🎉 Try the GOSIM Ticket 🎟 Lucky Draw! ✨
Filter

AI Model

The Curse of Depth in Large Language Models

May 6

15:40 - 16:20

Large Language Models (LLMs) have demonstrated impressive achievements. However, recent research has shown that their deeper layers often contribute minimally, with effectiveness diminishing as layer depth increases. This pattern presents significant opportunities for model compression. In the first part of this seminar, we will explore how this phenomenon can be harnessed to improve the efficiency of LLM compression. Despite these opportunities, the underutilization of deeper layers leads to inefficiencies, wasting resources that could be better used to enhance model performance. The second part of the talk will address the root cause of this ineffectiveness in deeper layers and propose a solution. We identify the issue as stemming from the prevalent use of Pre-Layer Normalization (Pre-LN) and introduce LayerNorm Scaling to solve this issue.

Speakers