AI Model

The Curse of Depth in Large Language Models

May 6

•

15:40 - 16:20

Location: Master Stage

Large Language Models (LLMs) have demonstrated impressive achievements. However, recent research has shown that their deeper layers often contribute minimally, with effectiveness diminishing as layer depth increases. This pattern presents significant opportunities for model compression. In the first part of this seminar, we will explore how this phenomenon can be harnessed to improve the efficiency of LLM compression. Despite these opportunities, the underutilization of deeper layers leads to inefficiencies, wasting resources that could be better used to enhance model performance. The second part of the talk will address the root cause of this ineffectiveness in deeper layers and propose a solution. We identify the issue as stemming from the prevalent use of Pre-Layer Normalization (Pre-LN) and introduce LayerNorm Scaling to solve this issue.

Speakers

Shiwei Liu

Royal Society Newton International Fellow at University of Oxford