[Paper Review] Mistral 7B

paper: Jiang, Albert Q., et al. "Mistral 7B." arXiv preprint arXiv:2310.06825 (2023)

Mistral 7B

We introduce Mistral 7B v0.1, a 7-billion-parameter language model engineered for superior performance and efficiency. Mistral 7B outperforms Llama 2 13B across all evaluated benchmarks, and Llama 1 34B in reasoning, mathematics, and code generation. Our m

arxiv.org

[Abstract]

Mistral 7B는 여러 벤치마크에서 Llama 2(13B)와 Llama 1(34B)를 뛰어넘음
Grouped-Query Attention (GQA)와 Sliding Window Attention (SWA)를 적용함

1. Introduction

Mistral 7B는 efficiency와 performance에서 balance
GQA: 추론 속도를 높이고 디코딩 시의 memory requirement를 줄여서 higher batch size 가능케함
SWA: 적은 computiational cost로 longer sequence를 다룰 수 있게 함

2. Architectural details

Grouped Query Attention

Sliding Window Attention
- Longformer에서 처음 사용
- window size W를 넘어서도 attend
- layer를 여러 개 쌓으면 CNN의 receptive field와 같이 window를 벗어난 토큰들의 정보도 얻을 수 있는 효과
- $W \times K$ token을 attend 할 수 있음

Rolling Buffer Cache
- cache가 $W$의 fixed size를 가짐
- position $i$의 key와 value는 cache의 $i mod W$ 위치에 저장함
- $i$가 $W$보다 커지면 overwrite

Pre-fill and Chunking
- prompt는 이미 주어져있기 때문에 prompt 매우 긴 경우, chunk size만큼 잘라서 $(k,v)$ cache를 pre-fill
- 토큰을 한 번에 하나만 생성하는 것이 아니라 chunk 사이즈만큼 한번에 생성함

3. Results

Various tasks
- Commonsense Reasoning
- World Knowledge
- Reading Comprehension
- Math
- Code
- Popular aggregated results
Models
- Mistral 7B
- Llama 2 7B / 13B
- Code-Llama 7B
- Llama 1 34B
Result
- Mistral 7B는 Llama2 13B보다 모든 metric에서 더 나은 모습을 보임
- Llama 1 34B에 대해서도 대부분 더 나은 모습
- 특히 Mistral 7B는 code, math, reasoning 벤치마크들에서 나은 모습을 보임

4. Instruction Finetuing

Mistral 7B - Instruct
- Huggingface의 publicly available instruction datasets들로 instruction finetuning을 진행함
- Llama 2 13B Chat이나 Vicuna 13B 보다 더 나은 모습

'NLP' 카테고리의 다른 글

[Paper Review] Training Verifiers to Solve Math Word Problems (GSM8K) (0)	2025.01.08
[Paper Review] Reflexion: Language Agents with Verbal Reinforcement Learning (0)	2024.12.07

MLZoo

[Paper Review] Mistral 7B

[Abstract]

1. Introduction

2. Architectural details

3. Results

4. Instruction Finetuing

'NLP' 카테고리의 다른 글

티스토리툴바

[Paper Review] Mistral 7B

[Abstract]

1. Introduction

2. Architectural details

3. Results

4. Instruction Finetuing

'NLP' 카테고리의 다른 글

'NLP' Related Articles

티스토리툴바