본문 바로가기

분류 전체보기

(10)
[Paper Review] Training Verifiers to Solve Math Word Problems (GSM8K) paper: Cobbe, Karl, et al. "Training verifiers to solve math word problems." arXiv preprint arXiv:2110.14168 (2021).link: https://arxiv.org/abs/2110.14168 Training Verifiers to Solve Math Word ProblemsState-of-the-art language models can match human performance on many tasks, but they still struggle to robustly perform multi-step mathematical reasoning. To diagnose the failures of current models..
[Paper Review] Android in the Zoo: Chain-of-Action-Thought for GUI Agents (AITZ) paper: Zhang, Jiwen, et al. "Android in the zoo: Chain-of-action-thought for gui agents." arXiv preprint arXiv:2403.02713 (2024)link: https://arxiv.org/abs/2403.02713 Android in the Zoo: Chain-of-Action-Thought for GUI AgentsLarge language model (LLM) leads to a surge of autonomous GUI agents for smartphone, which completes a task triggered by natural language through predicting a sequence of ac..
[Paper Review] Mistral 7B paper: Jiang, Albert Q., et al. "Mistral 7B." arXiv preprint arXiv:2310.06825 (2023)link: https://arxiv.org/abs/2310.06825 Mistral 7BWe introduce Mistral 7B v0.1, a 7-billion-parameter language model engineered for superior performance and efficiency. Mistral 7B outperforms Llama 2 13B across all evaluated benchmarks, and Llama 1 34B in reasoning, mathematics, and code generation. Our marxiv.org..
[Paper Review] Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models (ReST^EM) paper: Singh, Avi, et al. "Beyond human data: Scaling self-training for problem-solving with language models." arXiv preprint arXiv:2312.06585 (2023).link: https://arxiv.org/abs/2312.06585[Beyond Human Data: Scaling Self-Training for Problem-Solving with Language ModelsFine-tuning language models~(LMs) on human-generated data remains a prevalent practice. However, the performance of such models ..
[Paper Review] AlphaZero-Like Tree-Search can GuideLarge Language Model Decoding and Training (TS-LLM) paper: Feng, Xidong, et al. "Alphazero-like tree-search can guide large language model decoding and training." arXiv preprint arXiv:2309.17179 (2023).link: https://arxiv.org/abs/2309.17179 Alphazero-like Tree-Search can Guide Large Language Model Decoding and TrainingRecent works like Tree-of-Thought (ToT) and Reasoning via Planning (RAP) aim to augment the reasoning capabilities of LLMs by usin..
[Paper Review] Android in the Zoo:Chain-of-Action-Thought for GUI Agents (AITZ) paper: Zhang, Jiwen, et al. "Android in the zoo: Chain-of-action-thought for gui agents." arXiv preprint arXiv:2403.02713 (2024)link: https://arxiv.org/abs/2403.02713 Android in the Zoo: Chain-of-Action-Thought for GUI AgentsLarge language model (LLM) leads to a surge of autonomous GUI agents for smartphone, which completes a task triggered by natural language through predicting a sequence of ac..
[Paper Review] Don’t throw away your value model!Generating more preferable text with Value-Guided Monte-CarloTree Search decoding (PPO-MCTS) paper: Liu, Jiacheng, et al. "Don't throw away your value model! Generating more preferable text with Value-Guided Monte-Carlo Tree Search decoding." First Conference on Language Modeling. 2024.link: https://arxiv.org/abs/2309.15028[Abstract]PPO에서 얻어지는 Value model이 적절한 guidance로써 text decoding에 도움을 줌1. PreliminariesGuided Decoding어떤 goal에 대해서 s_t = (w,x_PPOPolicy objective & Value objectivePolic..
[Paper Review] ReFT: Reasoning with Reinforced Fine-Tuning paper: ReFT: Reasoning with Reinforced Fine-Tuning (Trung et al., ACL 2024) ReFT: Reasoning with Reinforced Fine-TuningLuong Trung, Xinbo Zhang, Zhanming Jie, Peng Sun, Xiaoran Jin, Hang Li. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024.aclanthology.org[Abstract]CoT data에 SFT만 하면 Generalization 능력이 떨어짐ReFT: SFT로 warmup한 뒤, o..