전체 글 (7) 썸네일형 리스트형 [Paper Review] Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models (ReST^EM) paper: Singh, Avi, et al. "Beyond human data: Scaling self-training for problem-solving with language models." arXiv preprint arXiv:2312.06585 (2023).link: https://arxiv.org/abs/2312.06585[Beyond Human Data: Scaling Self-Training for Problem-Solving with Language ModelsFine-tuning language models~(LMs) on human-generated data remains a prevalent practice. However, the performance of such models .. [Paper Review] AlphaZero-Like Tree-Search can GuideLarge Language Model Decoding and Training (TS-LLM) paper: Feng, Xidong, et al. "Alphazero-like tree-search can guide large language model decoding and training." arXiv preprint arXiv:2309.17179 (2023).link: https://arxiv.org/abs/2309.17179 Alphazero-like Tree-Search can Guide Large Language Model Decoding and TrainingRecent works like Tree-of-Thought (ToT) and Reasoning via Planning (RAP) aim to augment the reasoning capabilities of LLMs by usin.. [Paper Review] Android in the Zoo:Chain-of-Action-Thought for GUI Agents (AITZ) paper: Zhang, Jiwen, et al. "Android in the zoo: Chain-of-action-thought for gui agents." arXiv preprint arXiv:2403.02713 (2024)link: https://arxiv.org/abs/2403.02713 Android in the Zoo: Chain-of-Action-Thought for GUI AgentsLarge language model (LLM) leads to a surge of autonomous GUI agents for smartphone, which completes a task triggered by natural language through predicting a sequence of ac.. [Paper Review] Don’t throw away your value model!Generating more preferable text with Value-Guided Monte-CarloTree Search decoding (PPO-MCTS) paper: Liu, Jiacheng, et al. "Don't throw away your value model! Generating more preferable text with Value-Guided Monte-Carlo Tree Search decoding." First Conference on Language Modeling. 2024.link: https://arxiv.org/abs/2309.15028[Abstract]PPO에서 얻어지는 Value model이 적절한 guidance로써 text decoding에 도움을 줌1. PreliminariesGuided Decoding어떤 goal에 대해서 s_t = (w,x_PPOPolicy objective & Value objectivePolic.. [Paper Review] ReFT: Reasoning with Reinforced Fine-Tuning paper: ReFT: Reasoning with Reinforced Fine-Tuning (Trung et al., ACL 2024) ReFT: Reasoning with Reinforced Fine-TuningLuong Trung, Xinbo Zhang, Zhanming Jie, Peng Sun, Xiaoran Jin, Hang Li. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024.aclanthology.org[Abstract]CoT data에 SFT만 하면 Generalization 능력이 떨어짐ReFT: SFT로 warmup한 뒤, o.. [Paper Review] Multimodal Chain-of-Thought Reasoning inLanguage Models (MM-CoT) paper: Zhang, Zhuosheng, et al. "Multimodal chain-of-thought reasoning in language models." arXiv preprint arXiv:2302.00923 (2023).link: https://arxiv.org/abs/2302.00923 Multimodal Chain-of-Thought Reasoning in Language ModelsLarge language models (LLMs) have shown impressive performance on complex reasoning by leveraging chain-of-thought (CoT) prompting to generate intermediate reasoning chains.. [Paper Review] Reflexion: Language Agents with Verbal Reinforcement Learning Paper: Shinn, Noah, et al. "Reflexion: Language agents with verbal reinforcement learning." Advances in Neural Information Processing Systems 36 (2024).link: https://proceedings.neurips.cc/paper_files/paper/2023/hash/1b44b878bb782e6954cd888628510e90-Abstract-Conference.html Reflexion: language agents with verbal reinforcement learningRequests for name changes in the electronic proceedings will b.. 이전 1 다음