In our previous work, “Surpassing GPT-4: Exploring How Agent Workflows Forge the Next Frontier of LLM Performance!” . We uncovered a crucial element propelling LLMs to new heights: the Agents workflow. This intricate system encompasses four dynamic patterns: Reflection, Tool Usage, Planning, and Multi-Agent Collaboration, laying the foundation for revolutionary advancements in LLM performance. Now, prepare to embark on a journey deeper into the mesmerizing landscape of the Reflection Agent workflow pattern.
Related works
Dive into Relevant Research Reflection hinges upon LLMs scrutinizing their past endeavors and proposing enhancements. Three seminal papers vividly elaborate on this pattern, offering us invaluable insights:
- Self-Refine: Iterative Refinement with Self-Feedback
- CRITIC: Empowering Large Language Models with Tool-Interactive Critiquing
- Reflexion: Nurturing Language Agents through Verbal Reinforcement Learning
Self-Refine
The Art of Self-Optimization True to its name, Self-Refine epitomizes a self-refinement technique tailored for LLM optimization. It employs a single LLM as a generator, improver, and feedback provider. This method crafts initial outputs through iterative processes, leveraging the same LLM for feedback and subsequent output enhancements. The paper showcases Self-Refine’s prowess across diverse tasks like dialogue response generation, mathematical reasoning, and code generation, evaluated using cutting-edge LLMs such as GPT-3.5 and GPT-4. Notably, Self-Refine surpasses traditional single-step generation methods on all evaluation tasks using the same LLM, boasting an average performance boost of approximately 20%.
To illustrate, consider use ChatGPT to code a simple function in Python.
CRITIC
The author introduces a framework called CRITIC, which enables Large Language Models (LLMs) to validate and improve their outputs through interaction with external tools, akin to the interaction between humans and tools.
This open-source framework evaluates specific aspects of initial outputs via interactions with tools like search engines and code interpreters, refining outputs based on feedback garnered during validation. This iterative process ensures continual output refinement. Through rigorous evaluations across tasks like free-form question answering and mathematical program synthesis, CRITIC is able to consistently enhance LLM performance.
Notably, it achieves top scores across various datasets for mathematical evaluations.
Reflexion
ReAct, SayCan, Toolformer, HuggingGPT and WebGPT have showcased the feasibility of constructing automated decision agents based on LLMs. However, these methods often rely on short-term memory, limiting their performance. Reflexion, a paradigm shift in agent learning. By harnessing language verb reinforcement learning, Reflexion enables agents to learn from past failures, transforming binary or quantized environmental feedback into textual descriptions for future iterations. This self-reflective feedback acts as a semantic gradient signal, guiding agents towards optimal performance, akin to humans mastering skills through practice and reflection.
The figure reveals that the Reflexion framework leverages three distinct models: the Actor, the Evaluator, and the Self-Reflection model.
- The Actor generates text and actions based on observed states.
- The Evaluator computes reward scores for the Actor’s outputs.
- The Self-Reflection model generates verbal prompts to aid the Actor in self-improvement.
This process employs both short-term and long-term memory. Trajectory history functions as short-term memory, while the Self-Reflection model’s outputs are stored in long-term memory.
Through Reflexion, decision-making capabilities improved by 22% on AlfWorld tasks, 20% on HotPotQA reasoning questions, and 11% on HumanEval programming tasks. For instance, it outperforms GPT-4 by 10 points on HumanEval (PY), soaring from 80.1 to an impressive 91.0.
Embrace the journey as we unravel the intricacies of the Reflexion framework and its transformative impact on LLM agents.
Next steps
In our exploration of LLMs and agent workflows, we’ve encountered game-changing tools like Self-Refine and CRITIC. Yet, it’s Reflexion that truly stands out, promising to revolutionize LLM performance through language reinforcement.
Next up, we’ll dissect Reflexion’s source code, delving into its inner workings. From decoding environmental feedback to understanding its actor-evaluator-self-reflection model, we’re poised to unveil the secrets behind its transformative impact on LLM agents.
Stay tuned, give it a clap if you found this article helpful, drop a comment to share your thoughts, and don’t forget to follow me for the latest updates!