Tree of Thought Prompting - Walking the Path of Unique Approach to Problem-Solving

Tree of Thought Prompting, a method for enhancing problem-solving in AI language models. Understand how this approach uses a tree search mechanism to generate, critique, and prune thoughts, thereby navigating the solution space more effectively.

Tree of Thought Prompting - Walking the Path of Unique Approach to Problem-Solving

A key area of interest of Generative AI lies in the improvement of prompting techniques used in large language models. These models are tasked with generating human-like text based on provided prompts, aiming to create responses that are coherent, contextually accurate, and useful. This essay delves into three distinct types of prompting: Input Output Prompting, Chain of Thought Prompting, and Tree of Thought Prompting. Each of these methods has its merits and limitations, which will be examined alongside their applicability to AI development.

The integration of the "Tree of Thoughts" methodology into large language models significantly enhances their problem-solving capabilities, outperforming traditional input-output methods in a diverse range of tasks. By implementing a unique blend of planning, decision-making mechanisms, and a self-reflection system, these models can tackle complex problems with unparalleled efficiency.

Summary of Concept

Tree of Thought (ToT) is a framework proposed for improving problem-solving and reasoning abilities in large language models (LLMs) like GPT. The key ideas are:

  1. Represent the reasoning process as a tree, where each node is an intermediate "thought" or coherent piece of reasoning that serves as a step towards the final solution. For example, in mathematical problem-solving, each thought could be an equation.
  2. Actively generate multiple possible thoughts at each step, rather than just sampling one thought sequentially as in chain-of-thought prompting. This allows the model to explore diverse reasoning paths.
  3. Evaluate the promise of different thoughts/nodes using the LLM itself, by prompting it to assess the validity or likelihood of success of each thought. This provides a heuristic to guide the search through the reasoning tree.
  4. Use deliberate search algorithms like breadth-first search or depth-first search to systematically explore the tree of thoughts. Unlike chain of thought, ToT can look ahead, backtrack, and branch out to consider different possibilities.
  5. The overall framework is general and modular - the thought representation, generation, evaluation, and search algorithm can all be customized for different problems. No extra training of models is needed.

In experiments, ToT greatly improved performance over the chain of thought prompting on problems requiring searches like Game of 24, Creative Writing, and Crosswords. Key benefits are the ability to explore multiple reasoning paths and leverage the LM's own assessments to guide search, instead of just left-to-right token generation. The results demonstrate the promise of incorporating more deliberate, human-like planning and metacognition into LLMs.

Decoding the Essence of the Paper

The paper in question, titled "Tree of Thoughts: Deliberate Problem-Solving in Large Language Models", seeks to enhance the problem-solving abilities of these behemoths of AI. How do large language models work, you ask? Well, they're essentially predicting the next in a sequence of text, based on a given prompt. Sounds simple, right? But the reality is a little more complicated than that.

Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Language models are increasingly being deployed for general problem solvingacross a wide range of tasks, but are still confined to token-level,left-to-right decision-making processes during inference. This means they canfall short in tasks that require exploration, strategic lookahead, or wherei…

Highlighting the Current Model's Shortcomings

Unfortunately, current models are often confined to a 'token' level, focusing on sentence inference and word decision-making. They tend to stumble when faced with tasks requiring strategic exploration, where initial decisions can play a pivotal role in the overall problem-solving process. That's where the concept of a 'tree of thought' comes into play.

Input Output Prompting and Its Limitations

Input Output Prompting is the most fundamental method of interacting with a language model. This method involves providing the model with a specific task and, optionally, defining the desired format of the output. The task could range from writing an email to a boss to executing a classification task, which the model should then complete in the requested format.

A significant advantage of this method lies in its simplicity and straightforwardness, as it essentially requires the model to generate an output based on the given input. However, this technique comes with its limitations. For instance, it doesn't support intermediate steps of problem-solving, nor does it provide a clear process for arriving at a particular response. This lack of transparency can make it difficult to understand the reasoning behind the model's output, which can lead to challenges in debugging or refining the model's results.

Chain of Thought Prompting and Its Limitations

The Chain of Thought Prompting method seeks to improve upon Input Output Prompting by instructing the model to make and display intermediate steps. Instead of just providing input and receiving an output, the model is guided through a problem-solving process, where it outlines its 'thoughts' or steps leading up to the final answer.

This approach has the advantage of offering more insight into the problem-solving process of the AI model, which can enhance our understanding of the model's workings. However, it is still subject to limitations. While the method provides a structured way for the model to compute and solve the given task, it's not fully understood why exactly this method leads to better solutions. Hypotheses revolve around the idea of a 'scratch pad' for the model to write down its thoughts or a 'longer time to compute,' but more research is needed in this area.

Enter Tree of Thought Prompting

The Tree of Thought Prompting takes a more comprehensive approach to problem-solving by allowing multiple iterations of the Chain of Thought approach. It involves generating multiple 'thoughts' or problem-solving steps for a given prompt and then using the AI model to critique these steps based on their suitability to solve the original problem.

This method brings two primary benefits to the table. Firstly, it leverages the model's strength in evaluating the coherence between two things, which often outperforms the generation of new content. Secondly, by generating multiple thoughts and having the model critique them, the method offers a way to prune less suitable options and enhance the final output quality.

Delving into the Experimental Framework

To put their theories to the test, the researchers employed a decision tree to figure out solutions and backtrack permutations. They tested their 'tree of thought' in a variety of scenarios including the game '24', creative writing, and crossword puzzles. Astoundingly, their method achieved a success rate of 74%, marking a significant improvement in problem-solving capabilities.

Exploring Input-Output Prompting

The Chain of Thought approach takes the concept of input-output prompting to the next level. It essentially asks the large language model to explain, step-by-step, the reasoning behind its conclusion or solution. It fosters a self-consistent train of thought in a way that hasn't been seen before.

The 'Tree of Thought': A Deeper Look

The Tree of Thought asks the large language model to consider multiple solutions at each level, exploring nodes and diving deep into permutations. The model is encouraged to continually evaluate its results, normalizing its decision-making process, and boosting its confidence in the chosen solution.

Understanding the Complexity of the 'Tree of Thought'

The 'Tree of Thought' might sound complex, and that's because it is. Implementing it requires an in-depth understanding of coding and the ability to manipulate symbols. Yet, despite its complexity, the model's self-consistency and depth of reasoning have shown how effectively it tackles deep mathematical problems.

Breaking Down Problems - The Power of Decomposition

One of the keys to this approach is the ability to break down problems into smaller, manageable pieces - similar to how a human brain works. This problem decomposition allows the model to tackle each individual segment of a complex math problem and build upon each solution.

Thought Decomposition and Evaluation - The Path to Better Solutions

This approach doesn't just stop at breaking down the problem; it also applies thought decomposition. It generates intermediate steps and potential solutions, which are then evaluated to determine whether they're on the right path or not. This constant evaluation allows for dynamic decision-making, improving the model's problem-solving capabilities.

The Evaluator: The Large Language Model's Judge

A critical component of the process is the evaluator, which assesses potential solutions at each intermediate step. This in-depth evaluation helps the large language model determine whether a potential solution is viable or whether an alternative should be proposed.

Deliberate Reasoning: The Final Piece of the Puzzle

The goal is to enable the large language model to deliberately reason its way to a solution. Rather than being bound by pre-programmed rules, it's about creating efficient learned models that propose and evaluate methods based on the context of the problem. The large language model evaluates each intermediate step, moving forward only when a viable solution path is found. This, in essence, is the power of deliberate reasoning.

How Tree of Thought Prompting Works

The process starts with generating multiple initial thoughts, representing the initial steps of problem-solving. For instance, if the task is to create a plan, the model could generate multiple potential first steps for the plan. These initial thoughts are analogous to the root nodes of a tree, each branching into further thoughts or steps.

Once the initial thoughts are generated, the AI model is tasked with self-critiquing each of these thoughts with respect to the input prompt. It evaluates how well each thought or step aligns with the problem-solving objective, which takes advantage of the model's inherent strength in evaluating the coherence between two things. This assessment phase could involve ranking each thought or assigning scores to each, depending on the application.

Upon reviewing the critiques, the model discards the thoughts that were evaluated as less useful or suitable. The remaining thoughts, or nodes, are then expanded upon with further steps, again generated by the model. This forms the second layer of the tree, which is then subjected to the same process of self-critique and pruning.

The Importance of Backtracking

In instances where all generated thoughts for a node are evaluated as unsuitable, the method employs backtracking. Essentially, the model returns to the previous layer of the tree and branches out from another node, discarding the unfruitful path. This process prevents the model from wasting computational resources on non-constructive avenues and enhances the overall effectiveness of the problem-solving process.

Tree Search in Problem-Solving

This process is akin to a tree search in computer science, where each node is evaluated and expanded upon in order of its assigned value. The process can be executed either breadth-first or depth-first, depending on the context of the problem. This structure allows for efficient searching through potential solutions, with the model consistently focusing on the most promising paths.

Dual Roles: Thought Generator and Critic

In this method, the AI model performs two distinct roles: the thought generator and the critic. As the thought generator, the model proposes intermediate steps based on the input and previous thoughts. It then shifts to the role of the critic, where it evaluates the relevance and efficacy of the generated thoughts in relation to the problem at hand. This dual role enables a more structured and coherent problem-solving process.

Performance Disparities Between Different Models

The revolutionary Chain of Thought methodology shows a consistent and impressive success rate across numerous iterations. While other models clocked in at a 7.3 percent success rate, Chain of Thought achieved an astonishing 100 percent success rate when combined with the self-consistency parameter, easily outpacing the input-output model. This immense gap in performance showcases the remarkable potential of the Chain of Thought methodology.

Triumph of the Tree of Thought Over Traditional Methods

The Tree of Thought methodology doesn't just surpass other methods—it crushes them. With an efficiency rate of 100 percent in comparison to the 49 percent of the traditional Chain of Thought method, it's clear that the Tree of Thought represents a significant evolution in large language model problem-solving.

The Creative Writing Challenge

The ability to create a coherent passage from random sentences represents a significant challenge, requiring high levels of creativity and strategic planning. However, when pitted against this challenge, the GPT-4 model, guided by the Tree of Thought methodology, produced coherent passages that rival those written by humans.

From Astronauts to Handstands: A Creative Writing Plan

The Tree of Thought method excels at strategic planning, allowing for seamless integration of seemingly unrelated topics, like performing handstands and an astronaut's time in space. It can effectively weave these topics into a cohesive narrative, demonstrating a level of sophistication beyond traditional models.

Evaluating Perceptions with the Tree of Thought

The Tree of Thought's ability to contemplate perceptions and shape a narrative around identity showcases its impressive versatility. It consistently outperforms the input-output and Chain of Thought methods, scoring higher on both coherency and creativity scales.

Conquering Mini Crosswords: A Test of Reasoning

Crossword puzzles, a seemingly innocuous pastime, represent a complex challenge for AI, requiring a deep understanding of language and problem-solving. The Tree of Thought methodology takes this challenge in stride, achieving a success rate that far exceeds traditional models.

Tree-of-Thought Standard Operating Procedure (SOP)

Here is a draft standard operating procedure (SOP) for applying the Tree of Thoughts (ToT) framework to a new reasoning or problem-solving task:

  1. Define the problem input and desired output.
  2. Decompose the reasoning process into coherent thought steps. Determine an appropriate granularity for thoughts based on what the LLM can generate and evaluate effectively.
  3. Design a thought generator prompt to propose k possible next thoughts conditioned on the current thought sequence. This could sample thoughts independently or sequentially in context.
  4. Design a thought evaluation prompt to assess the promise of generated thoughts. This could value thoughts independently or vote/rank thoughts relative to each other.
  5. Choose a search algorithm like BFS or DFS based on the estimated tree depth and branching factor.
  6. Initialize the tree with the problem input as the root state. Use the thought generator to expand the leaf nodes and the thought evaluator to prioritize newly generated thoughts.
  7. Run a search for up to a maximum number of steps or until a solution is found. Extract the reasoning chain from the highest valued leaf node.
  8. Analyze results and refine prompts as needed to improve performance. Adjust search hyperparameters like branching factor and depth as needed.
  9. For new tasks, iterate on the design by adjusting the thought representation, search algorithm, or evaluation prompts. Leverage the LM's strengths and task properties.
  10. Compare ToT performance to baseline approaches like input-output prompting and analyze errors to identify areas for improvement.

Following this general procedure will allow systematic applying ToT to new reasoning tasks. The key is striking the right balance between thought size, search space, and evaluative reasoning. With some iteration, ToT provides a promising path to more deliberate and flexible reasoning with LLMs.


Tree of Thought Prompting, thus, offers a comprehensive approach to problem-solving with AI language models. By generating and evaluating multiple chains of thoughts, the model can navigate the solution space more effectively, pruning non-effective paths, and focusing on promising ones. Despite its complexity, this method opens up new avenues for enhancing the usefulness and understandability of AI models.

Read next