Prompting Chain-of-thought and Skeleton-of-thought.
Here we discuss two current techniques finding success in prompting, chain-of-thought (CoT) and skeleton-of-thought (SoT)
If you are here, I am assuming fairly you know about Zero and Few shot prompting.
Here I wanted to discuss two common but skilled techniques which has helped my LLM not hallucinate and act like my toddler nephew. My nephew is an annoying little menace, like an hallucinating LLM :), but I love him considering the good he has in him and the impact he will have on the world hopefully, again like an LLM, but my sister has to instil appropriate knowledge in him like we do with CoT and SoT, again like an LLM.
Chain-of-thought
Introduced in Wei et al. (2022), chain-of-thought (CoT) prompting is basically where you enable the LLM for complex reasoning capabilities through intermediate reasoning steps via prompts. There are multiple flavours of CoT
Zero shot CoT
The most basic idea of zero-shot CoT (Kojima et al. 2022) is that essentially adding "Let's think step by step" to the original prompt, helps our LLM to “think” before “responding”.
For example, for basic maths problem
prompt = f"""
Let's think step by step for the following problem before answering:
Problem: {problem}
"""a more detailed version
prompt = f"""
Let's solve the following math problem step-by-step:
Problem: {problem}
1. First, identify the operation needed to isolate the variable.
2. Perform the operation step-by-step.
3. Simplify the equation to find the solution.
"""Few shot CoT
In this, we just augment our zero shot CoT with few sample examples and nothing else.
prompt = """
Let's solve some math problems step-by-step.
Example 1:
Problem: Solve for x in the equation 3x + 2 = 11.
1. Subtract 2 from both sides of the equation.
- 3x + 2 - 2 = 11 - 2
- 3x = 9
2. Divide both sides by 3.
- 3x / 3 = 9 / 3
- x = 3
Example 2:
Problem: Solve for y in the equation 4y - 5 = 7.
1. Add 5 to both sides of the equation.
- 4y - 5 + 5 = 7 + 5
- 4y = 12
2. Divide both sides by 4.
- 4y / 4 = 12 / 4
- y = 3
Now, let's solve a new problem step-by-step.
Problem: {problem}
"""Auto CoT
Not done enough work but what I understood is In zero-shot CoT, the model is given a single prompt that instructs it to solve a problem step-by-step without providing any prior examples. The model must use its own understanding to generate the intermediate reasoning steps. Auto-CoT involves the model autonomously generating intermediate reasoning steps. The key difference is that Auto-CoT may leverage multiple passes or iterations, potentially using self-consistency or prompt chaining techniques to refine its answers. This approach encourages the model to autonomously create and verify reasoning paths.
Self consistency
Self-consistency prompting is an advanced technique that builds on Chain-of-Thought (CoT) prompting. Instead of relying on a single reasoning path, self-consistency involves generating multiple diverse reasoning paths and then selecting the most consistent answer from these paths. This technique improves accuracy and reliability, especially for complex reasoning tasks. Perhaps one of the more advanced techniques out there for prompt engineering is self-consistency. Proposed by Wang et al. (2022), self-consistency aims "to replace the naive greedy decoding used in chain-of-thought prompting".
import openai
from collections import Counter
# Set your OpenAI API key
openai.api_key = 'your-api-key'
def generate_solutions(prompt, n=5):
response = openai.Completion.create(
engine="text-davinci-003",
prompt=prompt,
max_tokens=150,
temperature=0.7,
n=n # Generate n different solutions
)
solutions = [choice.text.strip() for choice in response.choices]
return solutions
def select_most_consistent_solution(solutions):
# Automatically count occurrences and select the most common solution
count = Counter(solutions)
most_common_solution = count.most_common(1)[0][0]
return most_common_solution
# Example problem
problem = "Solve for x in the equation 2x + 3 = 11."
prompt = f"""
Solve the following math problem step-by-step and explain your reasoning.
Problem: {problem}
"""
# Self-Consistency process
solutions = generate_solutions(prompt, n=10) # Generate 10 solutions for better consistency
print("Generated Solutions:", solutions)
most_consistent_solution = select_most_consistent_solution(solutions)
print("Most Consistent Solution:", most_consistent_solution)1. Generating Multiple Solutions:
• Set n=10 to generate 10 different completions, ensuring diverse reasoning paths and better consistency.
2. Automated Selection:
• Using Counter from the collections module to count the frequency of each solution and automatically select the most frequent one.
Skeleton-of-thought
Here we instruct the LLM to first generate crisp skeleton/structure on a given topic and then let it expand on those structure point by point
The 3 steps involved are:
Prompt to generate the skeleton outline
Prompt to elaborate each point
Concurrent execution of each point
class Gpt4Turbo:
def __init__(self):
self.MODEL = 'gpt-3.5-turbo-1106'
self.TOKEN_LIMIT=4000
self.client = OpenAI(api_key=OPEN_AI_API_KEY)
self.temperature =0.7
self.streaming = False
def gptCall_json(self, temperature, messages: list):
try:
response = self.client.chat.completions.create(model=self.MODEL,
messages=messages,
temperature=temperature,
max_tokens=self.TOKEN_LIMIT,
stream=False,
response_format={"type": "json_object"}) ## Enforce output format
return response.choices[0].message.content
except Exception as e:
print(e)
return ""
def generate_skeleton(self):
question = self.question
outline_prompt = f'''
You're an organizer responsible for only giving the skeleton (not the full content) for answering the question.
Provide the skeleton as a JSON to answer the question. Instead of writing a full sentence, each skeleton point should
be very short with only 2~5 words. Generally, the skeleton should have 3~10 points. The skeleton is an outline that would be expanded later.
Don't elaborate on the point in the skeleton.
Example:
\n\nQuestion:\nWhat are the typical types of Chinese dishes?: \n Response: {{"answer" : ["Dumplings" , "Noodles" , "Dim Sum" , "Hot Pot" , "Wonton", "Ma Po Tofu", "Char Siu", " Fried Rice"]}}.
\n\nQuestion:\nWhat are some practical tips for individuals to reduce their carbon emissions?\n Response: {{ "answer" :["Energy Conservation", "Efficient transportation", "Home Energy Efficiency", "Reduce Water Consumption", "Sustainable Diet", "Sustainable Travel"]}}
\n\nNow, please provide the skeleton for the following question.\n{question}\n Response: {{"answer": [...]}}
'''
## Make the message
message=[]
message.append({"role": "system", "content": "You are a helpful assistant. You respond in JSON format."})
message.append({"role": "user", "content": outline_prompt})
self.result = self.gptCall_json(self.temperature, message)
def elaborate_point(self, point):
question = self.question
point_prompt = f'''
You help elaborate on the point user wants. Your input is a question and one possible answer from the question, also called <point>. You will elaborate on the <point> and give a 2-3 sentence response
on how the <point> helps answer the question. Start your response by mentioning the <point> and then colon like point: and then your response
Your response will be in JSON format. Example: {{"answer": {point}: your response"}}
\n\nNow, please elaborate on the following point. Question: {question}\n <Point> : {point} \n Response: {{"answer": [...]}}
'''
## Make the message
message=[]
message.append({"role": "system", "content": "You are a helpful assistant. You respond in JSON format."})
message.append({"role": "user", "content": point_prompt})
result = self.gptCall_json(self.temperature, message)
point_elaborate = json.loads(result)
return point_elaborate['answer']
def concurrent_results(self, question):
self.question = question
self.generate_skeleton()
result = json.loads(self.result)
num_points = len(result["answer"])
# Create a thread pool executor with 5 threads
with concurrent.futures.ThreadPoolExecutor(max_workers=num_points) as executor:
# Submit the API calls to the executor
outputs = [executor.submit(self.elaborate_point, point) for point in result['answer']]
# Wait for the API calls to complete and get the results
results = [future.result() for future in concurrent.futures.as_completed(outputs)]
# Use list comprehension to add enumeration and "\n" each record
string_list = [f"{i+1}. {record}\n" for i, record in enumerate(results)]
# Join the string_list elements into a single string
final_output = ''.join(string_list)
return final_outputThats all folks
#PromptEngineering #ChainOfThought #SkeletonOfThought #AIResearch #MachineLearning


This was a great breakdown of CoT and SoT—practical, clear, and actually useful. The toddler nephew analogy made me laugh, and honestly, it’s a solid way to think about guiding LLMs toward better reasoning.
I’ve used CoT prompting before, but I hadn’t considered Auto-CoT or self-consistency in depth. The idea of generating multiple reasoning paths and selecting the most consistent one is something I’ll definitely be experimenting with. Skeleton-of-thought is another concept I hadn’t played with, but the idea of forcing structured thinking before elaboration makes total sense—especially when trying to get AI to write more coherently.