AI's Dicey Reputation: Are LLMs Really Just Random Stochastic Machines?

Not too long ago, I found myself in a spirited exchange on social media with a chap – genuinely a good fellow – who had some reservations about the concepts of prompt engineering and prompting.

His stance revolved around two main points.

Firstly, he believed that prompting and prompt engineering were one and the same, even though we've delved deep into their distinctions in numerous articles.
But the crux of his argument was his likening of prompting to mere "guessing" or "or gambling" drawing a parallel to the simple act of rolling a dice.

This is an issue that I often face with students. His analogy did get the gears in my head turning. Indeed, the dice metaphor holds some water, but perhaps not in the way he initially thought. Here, I aim to unfold the layers of that conversation and shed some light on my perspective.

If you've ever thrown a dice during a board game (or a casino), you're well on your way to grasping a fundamental concept of generative artificial intelligence. The beauty of this analogy is its simplicity. Dice, in their most common form, have six sides, numbered 1 through 6. When you roll the dice, you have a reasonable expectation of the outcome—a number between 1 and 6. There's a sense of predictability.

How a Prompt Sets the Scope

The Traditional Dice: Predictable Outcomes

With a traditional 6-sided dice, the outcomes are clear. Toss it, and you're sure to land on a number from 1 to 6. Nothing more, nothing less. Just as you have a limited number of outcomes with a die, so does an AI system when given a specific prompt. It operates within the parameters you set for it.

Modifying the Dice - New Expectations

Now, imagine a dice that doesn't conform to the standard 1-6 numbering. Instead, it has fractions or numbers from 10 to 15. Suddenly, the outcomes you expect shift dramatically. Instead of anticipating a 3 or a 5, you might be hoping for a 10.5 or a 14. It's still a dice, but the results have changed due to its modified structure.

With Generative AI, the prompts you provide are akin to the range of numbers on the dice. Think of these prompts act as a guide, setting the boundaries for the AI's responses. The more context or specifics you provide, the narrower the scope of potential outcomes becomes. It's like customizing your dice to show only the numbers you want.

Flexible Possibilities Within Set Parameters

Does this mean AI has infinite capabilities? Not exactly. While the dice metaphor explains how prompts constrain possibilities, it also highlights that they are not limitless. AI systems cannot suddenly produce outcomes completely outside their training.

It operates based on the information provided. Think of it this way: the AI is like a dice, and the prompts you give are the numbers on its sides. By adjusting your prompts—by being more specific or broad—you essentially "load" the dice in favour of certain outcomes.

Generative AI thrives on data and context. Feed it with clear, concise prompts, and it can produce results that align closely with your expectations. However, if your prompts are vague, the AI's responses might be equally uncertain, just as you'd be unsure of the outcome if someone handed you a dice with unfamiliar numbers.

💡

While the outcomes are limited to the set parameters, the beauty of AI lies in its adaptability. Given the right prompts, AI can be moulded to produce a vast array of results, much like how different dice can yield various numbers.

The Myth of LLM's Stochastic or Random Outputs

A sentiment I frequently encounter is the notion that Large Language Models (LLMs) produce outputs that are inherently random or stochastic. This perspective isn't entirely baseless; after all, at their core, LLMs can generate a wide array of responses given their vast training data. However, my hands-on experience paints a different picture.

In standard conditions, using commercially available or open-sourced LLMs, I've seldom, if ever, witnessed this unpredictable behavior. Why is that? Well, it boils down to the fine-tuning and control mechanisms implemented during their development.

While LLMs, in their most primitive forms, may indeed have a stochastic essence, by the time they reach end-users in a commercial or open-source setting, they've undergone rigorous refinement. This fine-tuning process involves training the model on specific tasks or datasets to align its outputs more closely with human expectations. It's a bit like training a wild horse; while the animal might start with unpredictable behaviors, consistent training and guidance can mold its actions to be more controlled and deliberate.

What we can learn here is that, though the foundational architecture of LLMs might have the potential for randomness, the versions we typically interact with have been optimized for consistency and reliability. The wild, stochastic nature is tamed to ensure that users get a more predictable and valuable experience.

Beyond the Prompt

When working with generative AI models like GPT-4, the user has the power to fine-tune certain settings, shaping the output to better fit specific needs.

Four commonly adjusted parameters are temperature, top p, frequency penalty, and presence penalty. Each plays a unique role in determining the model's responses. Let's delve into how these settings can influence the output and the degree of randomness.

💡

These settings are not available on most AI Chatbots such as ChatGPT and Claude

Adjusting Temperature

Another way to influence AI outputs is by adjusting the "temperature" parameter. Temperature relates to the randomness of the model's responses. A higher value makes outputs more unpredictable, while a lower value prompts more certain, safer responses.

Adjusting Top p (Nucleus Sampling)

Top p, also known as nucleus sampling, is another method to introduce randomness. Instead of sampling from the entire distribution, the model selects from the "nucleus" of top tokens whose cumulative probability exceeds a certain threshold (p). A lower p value results in higher randomness as the model picks from a smaller set of tokens, while a larger p value makes the model more conservative, drawing from a broader set.

Think of temperature as loading a dice with more or less expected numbers. Higher values allow more creative freedom, while lower ones narrow the scope. Fine-tuning the temperature provides further control over the AI's randomness beyond just crafting the prompt.

Discouraging Repetition with Frequency Penalty

Another useful parameter is the frequency penalty, which affects repetition. A negative value here pushes the AI to avoid reusing phrases or ideas, increasing variety. Conversely, a positive penalty can make the model repeat key concepts more often, at the cost of some redundancy.

Think of the frequency penalty as nudging the metaphorical dice away from or towards certain numbers. It provides another lever to shape the AI's responses as needed for the context.

Encouraging Novelty with Presence Penalty

Presence penalty is the opposite of frequency penalty. A positive value here encourages the introduction of new concepts or terms in the output. This prompts fresher, more novel responses from the AI. Conversely, a negative presence penalty may result in the model overlooking certain ideas.

Think of presence penalty as nudging the metaphorical dice towards new numbers beyond the standard set. It pushes the boundaries of the AI's responses in creative ways tailored for the context.

Guiding Generative AI with Intentional Prompting

Stepping back, this dice analogy vividly captures the essence of leveraging generative AI. These models do not have boundless potential. But considerate prompting provides guardrails that allow for contextually relevant, engaging responses. Understanding this relationship between prompts and possibilities unlocks the true utility of AI systems.

💡

The dice analogy offers a splendid visualization of how generative AI operates. By understanding the relationship between the prompts and the AI's responses, we can better harness its potential, ensuring that the "roll" is always in our favour.

The next time you interact with a generative AI, envision the prompt as determining the numbers on a metaphorical dice. Thoughtful prompting improves outcomes, just as loaded dice weights results. While possibilities are finite, our prompts make AI remarkably flexible and useful within parameter guardrails. It's a clever and accurate visualization for how today's AI language models operate.

AI's Dicey Reputation: Are LLMs Really Just Random Stochastic Machines?

How a Prompt Sets the Scope

The Traditional Dice: Predictable Outcomes

Modifying the Dice - New Expectations

Flexible Possibilities Within Set Parameters

The Myth of LLM's Stochastic or Random Outputs

Beyond the Prompt

Adjusting Temperature

Adjusting Top p (Nucleus Sampling)

Discouraging Repetition with Frequency Penalty

Encouraging Novelty with Presence Penalty

Guiding Generative AI with Intentional Prompting

Intention-Aligned Prompting in AI Interactions

Goal-Oriented vs Process-Oriented Prompting in Large Language Models

Memory, Context, and Cognition in LLMs

0-Shot vs Few-Shot vs Partial-Shot Examples in Language Model Learning

System Prompts in Large Language Models

Taming the Black Box with Interpretable Prompting