Large language models (LLMs) like GPT-3 and ChatGPT have traditionally produced non-deterministic outputs, meaning responses can vary for the same user prompt. This poses challenges for testing and auditing AI systems.
Recently, OpenAI has introduced beta features to enable reproducibility of LLM outputs. This article explores these capabilities and their implications.
Seeding LLM Inputs
OpenAI now allows seeding of prompts to associate a user input with a specific LLM response. The prompt text combined with the seed value produces the same output each time.
- The seed can be any integer value decided by the user. It links the prompt to the response.
- To reproduce a response, the exact prompt text and seed value must be provided.
- Seeding prompts require architectural changes to store seed values with prompts.
- Use cases include replicating specific customer journeys for testing and reusing optimal responses.
OpenAI exposes a
system_fingerprint parameter that represents the backend configuration used to run the model.
- If the fingerprint changes, the LLM output may also change even with the same seeded prompt.
- Fingerprints help track when backend updates impact output determinism.
- Saving outputs with the fingerprint allows checking if outputs change due to backend updates.
- Process needed to compare fingerprints and be alerted to backend changes.
Enabling Deterministic Outputs
To receive deterministic outputs:
- Use the same seed value for requests needing reproducible outputs.
- Ensure all parameters (prompt, temperature etc.) stay exactly the same.
- Check system fingerprint and regenerate outputs if it changes.
- Seeding requires overhead for data storage and architecture changes.
- Fingerprint comparisons add complexity to monitor backend updates.
- Flexibility is reduced compared to non-deterministic models.
- Use cases like testing may warrant the extra effort for determinism.
OpenAI's beta features allow seeding prompts and tracking backend changes to achieve reproducible outputs from traditionally non-deterministic LLMs. While adding complexity, this determinism enables use cases like testing and auditing AI systems.