Offline, frozen archive of pre-generated AI responses.
Canonical project documentation.
The AI Response Archive is an experimental offline collection of pre-generated AI responses, distributed as a self-contained ZIM file for use with Kiwix.
Instead of performing live inference, the project systematically enumerates a bounded prompt space and preserves the resulting responses as a static, offline artifact.
The distributed archive is available via Gumroad:
👉 Get the AI Response Archive on Gumroad
The archive captures all prompt strings composed of printable ASCII characters up to a maximum length N.
Let:
A = number of printable ASCII charactersN = maximum prompt lengthThe total enumerated prompt space is:
A¹ + A² + A³ + … + Aⁿ
Each prompt is evaluated once using a fixed model configuration, and the resulting output is preserved verbatim.
The archive is therefore:
Because the prompt space grows exponentially with respect to N, each increment substantially increases generation time and storage requirements.
The term “deterministic” in this project refers to runtime behavior, not model-level reproducibility.
At runtime:
At generation time:
This means the archive is:
Responses were generated using:
The archive therefore reflects the behavioral characteristics of this specific quantized model artifact.
Generation was performed using:
-DGGML_CUDA=ONThe exact commit hash of llama.cpp was not recorded at generation time.
The archive therefore represents a frozen behavioral snapshot of the model evaluated using a llama.cpp build from the ggml-org repository as of February 2026.
All responses were generated using a single, fixed inference configuration.
Key parameters included:
Temperature was set to 0.0 to minimize stochastic variation.
Top-p remained at 1.0 (no nucleus truncation).
Max tokens was intentionally left at 500 rather than reduced, in order to avoid artificially constraining output length across prompts of varying ambiguity.
n_predict: 1 ensured that exactly one completion was generated and retained for each prompt. No ranking, filtering, retries, or multi-sample selection was performed.
Each prompt was evaluated exactly once.
Each enumerated prompt was inserted into the following chat-format template:
<|start_header_id|>system<|end_header_id|>
You are a helpful assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>
[prompt]<|eot_id|><|start_header_id|>assistant<|end_header_id|>
Where [prompt] was replaced by the enumerated ASCII string.
Generation began immediately after the final assistant header marker.
The distributed ZIM file contains:
At runtime:
N)No network requests are made.
No APIs are contacted.
No model inference occurs.
▶️ Watch the extended demonstration on YouTube





Let:
A = printable ASCII alphabet sizeN = maximum prompt lengthTotal prompts:
A¹ + A² + … + Aⁿ
This geometric expansion defines the core scaling behavior of the archive.
Increasing N by 1 multiplies the largest term by A, resulting in substantial increases in storage size and generation workload.
This combinatorial structure is intentional and foundational to the experiment.
This repository serves as the canonical documentation and reference page for the AI Response Archive project.
It does not contain:
The archive itself is distributed commercially via Gumroad.
The project is experimental and evolving.
Future releases may increase the maximum prompt length N, subject to practical constraints such as storage growth and distribution feasibility.
© 2026 Anthony Karam. All rights reserved.