AI-Optimised Text Formats: Why Structure Matters
- annaboten101
- May 26
- 5 min read
Text formats matter.
The same information can be written in many ways, but some formats are much better than others when we speak about AI retrieval quality, token usage, cost, and human readability.
JSON is one of the most common formats for structured data. In many ways, it is the gold standard: reliable, widely supported, and excellent for software systems. Its weakness is repetition. Field names, brackets, quotes, and punctuation are repeated again and again. Pretty-printed JSON is easier for humans to read, but it is token-heavy. Minified JSON is smaller, but still repetitive and less pleasant to inspect.
YAML is more human-friendly. It uses indentation instead of many brackets, which can make it easier to read and write. It is useful for configuration files, notes, and structured documents. However, YAML can become ambiguous, indentation-sensitive, and sometimes surprisingly large in token count. It is readable, but not always efficient.
CSV is extremely compact for flat tables. It avoids repeated keys and keeps only rows and values. This makes it highly token-efficient. Its weakness is structure: CSV struggles with nested information, mixed data types, and complex relationships. It is excellent for spreadsheets and simple datasets, but poor for rich knowledge.
TOON, or Token-Oriented Object Notation, is especially interesting. It is a recent open-source, MIT-licensed format associated with Johann Schopplich's work. TOON combines YAML-like readability for nested objects with CSV-like rows for uniform data. It declares fields once, then places compact rows below them.
For AI use, this is valuable. Fewer tokens can mean lower cost, longer useful context, and faster processing. Some benchmarks report strong results, including around 40% fewer tokens in suitable cases and very high accuracy in structured tasks. But this should not be treated as magic. The result depends on the data shape. CSV may still be best for very flat tables, while JSON or YAML may sometimes work better for irregular or deeply nested data.
The practical implications are significant. You can ask an AI to summarise chapters of books in TOON while keeping the examples. A book that might take days to read can be reviewed in a couple of hours, with the structure of the ideas preserved clearly enough to study, compare, and revisit.
This is not a replacement for the original text. Compression always loses something: tone, rhythm, nuance, argument flow, and the full experience of reading. Original texts should still be used when deep understanding matters.
But for refreshing knowledge, building a study map, reviewing technical material, comparing chapters, or deciding whether a book deserves deeper reading, TOON-style summaries are priceless. They turn large bodies of text into compact, navigable knowledge without reducing everything to vague bullet points.
While my favourite is TOON, the best format depends on the task.
JSON is best for systems.
YAML is good for readable configuration.
CSV is best for simple tables.
TOON is very promising for AI-assisted learning and structured summaries.
Enough theory, let's see it in practice.
Below is a compressed version of this article represented in different formats. The numbers are approximate and example-dependent, but they illustrate the idea.
Token-Optimised Summary
Text formats matter for AI because structure affects readability, retrieval quality, token use, and cost.
JSON is reliable and widely supported, but repetitive and token-heavy.
Minified JSON is smaller, but still repetitive and harder to read.
YAML is readable and useful for configuration, but indentation-sensitive and not always token-efficient.
CSV is extremely compact for flat tables, but weak for nested or complex knowledge.
TOON is a recent MIT-licensed open format for compact, human-readable structured data.
It declares fields once and uses compact rows, making it useful for AI prompts and summaries.
TOON can reduce tokens in suitable cases, but results depend on the data shape.
Use TOON summaries for review, study maps, chapter previews, and technical refreshers.
Do not treat compressed summaries as replacements for full original texts when deep understanding matters.
JSON
{
"summary": {
"importance": "Text formats affect AI readability, retrieval quality, token use, and cost.",
"json": "Reliable and widely supported, but repetitive and token-heavy.",
"minified_json": "Smaller than pretty JSON, but still repetitive and harder to read.",
"yaml": "Readable and useful for configuration, but indentation-sensitive and not always efficient.",
"csv": "Very compact for flat tables, but weak for nested or complex knowledge.",
"toon": "Recent MIT-licensed open format for compact, human-readable structured data.",
"use_cases": "Review, study maps, chapter previews, technical refreshers.",
"caveat": "Compressed summaries lose nuance and should not replace full texts for deep understanding."
}
}
Minified JSON
{"summary":{"importance":"Text formats affect AI readability, retrieval quality, token use, and cost.","json":"Reliable and widely supported, but repetitive and token-heavy.","minified_json":"Smaller than pretty JSON, but still repetitive and harder to read.","yaml":"Readable and useful for configuration, but indentation-sensitive and not always efficient.","csv":"Very compact for flat tables, but weak for nested or complex knowledge.","toon":"Recent MIT-licensed open format for compact, human-readable structured data.","use_cases":"Review, study maps, chapter previews, technical refreshers.","caveat":"Compressed summaries lose nuance and should not replace full texts for deep understanding."}}
YAML
summary:
importance: "Text formats affect AI readability, retrieval quality, token use, and cost."
json: "Reliable and widely supported, but repetitive and token-heavy."
minified_json: "Smaller than pretty JSON, but still repetitive and harder to read."
yaml: "Readable and useful for configuration, but indentation-sensitive and not always efficient."
csv: "Very compact for flat tables, but weak for nested or complex knowledge."
toon: "Recent MIT-licensed open format for compact, human-readable structured data."
use_cases: "Review, study maps, chapter previews, technical refreshers."
caveat: "Compressed summaries lose nuance and should not replace full texts for deep understanding."
CSV
field,value
importance,"Text formats affect AI readability, retrieval quality, token use, and cost."
json,"Reliable and widely supported, but repetitive and token-heavy."
minified_json,"Smaller than pretty JSON, but still repetitive and harder to read."
yaml,"Readable and useful for configuration, but indentation-sensitive and not always efficient."
csv,"Very compact for flat tables, but weak for nested or complex knowledge."
toon,"Recent MIT-licensed open format for compact, human-readable structured data."
use_cases,"Review, study maps, chapter previews, technical refreshers."
caveat,"Compressed summaries lose nuance and should not replace full texts for deep understanding."
TOON
summary:
importance: Text formats affect AI readability, retrieval quality, token use, and cost.
formats[5]{name,strength,weakness}:
JSON,Reliable and widely supported,Repetitive and token-heavy
Minified JSON,Smaller than pretty JSON,Harder to read and still repetitive
YAML,Readable and useful for configuration,Indentation-sensitive and not always efficient
CSV,Very compact for flat tables,Weak for nested or complex knowledge
TOON,Compact human-readable structured data,Best results depend on data shape
use_cases[4]: review,study maps,chapter previews,technical refreshers
caveat: Compressed summaries lose nuance and should not replace full texts for deep understanding.
Approximate Token Comparison
Format | Approx. token count | Main point |
Original article section | ~650 | Clear for humans, but longer |
Pretty JSON | ~175 | Structured, but repetitive |
Minified JSON | ~145 | Smaller, less readable |
YAML | ~155 | Readable, moderate size |
CSV | ~120 | Very compact, flat structure |
TOON | ~115 | Compact while preserving structure |
TOON is not always the winner, but it is one of the most interesting formats for AI-assisted learning because it balances compression, structure, and human readability.
Practical AI guidance without the hype.
References
Comments