top of page
Search

AI-Optimised Text Formats: Why Structure Matters

  • Writer: annaboten101
    annaboten101
  • May 26
  • 5 min read

Text formats matter.

The same information can be written in many ways, but some formats are much better than others when we speak about AI retrieval quality, token usage, cost, and human readability.

JSON is one of the most common formats for structured data. In many ways, it is the gold standard: reliable, widely supported, and excellent for software systems. Its weakness is repetition. Field names, brackets, quotes, and punctuation are repeated again and again. Pretty-printed JSON is easier for humans to read, but it is token-heavy. Minified JSON is smaller, but still repetitive and less pleasant to inspect.

YAML is more human-friendly. It uses indentation instead of many brackets, which can make it easier to read and write. It is useful for configuration files, notes, and structured documents. However, YAML can become ambiguous, indentation-sensitive, and sometimes surprisingly large in token count. It is readable, but not always efficient.

CSV is extremely compact for flat tables. It avoids repeated keys and keeps only rows and values. This makes it highly token-efficient. Its weakness is structure: CSV struggles with nested information, mixed data types, and complex relationships. It is excellent for spreadsheets and simple datasets, but poor for rich knowledge.

TOON, or Token-Oriented Object Notation, is especially interesting. It is a recent open-source, MIT-licensed format associated with Johann Schopplich's work. TOON combines YAML-like readability for nested objects with CSV-like rows for uniform data. It declares fields once, then places compact rows below them.

For AI use, this is valuable. Fewer tokens can mean lower cost, longer useful context, and faster processing. Some benchmarks report strong results, including around 40% fewer tokens in suitable cases and very high accuracy in structured tasks. But this should not be treated as magic. The result depends on the data shape. CSV may still be best for very flat tables, while JSON or YAML may sometimes work better for irregular or deeply nested data.

The practical implications are significant. You can ask an AI to summarise chapters of books in TOON while keeping the examples. A book that might take days to read can be reviewed in a couple of hours, with the structure of the ideas preserved clearly enough to study, compare, and revisit.

This is not a replacement for the original text. Compression always loses something: tone, rhythm, nuance, argument flow, and the full experience of reading. Original texts should still be used when deep understanding matters.

But for refreshing knowledge, building a study map, reviewing technical material, comparing chapters, or deciding whether a book deserves deeper reading, TOON-style summaries are priceless. They turn large bodies of text into compact, navigable knowledge without reducing everything to vague bullet points.

While my favourite is TOON, the best format depends on the task.

  • JSON is best for systems.

  • YAML is good for readable configuration.

  • CSV is best for simple tables.

  • TOON is very promising for AI-assisted learning and structured summaries.

Enough theory, let's see it in practice.

Below is a compressed version of this article represented in different formats. The numbers are approximate and example-dependent, but they illustrate the idea.

Token-Optimised Summary

Text formats matter for AI because structure affects readability, retrieval quality, token use, and cost.
JSON is reliable and widely supported, but repetitive and token-heavy.
Minified JSON is smaller, but still repetitive and harder to read.
YAML is readable and useful for configuration, but indentation-sensitive and not always token-efficient.
CSV is extremely compact for flat tables, but weak for nested or complex knowledge.
TOON is a recent MIT-licensed open format for compact, human-readable structured data.
It declares fields once and uses compact rows, making it useful for AI prompts and summaries.
TOON can reduce tokens in suitable cases, but results depend on the data shape.
Use TOON summaries for review, study maps, chapter previews, and technical refreshers.
Do not treat compressed summaries as replacements for full original texts when deep understanding matters.

JSON

{
  "summary": {
    "importance": "Text formats affect AI readability, retrieval quality, token use, and cost.",
    "json": "Reliable and widely supported, but repetitive and token-heavy.",
    "minified_json": "Smaller than pretty JSON, but still repetitive and harder to read.",
    "yaml": "Readable and useful for configuration, but indentation-sensitive and not always efficient.",
    "csv": "Very compact for flat tables, but weak for nested or complex knowledge.",
    "toon": "Recent MIT-licensed open format for compact, human-readable structured data.",
    "use_cases": "Review, study maps, chapter previews, technical refreshers.",
    "caveat": "Compressed summaries lose nuance and should not replace full texts for deep understanding."
  }
}

Minified JSON

{"summary":{"importance":"Text formats affect AI readability, retrieval quality, token use, and cost.","json":"Reliable and widely supported, but repetitive and token-heavy.","minified_json":"Smaller than pretty JSON, but still repetitive and harder to read.","yaml":"Readable and useful for configuration, but indentation-sensitive and not always efficient.","csv":"Very compact for flat tables, but weak for nested or complex knowledge.","toon":"Recent MIT-licensed open format for compact, human-readable structured data.","use_cases":"Review, study maps, chapter previews, technical refreshers.","caveat":"Compressed summaries lose nuance and should not replace full texts for deep understanding."}}

YAML

summary:
  importance: "Text formats affect AI readability, retrieval quality, token use, and cost."
  json: "Reliable and widely supported, but repetitive and token-heavy."
  minified_json: "Smaller than pretty JSON, but still repetitive and harder to read."
  yaml: "Readable and useful for configuration, but indentation-sensitive and not always efficient."
  csv: "Very compact for flat tables, but weak for nested or complex knowledge."
  toon: "Recent MIT-licensed open format for compact, human-readable structured data."
  use_cases: "Review, study maps, chapter previews, technical refreshers."
  caveat: "Compressed summaries lose nuance and should not replace full texts for deep understanding."

CSV

field,value
importance,"Text formats affect AI readability, retrieval quality, token use, and cost."
json,"Reliable and widely supported, but repetitive and token-heavy."
minified_json,"Smaller than pretty JSON, but still repetitive and harder to read."
yaml,"Readable and useful for configuration, but indentation-sensitive and not always efficient."
csv,"Very compact for flat tables, but weak for nested or complex knowledge."
toon,"Recent MIT-licensed open format for compact, human-readable structured data."
use_cases,"Review, study maps, chapter previews, technical refreshers."
caveat,"Compressed summaries lose nuance and should not replace full texts for deep understanding."

TOON

summary:
  importance: Text formats affect AI readability, retrieval quality, token use, and cost.
  formats[5]{name,strength,weakness}:
    JSON,Reliable and widely supported,Repetitive and token-heavy
    Minified JSON,Smaller than pretty JSON,Harder to read and still repetitive
    YAML,Readable and useful for configuration,Indentation-sensitive and not always efficient
    CSV,Very compact for flat tables,Weak for nested or complex knowledge
    TOON,Compact human-readable structured data,Best results depend on data shape
  use_cases[4]: review,study maps,chapter previews,technical refreshers
  caveat: Compressed summaries lose nuance and should not replace full texts for deep understanding.

Approximate Token Comparison

Format

Approx. token count

Main point

Original article section

~650

Clear for humans, but longer

Pretty JSON

~175

Structured, but repetitive

Minified JSON

~145

Smaller, less readable

YAML

~155

Readable, moderate size

CSV

~120

Very compact, flat structure

TOON

~115

Compact while preserving structure

TOON is not always the winner, but it is one of the most interesting formats for AI-assisted learning because it balances compression, structure, and human readability.

Practical AI guidance without the hype.

References

 
 
 

Recent Posts

See All

Comments


© 2025 by AI-M Group. Powered and secured by Wix

bottom of page