When you feed documents to a language model, format matters as much as content. Markdown consistently outperforms the alternatives.
It’s token-efficient
HTML and XML waste tokens on tags and attributes. Markdown conveys the same structure — headings, lists, tables, emphasis — with a fraction of the overhead.
It preserves structure
Unlike a flat text dump from a PDF, Markdown keeps the document hierarchy intact, which helps models reason about sections and relationships.
It chunks cleanly for RAG
- Headings give natural chunk boundaries.
- Tables stay readable after splitting.
- Links and code blocks survive embedding.
That’s why Kit for AI converts every document to clean Markdown first — and optionally extracts JSON on top.