Glossary

RAG (Retrieval-Augmented Generation)

RAG, or Retrieval-Augmented Generation, is an AI pattern where a model retrieves relevant passages from a source document before generating an answer, grounding output in the source.

A vanilla large language model answers from its training data, which is fixed at the moment of training and prone to hallucination. RAG fixes this by adding a retrieval step: when the user asks a question, the system searches a corpus (a book, a PDF, a knowledge base) for the most relevant passages, hands those passages to the LLM as context, and only then asks the model to answer.

The result is an answer that cites specific passages and is constrained by the source rather than by the model’s training. This makes RAG the standard pattern for "chat with your document" features, customer-support bots over a help centre, and AI search engines that need to attribute claims.

RAG quality depends on three pieces: how the source is chunked (paragraph, section, sentence), how chunks are embedded for similarity search, and how the model is prompted to use the retrieved context. A weak link in any of these produces answers that are technically grounded but practically wrong.

Where Summio fits

Summio uses RAG to ground every claim in a summary or chat response back to a passage in the original book, video transcript, article, or PDF. If the engine cannot cite a passage, it does not print the claim — that is the source-grounding policy that makes Summio trustworthy enough for professional reading.

Read more about Summio →

Common questions

How is RAG different from fine-tuning?

Fine-tuning bakes new knowledge into the model weights — slow, expensive, and only updateable by re-training. RAG keeps the model frozen and changes the retrieved context per query, so updating knowledge means re-indexing a document, not retraining anything.

Does RAG eliminate hallucinations?

It reduces them substantially but does not eliminate them. The model can still misread the retrieved passage, ignore it, or stitch together an incorrect synthesis from correct snippets. Source-citation policies (refuse to answer if no passage supports the claim) close most of the remaining gap.

Where is RAG used?

AI search engines (Perplexity, ChatGPT search, Claude with web access), enterprise document chat, customer-support bots over a help centre, and reading apps like Summio. Anywhere an answer needs to be attributable to a specific source.