Reading Notes: In-Context Retrieval-Augmented Language Models
The idea
In-Context RALM1 is the RAG technology for Autoregressive LM. In summary, the RAG technology involves using a retriever during model inference to fetch relevant documents, which are then concatenated with the origin input.
In the In-Context Learning setting, some examples are placed before the user’s input, and then they are fed to LLM. Similarly, the In-Context RALM works in a similar way: it directly concatenates the most relevant retrieved document in front of the model’s input. The advantage is that there’s no need to retrain the LLM. A diagram created with Mermaid is shown below.