Better TF-IDF: BM25
Intro
You probably encountered BM25 numerous times when reading papers from the LLM (RAG) or information retrieval domain. This algorithm is a ranking method that computes relevance scores for documents given a user query.
If you examine the BM25 formula carefully, you will notice its similarity to the classic TF-IDF. In fact, BM25 is an enhanced version of TF-IDF, as we’ll explore shortly.
The BM25
Before we dive into the mechanism of BM25, let’s establish some notations: