Drain: simgple but effective log parsing algorithm
Backgroup
Web service platforms generate a large volume of unstructured logs. However, machine learning and data analysis require structured input.
Therefore, extracting structured information from unstructured logs is a critical problem. A naive approach is to use regular expressions to extract structured information. However, this method has some drawbacks1
- The volume of logs is so large that manually crafting regex patterns is impractical.
- Logs may come from different components, each with its own log format.
The Drain algorithm was proposed in 2017. At that time, many log parsing algorithms focused on offline batch processing. However, logs in web service platforms are generated as a stream. Therefore, the drain algorithm focuses on online stream processing.