<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0">
    <channel>
        <title>Software-Engineering - Tag - MartinLwx&#39;s Blog</title>
        <link>https://martinlwx.github.io/en/tags/software-engineering/</link>
        <description>Software-Engineering - Tag - MartinLwx&#39;s Blog</description>
        <generator>Hugo -- gohugo.io</generator><language>en</language><managingEditor>martinlwx@163.com (MartinLwx)</managingEditor>
            <webMaster>martinlwx@163.com (MartinLwx)</webMaster><copyright>&lt;a rel=&#34;license noopener&#34; href=&#34;https://creativecommons.org/licenses/by-nc-nd/4.0/&#34; target=&#34;_blank&#34;&gt;CC BY-NC-ND 4.0&lt;/a&gt;</copyright><lastBuildDate>Mon, 20 Apr 2026 00:32:07 &#43;0800</lastBuildDate><atom:link href="https://martinlwx.github.io/en/tags/software-engineering/" rel="self" type="application/rss+xml" /><item>
    <title>Drain: simgple but effective log parsing algorithm</title>
    <link>https://martinlwx.github.io/en/log-parsing-algorithm-drain/</link>
    <pubDate>Mon, 20 Apr 2026 00:32:07 &#43;0800</pubDate><author>
        <name>MartinLwx</name>
    </author><guid>https://martinlwx.github.io/en/log-parsing-algorithm-drain/</guid>
    <description><![CDATA[<h2 id="backgroup" class="headerLink">
    <a href="#backgroup" class="header-mark"></a>Backgroup</h2><p>Web service platforms generate a large volume of unstructured logs. However, machine learning and data analysis require structured input.</p>
<p>Therefore, <em>extracting structured information from unstructured logs</em> is a critical problem. A naive approach is to use regular expressions to extract structured information. However, this method has some drawbacks<sup id="fnref:1"><a href="#fn:1" class="footnote-ref" role="doc-noteref">1</a></sup></p>
<ul>
<li>The volume of logs is so large that manually crafting regex patterns is impractical.</li>
<li>Logs may come from different components, each with its own log format.</li>
</ul>
<p>The Drain algorithm was proposed in 2017. At that time, many log parsing algorithms focused on <em>offline batch processing</em>. However, logs in web service platforms are generated as a stream. Therefore, the drain algorithm focuses on <em>online stream processing</em>.</p>]]></description>
</item></channel>
</rss>
