语言模型中的 Weight Tying 技术

MartinLwx 发布于 2025-03-11 收录于类别 ML-DL

引言

Quote

In our model, we share the same weight matrix between the two embedding layers and the pre-softmax linear transformation - Attention is All You Need, Section 3.4. Embeddings and Softmax¹

多头注意力是什么

MartinLwx 发布于 2025-03-04 收录于类别 ML-DL

什么是多头注意力

在上一篇文章里面我们已经讲完了 Self Attention|自注意力，这里我们在自注意力的基础上多增加一点东西：加上多头注意力（Multi-Head Attention，MHA）。这个其实才是本来 Transformer 的自注意力的完全版本¹。因为大部分内容在前文已经讲完，本篇不会太长～

如何理解 Transformer 的自注意力公式

MartinLwx 发布于 2025-03-02 收录于类别 ML-DL

Info

进一步阅读:

从 Basic Block 到 Control Flow Graph

MartinLwx 发布于 2025-02-20 收录于类别 Program-Analysis Compiler

Info

注意：三地址码是 Basic Block（BB）的基础，而 Basic Block 是 Control Flow Graph（CFG）的基础，因此在阅读本文之前，你最好了解一下三地址码，可以参考我写好的上一篇博客

三地址码(3AC/TAC)是什么

MartinLwx 发布于 2025-02-18 收录于类别 Program-Analysis

信息

进一步阅读

GraphRAG 工作流

MartinLwx 发布于 2025-02-11 收录于类别 ML-DL

Motivation

当前的 RAG 技术无法回答关于语料库的全局性问题，比如“这个数据集的主题是什么”。这一类问题不是可以通过检索增强技术解决的，因为答案一般不在某一段文本里面，正确答案需要理解整个语料库并给出抽象的总结，作者称这类问题为 query-focused summarization (QFS) 问题¹。普通的 RAG 技术无法很好处理这个问题。