LLM - Tag - MartinLwx's Blog

LLM

2025

Async + Leaky Bucket: How to Batch LLM API Calls Efficiently 06-18

Transformer architecture variation: Rotary Position Embedding (RoPE) 05-24

Transformer architecture variation: RMSNorm 05-11

Weight Tying in Language Models: A Technique to Parameter efficiency 03-11

What is Multi-Head Attention (MHA) 03-04

An Explanation of Self-Attention mechanism in Transformer 03-02

The Flow of GraphRAG 02-12

2024

Reading Notes: Generalization through Memorization: Nearest Neighbor Language Models 12-23

Reading Notes: In-Context Retrieval-Augmented Language Models 12-04

2023

LLM inference optimization - KV Cache 10-12

BPE Tokenization Demystified: Implementation and Examples 08-24