Handing Obsidian Over to AI: My Experience with LLM Wiki

MartinLwx included in category Tool

2026-06-03 2026-06-04 2744 words 13 minutes

Contents

Background

It has been roughly two months since Andre Karpathy released the LLM Wiki idea in his gist. And I have been using my LLM Wiki for over one month. Today, I’m writing this post to share my thoughts.

Before LLM Wiki emerged, I had already been using Obsidian extensively for several years to take a large number of notes—over 10,000 in total. The high count is largely due to my adherence to the principle of atomic notes, where I create a separate note for each logically self-contained concept. Over time, this has allowed me to build a personal knowledge system and benefit from the compounding effects of structured knowledge.

However, maintaining this system comes with several pain points:

Citations are tedious to maintain: When taking notes, I try to use footnotes ([^]) to reference sources. This ensures that when I revisit a note later, I can quickly trace where factual statements, logical inferences, or quantitative data came from. While Obsidian already offers some convenience—for example, allowing footnotes to be edited in a floating window—the overall process is still quite monotonous and cumbersome.
Maintaining bidirectional links is inconvenient: One of Obsidian’s key features is the ability to create links between notes using the [[wikilink]] syntax. It’s a powerful idea, but in practice, maintaining these links manually is inconvenient. After creating a new note, I often have to go back and manually add links from related notes, and it’s easy to miss connections.
The note-taking process itself is dull: Writing notes is inherently repetitive and unengaging. It requires constant formatting and annotation—using $...$ for math, **...** for bold text, and verbose Markdown syntax for tables and other structures.

Tip

I admit that taking notes manually may be helpful for learning purpose, as it forces you to summarize the knowledge in your language :)

Because of the issues described above, my note system gradually became somewhat disorganized. Some notes were missing citations, others lacked proper bidirectional links—none of it was in a fully consistent or “ideal” state.

To address this, I adopted a manual curation strategy: I maintain a dedicated folder and treat it as a “standard-compliant” subset of my knowledge base. Whenever I come across a note that does not meet the standard, I manually revise it and gradually migrate it into this folder. The long-term goal is to bring the entire knowledge base into this ideal form.

The figure below shows the “knowledge graph” of my earlier wiki system. Each visible cluster represents effort I invested in linking and organizing notes :(

It is not difficult to see the underlying trade-off: note-taking is valuable because it enables long-term knowledge accumulation, retrieval, and reuse. However, maintaining a perfectly structured note system requires substantial cognitive and operational overhead. At some point, the cost of maintaining the system can potentially exceed the benefit it provides.

The introduction of LLM Wiki partially alleviates this issue, but it also introduces new problems, which I will discuss later.

LLM Wiki Architecture

Note

The content in this section is taken from my LLM Wiki, which was generated by DeepSeek V4 Pro after ingesting Andrej Karpathy’s gist using the setup described in this article. I’ve simply organized it into a format more suitable for article reading to demonstrate the results. The diagram was drawn by me manually :)

LLM Wiki is a pattern for incrementally building and maintaining a persistent Wiki knowledge base using LLMs (Large Language Models). The core idea is: when users add new source documents, the LLM doesn’t merely index them for later retrieval — it actively reads, extracts key information, and integrates it into the existing Wiki — updating entity pages, revising topic summaries, flagging conflicts between old and new data, and reinforcing or challenging existing synthesis. Knowledge is compiled once and continuously maintained, rather than being re-derived on every query. The Wiki is a persistent, compound artifact: cross-references are already in place, contradictions are flagged, and synthesis reflects everything that has been read.

Its core principles are:

Persistent Compilation: Knowledge is compiled into structured markdown pages and continuously maintained, rather than being re-derived from raw documents on every query. The Wiki is a constantly enriching, compound artifact.
Incremental Integration: Each time a new source document is added, the LLM not only creates its summary but also updates all related entity and concept pages and revises cross-references. A single source document may touch 10-15 Wiki pages.
Cross-Referencing & Consistency: Cross-references within the Wiki are automatically maintained by the LLM. When new data contradicts old conclusions, the LLM flags the contradiction and updates the synthesis. Humans don’t need to maintain consistency manually.
Human-Machine Division of Labor: Humans curate source documents, guide the direction of analysis, and ask good questions; the LLM handles all the “grunt work” — summarization, cross-referencing, archiving, and consistency maintenance. Obsidian is the IDE, the LLM is the programmer, and the Wiki is the codebase.

The LLM Wiki architecture consists of three layers:

Raw Sources: A human-curated collection of source documents — articles, papers, images, data files. The LLM reads from these but is never allowed to modify them. This is the immutable source of knowledge.
Wiki: An LLM-generated collection of markdown files — summary pages, entity pages, concept pages, comparison pages, overview pages, synthesis pages. The LLM fully owns this layer: creating pages, updating pages when new source documents arrive, maintaining cross-references, and ensuring site-wide consistency.
Schema: A document (such as CLAUDE.md or AGENTS.md) that defines the Wiki’s structure, naming conventions, and workflows. This is the critical configuration document — it makes the LLM a disciplined Wiki maintainer.

Two special files help the LLM (and humans) navigate as the Wiki grows:

INDEX.md: Content-oriented. A directory of all pages in the Wiki — each page includes a link, a one-line summary, and optional metadata (date, source document count). Organized by category (entities, concepts, sources, etc.). The LLM updates it after every Ingest. When querying, the LLM first reads the INDEX to find relevant pages before diving deeper. This navigation approach is surprisingly effective at moderate scale.
LOG.md: Time-oriented. An append-only operation log — dates and descriptions of Ingests, Queries, and Lints. If each entry starts with a consistent prefix (e.g., ## [2026-04-02] ingest | Article Title), the log can be parsed by simple Unix tools. The log provides a timeline of the Wiki’s evolution and gives the LLM context about recent operations.

The 3 Key Operations of LLM Wiki

Note

The content in this section is also taken from my LLM Wiki, generated by DeepSeek V4 Pro after ingesting Andrej Karpathy’s gist using the setup described in this article. I’ve simply organized it into a format more suitable for article reading to demonstrate the results.

There are 3 important operations in LLM Wiki:

Ingest

Ingest is the core operation for integrating a new source document into the Wiki. The typical workflow is:

🧑‍💻/👩‍💻 — The user places a new source document into the Raw Sources collection and notifies the LLM to process it.
🤖 — The LLM reads the source document, discusses key points with the user, writes a summary page in the Wiki, updates INDEX.md, updates all related entity and concept pages, and finally appends an operation record in LOG.md

A single source document may touch 10-15 Wiki pages. Users can ingest one at a time (single-shot, high engagement) or in batches (multi-shot, low supervision), depending on personal workflow preferences.

Query

Query is the process by which users ask questions through the Wiki and receive answers. The typical workflow is:

🧑‍💻/👩‍💻 — The user asks a question about the Wiki.
🤖 — The LLM first searches for relevant pages, reads the content, synthesizes an answer, and includes citations. Answers can take various forms — markdown pages, comparison tables, slides (Marp), charts(matplotlib), canvas, etc.

The key insight is: good answers can be saved back as new Wiki pages. Comparisons, analyses, and discovered connections raised by users — these are all valuable knowledge that shouldn’t disappear into chat history. This allows the exploration process to compound within the knowledge base just like source documents do.

Lint

Lint is the operation of periodically running health checks on the Wiki. The typical workflow is:

🧑‍💻/👩‍💻 — The user asks the LLM to inspect for issues, such as: contradictions between pages, outdated claims superseded by newer source documents, orphaned pages with no inbound links, important concepts mentioned but lacking standalone pages, missing cross-references, or data gaps that could be filled through web searches.
🤖 — Based on rules set by the user, the LLM inspects existing pages and proposes corrections along with implementing the changes.

Lint keeps the Wiki system evolving healthily.

Pros and Cons

The technical approach behind LLM Wiki is appealing because it directly addresses many of the maintenance pain points in a manually curated wiki system. It reduces the overhead of keeping the system consistent and shifts effort away from bookkeeping toward higher-level cognitive work.

However, it quickly becomes clear that the approach has non-trivial limitations.

First, LLM Wiki may not scale well. In the original gist, Andrej Karpathy suggests that handling a few hundred wiki pages is feasible. The question is what happens when the system grows to thousands or even tens of thousands of pages. At some point, a global INDEX.md may no longer fit within an agent’s context window. This raises an issue: how does the agent efficiently locate relevant pages, and how does it reason about relationships between notes without full visibility of the index?

Second, there is the problem of hallucination propagation. During ingestion, the LLM reads source material and generates structured notes. However, the model may introduce inaccuracies or even fully fabricated interpretations due to hallucination. If such errors are stored in the wiki, they become part of the retrieval content. Later, during querying, the system may retrieve and rely on these corrupted entries, leading to compounding errors. In that case, the reliability of the entire wiki system becomes difficult to guarantee.

I see the same concerns in the Internet. After some research, I found an interesting LLM Wiki implementation - LLM Wiki Compiler - that may solve this issue. To summarize, the LLM Wiki Compiler suggests that there should be a coverage indicator tag in each section with this format [coverage: {level} -- {N} sources]. The level here can be one of high, medium, low, and each of them has its corresponding N range (the count of source files contributed to this section). The tag tells you and the agent if this section can be trusted directly. This is an example¹:

## Summary [coverage: high -- 15 sources]
...trust this, it's well-sourced...

## Experiments & Results [coverage: medium -- 3 sources]
...decent overview, check raw files for details...

## Gotchas [coverage: low -- 1 source]
...read the raw gotchas.md directly...

Apparently, the more sources contributed to a section, the better. We should also teach the LLM how to use this coverage indicator effectively¹

[coverage: high] – trust this section, skip the raw files.
[coverage: medium] – good overview, check raw sources for granular questions.
[coverage: low] – read the raw sources listed in that section directly.

In my opinion, the design of the coverage indicator is amazing. It builds the lineage in trusted raw sources and the summarized wiki page. The coverage indicator can be consumed both by a human and an agent. I started to build my LLM Wiki after I saw this strategy :)

LLM Wiki in practice

Tip

In the beginning, I used the OpenCode with three customized commands to build the LLM Wiki. I also refine the ingest command, query command and lint command. However, I decided to switch from OpenCode + Commands to Hermes Agent + SKILL for these reasons:

Hermes Agent has the self-improving ability that can enhance my LLM WIKI skill as time goes by.
With Hermes Gateway set up, I can trigger the ingest procedure in my mobile phone.
The Hermes Agent has the built Obsidian SKILL. It knows how to write an Obsidian markdown file.

If you have read the comments section in Andrej Karpathy’s gist, you will find plenty of implementations. In my opinion, the core principle of LLM Wiki is quite simple and intuitive. Writing dedicated software for this would be overkill. Given the existing Obsidian base and regular interaction through Hermes Agent and LLMs, the setup can already function as a full LLM Wiki system using Obsidian + Hermes. A concrete implementation can be structured as follows:

(optional) Obsidian Web Clipper: automatically converts web content into Markdown and stores it in the Obsidian vault. This is optional because Hermes can also handle URLs directly and process them into structured notes.
(optional) Obsidian CLI: an official CLI tool for interacting with the Obsidian vault. In many cases, it provides a more efficient interface than raw Bash operations when automating vault-level tasks.
LLM Wiki SKILL: a comprehensive skill that instructs the LLM on what an LLM wiki is, and defines standardized workflows for ingestion, querying, and linting. This is the core behavioral layer of the system.
AGENTS.md: a schema document describing how the wiki is organized, including directory structure, file naming conventions, and required note templates or structural rules for each page.

Among these components, the most critical is the LLM Wiki SKILL. This is also the most complex part to design properly. However, it is worth noting that a substantial version of it already exists in Hermes Agent’s built-in SKILL set. The implementation can be found in the official documentation.

A practical recommendation is to start from this existing implementation and adapt it to your own workflow. The default version is general-purpose and may not align with individual constraints or preferences.

In my own setup, I extended the official LLM Wiki SKILL mainly in two areas:

Wiki organization model: definition of directory structure and the semantic meaning of each folder
Coverage Indicator mechanism: formalization of what it means, how it is computed, and how it is used during ingestion and review

You can find my implementation here.

Another component is SCHEMA.md, and I should skip this because different people have unique structure requirements for a wiki page and the whole wiki system.

Here I would like to show a concrete example of why I am choosing Hermes: The combination of skill self-improving and memory mechanism is just incredible. In short words, the agent just understands you. Yesterday I complained about the misalignment between high/medium/low and N. Hermes find the fault patterns and improves my skill. It forces the health check procedure after ingestion. It also adds a memory in its memory file.

Health check is now Step 5 of the my-llm-wiki Ingest workflow (references/audit.py).
Run BEFORE updating INDEX/LOG. Recurring mistakes: (1) coverage N counts footnote
references instead of unique source docs → [^1],[^2],[^3] all pointing to
same doc = N=1; (2) footnote-free sections wrongly tagged N>0; 
(3) source_cnt ≠ actual unique sources. 
Also: wikilink inline at point of mention, not just footer.

Wrap-up

After a month of intensive use, my note-taking workflow has changed significantly.

Before: I would take notes while reading, resulting in a loop that looked like Read → Take notes → Read → ..... Sometimes I read articles on my mobile phone, but typing on a mobile device is just too inefficient. In those cases, I would usually save the article and wait until I got back to my computer before processing it properly.
Now: Regardless of the device I’m using, I can focus entirely on understanding the content itself. Once I finish reading an article, I typically ask Hermes to ingest it into my LLM Wiki. I then review the generated output to ensure it follows my formatting requirements and captures the points I actually care about. If it doesn’t, I simply ask Hermes to revise it.

The Coverage Indicator design helps mitigate hallucination to some extent, but the scalability problem remains unresolved. Once a wiki reaches a certain size, several challenges begin to emerge:

Agents may struggle to discover relevant pages reliably.
The maintenance burden of the wiki continues to grow.
Consistency across related pages becomes increasingly difficult to enforce.

These are all legitimate concerns. Nevertheless, based on my personal experience so far, LLM Wiki has produced a net positive impact on my workflow. More importantly, I am optimistic that the community will eventually develop robust solutions to these problems—whether through better algorithms, stronger engineering practices, or perhaps simply because future generations of LLMs will operate at an entirely different level 😈️

Coverage Indicator - LLM Wiki Compiler ↩︎ ↩︎