
Merkle Trees
- Each leaf node = hash of a data block (file). Non-leaf nodes = hash of children.
- Any change propagates up to the root hash, enabling fast change detection.
Cursor's Process
- Chunking: Split code into semantic units (functions, classes) via AST
- Merkle tree: Compute hashes for each chunk, sync with server
- Embedding: OpenAI or custom models vectorize chunks
- Vector DB: Stored in Turbopuffer with obfuscated file paths
- Change detection: Every 10 minutes, compare Merkle hashes -- upload only changed files
- RAG: When user queries (@Codebase), retrieve relevant chunks for LLM context
Security
- Code not stored in database -- deleted after request
- File paths obfuscated with secret keys
- Git integration for team collaboration
Key Insight
"Merkle trees are a fast, secure fingerprint system for detecting changes in data."