DocuClaw
YOUR DOCUMENTS. YOUR RULES.
Open-source, local-first, AI-powered document intelligence. Extract, organize, and archive invoices, receipts, and contracts — 100% on your machine.
View on GitHub →⟩ Quick Start
# Clone & install $ git clone https://github.com/astonysh/DocuClaw.git $ cd DocuClaw && pip install -e . # Process a document $ docuclaw process \ --entity-id "org_mycompany_01" \ --country DE \ --input ./scans/invoice.png
⟩ What It Does
100% Sovereign
All data stays on YOUR machine. Zero cloud dependency. Zero telemetry. Your privacy is non-negotiable.
Multi-Entity
Manage personal docs, company invoices, and team files — all in one install. Separate or combine as you wish.
Plugin Architecture
Country-specific parsers snap in like LEGO bricks. Germany, US, China — extend DocuClaw for any locale.
Markdown-Native
Every document becomes a searchable .md file with structured YAML frontmatter. Human-readable, version-controllable.
AI-Powered Extraction
Multimodal LLM extracts structured data from scans, photos, and emails. Works with Ollama, OpenAI, or any model.
Compliance-Ready
Designed with GoBD (Germany), GDPR, and audit-trail principles baked in. Enterprise-grade from day one.
⟩ GDPR & Compliance
DocuClaw is designed from the ground up with EU GDPR compliance in mind. By keeping all data processing local and giving you full control, DocuClaw eliminates the most common compliance risks associated with cloud-based document management.
Local-First by Design
No data leaves your machine — ever. No third-party servers, no cross-border data transfers, no sub-processors. Full compliance with GDPR Articles 44–49 on international data transfers by simply not transferring data at all.
Data Minimization
DocuClaw only extracts and stores the structured fields you define. No hidden telemetry, no usage analytics, no behavioral tracking. Aligned with GDPR Article 5(1)(c) — data minimization principle.
Right to Erasure
Since all data is stored as plain Markdown files on your local filesystem, exercising the right to erasure (GDPR Article 17) is as simple as deleting a file. No vendor lock-in, no deletion request tickets.
Audit Trail & Accountability
Built-in audit logging and hash-chain integrity verification support GDPR Article 5(2) accountability requirements and GoBD (Germany) compliant archival standards.
⟩ Architecture
┌─────────────────────────────────────────────┐ │ CLI / API │ ├─────────────────────────────────────────────┤ │ Core Engine │ │ ┌──────────┐ ┌──────────┐ ┌───────────┐ │ │ │ Schema │ │ Storage │ │ Registry │ │ │ │(Pydantic) │ │ Layer │ │ (Plugin) │ │ │ └──────────┘ └──────────┘ └───────────┘ │ ├─────────────────────────────────────────────┤ │ Parser Plugins │ │ ┌────────┐ ┌────────┐ ┌──────────────┐ │ │ │ DE 🇩🇪 │ │ US 🇺🇸 │ │ Custom ... │ │ │ │Invoice │ │Invoice │ │ Your Parser │ │ │ └────────┘ └────────┘ └──────────────┘ │ ├─────────────────────────────────────────────┤ │ Input Adapters (Future) │ │ 📷 Scanner │ 📧 Email │ 🔗 Webhook │ 🔌 API │ └─────────────────────────────────────────────┘
⟩ The Data Contract
Every document, whether a €10K enterprise invoice or a personal electricity bill, is normalized into a universal Markdown schema with structured YAML frontmatter.
--- id: doc_20260215_a1b2c3d4 entity_id: "org_acme_01" entity_type: "company" source_type: physical_mail country: DE document_type: b2b_invoice date_received: "2026-02-15" sender_name: "AWS EMEA SARL" amount_total: 125.50 currency: EUR status: pending tags: [IT_Infrastructure, Q1_Expense] ---
⟩ How It Works
⟩ AI-Powered Output
DocuClaw doesn't just archive your documents — it makes them actionable. Through AI agent integration, your structured data becomes a living knowledge base that can answer questions, automate workflows, and feed directly into the tools you already use.
Ask Your Documents
Chat with your document archive through an AI agent. "How much did I spend on AWS last quarter?" "When does my lease expire?" — Get instant answers from your own data.
Calendar & Reminders
Auto-extract payment due dates, contract renewals, and deadlines from your documents and sync them to your calendar. Never miss a deadline again.
Tax Filing & Reports
Generate tax-ready summaries, expense reports, and financial overviews directly from your archived invoices and receipts. Export in formats your accountant or tax software expects.
To-Do & Task Lists
Automatically create action items from documents — "Pay invoice #4521 by March 15", "Renew insurance policy", "Submit quarterly VAT return" — and push them to your task manager.
Third-Party Systems
Generate and submit data in the exact format required by accounting software (DATEV, Xero, QuickBooks), ERP systems, government portals, and banking platforms — all from your local archive.
Custom Analytics
Build custom dashboards and reports from your document data. Track spending trends, vendor relationships, contract status, and compliance metrics — all processed locally.