Skip to content
DocuMind

100%

Citation Coverage

Every answer includes file name and page number

94%

Answer Accuracy

Measured on internal 200-question eval set

2.1s

Avg Response Time

End-to-end including retrieval and generation

99.2%

Uptime

3-provider fallback chain handles provider outages

Problem

Research teams and legal professionals routinely work across dozens of PDFs simultaneously — case files, academic papers, regulatory documents — and the bottleneck is not reading speed, it is locating the exact passage that supports a claim. Manual search across large document sets takes hours and introduces the risk of missing critical information buried in a file that was only partially reviewed.

Existing AI-powered document tools compound the problem rather than solve it. Retrieval-augmented generation systems that skip citation enforcement will confidently state a fact while pointing to the wrong document or no document at all. For legal work in particular, an answer without a traceable source is worthless — and an answer with a fabricated source is actively harmful.

The hard requirement driving every architectural decision: every single answer must cite the exact file name and page number of its source. No exceptions, no graceful degradation. If the system cannot produce a verifiable citation, it must refuse to answer rather than speculate.

Approach

A stateless multi-document RAG pipeline was the foundation. Stateless because session affinity would become a scaling constraint the moment traffic spiked — each request carries everything the backend needs, and any instance can serve it. LangGraph drives the agent layer, routing between a retrieval node and a citation verification node before any response reaches the client. The verification node is not optional middleware; it is a hard gate that blocks any answer lacking a file name and page number.

Each uploaded PDF is processed at upload time: extracted, split into overlapping chunks, embedded, and written to both a local FAISS index for in-session speed and Pinecone Serverless for persistence across sessions. At query time, retrieval runs across all indexed documents simultaneously rather than sequentially, keeping latency flat regardless of how many files are in the session.

Provider reliability was a real constraint. A single LLM provider going down mid-session would mean a broken product with no recourse. A three-provider fallback chain — Gemini Flash as primary, Groq Llama as first fallback, OpenRouter as final fallback — means the system stays live through individual provider outages. The chain is transparent to the user; responses stream continuously via Server-Sent Events regardless of which provider is serving the request.

Stack

Each technology was chosen to solve a specific constraint.

  • FastAPI — Async-first Python framework; native SSE support for streaming LLM responses
  • LangChain — Document loading, text splitting, and embedding orchestration
  • LangGraph — Multi-agent graph for routing between retrieval and citation verification nodes
  • FAISS — Local vector index for fast in-session similarity search
  • Pinecone Serverless — Persistent cross-session vector storage with zero infrastructure management
  • React + Tailwind CSS — Rapid UI iteration; Tailwind's utility classes match the component-per-feature architecture
  • Upstash Redis — Serverless key-value store for session state; no cold starts
  • DigitalOcean Spaces — S3-compatible object storage for raw PDF files
  • Docker — Reproducible build environment; single container deploy
  • DigitalOcean — Predictable monthly pricing vs. AWS; adequate for this traffic profile

Results

Measured against an internal 200-question evaluation set built from real research and legal document workflows.

100%Citation CoverageEvery answer includes file name and page number
94%Answer AccuracyMeasured on internal 200-question eval set
2.1sAvg Response TimeEnd-to-end including retrieval and generation
99.2%Uptime3-provider fallback chain handles provider outages

The system handles up to 50MB of PDFs per session across multiple files.

[ HIRE / COLLABORATE / TALK SHOP ]

If you're building an AI product
and need someone who'll
treat your codebase like their own —