Production-ready structured data extraction system supporting Vision LLMs and pluggable workflow orchestration for invoices, bank statements, financial tables, and more.
Overview#
Sparrow is a production-ready structured data extraction system developed by Katana ML (lead developer Andrej Baranovskij). It focuses on converting unstructured documents — invoices, receipts, bank statements, financial tables, and more — into structured JSON data with high precision and automation.
Architecture#
The project uses a monorepo multi-app architecture with three top-level directories:
sparrow-data/: Sample test data (bonds_table.png, bank_statement.pdf, invoice.pdf, etc.)sparrow-ml/: Core computation engine, containingllm/(main API server & CLI) andagents/(workflow orchestration service)sparrow-ui/: Gradio-based Web interactive interface
Pipeline Engines#
- Sparrow Parse: Vision LLM-based visual structured extraction pipeline, supporting Mistral, Qwen 2.5-VL, DeepSeek OCR, dots.ocr, dots-mocr, and more
- Sparrow Instructor: Text LLM-based instruction processing, validation, and decision pipeline (supporting GPT-OSS, Mistral, Qwen 3.5, etc.)
- Sparrow Agents: Prefect-orchestrated multi-step processing pipeline that decomposes complex scenarios into subtasks like classification → extraction → validation
Inference Backends#
- Apple Silicon: Deep MLX framework integration, leveraging unified memory for efficient large model execution (dependency package
sparrow-parse[mlx]) - NVIDIA/AMD GPU: Compatible with vLLM and Ollama backends, requiring CUDA environment
- Cloud & CPU: Supports Hugging Face Cloud GPU backend, or local execution of ≤7B small models
Document Processing#
- Supports PNG, JPG images and multi-page PDF input
- Natively adapts to invoices, receipts, forms, bank statements, financial tables, and other document types
--crop-sizeparameter for document border cropping to improve Vision LLM focus accuracy- JSON Schema-driven output constraints with automatic validation
Usage#
CLI#
./sparrow.sh "<JSON_SCHEMA>" --pipeline "<PIPELINE>" [OPTIONS] --file-path "<FILE>"
Key parameters: --pipeline (select pipeline), --options (backend & model), --instruction (text instruction), --validation (field validation), --crop-size (crop pixels), --page-type (page classification).
RESTful API#
- Document extraction:
POST /api/v1/sparrow-llm/inference(multipart/form-data) - Text instruction:
POST /api/v1/sparrow-llm/instruction-inference(form-urlencoded) - Access
http://localhost:8002/api/v1/sparrow-llm/docsfor Swagger interactive docs after startup
Agent API#
curl -X POST 'http://localhost:8001/api/v1/sparrow-agents/execute/file' \
-F 'agent_name=medical_prescriptions' \
-F 'extraction_params={"sparrow_key":"123456"}' \
-F 'file=@prescription.pdf'
Use Cases#
- Financial document automation: Extract full structured data from invoices and bank statements in PDF/image format
- Financial table extraction: Extract fields like instrument_name and valuation from bond tables
- Medical prescription processing: Multi-step workflow via Agent orchestration (classification → extraction → validation)
- Text instruction processing: Math operations, text analysis, summarization, Q&A
- Function Calling integration: External data access such as stock data queries
Enterprise Features#
- API-First design with complete RESTful API and Swagger documentation
- Prefect-based Dashboard and Agent workflow tracking (Dashboard requires local Oracle Database 23ai Free)
- Built-in rate limiting and usage analytics
- Open-sourced under GPL-3.0 with commercial dual-licensing available
Installation Requirements#
Python 3.12.10+, macOS/Linux/Windows, GPU (matching model VRAM requirements). PDF processing requires poppler.