Sparrow

Production-ready structured data extraction system supporting Vision LLMs and pluggable workflow orchestration for invoices, bank statements, financial tables, and more.

Overview#

Sparrow is a production-ready structured data extraction system developed by Katana ML (lead developer Andrej Baranovskij). It focuses on converting unstructured documents — invoices, receipts, bank statements, financial tables, and more — into structured JSON data with high precision and automation.

Architecture#

The project uses a monorepo multi-app architecture with three top-level directories:

sparrow-data/: Sample test data (bonds_table.png, bank_statement.pdf, invoice.pdf, etc.)
sparrow-ml/: Core computation engine, containing llm/ (main API server & CLI) and agents/ (workflow orchestration service)
sparrow-ui/: Gradio-based Web interactive interface

Pipeline Engines#

Sparrow Parse: Vision LLM-based visual structured extraction pipeline, supporting Mistral, Qwen 2.5-VL, DeepSeek OCR, dots.ocr, dots-mocr, and more
Sparrow Instructor: Text LLM-based instruction processing, validation, and decision pipeline (supporting GPT-OSS, Mistral, Qwen 3.5, etc.)
Sparrow Agents: Prefect-orchestrated multi-step processing pipeline that decomposes complex scenarios into subtasks like classification → extraction → validation

Inference Backends#

Apple Silicon: Deep MLX framework integration, leveraging unified memory for efficient large model execution (dependency package sparrow-parse[mlx])
NVIDIA/AMD GPU: Compatible with vLLM and Ollama backends, requiring CUDA environment
Cloud & CPU: Supports Hugging Face Cloud GPU backend, or local execution of ≤7B small models

Document Processing#

Supports PNG, JPG images and multi-page PDF input
Natively adapts to invoices, receipts, forms, bank statements, financial tables, and other document types
--crop-size parameter for document border cropping to improve Vision LLM focus accuracy
JSON Schema-driven output constraints with automatic validation

Usage#

CLI#

./sparrow.sh "<JSON_SCHEMA>" --pipeline "<PIPELINE>" [OPTIONS] --file-path "<FILE>"

Key parameters: --pipeline (select pipeline), --options (backend & model), --instruction (text instruction), --validation (field validation), --crop-size (crop pixels), --page-type (page classification).

RESTful API#

Document extraction: POST /api/v1/sparrow-llm/inference (multipart/form-data)
Text instruction: POST /api/v1/sparrow-llm/instruction-inference (form-urlencoded)
Access http://localhost:8002/api/v1/sparrow-llm/docs for Swagger interactive docs after startup

Agent API#

curl -X POST 'http://localhost:8001/api/v1/sparrow-agents/execute/file' \
  -F 'agent_name=medical_prescriptions' \
  -F 'extraction_params={"sparrow_key":"123456"}' \
  -F 'file=@prescription.pdf'

Use Cases#

Financial document automation: Extract full structured data from invoices and bank statements in PDF/image format
Financial table extraction: Extract fields like instrument_name and valuation from bond tables
Medical prescription processing: Multi-step workflow via Agent orchestration (classification → extraction → validation)
Text instruction processing: Math operations, text analysis, summarization, Q&A
Function Calling integration: External data access such as stock data queries

Enterprise Features#

API-First design with complete RESTful API and Swagger documentation
Prefect-based Dashboard and Agent workflow tracking (Dashboard requires local Oracle Database 23ai Free)
Built-in rate limiting and usage analytics
Open-sourced under GPL-3.0 with commercial dual-licensing available

Installation Requirements#

Python 3.12.10+, macOS/Linux/Windows, GPU (matching model VRAM requirements). PDF processing requires poppler.