Open-source LLMOps platform with a Rust high-performance gateway unifying LLM access, observability, evaluation, optimization, and experimentation
TensorZero is an open-source LLMOps platform for production environments, developed by TensorZero Inc. (NYC, $7.3M seed funding) under the Apache-2.0 license. At its core is a Rust-built high-performance LLM gateway (<1ms p99 latency overhead, 10k+ QPS) providing unified API access to 18+ providers including Anthropic, OpenAI, Azure, AWS Bedrock, GCP Vertex AI, Mistral, and DeepSeek, with built-in routing, retry, fallback, load balancing, rate limiting, and authentication.
The platform is built around the Function → Variant data model: Functions define task intent, Variants define specific implementations (prompt + model combinations), all driven by declarative tensorzero.toml configuration that naturally fits GitOps workflows. The API layer is fully compatible with the OpenAI SDK, enabling low-friction migration for existing applications.
For observability, TensorZero offers self-hosted storage (Postgres default / ClickHouse for high throughput) with a Web UI for inspecting individual inferences and aggregate metrics, building datasets from historical inferences, replaying inferences, and exporting via OpenTelemetry OTLP and Prometheus standards to integrate with existing observability toolchains.
The evaluation system operates at two levels: inference-level (heuristics + LLM judges, analogous to unit tests) and workflow-level (analogous to integration tests), with optimizable LLM judges to align with human preferences. Optimization capabilities span SFT, RLHF, GEPA automatic prompt engineering, DICL dynamic in-context learning, and best-of-N / mixture-of-N sampling, forming a feedback flywheel from production data to better models. Experiment management supports adaptive and static A/B testing with namespace isolation, and the Episode concept supports multi-turn conversation scenarios.
Typical applications include: enterprise unified LLM gateways to reduce integration complexity, fine-tuning + DICL to make smaller models outperform larger ones on specific tasks (with significant cost and latency reductions), agentic RAG multi-hop QA systems, multimodal fine-tuning (e.g., document image classification), and production scenarios like automated banking code changelogs. Deployment options include single Docker container, Docker Compose (~5 minutes to start), and Kubernetes + Helm.