Mooncake
✨A KVCache-centric disaggregated architecture platform for LLM serving, providing distributed KVCache pooling, topology-aware high-speed transfer engine, and centralized scheduler, supporting Prefill-Decode separation and MoE elastic inference.
PythonRustPyTorch