Orchard
Orchard is the Local Intelligence compute platform for Apple Silicon. It's the engine that powers Proxy's local inference — running open models natively on your Mac with no cloud dependency.
What Orchard Is
Orchard is a vertically integrated inference stack:
orchard-py (Python SDK) ──┐
├──→ PIE (C++ inference engine)
orchard-rs (Rust SDK) ────┘ ├── PAL (Metal GPU kernels)
├── PSE (structured generation)
└── Carbon (private MLX fork)
- PIE (Proxy Inference Engine) — C++23 inference server optimized for Apple Silicon
- PAL (Proxy Attention Lab) — Custom Metal GPU kernels for paged attention
- PSE (Proxy State Engine) — Grammar and structured generation engine
- Carbon — Private fork of Apple's MLX framework with multi-stream concurrency and epoch-based buffer safety
Design Philosophy
Orchard exists because local inference on consumer hardware is a fundamentally different problem than cloud inference on GPU clusters.
- Single device — no distributed coordination, no fleet management, no network hops
- Apple Silicon — unified memory, Metal compute, exceptional performance-per-watt
- Continuous batching — multiple agents can share the same model simultaneously
- Structured generation — PSE guarantees models produce valid output (JSON, function calls, schema-constrained responses) without sacrificing creative capability
The bet: local gives velocity to outrun cloud. No server farm reconfiguration. No distributed KV cache coordination. Everything on one device means faster iteration.
How Proxy Uses Orchard
Proxy connects to Orchard through orchard-rs, the Rust client library. The connection flows:
Proxy (SwiftUI)
└── Glue (Rust FFI)
└── Grand Central
└── orchard-rs ──→ PIE (IPC) ──→ Model inference
PIE runs as a separate process. orchard-rs communicates with it over IPC (nanomsg). Grand Central manages the lifecycle — starting PIE when needed, routing inference requests, handling streaming responses.
Client Libraries
| Library | Language | Distribution | Purpose |
|---|---|---|---|
| orchard-py | Python | PyPI | Python SDK, FastAPI server, OpenAI-compatible API |
| orchard-rs | Rust | crates.io | Rust SDK, IPC client, used by Grand Central |
| orchard-swift | Swift | SPM | Telemetry only (not inference) |
Both orchard-py and orchard-rs expose an OpenAI-compatible API surface — chat completions, streaming, tool calling, structured output.