Skip to content

Orchard

Orchard is the Local Intelligence compute platform for Apple Silicon. It's the engine that powers Proxy's local inference — running open models natively on your Mac with no cloud dependency.

What Orchard Is

Orchard is a vertically integrated inference stack:

orchard-py (Python SDK) ──┐
                          ├──→ PIE (C++ inference engine)
orchard-rs (Rust SDK) ────┘       ├── PAL (Metal GPU kernels)
                                  ├── PSE (structured generation)
                                  └── Carbon (private MLX fork)
  • PIE (Proxy Inference Engine) — C++23 inference server optimized for Apple Silicon
  • PAL (Proxy Attention Lab) — Custom Metal GPU kernels for paged attention
  • PSE (Proxy State Engine) — Grammar and structured generation engine
  • Carbon — Private fork of Apple's MLX framework with multi-stream concurrency and epoch-based buffer safety

Design Philosophy

Orchard exists because local inference on consumer hardware is a fundamentally different problem than cloud inference on GPU clusters.

  • Single device — no distributed coordination, no fleet management, no network hops
  • Apple Silicon — unified memory, Metal compute, exceptional performance-per-watt
  • Continuous batching — multiple agents can share the same model simultaneously
  • Structured generation — PSE guarantees models produce valid output (JSON, function calls, schema-constrained responses) without sacrificing creative capability

The bet: local gives velocity to outrun cloud. No server farm reconfiguration. No distributed KV cache coordination. Everything on one device means faster iteration.

How Proxy Uses Orchard

Proxy connects to Orchard through orchard-rs, the Rust client library. The connection flows:

Proxy (SwiftUI)
  └── Glue (Rust FFI)
      └── Grand Central
          └── orchard-rs ──→ PIE (IPC) ──→ Model inference

PIE runs as a separate process. orchard-rs communicates with it over IPC (nanomsg). Grand Central manages the lifecycle — starting PIE when needed, routing inference requests, handling streaming responses.

Client Libraries

Library Language Distribution Purpose
orchard-py Python PyPI Python SDK, FastAPI server, OpenAI-compatible API
orchard-rs Rust crates.io Rust SDK, IPC client, used by Grand Central
orchard-swift Swift SPM Telemetry only (not inference)

Both orchard-py and orchard-rs expose an OpenAI-compatible API surface — chat completions, streaming, tool calling, structured output.