04 Jan 2026 4 min read

Building AI Infrastructure

only running 22 instances of claude code

What started as experiments in multi-agent orchestration has grown into a connected ecosystem of tools that I use daily to run my businesses and manage my life.

I want to share what I've been building, how I work, and where this is heading.

The Partnership That Makes This Possible

Before diving into the projects, I need to acknowledge the collaboration that accelerates everything: Claude (claude.ai and Claude Code).

I work with Claude daily—not as a tool I query occasionally, but as a genuine development partner. We architect systems together, debug issues in real-time, and iterate on ideas across sessions. Claude remembers context about my projects, understands my preferences, and pushes back when I'm overcomplicating things.

This isn't hyperbole. The projects below were built in a fraction of the time they would have taken solo, with higher quality than I could achieve alone. The voice pipeline we built yesterday? From "can this model run on my hardware?" to "766ms latency streaming system" in a single session.

If you're building AI systems and not leveraging AI as a collaborator, you're leaving 10x on the table.

The Projects

Distributed Electrons (cloudflare-multiagent)

github.com/Logos-Flux/cloudflare-multiagent

This is where it started. Distributed Electrons is a universal LLM router and provider abstraction layer built on Cloudflare Workers.

The core insight: AI applications shouldn't be tightly coupled to a single provider. Models change, pricing shifts, rate limits hit. DE provides a unified interface that lets you swap providers without touching application code.

Since we published this app, Cloudflare has released AI Gateway, which I highly recommend, and have integrated into my latest development version. AI gateway does a great job for organizing keys and whatnot, distributed electrons is more focused on multi-app orchestration queue, deliverable QA, prompt library, and multi-agent workflow coordinator.

Features:

Admin-managed model configurations (no code changes to add providers)
Multi-provider routing (OpenAI, Anthropic, Google, local models)
Built for multi-agent workflows where different agents need different models

This got some traction on r/ClaudeAI back in October 2025—turns out a lot of people were solving the same problem in messier ways.

Mnemo

github.com/Logos-Flux/mnemo

Mnemo started as a way to give Claude extended memory using Gemini's context caching. Load a GitHub repo, documentation, or PDF into a cache, then query it across sessions.

The architecture has evolved significantly:

Tiered query routing: Fast RAG via Vectorize + AI Search for simple lookups, full-context processing via local Nemotron for complex queries
Cost optimization: Moved from Gemini API to local inference on my DGX Spark to reduce costs
MCP integration: Works as an MCP server, so any Claude session can access cached context

Use case: I load entire codebases into Mnemo before working on them. Instead of re-explaining project structure every session, Claude can query the cache for specifics.

Spark Voice Pipeline

github.com/Logos-Flux/spark-voice-pipeline

The newest release. A real-time voice assistant running entirely on local hardware (NVIDIA DGX Spark) with 766ms latency to first audio.

Stack:

STT: whisper.cpp with CUDA acceleration
LLM: Ollama (llama3.2)
TTS: Microsoft VibeVoice-Realtime-0.5B

The key innovation is sentence-level streaming. Instead of waiting for the full LLM response before generating audio, we buffer tokens until a sentence boundary, then immediately stream that sentence to TTS while the LLM keeps generating. Combined with continuous audio playback (WebSocket streaming + OutputStream callbacks), it feels responsive rather than turn-based.

This also documents how to get PyTorch + CUDA working properly on DGX Spark—a common pain point for Spark owners.

The Private Stack

Some projects aren't public yet but are central to how I work:

Nexus

Nexus is my personal productivity system—a voice-first task and idea manager with AI-powered planning and execution.

The concept: Capture ideas and tasks via voice or text. AI classifies and routes them. When you're ready to execute an idea, Nexus generates a plan, breaks it into tasks, and manages dependencies. Tasks route to different executors (human, AI, or human-AI collaborative) based on their nature.

It runs as an MCP server, so Claude can read and write to it directly. When I say "add this to Nexus," Claude actually adds it. When I ask "what's on my plate today?", Claude checks the queue.

Content Forge

www.contentforgebrand.com

Content Forge is a 120-agent orchestrator for brand identity and content generation. Originally planned as a B2C SaaS, it's evolved into a B2B content services approach—using the sophisticated multi-agent system to deliver premium positioning for clients.

The architecture uses specialized agents for research, writing, editing, brand voice consistency, and more. Each agent has defined capabilities and handoff protocols. The result is content that maintains consistency across formats while adapting to specific channel requirements.

The Hardware

Everything above runs on:

DGX Spark

NVIDIA GB10 (Blackwell architecture)
128GB unified memory
CUDA 13

This machine is the hub. Voice processing, local inference, context caching—it handles the compute-heavy work while edge devices handle I/O.

Workstations

AMD Ryzen 9
64GB RAM
RTX 3080 & 4070ti

Development environment and secondary inference.

The goal is a hybrid architecture: local processing for latency-sensitive and privacy-sensitive work, cloud APIs when their capabilities justify the cost.

What's Next

Active development:

Wake word implementation for always-on voice interaction
Voice cloning testing with VibeVoice 1.5B for custom voice personas
Dedicated voice hardware (exploring Zoom phone hacks, Jetson boards)
Bridge architecture connecting edge devices to the Spark "brain"

The vision is an AI-augmented workflow where voice, text, and automated agents work together seamlessly. Not replacing human judgment, but eliminating friction between intention and action.

Get Involved

Everything under Logos Flux is MIT licensed. If you're building similar infrastructure, I'd love to hear about it.

GitHub: github.com/Logos-Flux
Reddit: u/logos_flux

Web: logosflux.io