ADR 001: The Multi-Node Hybrid AI Ecosystem¶
Status: Accepted
Date: 2026-02-09
Architect: Stefan Binder, Gemini 3 Pro (My Independent AI Architect)
1. Context and Problem Statement¶
The goal is to build a modular, private AI ecosystem that spans across multiple local nodes (NAS, PC, Laptop) and a central cloud-based "brain" on Google Cloud (GCP). The system must process highly sensitive personal data (WhatsApp, Gmail, Photos) while ensuring no raw PII is exposed to cloud models.
The architecture must support "Scale to Zero" cost optimization while providing high-performance RAG (Retrieval-Augmented Generation) capabilities.
2. Decision Drivers¶
- Privacy First: Local-only PII scrubbing; raw data never touches the cloud.
- Cost Efficiency: No monthly "minimum tax" for cloud databases or idle VMs.
- Node Flexibility: Seamless transitions between local compute (PC/Laptop) and low-power durability (NAS).
- Data Sovereignty: Use open-source standards (Qdrant, SQLite) to avoid SaaS vendor lock-in.
3. Architecture Blueprint¶
[Image of a hybrid cloud AI architecture diagram showing local nodes syncing to a cloud-based Qdrant vector database via Litestream and GCS]
graph TD
%% Boundary: LOCAL ECOSYSTEM
subgraph LOCAL_NETWORK [LOCAL BOUNDARY: NAS - LAPTOP - PC]
direction TB
%% Subgraph: Data Importers
subgraph Importers ["Data Importers"]
WHATSAPP["WhatsApp"]
FS_DOCS["FS: txt, PDFs"]
FS_OFFICE["FS: docx, xlsx"]
G_DRIVE["Google: Drive"]
G_GMAIL["Google: Gmail"]
S_PHOTO["Synology Photos"]
end
%% Subgraph: AI Orchestrator
subgraph Orchestrator_Stack ["AI Orchestrator Core"]
ORCH["Central Orchestrator"]
PC["Privacy Core (Presidio)"]
MDB[("Privacy MapDB: SQLite")]
BC["Local Border Control: Guardrails & Routing"]
end
%% Subgraph: Admin Dashboard
subgraph Admin_UI ["Admin Dashboard"]
DASH["Chat Interface"]
CONF["System Config"]
CDB[("Config DB: SQLite")]
end
%% Local Models
L_INF["Local Sovereign Inference: Llama 4 Scout 8B"]
L_EMB["Local Embeddings: Nomic-Embed-Text-V2"]
end
%% Boundary: GOOGLE CLOUD PLATFORM
subgraph GCP_CLOUD [GCP BOUNDARY: Private Cloud]
direction TB
%% GCP Storage & Locks
subgraph GCP_Buckets ["GCP Buckets (CMEK)"]
S3_STORE["sqlite3-store"]
DATA_VAULT["myindependent-ai-data"]
SYNC_LOCK["global.sync.lock"]
end
%% Cloud Intelligence
subgraph Cloud_Models ["Cloud Intelligence"]
GEMINI["Public Managed Inference: Gemini Pro API"]
VERTEX["Private Managed Inference: Vertex AI Endpoint"]
end
%% Vector Store
subgraph QDRANT_VM ["Qdrant: Vector Store"]
direction TB
C_PUB["Collection: Public Managed Vectors (Gemini)"]
C_PRI["Collection: Private Managed Vectors (Vertex)"]
C_LOC["Collection: Local Vectors (Local EMB)"]
end
end
%% RELATIONSHIPS & DATA FLOWS
%% Importer Flow & Config Direction
Importers -- Context Retrieval --> ORCH
Importers -- Define Sources --> CONF
%% Sync Lock Mechanism
ORCH <-->|Acquire/Release| SYNC_LOCK
%% Orchestrator Internal Handshakes
ORCH -- Routing --> BC
BC -- Inference Req --> L_INF
ORCH -- PII Scrub --> PC
PC <--> MDB
ORCH -- Vectorize --> L_EMB
%% State Sync to Cloud
MDB -.->|Litestream| S3_STORE
CDB -.->|Litestream| S3_STORE
%% Dashboard Interaction
DASH <--> ORCH
CONF <--> CDB
%% Cloud Logic
ORCH -- Secure Query --> GEMINI
ORCH -- Private Tunnel --> VERTEX
ORCH -- Store/Retrieve --> QDRANT_VM
%% Collection Mapping
GEMINI -.-> C_PUB
VERTEX -.-> C_PRI
L_EMB -.-> C_LOC
4. Sequence Diagrams¶
4.1. Sequence 1: WhatsApp Ingestion (Triggered & Local)¶
This sequence shows the Orchestrator's Scheduler initiating the job.
sequenceDiagram
autonumber
participant S as Scheduler/Cron (Internal)
participant O as Orchestrator
participant W as WhatsApp Importer
participant P as Privacy Core (Local)
participant E as Local Embed (Nomic)
participant Q as Qdrant (GCP VM)
S->>O: 1. Trigger Scheduled Sync (e.g., Every 4h)
O->>W: 2. START_SIGNAL (Fetch New)
W->>W: Auth with Meta/Local API
W-->>O: 3. Return Raw Messages
O->>P: 4. Request PII scrubbing
P->>P: Map names to IDs (SQLite)
P-->>O: 5. Return Scrubbed Text
O->>E: 6. Vectorize (Local Nomic)
E-->>O: 7. Return Local Vector
O->>Q: 8. Upsert to 'collection_local'
Q-->>O: ACK (Success)
4.2. Sequence 2: Gmail Ingestion (Manual Trigger)¶
sequenceDiagram
autonumber
participant U as User (Admin UI)
participant O as Orchestrator
participant G as Gmail Importer
participant P as Privacy Core (Local)
participant CE as Cloud Embed (Vertex)
participant Q as Qdrant (GCP VM)
U->>O: 1. Click "Sync Gmail Now"
O->>G: 2. START_SIGNAL (Fetch New)
G->>G: Auth with Google OAuth
G-->>O: 3. Return Raw Emails
O->>P: 4. Scrub PII (Names/Addresses)
P-->>O: 5. Return Anonymized Text
O->>CE: 6. Get High-Dim Embedding (Vertex)
CE-->>O: 7. Return Vector
O->>Q: 8. Upsert to 'collection_private'
Q-->>O: ACK (Success)
4.3. Sequence 3: Private Chat (Sensitive Context Search)¶
sequenceDiagram
autonumber
participant U as User (Admin UI)
participant O as Orchestrator
participant BC as Local Guardrail/Router
participant E as Local Embed (Nomic)
participant Q as Qdrant (GCP VM)
participant L as Local Chat (Llama 8B)
participant M as Mapping DB (Local)
U->>O: "What did Mario say about the project?"
%% GUARDRAILS & ROUTING
O->>BC: 1. Is this safe? Where should I look?
BC-->>O: 2. { "status": "safe", "route": "local_search", "privacy": "sensitive" }
%% RETRIEVAL
O->>E: 3. Vectorize Query (Nomic)
E-->>O: 4. Query Vector
O->>Q: 5. Search 'collection_local'
Q-->>O: 6. Return Scrubbed Chunks (e.g., "User_7 said...")
%% INFERENCE
O->>L: 7. Prompt + Scrubbed Context
L-->>O: 8. "User_7 mentioned the project is on track."
%% REHYDRATION (Inside Orchestrator RAM)
O->>M: 9. Lookup: Who is "User_7"?
M-->>O: 10. "Mario"
O->>O: 11. Replace "User_7" with "Mario" in string
O-->>U: 12. Final Answer: "Mario mentioned the project is on track."
4.4. Sequence 4: Public Chat (General Knowledge Search or defined non-sentisive data)¶
sequenceDiagram
autonumber
participant U as User (Admin UI)
participant O as Orchestrator
participant BC as Local Guardrail/Router
participant CE as Cloud Embed (Vertex)
participant Q as Qdrant (GCP VM)
participant C as Cloud Model (Gemini)
U->>O: "How do I bake sourdough bread?"
%% GUARDRAILS & ROUTING
O->>BC: 1. Is this safe? (Intent Classification)
BC-->>O: 2. { "status": "safe", "route": "cloud_search", "privacy": "public" }
%% RETRIEVAL
O->>CE: 3. Vectorize Query (Vertex API)
CE-->>O: 4. Query Vector
O->>Q: 5. Search 'collection_public'
Q-->>O: 6. Return General Knowledge Chunks
%% INFERENCE
O->>C: 7. Prompt + Public Context
Note right of C: Cloud VM was manually waked by User
C-->>O: 8. "To bake sourdough, you need to..."
%% POST-PROCESSING (Inside Orchestrator RAM)
O->>O: 9. Scrubbing Check (Ensure no accidental PII leak)
O->>O: 10. Metadata Tagging (Source: Gemini Pro)
O-->>U: 11. Final Answer: "To bake sourdough..."
5. Master Technical Decisions¶
5.1. Privacy: Local-First PII Scrubbing¶
Decision: Use Microsoft Presidio for NER-based PII identification combined with a local SQLite mapping table (mapping.db).
Rationale: Industry-standard detection. "Mario" is transformed into User_77 locally. The mapping key remains exclusively on the local LAN.
5.2. Vector DB: Self-Hosted Qdrant on VM¶
Decision: Use Qdrant (Docker) on a GCP Compute Engine (Spot/Preemptible instance).
Rationale: Avoids the proprietary lock-in and high minimum costs of Pinecone. Qdrant is open-source, Rust-based, and VPC-compliant.
Lifecycle: The Orchestrator triggers gcloud instances start on-demand. An inactivity sidecar shuts down the VM after 10m of idle time, achieving zero idle cost.
5.3. State Sync: SQLite + Litestream + GCS Lock¶
Decision: Sync config.db and mapping.db via Litestream to a GCS bucket.
Concurrency Control: An Atomic GCS Lock (sync.lock) prevents write-collisions across nodes.
Node Identity: Every node defines a MIAI_NODE_NAME and follows an offset schedule (:00 for NAS, :15 for Laptop, :55 for PC) to minimize lock contention.
5.4. Multi-Node Configuration¶
Decision: The system environment is determined by the MIAI_NODE_NAME environment variable, which gates resource usage (e.g., local LLM model size) and execution targets.
6. Consequences¶
Positive: Privacy-guaranteed AI; Costs ~$0/month when idle; Full ownership of vector data.
Negative: Cold-start latency (VM boot time ~45s); Complexity in managing node locks and Litestream sidecars.