ADR 001: The Multi-Node Hybrid AI Ecosystem¶

Status: Accepted
Date: 2026-02-09
Architect: Stefan Binder, Gemini 3 Pro (My Independent AI Architect)

1. Context and Problem Statement¶

The goal is to build a modular, private AI ecosystem that spans across multiple local nodes (NAS, PC, Laptop) and a central cloud-based "brain" on Google Cloud (GCP). The system must process highly sensitive personal data (WhatsApp, Gmail, Photos) while ensuring no raw PII is exposed to cloud models.

The architecture must support "Scale to Zero" cost optimization while providing high-performance RAG (Retrieval-Augmented Generation) capabilities.

2. Decision Drivers¶

Privacy First: Local-only PII scrubbing; raw data never touches the cloud.
Cost Efficiency: No monthly "minimum tax" for cloud databases or idle VMs.
Node Flexibility: Seamless transitions between local compute (PC/Laptop) and low-power durability (NAS).
Data Sovereignty: Use open-source standards (Qdrant, SQLite) to avoid SaaS vendor lock-in.

3. Architecture Blueprint¶

[Image of a hybrid cloud AI architecture diagram showing local nodes syncing to a cloud-based Qdrant vector database via Litestream and GCS]


graph TD
    %% Boundary: LOCAL ECOSYSTEM
    subgraph LOCAL_NETWORK [LOCAL BOUNDARY: NAS - LAPTOP - PC]
        direction TB

        %% Subgraph: Data Importers
        subgraph Importers ["Data Importers"]
            WHATSAPP["WhatsApp"]
            FS_DOCS["FS: txt, PDFs"]
            FS_OFFICE["FS: docx, xlsx"]
            G_DRIVE["Google: Drive"]
            G_GMAIL["Google: Gmail"]
            S_PHOTO["Synology Photos"]
        end

        %% Subgraph: AI Orchestrator
        subgraph Orchestrator_Stack ["AI Orchestrator Core"]
            ORCH["Central Orchestrator"]
            PC["Privacy Core (Presidio)"]
            MDB[("Privacy MapDB: SQLite")]
            BC["Local Border Control: Guardrails & Routing"]
        end

        %% Subgraph: Admin Dashboard
        subgraph Admin_UI ["Admin Dashboard"]
            DASH["Chat Interface"]
            CONF["System Config"]
            CDB[("Config DB: SQLite")]
        end

        %% Local Models
        L_INF["Local Sovereign Inference: Llama 4 Scout 8B"]
        L_EMB["Local Embeddings: Nomic-Embed-Text-V2"]
    end

    %% Boundary: GOOGLE CLOUD PLATFORM
    subgraph GCP_CLOUD [GCP BOUNDARY: Private Cloud]
        direction TB

        %% GCP Storage & Locks
        subgraph GCP_Buckets ["GCP Buckets (CMEK)"]
            S3_STORE["sqlite3-store"]
            DATA_VAULT["myindependent-ai-data"]
            SYNC_LOCK["global.sync.lock"]
        end

        %% Cloud Intelligence
        subgraph Cloud_Models ["Cloud Intelligence"]
            GEMINI["Public Managed Inference: Gemini Pro API"]
            VERTEX["Private Managed Inference: Vertex AI Endpoint"]
        end

        %% Vector Store
        subgraph QDRANT_VM ["Qdrant: Vector Store"]
            direction TB
            C_PUB["Collection: Public Managed Vectors (Gemini)"]
            C_PRI["Collection: Private Managed Vectors (Vertex)"]
            C_LOC["Collection: Local Vectors (Local EMB)"]
        end
    end

    %% RELATIONSHIPS & DATA FLOWS

    %% Importer Flow & Config Direction
    Importers -- Context Retrieval --> ORCH
    Importers -- Define Sources --> CONF

    %% Sync Lock Mechanism
    ORCH <-->|Acquire/Release| SYNC_LOCK

    %% Orchestrator Internal Handshakes
    ORCH -- Routing --> BC
    BC -- Inference Req --> L_INF
    ORCH -- PII Scrub --> PC
    PC <--> MDB
    ORCH -- Vectorize --> L_EMB

    %% State Sync to Cloud
    MDB -.->|Litestream| S3_STORE
    CDB -.->|Litestream| S3_STORE

    %% Dashboard Interaction
    DASH <--> ORCH
    CONF <--> CDB

    %% Cloud Logic
    ORCH -- Secure Query --> GEMINI
    ORCH -- Private Tunnel --> VERTEX
    ORCH -- Store/Retrieve --> QDRANT_VM

    %% Collection Mapping
    GEMINI -.-> C_PUB
    VERTEX -.-> C_PRI
    L_EMB -.-> C_LOC

4. Sequence Diagrams¶

4.1. Sequence 1: WhatsApp Ingestion (Triggered & Local)¶

This sequence shows the Orchestrator's Scheduler initiating the job.

sequenceDiagram
    autonumber
    participant S as Scheduler/Cron (Internal)
    participant O as Orchestrator
    participant W as WhatsApp Importer
    participant P as Privacy Core (Local)
    participant E as Local Embed (Nomic)
    participant Q as Qdrant (GCP VM)

    S->>O: 1. Trigger Scheduled Sync (e.g., Every 4h)
    O->>W: 2. START_SIGNAL (Fetch New)
    W->>W: Auth with Meta/Local API
    W-->>O: 3. Return Raw Messages
    O->>P: 4. Request PII scrubbing
    P->>P: Map names to IDs (SQLite)
    P-->>O: 5. Return Scrubbed Text
    O->>E: 6. Vectorize (Local Nomic)
    E-->>O: 7. Return Local Vector
    O->>Q: 8. Upsert to 'collection_local'
    Q-->>O: ACK (Success)

4.2. Sequence 2: Gmail Ingestion (Manual Trigger)¶

sequenceDiagram
    autonumber
    participant U as User (Admin UI)
    participant O as Orchestrator
    participant G as Gmail Importer
    participant P as Privacy Core (Local)
    participant CE as Cloud Embed (Vertex)
    participant Q as Qdrant (GCP VM)

    U->>O: 1. Click "Sync Gmail Now"
    O->>G: 2. START_SIGNAL (Fetch New)
    G->>G: Auth with Google OAuth
    G-->>O: 3. Return Raw Emails
    O->>P: 4. Scrub PII (Names/Addresses)
    P-->>O: 5. Return Anonymized Text
    O->>CE: 6. Get High-Dim Embedding (Vertex)
    CE-->>O: 7. Return Vector
    O->>Q: 8. Upsert to 'collection_private'
    Q-->>O: ACK (Success)

4.3. Sequence 3: Private Chat (Sensitive Context Search)¶

sequenceDiagram
    autonumber
    participant U as User (Admin UI)
    participant O as Orchestrator
    participant BC as Local Guardrail/Router
    participant E as Local Embed (Nomic)
    participant Q as Qdrant (GCP VM)
    participant L as Local Chat (Llama 8B)
    participant M as Mapping DB (Local)

    U->>O: "What did Mario say about the project?"

    %% GUARDRAILS & ROUTING
    O->>BC: 1. Is this safe? Where should I look?
    BC-->>O: 2. { "status": "safe", "route": "local_search", "privacy": "sensitive" }

    %% RETRIEVAL
    O->>E: 3. Vectorize Query (Nomic)
    E-->>O: 4. Query Vector
    O->>Q: 5. Search 'collection_local'
    Q-->>O: 6. Return Scrubbed Chunks (e.g., "User_7 said...")

    %% INFERENCE
    O->>L: 7. Prompt + Scrubbed Context
    L-->>O: 8. "User_7 mentioned the project is on track."

    %% REHYDRATION (Inside Orchestrator RAM)
    O->>M: 9. Lookup: Who is "User_7"?
    M-->>O: 10. "Mario"
    O->>O: 11. Replace "User_7" with "Mario" in string

    O-->>U: 12. Final Answer: "Mario mentioned the project is on track."

4.4. Sequence 4: Public Chat (General Knowledge Search or defined non-sentisive data)¶

sequenceDiagram
    autonumber
    participant U as User (Admin UI)
    participant O as Orchestrator
    participant BC as Local Guardrail/Router
    participant CE as Cloud Embed (Vertex)
    participant Q as Qdrant (GCP VM)
    participant C as Cloud Model (Gemini)

    U->>O: "How do I bake sourdough bread?"

    %% GUARDRAILS & ROUTING
    O->>BC: 1. Is this safe? (Intent Classification)
    BC-->>O: 2. { "status": "safe", "route": "cloud_search", "privacy": "public" }

    %% RETRIEVAL
    O->>CE: 3. Vectorize Query (Vertex API)
    CE-->>O: 4. Query Vector
    O->>Q: 5. Search 'collection_public'
    Q-->>O: 6. Return General Knowledge Chunks

    %% INFERENCE
    O->>C: 7. Prompt + Public Context
    Note right of C: Cloud VM was manually waked by User
    C-->>O: 8. "To bake sourdough, you need to..."

    %% POST-PROCESSING (Inside Orchestrator RAM)
    O->>O: 9. Scrubbing Check (Ensure no accidental PII leak)
    O->>O: 10. Metadata Tagging (Source: Gemini Pro)

    O-->>U: 11. Final Answer: "To bake sourdough..."

5. Master Technical Decisions¶

5.1. Privacy: Local-First PII Scrubbing¶

Decision: Use Microsoft Presidio for NER-based PII identification combined with a local SQLite mapping table (mapping.db).

Rationale: Industry-standard detection. "Mario" is transformed into User_77 locally. The mapping key remains exclusively on the local LAN.

5.2. Vector DB: Self-Hosted Qdrant on VM¶

Decision: Use Qdrant (Docker) on a GCP Compute Engine (Spot/Preemptible instance).

Rationale: Avoids the proprietary lock-in and high minimum costs of Pinecone. Qdrant is open-source, Rust-based, and VPC-compliant.

Lifecycle: The Orchestrator triggers gcloud instances start on-demand. An inactivity sidecar shuts down the VM after 10m of idle time, achieving zero idle cost.

5.3. State Sync: SQLite + Litestream + GCS Lock¶

Decision: Sync config.db and mapping.db via Litestream to a GCS bucket.

Concurrency Control: An Atomic GCS Lock (sync.lock) prevents write-collisions across nodes.

Node Identity: Every node defines a MIAI_NODE_NAME and follows an offset schedule (:00 for NAS, :15 for Laptop, :55 for PC) to minimize lock contention.

5.4. Multi-Node Configuration¶

Decision: The system environment is determined by the MIAI_NODE_NAME environment variable, which gates resource usage (e.g., local LLM model size) and execution targets.

6. Consequences¶

Positive: Privacy-guaranteed AI; Costs ~$0/month when idle; Full ownership of vector data.

Negative: Cold-start latency (VM boot time ~45s); Complexity in managing node locks and Litestream sidecars.

Appendix¶

Component Overview