Skip to content

🏗️ CFS LLM Platform: Architecture Documentation

📋 Project Overview

Objective: Multi-LLM management platform with user permissions and billing integrations.

Version: 1.0 (Initialization Phase)

Date: January 2026


🎯 Infrastructure Overview

Hosting Architecture

text
┌─────────────────────────────────────────────────────────────────┐
│                    CFS LLM Platform                             │
│                  Initialization Architecture                    │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
                                |
┌──────────────────────────────────────────────────────────────────┐
│                    STRATO VPS Server                             │
│   • 32 GB RAM                                                    │
│   • 8 CPU Cores                                                  │
│   • Linux                                                        │
│   • Host compute & local containerized DB storage                │
├──────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌────────────────────┐  ┌────────────────────┐                │
│  │  Ollama (Docker)   │  │  Docker Containers │                │
│  ├────────────────────┤  ├────────────────────┤                │
│  │ • gemma4-4b        │  │ • Development      │                │
│  │ • gemma4:e4b       │  │   Port: 3001       │                │
│  │ • mistral:7b       │  │                    │                │
│  │ • qwen3:8b         │  │ • Production       │                │
│  │ Port: 11434 (Int.) │  │   Port: 3000       │                │
│  └────────────────────┘  └────────────────────┘                │
│                                                                  │
│  ┌────────────────────────────────────────────────────────────┐ │
│  │  MariaDB Database Stack (Local Containerized Layers)      │ │
│  ├────────────────────────────────────────────────────────────┤ │
│  │  1. cfs-db-local (Development)                             │ │
│  │     • Host: localhost (cfs-network-dev)  • Port: 3307      │ │
│  │  2. cfs-database-staging (Staging & QA)                    │ │
│  │     • Host: localhost (cfs-network-dev)  • Port: 3308      │ │
│  │  3. cfs-database-prod (Production Live)                    │ │
│  │     • Host: localhost (cfs-network-prod) • Port: 3306      │ │
│  └────────────────────────────────────────────────────────────┘ │
│                                                                  │
└──────────────────────────────────────────────────────────────────┘

🏢 1. Infrastructure Components

1.1 STRATO VPS Server (Host Compute Environment)

Specifications:

  • RAM: 32 GB
  • CPU: 8 Cores
  • OS: Linux
  • Role: Host compute runtime execution and local containerized database storage.

Deployed Services:

A. Ollama (Docker Container Service)

yaml
Installation: Docker container running in Traefik network
Port: 11434 (internal)
Models:
  - gemma4-4b-checkin-fast:latest
  - gemma4:e4b
  - mistral:7b
  - qwen3:8b

RAM Allocation: ~12-16 GB

Benefits:
  - Seamless integration within the Traefik network stack.
  - Easy model isolation and scaling.
  - Resource usage bounded via Docker memory limits.

B. Docker Engine Stack

yaml
Docker Compose Orchestration:
  - Development Application Container (Port 3001)
  - Production Application Container (Port 3000)
  - Three local MariaDB 10.11 Containers (Ports 3306, 3307, 3308)
  - Ollama Engine Container (Port 11434 internal)
  - Traefik Reverse Proxy (Ports 80/443)

RAM Allocation:
  - Development: ~4 GB
  - Production: ~8 GB
  - Database Layer: ~4 GB
  - System overhead: ~4 GB

1.2 Containerized Multi-DB Layer (MariaDB 10.11)

Role: Local Docker container-based database orchestration.

Architecture Benefits:

  • Container Isolation: Independent MariaDB containers prevent cross-environment table conflicts.
  • Granular Backups: Backups executed per database container via local shell tools on the VPS.
  • Local Autonomy: Complete control over DB backups and configurations without third-party dependencies.
  • Portability: Fast VPS migrations by restoring standardized SQL backup dumps.
  • Safe Testing: Sandboxed development and staging databases eliminate live data loss risks.

Databases:

Database 1: Local Dev DB

yaml
Name: cfs-db-local
Host: localhost (cfs-network-dev)
Port: 3307
User: cfs_dev
Purpose: Local development & testing
Backup: Triggered by SERVER-MANAGER backup scripts

Database 2: Production DB

yaml
Name: cfs-database-prod
Host: localhost (cfs-network-prod)
Port: 3306
User: cfs_prod
Purpose: Production live state
Backup: Daily automated VPS backup script

Database 3: Staging DB

yaml
Name: cfs-database-staging
Host: localhost (cfs-network-dev)
Port: 3308
User: cfs_staging
Purpose: Staging & QA testing
Backup: Triggered by SERVER-MANAGER backup scripts

🐳 2. Docker Architecture

2.1 Directory Structure

text
/srv/cfs-llm-platform/
├── docker-compose.dev.yml          # Development Environment
├── docker-compose.prod.yml         # Production Environment
├── .env.dev                        # Dev Environment Variables
├── .env.prod                       # Prod Environment Variables
├── volumes/
│   ├── dev/
│   │   └── openwebui-data/         # App data only (no DB storage)
│   └── prod/
│       └── openwebui-data/         # App data only (no DB storage)
├── scripts/
│   ├── deploy-dev.sh
│   ├── deploy-prod.sh
│   ├── health-check.sh
│   └── backup.sh
└── logs/
    ├── dev/
    └── prod/

2.2 Development Environment

docker-compose.dev.yml:

yaml
version: '3.8'

services:
  openwebui-dev:
    image: ghcr.io/open-webui/open-webui:dev
    container_name: cfs-openwebui-dev
    ports:
      - '3001:8080'
    environment:
      # Database Connection (Uses default internal SQLite database persisted in volume)

      # Ollama Connection (Docker Container)
      - OLLAMA_BASE_URL=http://ollama:11434

      # External LLM APIs
      - OPENAI_API_KEY=${OPENAI_API_KEY_DEV}
      - GOOGLE_API_KEY=${GOOGLE_API_KEY_DEV}

      # Features
      - ENABLE_RAG=true
      - ENABLE_WEB_SEARCH=true
      - ENABLE_IMAGE_GENERATION=false

      # Token Tracking
      - ENABLE_TOKEN_TRACKING=true

      # Development Settings
      - ENV=development
      - DEBUG=true
      - LOG_LEVEL=debug

      # Security
      - WEBUI_SECRET_KEY=${WEBUI_SECRET_KEY_DEV}

    volumes:
      - ./volumes/dev/openwebui-data:/app/backend/data
      - ./logs/dev:/app/backend/logs

    restart: unless-stopped

    extra_hosts:
      - 'host.docker.internal:host-gateway'

    networks:
      - cfs-network-dev

    healthcheck:
      test: ['CMD', 'curl', '-f', 'http://localhost:8080/health']
      interval: 30s
      timeout: 10s
      retries: 3

networks:
  cfs-network-dev:
    driver: bridge

Environment Variables (.env.dev):

bash
# .env.dev

# Local Database (Containerized)
DB_USER_DEV=cfs_dev
DB_PASSWORD_DEV=secure_dev_password_here
DB_HOST_DEV=cfs-db-local
DB_PORT_DEV=3306
DB_NAME_DEV=openwebui_dev

# External APIs
OPENAI_API_KEY_DEV=sk-dev-xxx
GOOGLE_API_KEY_DEV=AIza-dev-xxx

# Security
WEBUI_SECRET_KEY_DEV=generate_with_openssl_rand_hex_32

# Ollama
OLLAMA_HOST=http://ollama:11434

2.3 Production Environment

docker-compose.prod.yml:

yaml
version: '3.8'

services:
  openwebui-prod:
    image: ghcr.io/open-webui/open-webui:main
    container_name: cfs-openwebui-prod
    ports:
      - '3000:8080'
    environment:
      # Database Connection (Uses default internal SQLite database persisted in volume)

      # Ollama Connection (Docker Container)
      - OLLAMA_BASE_URL=http://ollama:11434

      # External LLM APIs
      - OPENAI_API_KEY=${OPENAI_API_KEY_PROD}
      - GOOGLE_API_KEY=${GOOGLE_API_KEY_PROD}
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY_PROD}

      # Features
      - ENABLE_RAG=true
      - ENABLE_WEB_SEARCH=true
      - ENABLE_IMAGE_GENERATION=true

      # Token Tracking & Billing
      - ENABLE_TOKEN_TRACKING=true
      - ENABLE_USAGE_LIMITS=true
      - ENABLE_BILLING=true

      # Production Settings
      - ENV=production
      - DEBUG=false
      - LOG_LEVEL=info

      # Security
      - WEBUI_SECRET_KEY=${WEBUI_SECRET_KEY_PROD}
      - ENABLE_OAUTH=true
      - OAUTH_PROVIDER=google

      # Performance
      - WORKERS=4
      - TIMEOUT=300

    volumes:
      - ./volumes/prod/openwebui-data:/app/backend/data
      - ./logs/prod:/app/backend/logs

    restart: always

    extra_hosts:
      - 'host.docker.internal:host-gateway'

    networks:
      - cfs-network-prod

    healthcheck:
      test: ['CMD', 'curl', '-f', 'http://localhost:8080/health']
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s

    logging:
      driver: 'json-file'
      options:
        max-size: '10m'
        max-file: '3'

networks:
  cfs-network-prod:
    driver: bridge

Environment Variables (.env.prod):

bash
# .env.prod

# Local Database (Containerized)
DB_USER_PROD=cfs_prod
DB_PASSWORD_PROD=secure_prod_password_here
DB_HOST_PROD=cfs-database-prod
DB_PORT_PROD=3306
DB_NAME_PROD=openwebui_prod

# External APIs
OPENAI_API_KEY_PROD=sk-prod-xxx
GOOGLE_API_KEY_PROD=AIza-prod-xxx
ANTHROPIC_API_KEY_PROD=sk-ant-prod-xxx

# Security
WEBUI_SECRET_KEY_PROD=generate_with_openssl_rand_hex_32

# Ollama
OLLAMA_HOST=http://ollama:11434

# OAuth (Optional)
OAUTH_CLIENT_ID=your_google_client_id
OAUTH_CLIENT_SECRET=your_google_client_secret

🤖 3. Ollama: Containerized Docker Deployment

3.1 Docker Compose Configuration

Compose Service Definition:

yaml
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    volumes:
      - /opt/cfs-infra/volumes/ollama:/root/.ollama
    environment:
      - OLLAMA_NUM_PARALLEL=3
      - OLLAMA_MAX_LOADED_MODELS=2
      - OLLAMA_KEEP_ALIVE=5m
    restart: unless-stopped
    networks:
      - traefik-network

3.2 Resource & Model Layout

RAM Budget allocation mapping for 32GB target:

yaml
Ollama Models Footprints:
  gemma4-4b-checkin-fast: ~3 GB RAM
  gemma4:e4b: ~4 GB RAM
  mistral:7b: ~4-6 GB RAM
  qwen3:8b: ~5-7 GB RAM

Active Strategy:
  - Enforce max 2 models concurrently active.
  - OLLAMA_MAX_LOADED_MODELS=2
  - OLLAMA_NUM_PARALLEL=3
  - Budgeted total Ollama RAM usage: ~12-16 GB

Remaining Memory Spaces:
  - Open WebUI Development: ~4 GB
  - Open WebUI Production: ~8 GB
  - System overhead: ~4 GB

Architectural Rationale for Containerized Ollama:

  • Network Security: Traefik isolates the Ollama API, keeping it internal and accessible only to Open WebUI inside the internal network.
  • Unified Orchestration: Managed within the main docker-compose.yml lifecycle of the VPS.
  • Persistence: Models stored on host volume (/opt/cfs-infra/volumes/ollama) for easy persistence and backup.

🎛️ 4. Open WebUI Features

4.1 LLM Connection Layer

Open WebUI acts as a centralized LLM provider management gateway:

yaml
# Admin Settings → Connections

Connection 1: Ollama Local
  Type: Ollama
  URL: http://ollama:11434
  Status: ✅ Active
  Models:
    - gemma4-4b-checkin-fast:latest (Free tier)
    - gemma4:e4b (Free tier)
    - mistral:7b (Free tier)
    - qwen3:8b (Free tier)
  Cost: €0.00 / 1K tokens

Connection 2: OpenAI
  Type: OpenAI
  URL: https://api.openai.com/v1
  API Key: sk-prod-xxx
  Status: ✅ Active
  Models:
    - gpt-4-turbo (€0.01 / 1K tokens)
    - gpt-3.5-turbo (€0.0015 / 1K tokens)

Connection 3: Google Gemini
  Type: OpenAI-Compatible
  URL: https://generativelanguage.googleapis.com/v1beta
  API Key: AIza-xxx
  Status: ✅ Active
  Models:
    - gemini-1.5-pro (€0.0035 / 1K tokens)
    - gemini-pro (€0.0005 / 1K tokens)

Connection 4: Anthropic Claude
  Type: OpenAI-Compatible
  URL: https://api.anthropic.com/v1
  API Key: sk-ant-xxx
  Status: ✅ Active
  Models:
    - claude-3-opus (€0.015 / 1K tokens)
    - claude-3-sonnet (€0.003 / 1K tokens)

4.2 Role-Based Access Control & User Permissions

Granular permission boundaries up to user levels:

yaml
# Tier-based Permissions Matrix

Tier: Trial (14-day evaluation)
  Token Boundaries:
    Daily: 10,000
    Monthly: 100,000
  Allowed Models:
    - mistral:7b-instruct (Local only)
  Features:
    - Basic Chat: 
    - RAG Document ingestion: 
    - Web Search: 
    - Image Generation: 
  Cost: €0.00

Tier: Basic (€9.90 / Month)
  Token Boundaries:
    Daily: 50,000
    Monthly: 1,000,000
  Allowed Models:
    - mistral:7b-instruct
    - gemini-pro
  Features:
    - Basic Chat: 
    - RAG Document ingestion: ✅ (Max 5 files)
    - Web Search: ✅ (Max 100 queries / Month)
    - Image Generation: 
  Overage rate: €0.01 / 1K tokens

Tier: Premium (€49.90 / Month)
  Token Boundaries:
    Daily: 500,000
    Monthly: 10,000,000
  Allowed Models:
    - mistral:7b-instruct
    - gemini-1.5-pro
    - gpt-4-turbo
    - claude-3-opus
  Features:
    - Basic Chat: 
    - RAG Document ingestion: ✅ (Unlimited)
    - Web Search: ✅ (Unlimited)
    - Image Generation: ✅ (Max 100 / Month)
    - Priority support: 
  Overage rate: €0.008 / 1K tokens

Tier: Enterprise (Custom SLA)
  Token Boundaries: Unlimited
  Allowed Models: Complete catalog including custom fine-tuned weights
  Features: All + 99.9% uptime SLA
  Cost: Bespoke pricing

4.3 Token Tracking & Analytics

Automated tracking at individual user resolution:

Database schema (hosted on MariaDB 10.11):

sql
-- Token Usage Logs Table
CREATE TABLE token_usage (
    id BIGINT AUTO_INCREMENT PRIMARY KEY,
    user_id VARCHAR(36) NOT NULL,
    session_id VARCHAR(255),
    model VARCHAR(100) NOT NULL,
    prompt_tokens INTEGER NOT NULL,
    completion_tokens INTEGER NOT NULL,
    total_tokens INTEGER NOT NULL,
    cost DECIMAL(10,6) NOT NULL,
    timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_user_date ON token_usage(user_id, timestamp);
CREATE INDEX idx_model ON token_usage(model);
CREATE INDEX idx_cost ON token_usage(cost);

-- User Subscription status Table
CREATE TABLE user_subscriptions (
    user_id VARCHAR(36) PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,
    tier VARCHAR(50) NOT NULL,
    status VARCHAR(50) NOT NULL,
    started_at TIMESTAMP NOT NULL,
    expires_at TIMESTAMP NULL,
    monthly_token_limit INTEGER,
    current_monthly_usage INTEGER DEFAULT 0,
    last_reset_at TIMESTAMP NULL
);

CREATE INDEX idx_tier ON user_subscriptions(tier);
CREATE INDEX idx_status ON user_subscriptions(status);

-- Invoicing Records Table
CREATE TABLE invoices (
    id UUID PRIMARY KEY,
    user_id UUID NOT NULL,
    period_start DATE NOT NULL,
    period_end DATE NOT NULL,
    base_charge DECIMAL(10,2) NOT NULL,
    token_usage INTEGER NOT NULL,
    overage_tokens INTEGER DEFAULT 0,
    overage_charge DECIMAL(10,2) DEFAULT 0,
    total_amount DECIMAL(10,2) NOT NULL,
    status VARCHAR(50) NOT NULL,
    created_at TIMESTAMP DEFAULT NOW(),
    paid_at TIMESTAMP,
    payment_method VARCHAR(50),
    stripe_invoice_id VARCHAR(255)
);

CREATE INDEX idx_user ON invoices(user_id);
CREATE INDEX idx_status_invoices ON invoices(status);
CREATE INDEX idx_period ON invoices(period_start, period_end);

-- Daily Usage Aggregation View
CREATE MATERIALIZED VIEW daily_usage_summary AS
SELECT
    user_id,
    DATE(timestamp) as usage_date,
    model,
    SUM(total_tokens) as total_tokens,
    SUM(cost) as total_cost,
    COUNT(*) as request_count
FROM token_usage
GROUP BY user_id, DATE(timestamp), model;

-- View indexing
CREATE INDEX idx_daily_summary ON daily_usage_summary(user_id, usage_date);

Billing cycle lifecycle:

text
Day 1-28: Live Token Auditing
├─ Every LLM inference call is captured.
├─ Count processed tokens (prompt vs. completion).
├─ Compute pricing costs based on provider rates.
├─ Record directly inside local MariaDB database.
└─ Expose real-time usage meters inside user dashboards.

Day 29: Invoice Verification Previews
├─ SSO engines compile monthly billing previews.
├─ Send automated system notification emails to user bases.
├─ Initiate a 24-hour review buffer window.
└─ Allow service tier modifications before final invoices lock.

Day 30: Invoice Generation
├─ Compile final re-calculated payment statements.
├─ Print static PDF invoices.
├─ Dispatch automated checkout link notifications.
└─ Queue processes inside Stripe integrations.

Day 31: Payment Settlement Transactions
├─ Execute automated card billing routines.
├─ Success: Reset month-to-date counters.
├─ Failure: Initiate 3-day grace access tiers.
└─ Dispatch export logs to bookkeeping frameworks.

Day 34: Overdue Notifications (Dunning)
├─ Send email payment reminders.
└─ Restrict permissions to local free model catalogs.

Day 37: Session Suspensions
├─ Lock premium endpoint APIs.
└─ Flag identities for administrative review.

🚀 5. Deployment Procedures

5.1 Initial Setup

Platform setup orchestration script:

bash
#!/bin/bash
# setup-cfs-platform.sh

echo "🚀 Starting CFS LLM Platform Installation"
echo "=========================================="

# 1. Update operating system packages
echo "📦 Updating OS libraries..."
sudo apt update && sudo apt upgrade -y
sudo apt install -y curl git docker.io docker-compose

# 2. Provision native Ollama daemon
echo "🤖 Installing Ollama service daemon..."
curl -fsSL https://ollama.com/install.sh | sh
sudo systemctl enable ollama
sudo systemctl start ollama

# 3. Cache base system LLM weights
echo "📥 Fetching system models..."
ollama pull mistral:7b-instruct
ollama pull llama3:8b
ollama pull phi3:mini

# 4. Initialize storage directories
echo "📁 Initializing storage paths..."
sudo mkdir -p /srv/cfs-llm-platform/{volumes/{dev,prod},scripts,logs/{dev,prod}}
cd /srv/cfs-llm-platform

# 5. Populate Dev and Prod environments
echo "🔐 Constructing configuration keys..."
cat > .env.dev <<EOF
# Remote Database Config (All-Inkl)
DB_USER_DEV=cfs_dev
DB_PASSWORD_DEV=$(openssl rand -base64 32)
DB_HOST_DEV=db-dev.all-inkl.com
DB_PORT_DEV=5432
DB_NAME_DEV=openwebui_dev

# External APIs
OPENAI_API_KEY_DEV=sk-dev-xxx
GOOGLE_API_KEY_DEV=AIza-dev-xxx

# Cryptographic Keys
WEBUI_SECRET_KEY_DEV=$(openssl rand -hex 32)
EOF

cat > .env.prod <<EOF
# Remote Database Config (All-Inkl)
DB_USER_PROD=cfs_prod
DB_PASSWORD_PROD=$(openssl rand -base64 32)
DB_HOST_PROD=db-prod.all-inkl.com
DB_PORT_PROD=5432
DB_NAME_PROD=openwebui_prod

# External APIs
OPENAI_API_KEY_PROD=sk-prod-xxx
GOOGLE_API_KEY_PROD=AIza-prod-xxx
ANTHROPIC_API_KEY_PROD=sk-ant-prod-xxx

# Cryptographic Keys
WEBUI_SECRET_KEY_PROD=$(openssl rand -hex 32)
EOF

# 6. Notify of database provisioning steps
echo "💾 Preparing Database Layer..."
echo "⚠️  ACTION REQUIRED: Manually configure MariaDB tables inside local docker instances:"
echo "   - Create database: cfs-database-staging (Assigned user: cfs_staging)"
echo "   - Create database: cfs-database-prod (Assigned user: cfs_prod)"
read -p "Press Enter once the databases are configured..."

# 7. Start services
echo "🔧 Activating Development Environment..."
docker-compose -f docker-compose.dev.yml up -d

echo "🚀 Activating Production Environment..."
docker-compose -f docker-compose.prod.yml up -d

# 8. Uptime checks
echo "🏥 Executing post-deployment health checks..."
sleep 10
curl -f http://localhost:3001/health && echo "✅ Development Server: OK" || echo "❌ Development Server: OFFLINE"
curl -f http://localhost:3000/health && echo "✅ Production Server: OK" || echo "❌ Production Server: OFFLINE"
curl -f http://localhost:11434/api/tags && echo "✅ Ollama Engine: OK" || echo "❌ Ollama Engine: OFFLINE"

echo ""
echo "✅ Initialization Completed Successfully!"
echo "=========================================="
echo "📍 Development URL: http://dev.llm.cfs-platform.de:3001"
echo "📍 Production URL:  https://llm.cfs-platform.de"
echo "📍 Ollama Port:      http://localhost:11434"
echo ""
echo "⚠️  Pending Actions:"
echo "1. Input active API credentials in dev and prod environment files"
echo "2. Configure Nginx Reverse Proxy parameters"
echo "3. Issue SSL/TLS Certificates via Let's Encrypt"
echo "4. Activate strict host firewall policies"

5.2 Deployment Scripts

Development environment deployment:

bash
#!/bin/bash
# scripts/deploy-dev.sh

cd /srv/cfs-llm-platform

echo "🔧 Deploying Development Environment..."

# Update local docker cache
docker-compose -f docker-compose.dev.yml pull

# Re-orchestrate running dev services
docker-compose -f docker-compose.dev.yml up -d

# Verify server availability
sleep 5
curl -f http://localhost:3001/health && echo "✅ Development successfully updated" || echo "❌ Dev deployment failed"

# Display container logs
docker-compose -f docker-compose.dev.yml logs -f --tail=50

Production environment deployment:

bash
#!/bin/bash
# scripts/deploy-prod.sh

cd /srv/cfs-llm-platform

echo "🚀 Deploying Production Environment..."

# Compile snapshot of application volumes state
echo "💾 Creating volumes snapshot backup..."
docker-compose -f docker-compose.prod.yml exec openwebui-prod \
  tar -czf /app/backend/data/backup-$(date +%Y%m%d-%H%M%S).tar.gz /app/backend/data

# Pull updated images from registries
docker-compose -f docker-compose.prod.yml pull

# Zero-downtime container replacement
docker-compose -f docker-compose.prod.yml up -d --no-deps openwebui-prod

# Verify container endpoint
sleep 10
curl -f http://localhost:3000/health && echo "✅ Production successfully updated" || echo "❌ Production deployment failed"

# Display container logs
docker-compose -f docker-compose.prod.yml logs -f --tail=50 openwebui-prod

🔐 6. Security & Networking

6.1 Host Firewall Policies

bash
#!/bin/bash
# scripts/setup-firewall.sh

# Initialize UFW configurations

# Restrict SSH entry to authorized developer IPs
sudo ufw allow from YOUR_TRUSTED_DEVELOPER_IP to any port 22

# Expose standard web traffic endpoints
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp

# Restrict development endpoints to internal networks / VPN connections
sudo ufw allow from 10.0.0.0/8 to any port 3001

# Deny direct public connections targeting background microservices
sudo ufw deny 11434/tcp

# Apply firewall policies
sudo ufw enable
sudo ufw status verbose

6.2 Nginx Reverse Proxy Configuration

/etc/nginx/sites-available/cfs-llm-platform:

nginx
# Production Environment
server {
    listen 443 ssl http2;
    server_name llm.cfs-platform.de;

    # SSL Certificates Configuration
    ssl_certificate /etc/letsencrypt/live/llm.cfs-platform.de/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/llm.cfs-platform.de/privkey.pem;
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers HIGH:!aNULL:!MD5;
    ssl_prefer_server_ciphers on;

    # Security Headers
    add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
    add_header X-Frame-Options "SAMEORIGIN" always;
    add_header X-Content-Type-Options "nosniff" always;
    add_header X-XSS-Protection "1; mode=block" always;

    # Proxy traffic to Open WebUI Production container
    location / {
        proxy_pass http://localhost:3000;
        proxy_http_version 1.1;

        # WebSockets support
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';

        # Request Headers
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # Connection Timeouts
        proxy_connect_timeout 300s;
        proxy_send_timeout 300s;
        proxy_read_timeout 300s;

        # Disable buffering to stream LLM responses instantly
        proxy_buffering off;
        proxy_cache_bypass $http_upgrade;
    }

    # API Request Rate Limiting
    limit_req_zone $binary_remote_addr zone=api_limit:10m rate=10r/s;
    limit_req zone=api_limit burst=20 nodelay;

    # Log file configurations
    access_log /var/log/nginx/cfs-llm-prod-access.log;
    error_log /var/log/nginx/cfs-llm-prod-error.log;
}

# Development Environment (Basic Auth Protected)
server {
    listen 443 ssl http2;
    server_name dev.llm.cfs-platform.de;

    ssl_certificate /etc/letsencrypt/live/dev.llm.cfs-platform.de/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/dev.llm.cfs-platform.de/privkey.pem;

    # Enforce basic HTTP auth gate
    auth_basic "Development Environment Access Gate";
    auth_basic_user_file /etc/nginx/.htpasswd;

    location / {
        proxy_pass http://localhost:3001;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        proxy_set_header Host $host;
        proxy_cache_bypass $http_upgrade;
    }

    access_log /var/log/nginx/cfs-llm-dev-access.log;
    error_log /var/log/nginx/cfs-llm-dev-error.log;
}

# Insecure HTTP Redirect Rule
server {
    listen 80;
    server_name llm.cfs-platform.de dev.llm.cfs-platform.de;
    return 301 https://$server_name$request_uri;
}

Generate Developer Access Credentials:

bash
# Provision htpasswd file containing administrator password hash
sudo htpasswd -c /etc/nginx/.htpasswd cfs_admin

📊 7. Platform Monitoring & Maintenance

7.1 Health Auditing Agent

bash
#!/bin/bash
# scripts/health-check.sh

LOG_FILE="/var/log/cfs-llm-health.log"
ALERT_EMAIL="admin@cfs-platform.de"

log() {
    echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" | tee -a $LOG_FILE
}

send_alert() {
    echo "$1" | mail -s "⚠️ CFS LLM Platform Alert" $ALERT_EMAIL
}

# Verify native Ollama status
if ! curl -sf http://localhost:11434/api/tags > /dev/null; then
    log "❌ Ollama system service is OFFLINE - Triggering systemd restart..."
    sudo systemctl restart ollama
    send_alert "Ollama service failed validation; daemon restarted."
else
    log "✅ Ollama service: ONLINE"
fi

# Verify Development Container
if ! curl -sf http://localhost:3001/health > /dev/null; then
    log "❌ Development application server is OFFLINE - Restarting container..."
    cd /srv/cfs-llm-platform
    docker-compose -f docker-compose.dev.yml restart openwebui-dev
    send_alert "Development container failed health checks; container restarted."
else
    log "✅ Development application server: ONLINE"
fi

# Verify Production Container
if ! curl -sf http://localhost:3000/health > /dev/null; then
    log "❌ Production application server is OFFLINE - Restarting container..."
    cd /srv/cfs-llm-platform
    docker-compose -f docker-compose.prod.yml restart openwebui-prod
    send_alert "Production container failed health checks; container restarted."
else
    log "✅ Production application server: ONLINE"
fi

# Verify external All-Inkl database connectivity
if ! timeout 5 bash -c "cat < /dev/null > /dev/tcp/db-prod.all-inkl.com/5432"; then
    log "⚠️ External All-Inkl database cluster is UNREACHABLE."
    send_alert "Database connection failure detected at remote host db-prod.all-inkl.com."
else
    log "✅ Database cluster connection: ONLINE"
fi

# Monitor host physical storage parameters
DISK_USAGE=$(df -h / | awk 'NR==2 {print $5}' | sed 's/%//')
if [ $DISK_USAGE -gt 80 ]; then
    log "⚠️ Host disk space footprint is critical: ${DISK_USAGE}%"
    send_alert "Host storage volume running low: ${DISK_USAGE}% allocated."
fi

# Monitor host memory footprint
MEM_USAGE=$(free | grep Mem | awk '{print int($3/$2 * 100)}')
if [ $MEM_USAGE -gt 90 ]; then
    log "⚠️ Host memory saturation is high: ${MEM_USAGE}%"
    send_alert "Host memory exhaustion warning: ${MEM_USAGE}% saturated."
fi

log "✅ Health check auditing cycle completed."

Systemd cron definitions:

bash
# /etc/cron.d/cfs-llm-health

# Execute platform audits at 5-minute intervals
*/5 * * * * root /srv/cfs-llm-platform/scripts/health-check.sh

# Rotate log records daily keeping last 7 entries
0 0 * * * root find /srv/cfs-llm-platform/logs -name "*.log" -mtime +7 -delete

7.2 Backup & Retention Schedules

NOTE

Database tables reside natively on All-Inkl hosting clusters and leverage the provider's automated daily recovery snapshots.

bash
#!/bin/bash
# scripts/backup.sh

BACKUP_DIR="/srv/cfs-llm-platform/backups"
DATE=$(date +%Y%m%d_%H%M%S)

mkdir -p $BACKUP_DIR/{volumes,configs}

log() {
    echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1"
}

# 1. Compress Dev and Prod application data volumes (excluding DB)
log "📦 Compressing dev and prod storage volumes..."
tar -czf $BACKUP_DIR/volumes/dev-volumes-$DATE.tar.gz \
    /srv/cfs-llm-platform/volumes/dev/

tar -czf $BACKUP_DIR/volumes/prod-volumes-$DATE.tar.gz \
    /srv/cfs-llm-platform/volumes/prod/

# 2. Backup system configurations
log "⚙️ Compressing platform properties files..."
tar -czf $BACKUP_DIR/configs/configs-$DATE.tar.gz \
    /srv/cfs-llm-platform/*.yml \
    /srv/cfs-llm-platform/.env.* \
    /etc/nginx/sites-available/cfs-llm-platform

# 3. Compress cached Ollama weights
log "🤖 Compressing Ollama model weights..."
tar -czf $BACKUP_DIR/ollama-models-$DATE.tar.gz \
    /var/lib/ollama/models/

# 4. Prune local backup files older than 7 days
log "🗑️ Pruning stale local snapshots..."
find $BACKUP_DIR -name "*.tar.gz" -mtime +7 -delete

log "✅ Backup sequence finished successfully: $DATE"

Configure cron job execution:

bash
# Execute local storage backups daily at 02:00 AM
0 2 * * * /srv/cfs-llm-platform/scripts/backup.sh

📈 8. Scaling Roadmap

Phase 1: Bootstrap Uptime (Current Status)

yaml
Status: Deployed & Active
Host Infrastructure:
  - STRATO VPS compute (32GB RAM, 8 CPU Cores)
  - Isolated All-Inkl database clusters
  - Native Ollama inference service
  - Containerized Open WebUI Dev and Prod instances

Key Features:
  - Multi-LLM provider abstraction layer.
  - Role-based permissions matrix.
  - Real-time token usage logs.
  - Basic monthly billing invoices generator.

Constraints:
  - Single host point of failure.
  - Shared local inference service resource locks.
  - Memory ceilings (32GB RAM total).

Phase 2: High Availability Scaling (3-6 Months Pipeline)

yaml
Status: Planning Stage
Target Improvements:
  - High availability load balancer layers.
  - Redis caches supporting distributed sessions.
  - Dedicated isolated LLM inference engines.
  - Advanced Grafana performance monitoring dashboards.
  - Automated offsite backup archives syncing.

Target Infrastructure:
  - Upgrade VPS compute capacity (64GB RAM).
  - Physical database layers remains on All-Inkl.
  - Containerized Redis daemon.

Phase 3: Cloud Native Cluster (6-12 Months Pipeline)

yaml
Status: Vision Stage
Target Improvements:
  - Kubernetes cluster auto-scaling orchestration.
  - Geo-replicated deployments.
  - Auto-scaling clusters of GPU nodes.
  - Microservices decoupling.

✅ 9. Architecture Decision Records Summary

Platform LayerChosen TechnologyHosting SiteADR Rationale
Host ComputeSTRATO VPSCloud (32GB RAM, 8 Cores)Highly cost-efficient computing layer.
LLM InferenceOllamaHost OS (Native systemd)Eliminates hypervisor overhead for maximum performance.
Development AppDocker ContainerHost OS (Port 3001)Provides isolated testing sandboxes.
Production AppDocker ContainerHost OS (Port 3000)Enables smooth deployments and updates.
Dev DatabaseMariaDB 10.11VPS Host (cfs-db-local)Safely isolates dev trials from live production databases.
Prod DatabaseMariaDB 10.11VPS Host (cfs-database-prod)Keeps user identity and billing data secure from server crashes.
Uptime MonitorCron Auditing AgentHost OS (Internal cron)Provides automated self-healing scripts on failure.
User AccessOpen WebUI RBACContainer EngineProvides user directory management tools out-of-the-box.

🎯 10. Core Architectural Strengths

✅ Disaster Uptime Tolerance

  • Persistent system data is stored externally on All-Inkl clusters, isolated from the VPS host.
  • VPS server failures do not cause permanent data loss.
  • Dev sandbox updates do not impact production database security.

✅ Optimized Compute Performance

  • Running Ollama natively ensures optimal host memory utilization (32GB RAM).
  • Zero hypervisor layers are injected between LLM inference and the host CPU cores.

✅ Infrastructure Portability

  • Absolute separation between compute services and database storage.
  • VPS resources can be migrated to other cloud host models in minutes.

📝 11. VPS Host RAM Allocation Mapping (32GB RAM Capacity)

yaml
Total VPS Capacity: 32 GB RAM
==============================

Ollama Native Engine (Inference):
  Allocated Limit: 12-16 GB
  Resource footprint detail:
    - Parallel execution limits: 2 active models concurrently.
    - Mistral weights footprint: ~4-6 GB
    - Llama3 weights footprint:  ~5-7 GB
    - Phi3 weights footprint:    ~2-3 GB

Open WebUI Production Container:
  Allocated Limit: 8 GB
  Resource footprint detail:
    - Direct user interactions.
    - Token tracking logging.
    - Billing systems tasks.

Open WebUI Development Container:
  Allocated Limit: 4 GB
  Resource footprint detail:
    - Sandboxed developer workspaces.
    - Debug logging engines.

System Daemons & OS Overhead:
  Allocated Limit: 4 GB
  Resource footprint detail:
    - Linux kernel processes.
    - Docker engine runtime.
    - Nginx reverse proxies.

Reserved Safety Buffer:
  Allocated Limit: 4 GB
  Resource footprint detail:
    - Absorbs transient utilization peaks.

🚀 12. Quick Start Deployment Guide

Step 1: Connect to VPS Host

bash
# Connect to the Stratos VPS
ssh root@your-strato-server.com

# Synchronize system packages lists
sudo apt update && sudo apt upgrade -y

Step 2: Initialize Platform Deploys

bash
# Execute setup automation scripts
curl -fsSL https://raw.githubusercontent.com/your-repo/setup-cfs-platform.sh | bash

Step 3: Configure Local MariaDB Instances

text
1. Connect to local MariaDB containers on the VPS.
2. Create staging database named 'cfs-database-staging' (Assign user: cfs_staging).
3. Create production database named 'cfs-database-prod' (Assign user: cfs_prod).
4. Verify connections from the application stack.
5. Save generated credentials securely in environment variables.

Step 4: Populate Environment Files

bash
# Edit credentials inside properties files
nano /srv/cfs-llm-platform/.env.dev
nano /srv/cfs-llm-platform/.env.prod

Step 5: Orchestrate Services

bash
cd /srv/cfs-llm-platform

# Activate Dev environments
docker-compose -f docker-compose.dev.yml up -d

# Activate Prod environments
docker-compose -f docker-compose.prod.yml up -d

Step 6: Verify Endpoints Availability

bash
# Query health check endpoints
curl http://localhost:3001/health  # Dev App
curl http://localhost:3000/health  # Prod App
curl http://localhost:11434/api/tags  # Ollama Models

📞 Support & Contacts

Documentation Site: https://docs.cfs-platform.de

Internal Support Helpdesk: support@cfs-platform.de


📄 License & Credits

© 2026 CFS Platform. All rights reserved.

Last Updated: January 2026

Version: 1.0

Author: Christian Friedrich Schacht

Released under proprietary license.