Appearance
🏗️ CFS LLM Platform: Architecture Documentation
📋 Project Overview
Objective: Multi-LLM management platform with user permissions and billing integrations.
Version: 1.0 (Initialization Phase)
Date: January 2026
🎯 Infrastructure Overview
Hosting Architecture
text
┌─────────────────────────────────────────────────────────────────┐
│ CFS LLM Platform │
│ Initialization Architecture │
│ │
└─────────────────────────────────────────────────────────────────┘
|
┌──────────────────────────────────────────────────────────────────┐
│ STRATO VPS Server │
│ • 32 GB RAM │
│ • 8 CPU Cores │
│ • Linux │
│ • Host compute & local containerized DB storage │
├──────────────────────────────────────────────────────────────────┤
│ │
│ ┌────────────────────┐ ┌────────────────────┐ │
│ │ Ollama (Docker) │ │ Docker Containers │ │
│ ├────────────────────┤ ├────────────────────┤ │
│ │ • gemma4-4b │ │ • Development │ │
│ │ • gemma4:e4b │ │ Port: 3001 │ │
│ │ • mistral:7b │ │ │ │
│ │ • qwen3:8b │ │ • Production │ │
│ │ Port: 11434 (Int.) │ │ Port: 3000 │ │
│ └────────────────────┘ └────────────────────┘ │
│ │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ MariaDB Database Stack (Local Containerized Layers) │ │
│ ├────────────────────────────────────────────────────────────┤ │
│ │ 1. cfs-db-local (Development) │ │
│ │ • Host: localhost (cfs-network-dev) • Port: 3307 │ │
│ │ 2. cfs-database-staging (Staging & QA) │ │
│ │ • Host: localhost (cfs-network-dev) • Port: 3308 │ │
│ │ 3. cfs-database-prod (Production Live) │ │
│ │ • Host: localhost (cfs-network-prod) • Port: 3306 │ │
│ └────────────────────────────────────────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────────┘🏢 1. Infrastructure Components
1.1 STRATO VPS Server (Host Compute Environment)
Specifications:
- RAM: 32 GB
- CPU: 8 Cores
- OS: Linux
- Role: Host compute runtime execution and local containerized database storage.
Deployed Services:
A. Ollama (Docker Container Service)
yaml
Installation: Docker container running in Traefik network
Port: 11434 (internal)
Models:
- gemma4-4b-checkin-fast:latest
- gemma4:e4b
- mistral:7b
- qwen3:8b
RAM Allocation: ~12-16 GB
Benefits:
- Seamless integration within the Traefik network stack.
- Easy model isolation and scaling.
- Resource usage bounded via Docker memory limits.B. Docker Engine Stack
yaml
Docker Compose Orchestration:
- Development Application Container (Port 3001)
- Production Application Container (Port 3000)
- Three local MariaDB 10.11 Containers (Ports 3306, 3307, 3308)
- Ollama Engine Container (Port 11434 internal)
- Traefik Reverse Proxy (Ports 80/443)
RAM Allocation:
- Development: ~4 GB
- Production: ~8 GB
- Database Layer: ~4 GB
- System overhead: ~4 GB1.2 Containerized Multi-DB Layer (MariaDB 10.11)
Role: Local Docker container-based database orchestration.
Architecture Benefits:
- ✅ Container Isolation: Independent MariaDB containers prevent cross-environment table conflicts.
- ✅ Granular Backups: Backups executed per database container via local shell tools on the VPS.
- ✅ Local Autonomy: Complete control over DB backups and configurations without third-party dependencies.
- ✅ Portability: Fast VPS migrations by restoring standardized SQL backup dumps.
- ✅ Safe Testing: Sandboxed development and staging databases eliminate live data loss risks.
Databases:
Database 1: Local Dev DB
yaml
Name: cfs-db-local
Host: localhost (cfs-network-dev)
Port: 3307
User: cfs_dev
Purpose: Local development & testing
Backup: Triggered by SERVER-MANAGER backup scriptsDatabase 2: Production DB
yaml
Name: cfs-database-prod
Host: localhost (cfs-network-prod)
Port: 3306
User: cfs_prod
Purpose: Production live state
Backup: Daily automated VPS backup scriptDatabase 3: Staging DB
yaml
Name: cfs-database-staging
Host: localhost (cfs-network-dev)
Port: 3308
User: cfs_staging
Purpose: Staging & QA testing
Backup: Triggered by SERVER-MANAGER backup scripts🐳 2. Docker Architecture
2.1 Directory Structure
text
/srv/cfs-llm-platform/
├── docker-compose.dev.yml # Development Environment
├── docker-compose.prod.yml # Production Environment
├── .env.dev # Dev Environment Variables
├── .env.prod # Prod Environment Variables
├── volumes/
│ ├── dev/
│ │ └── openwebui-data/ # App data only (no DB storage)
│ └── prod/
│ └── openwebui-data/ # App data only (no DB storage)
├── scripts/
│ ├── deploy-dev.sh
│ ├── deploy-prod.sh
│ ├── health-check.sh
│ └── backup.sh
└── logs/
├── dev/
└── prod/2.2 Development Environment
docker-compose.dev.yml:
yaml
version: '3.8'
services:
openwebui-dev:
image: ghcr.io/open-webui/open-webui:dev
container_name: cfs-openwebui-dev
ports:
- '3001:8080'
environment:
# Database Connection (Uses default internal SQLite database persisted in volume)
# Ollama Connection (Docker Container)
- OLLAMA_BASE_URL=http://ollama:11434
# External LLM APIs
- OPENAI_API_KEY=${OPENAI_API_KEY_DEV}
- GOOGLE_API_KEY=${GOOGLE_API_KEY_DEV}
# Features
- ENABLE_RAG=true
- ENABLE_WEB_SEARCH=true
- ENABLE_IMAGE_GENERATION=false
# Token Tracking
- ENABLE_TOKEN_TRACKING=true
# Development Settings
- ENV=development
- DEBUG=true
- LOG_LEVEL=debug
# Security
- WEBUI_SECRET_KEY=${WEBUI_SECRET_KEY_DEV}
volumes:
- ./volumes/dev/openwebui-data:/app/backend/data
- ./logs/dev:/app/backend/logs
restart: unless-stopped
extra_hosts:
- 'host.docker.internal:host-gateway'
networks:
- cfs-network-dev
healthcheck:
test: ['CMD', 'curl', '-f', 'http://localhost:8080/health']
interval: 30s
timeout: 10s
retries: 3
networks:
cfs-network-dev:
driver: bridgeEnvironment Variables (.env.dev):
bash
# .env.dev
# Local Database (Containerized)
DB_USER_DEV=cfs_dev
DB_PASSWORD_DEV=secure_dev_password_here
DB_HOST_DEV=cfs-db-local
DB_PORT_DEV=3306
DB_NAME_DEV=openwebui_dev
# External APIs
OPENAI_API_KEY_DEV=sk-dev-xxx
GOOGLE_API_KEY_DEV=AIza-dev-xxx
# Security
WEBUI_SECRET_KEY_DEV=generate_with_openssl_rand_hex_32
# Ollama
OLLAMA_HOST=http://ollama:114342.3 Production Environment
docker-compose.prod.yml:
yaml
version: '3.8'
services:
openwebui-prod:
image: ghcr.io/open-webui/open-webui:main
container_name: cfs-openwebui-prod
ports:
- '3000:8080'
environment:
# Database Connection (Uses default internal SQLite database persisted in volume)
# Ollama Connection (Docker Container)
- OLLAMA_BASE_URL=http://ollama:11434
# External LLM APIs
- OPENAI_API_KEY=${OPENAI_API_KEY_PROD}
- GOOGLE_API_KEY=${GOOGLE_API_KEY_PROD}
- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY_PROD}
# Features
- ENABLE_RAG=true
- ENABLE_WEB_SEARCH=true
- ENABLE_IMAGE_GENERATION=true
# Token Tracking & Billing
- ENABLE_TOKEN_TRACKING=true
- ENABLE_USAGE_LIMITS=true
- ENABLE_BILLING=true
# Production Settings
- ENV=production
- DEBUG=false
- LOG_LEVEL=info
# Security
- WEBUI_SECRET_KEY=${WEBUI_SECRET_KEY_PROD}
- ENABLE_OAUTH=true
- OAUTH_PROVIDER=google
# Performance
- WORKERS=4
- TIMEOUT=300
volumes:
- ./volumes/prod/openwebui-data:/app/backend/data
- ./logs/prod:/app/backend/logs
restart: always
extra_hosts:
- 'host.docker.internal:host-gateway'
networks:
- cfs-network-prod
healthcheck:
test: ['CMD', 'curl', '-f', 'http://localhost:8080/health']
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
logging:
driver: 'json-file'
options:
max-size: '10m'
max-file: '3'
networks:
cfs-network-prod:
driver: bridgeEnvironment Variables (.env.prod):
bash
# .env.prod
# Local Database (Containerized)
DB_USER_PROD=cfs_prod
DB_PASSWORD_PROD=secure_prod_password_here
DB_HOST_PROD=cfs-database-prod
DB_PORT_PROD=3306
DB_NAME_PROD=openwebui_prod
# External APIs
OPENAI_API_KEY_PROD=sk-prod-xxx
GOOGLE_API_KEY_PROD=AIza-prod-xxx
ANTHROPIC_API_KEY_PROD=sk-ant-prod-xxx
# Security
WEBUI_SECRET_KEY_PROD=generate_with_openssl_rand_hex_32
# Ollama
OLLAMA_HOST=http://ollama:11434
# OAuth (Optional)
OAUTH_CLIENT_ID=your_google_client_id
OAUTH_CLIENT_SECRET=your_google_client_secret🤖 3. Ollama: Containerized Docker Deployment
3.1 Docker Compose Configuration
Compose Service Definition:
yaml
ollama:
image: ollama/ollama:latest
container_name: ollama
volumes:
- /opt/cfs-infra/volumes/ollama:/root/.ollama
environment:
- OLLAMA_NUM_PARALLEL=3
- OLLAMA_MAX_LOADED_MODELS=2
- OLLAMA_KEEP_ALIVE=5m
restart: unless-stopped
networks:
- traefik-network3.2 Resource & Model Layout
RAM Budget allocation mapping for 32GB target:
yaml
Ollama Models Footprints:
gemma4-4b-checkin-fast: ~3 GB RAM
gemma4:e4b: ~4 GB RAM
mistral:7b: ~4-6 GB RAM
qwen3:8b: ~5-7 GB RAM
Active Strategy:
- Enforce max 2 models concurrently active.
- OLLAMA_MAX_LOADED_MODELS=2
- OLLAMA_NUM_PARALLEL=3
- Budgeted total Ollama RAM usage: ~12-16 GB
Remaining Memory Spaces:
- Open WebUI Development: ~4 GB
- Open WebUI Production: ~8 GB
- System overhead: ~4 GBArchitectural Rationale for Containerized Ollama:
- ✅ Network Security: Traefik isolates the Ollama API, keeping it internal and accessible only to Open WebUI inside the internal network.
- ✅ Unified Orchestration: Managed within the main
docker-compose.ymllifecycle of the VPS. - ✅ Persistence: Models stored on host volume (
/opt/cfs-infra/volumes/ollama) for easy persistence and backup.
🎛️ 4. Open WebUI Features
4.1 LLM Connection Layer
Open WebUI acts as a centralized LLM provider management gateway:
yaml
# Admin Settings → Connections
Connection 1: Ollama Local
Type: Ollama
URL: http://ollama:11434
Status: ✅ Active
Models:
- gemma4-4b-checkin-fast:latest (Free tier)
- gemma4:e4b (Free tier)
- mistral:7b (Free tier)
- qwen3:8b (Free tier)
Cost: €0.00 / 1K tokens
Connection 2: OpenAI
Type: OpenAI
URL: https://api.openai.com/v1
API Key: sk-prod-xxx
Status: ✅ Active
Models:
- gpt-4-turbo (€0.01 / 1K tokens)
- gpt-3.5-turbo (€0.0015 / 1K tokens)
Connection 3: Google Gemini
Type: OpenAI-Compatible
URL: https://generativelanguage.googleapis.com/v1beta
API Key: AIza-xxx
Status: ✅ Active
Models:
- gemini-1.5-pro (€0.0035 / 1K tokens)
- gemini-pro (€0.0005 / 1K tokens)
Connection 4: Anthropic Claude
Type: OpenAI-Compatible
URL: https://api.anthropic.com/v1
API Key: sk-ant-xxx
Status: ✅ Active
Models:
- claude-3-opus (€0.015 / 1K tokens)
- claude-3-sonnet (€0.003 / 1K tokens)4.2 Role-Based Access Control & User Permissions
Granular permission boundaries up to user levels:
yaml
# Tier-based Permissions Matrix
Tier: Trial (14-day evaluation)
Token Boundaries:
Daily: 10,000
Monthly: 100,000
Allowed Models:
- mistral:7b-instruct (Local only)
Features:
- Basic Chat: ✅
- RAG Document ingestion: ❌
- Web Search: ❌
- Image Generation: ❌
Cost: €0.00
Tier: Basic (€9.90 / Month)
Token Boundaries:
Daily: 50,000
Monthly: 1,000,000
Allowed Models:
- mistral:7b-instruct
- gemini-pro
Features:
- Basic Chat: ✅
- RAG Document ingestion: ✅ (Max 5 files)
- Web Search: ✅ (Max 100 queries / Month)
- Image Generation: ❌
Overage rate: €0.01 / 1K tokens
Tier: Premium (€49.90 / Month)
Token Boundaries:
Daily: 500,000
Monthly: 10,000,000
Allowed Models:
- mistral:7b-instruct
- gemini-1.5-pro
- gpt-4-turbo
- claude-3-opus
Features:
- Basic Chat: ✅
- RAG Document ingestion: ✅ (Unlimited)
- Web Search: ✅ (Unlimited)
- Image Generation: ✅ (Max 100 / Month)
- Priority support: ✅
Overage rate: €0.008 / 1K tokens
Tier: Enterprise (Custom SLA)
Token Boundaries: Unlimited
Allowed Models: Complete catalog including custom fine-tuned weights
Features: All + 99.9% uptime SLA
Cost: Bespoke pricing4.3 Token Tracking & Analytics
Automated tracking at individual user resolution:
Database schema (hosted on MariaDB 10.11):
sql
-- Token Usage Logs Table
CREATE TABLE token_usage (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
user_id VARCHAR(36) NOT NULL,
session_id VARCHAR(255),
model VARCHAR(100) NOT NULL,
prompt_tokens INTEGER NOT NULL,
completion_tokens INTEGER NOT NULL,
total_tokens INTEGER NOT NULL,
cost DECIMAL(10,6) NOT NULL,
timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX idx_user_date ON token_usage(user_id, timestamp);
CREATE INDEX idx_model ON token_usage(model);
CREATE INDEX idx_cost ON token_usage(cost);
-- User Subscription status Table
CREATE TABLE user_subscriptions (
user_id VARCHAR(36) PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL,
tier VARCHAR(50) NOT NULL,
status VARCHAR(50) NOT NULL,
started_at TIMESTAMP NOT NULL,
expires_at TIMESTAMP NULL,
monthly_token_limit INTEGER,
current_monthly_usage INTEGER DEFAULT 0,
last_reset_at TIMESTAMP NULL
);
CREATE INDEX idx_tier ON user_subscriptions(tier);
CREATE INDEX idx_status ON user_subscriptions(status);
-- Invoicing Records Table
CREATE TABLE invoices (
id UUID PRIMARY KEY,
user_id UUID NOT NULL,
period_start DATE NOT NULL,
period_end DATE NOT NULL,
base_charge DECIMAL(10,2) NOT NULL,
token_usage INTEGER NOT NULL,
overage_tokens INTEGER DEFAULT 0,
overage_charge DECIMAL(10,2) DEFAULT 0,
total_amount DECIMAL(10,2) NOT NULL,
status VARCHAR(50) NOT NULL,
created_at TIMESTAMP DEFAULT NOW(),
paid_at TIMESTAMP,
payment_method VARCHAR(50),
stripe_invoice_id VARCHAR(255)
);
CREATE INDEX idx_user ON invoices(user_id);
CREATE INDEX idx_status_invoices ON invoices(status);
CREATE INDEX idx_period ON invoices(period_start, period_end);
-- Daily Usage Aggregation View
CREATE MATERIALIZED VIEW daily_usage_summary AS
SELECT
user_id,
DATE(timestamp) as usage_date,
model,
SUM(total_tokens) as total_tokens,
SUM(cost) as total_cost,
COUNT(*) as request_count
FROM token_usage
GROUP BY user_id, DATE(timestamp), model;
-- View indexing
CREATE INDEX idx_daily_summary ON daily_usage_summary(user_id, usage_date);Billing cycle lifecycle:
text
Day 1-28: Live Token Auditing
├─ Every LLM inference call is captured.
├─ Count processed tokens (prompt vs. completion).
├─ Compute pricing costs based on provider rates.
├─ Record directly inside local MariaDB database.
└─ Expose real-time usage meters inside user dashboards.
Day 29: Invoice Verification Previews
├─ SSO engines compile monthly billing previews.
├─ Send automated system notification emails to user bases.
├─ Initiate a 24-hour review buffer window.
└─ Allow service tier modifications before final invoices lock.
Day 30: Invoice Generation
├─ Compile final re-calculated payment statements.
├─ Print static PDF invoices.
├─ Dispatch automated checkout link notifications.
└─ Queue processes inside Stripe integrations.
Day 31: Payment Settlement Transactions
├─ Execute automated card billing routines.
├─ Success: Reset month-to-date counters.
├─ Failure: Initiate 3-day grace access tiers.
└─ Dispatch export logs to bookkeeping frameworks.
Day 34: Overdue Notifications (Dunning)
├─ Send email payment reminders.
└─ Restrict permissions to local free model catalogs.
Day 37: Session Suspensions
├─ Lock premium endpoint APIs.
└─ Flag identities for administrative review.🚀 5. Deployment Procedures
5.1 Initial Setup
Platform setup orchestration script:
bash
#!/bin/bash
# setup-cfs-platform.sh
echo "🚀 Starting CFS LLM Platform Installation"
echo "=========================================="
# 1. Update operating system packages
echo "📦 Updating OS libraries..."
sudo apt update && sudo apt upgrade -y
sudo apt install -y curl git docker.io docker-compose
# 2. Provision native Ollama daemon
echo "🤖 Installing Ollama service daemon..."
curl -fsSL https://ollama.com/install.sh | sh
sudo systemctl enable ollama
sudo systemctl start ollama
# 3. Cache base system LLM weights
echo "📥 Fetching system models..."
ollama pull mistral:7b-instruct
ollama pull llama3:8b
ollama pull phi3:mini
# 4. Initialize storage directories
echo "📁 Initializing storage paths..."
sudo mkdir -p /srv/cfs-llm-platform/{volumes/{dev,prod},scripts,logs/{dev,prod}}
cd /srv/cfs-llm-platform
# 5. Populate Dev and Prod environments
echo "🔐 Constructing configuration keys..."
cat > .env.dev <<EOF
# Remote Database Config (All-Inkl)
DB_USER_DEV=cfs_dev
DB_PASSWORD_DEV=$(openssl rand -base64 32)
DB_HOST_DEV=db-dev.all-inkl.com
DB_PORT_DEV=5432
DB_NAME_DEV=openwebui_dev
# External APIs
OPENAI_API_KEY_DEV=sk-dev-xxx
GOOGLE_API_KEY_DEV=AIza-dev-xxx
# Cryptographic Keys
WEBUI_SECRET_KEY_DEV=$(openssl rand -hex 32)
EOF
cat > .env.prod <<EOF
# Remote Database Config (All-Inkl)
DB_USER_PROD=cfs_prod
DB_PASSWORD_PROD=$(openssl rand -base64 32)
DB_HOST_PROD=db-prod.all-inkl.com
DB_PORT_PROD=5432
DB_NAME_PROD=openwebui_prod
# External APIs
OPENAI_API_KEY_PROD=sk-prod-xxx
GOOGLE_API_KEY_PROD=AIza-prod-xxx
ANTHROPIC_API_KEY_PROD=sk-ant-prod-xxx
# Cryptographic Keys
WEBUI_SECRET_KEY_PROD=$(openssl rand -hex 32)
EOF
# 6. Notify of database provisioning steps
echo "💾 Preparing Database Layer..."
echo "⚠️ ACTION REQUIRED: Manually configure MariaDB tables inside local docker instances:"
echo " - Create database: cfs-database-staging (Assigned user: cfs_staging)"
echo " - Create database: cfs-database-prod (Assigned user: cfs_prod)"
read -p "Press Enter once the databases are configured..."
# 7. Start services
echo "🔧 Activating Development Environment..."
docker-compose -f docker-compose.dev.yml up -d
echo "🚀 Activating Production Environment..."
docker-compose -f docker-compose.prod.yml up -d
# 8. Uptime checks
echo "🏥 Executing post-deployment health checks..."
sleep 10
curl -f http://localhost:3001/health && echo "✅ Development Server: OK" || echo "❌ Development Server: OFFLINE"
curl -f http://localhost:3000/health && echo "✅ Production Server: OK" || echo "❌ Production Server: OFFLINE"
curl -f http://localhost:11434/api/tags && echo "✅ Ollama Engine: OK" || echo "❌ Ollama Engine: OFFLINE"
echo ""
echo "✅ Initialization Completed Successfully!"
echo "=========================================="
echo "📍 Development URL: http://dev.llm.cfs-platform.de:3001"
echo "📍 Production URL: https://llm.cfs-platform.de"
echo "📍 Ollama Port: http://localhost:11434"
echo ""
echo "⚠️ Pending Actions:"
echo "1. Input active API credentials in dev and prod environment files"
echo "2. Configure Nginx Reverse Proxy parameters"
echo "3. Issue SSL/TLS Certificates via Let's Encrypt"
echo "4. Activate strict host firewall policies"5.2 Deployment Scripts
Development environment deployment:
bash
#!/bin/bash
# scripts/deploy-dev.sh
cd /srv/cfs-llm-platform
echo "🔧 Deploying Development Environment..."
# Update local docker cache
docker-compose -f docker-compose.dev.yml pull
# Re-orchestrate running dev services
docker-compose -f docker-compose.dev.yml up -d
# Verify server availability
sleep 5
curl -f http://localhost:3001/health && echo "✅ Development successfully updated" || echo "❌ Dev deployment failed"
# Display container logs
docker-compose -f docker-compose.dev.yml logs -f --tail=50Production environment deployment:
bash
#!/bin/bash
# scripts/deploy-prod.sh
cd /srv/cfs-llm-platform
echo "🚀 Deploying Production Environment..."
# Compile snapshot of application volumes state
echo "💾 Creating volumes snapshot backup..."
docker-compose -f docker-compose.prod.yml exec openwebui-prod \
tar -czf /app/backend/data/backup-$(date +%Y%m%d-%H%M%S).tar.gz /app/backend/data
# Pull updated images from registries
docker-compose -f docker-compose.prod.yml pull
# Zero-downtime container replacement
docker-compose -f docker-compose.prod.yml up -d --no-deps openwebui-prod
# Verify container endpoint
sleep 10
curl -f http://localhost:3000/health && echo "✅ Production successfully updated" || echo "❌ Production deployment failed"
# Display container logs
docker-compose -f docker-compose.prod.yml logs -f --tail=50 openwebui-prod🔐 6. Security & Networking
6.1 Host Firewall Policies
bash
#!/bin/bash
# scripts/setup-firewall.sh
# Initialize UFW configurations
# Restrict SSH entry to authorized developer IPs
sudo ufw allow from YOUR_TRUSTED_DEVELOPER_IP to any port 22
# Expose standard web traffic endpoints
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
# Restrict development endpoints to internal networks / VPN connections
sudo ufw allow from 10.0.0.0/8 to any port 3001
# Deny direct public connections targeting background microservices
sudo ufw deny 11434/tcp
# Apply firewall policies
sudo ufw enable
sudo ufw status verbose6.2 Nginx Reverse Proxy Configuration
/etc/nginx/sites-available/cfs-llm-platform:
nginx
# Production Environment
server {
listen 443 ssl http2;
server_name llm.cfs-platform.de;
# SSL Certificates Configuration
ssl_certificate /etc/letsencrypt/live/llm.cfs-platform.de/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/llm.cfs-platform.de/privkey.pem;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers HIGH:!aNULL:!MD5;
ssl_prefer_server_ciphers on;
# Security Headers
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
add_header X-Frame-Options "SAMEORIGIN" always;
add_header X-Content-Type-Options "nosniff" always;
add_header X-XSS-Protection "1; mode=block" always;
# Proxy traffic to Open WebUI Production container
location / {
proxy_pass http://localhost:3000;
proxy_http_version 1.1;
# WebSockets support
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
# Request Headers
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# Connection Timeouts
proxy_connect_timeout 300s;
proxy_send_timeout 300s;
proxy_read_timeout 300s;
# Disable buffering to stream LLM responses instantly
proxy_buffering off;
proxy_cache_bypass $http_upgrade;
}
# API Request Rate Limiting
limit_req_zone $binary_remote_addr zone=api_limit:10m rate=10r/s;
limit_req zone=api_limit burst=20 nodelay;
# Log file configurations
access_log /var/log/nginx/cfs-llm-prod-access.log;
error_log /var/log/nginx/cfs-llm-prod-error.log;
}
# Development Environment (Basic Auth Protected)
server {
listen 443 ssl http2;
server_name dev.llm.cfs-platform.de;
ssl_certificate /etc/letsencrypt/live/dev.llm.cfs-platform.de/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/dev.llm.cfs-platform.de/privkey.pem;
# Enforce basic HTTP auth gate
auth_basic "Development Environment Access Gate";
auth_basic_user_file /etc/nginx/.htpasswd;
location / {
proxy_pass http://localhost:3001;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
proxy_set_header Host $host;
proxy_cache_bypass $http_upgrade;
}
access_log /var/log/nginx/cfs-llm-dev-access.log;
error_log /var/log/nginx/cfs-llm-dev-error.log;
}
# Insecure HTTP Redirect Rule
server {
listen 80;
server_name llm.cfs-platform.de dev.llm.cfs-platform.de;
return 301 https://$server_name$request_uri;
}Generate Developer Access Credentials:
bash
# Provision htpasswd file containing administrator password hash
sudo htpasswd -c /etc/nginx/.htpasswd cfs_admin📊 7. Platform Monitoring & Maintenance
7.1 Health Auditing Agent
bash
#!/bin/bash
# scripts/health-check.sh
LOG_FILE="/var/log/cfs-llm-health.log"
ALERT_EMAIL="admin@cfs-platform.de"
log() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" | tee -a $LOG_FILE
}
send_alert() {
echo "$1" | mail -s "⚠️ CFS LLM Platform Alert" $ALERT_EMAIL
}
# Verify native Ollama status
if ! curl -sf http://localhost:11434/api/tags > /dev/null; then
log "❌ Ollama system service is OFFLINE - Triggering systemd restart..."
sudo systemctl restart ollama
send_alert "Ollama service failed validation; daemon restarted."
else
log "✅ Ollama service: ONLINE"
fi
# Verify Development Container
if ! curl -sf http://localhost:3001/health > /dev/null; then
log "❌ Development application server is OFFLINE - Restarting container..."
cd /srv/cfs-llm-platform
docker-compose -f docker-compose.dev.yml restart openwebui-dev
send_alert "Development container failed health checks; container restarted."
else
log "✅ Development application server: ONLINE"
fi
# Verify Production Container
if ! curl -sf http://localhost:3000/health > /dev/null; then
log "❌ Production application server is OFFLINE - Restarting container..."
cd /srv/cfs-llm-platform
docker-compose -f docker-compose.prod.yml restart openwebui-prod
send_alert "Production container failed health checks; container restarted."
else
log "✅ Production application server: ONLINE"
fi
# Verify external All-Inkl database connectivity
if ! timeout 5 bash -c "cat < /dev/null > /dev/tcp/db-prod.all-inkl.com/5432"; then
log "⚠️ External All-Inkl database cluster is UNREACHABLE."
send_alert "Database connection failure detected at remote host db-prod.all-inkl.com."
else
log "✅ Database cluster connection: ONLINE"
fi
# Monitor host physical storage parameters
DISK_USAGE=$(df -h / | awk 'NR==2 {print $5}' | sed 's/%//')
if [ $DISK_USAGE -gt 80 ]; then
log "⚠️ Host disk space footprint is critical: ${DISK_USAGE}%"
send_alert "Host storage volume running low: ${DISK_USAGE}% allocated."
fi
# Monitor host memory footprint
MEM_USAGE=$(free | grep Mem | awk '{print int($3/$2 * 100)}')
if [ $MEM_USAGE -gt 90 ]; then
log "⚠️ Host memory saturation is high: ${MEM_USAGE}%"
send_alert "Host memory exhaustion warning: ${MEM_USAGE}% saturated."
fi
log "✅ Health check auditing cycle completed."Systemd cron definitions:
bash
# /etc/cron.d/cfs-llm-health
# Execute platform audits at 5-minute intervals
*/5 * * * * root /srv/cfs-llm-platform/scripts/health-check.sh
# Rotate log records daily keeping last 7 entries
0 0 * * * root find /srv/cfs-llm-platform/logs -name "*.log" -mtime +7 -delete7.2 Backup & Retention Schedules
NOTE
Database tables reside natively on All-Inkl hosting clusters and leverage the provider's automated daily recovery snapshots.
bash
#!/bin/bash
# scripts/backup.sh
BACKUP_DIR="/srv/cfs-llm-platform/backups"
DATE=$(date +%Y%m%d_%H%M%S)
mkdir -p $BACKUP_DIR/{volumes,configs}
log() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1"
}
# 1. Compress Dev and Prod application data volumes (excluding DB)
log "📦 Compressing dev and prod storage volumes..."
tar -czf $BACKUP_DIR/volumes/dev-volumes-$DATE.tar.gz \
/srv/cfs-llm-platform/volumes/dev/
tar -czf $BACKUP_DIR/volumes/prod-volumes-$DATE.tar.gz \
/srv/cfs-llm-platform/volumes/prod/
# 2. Backup system configurations
log "⚙️ Compressing platform properties files..."
tar -czf $BACKUP_DIR/configs/configs-$DATE.tar.gz \
/srv/cfs-llm-platform/*.yml \
/srv/cfs-llm-platform/.env.* \
/etc/nginx/sites-available/cfs-llm-platform
# 3. Compress cached Ollama weights
log "🤖 Compressing Ollama model weights..."
tar -czf $BACKUP_DIR/ollama-models-$DATE.tar.gz \
/var/lib/ollama/models/
# 4. Prune local backup files older than 7 days
log "🗑️ Pruning stale local snapshots..."
find $BACKUP_DIR -name "*.tar.gz" -mtime +7 -delete
log "✅ Backup sequence finished successfully: $DATE"Configure cron job execution:
bash
# Execute local storage backups daily at 02:00 AM
0 2 * * * /srv/cfs-llm-platform/scripts/backup.sh📈 8. Scaling Roadmap
Phase 1: Bootstrap Uptime (Current Status)
yaml
Status: Deployed & Active
Host Infrastructure:
- STRATO VPS compute (32GB RAM, 8 CPU Cores)
- Isolated All-Inkl database clusters
- Native Ollama inference service
- Containerized Open WebUI Dev and Prod instances
Key Features:
- Multi-LLM provider abstraction layer.
- Role-based permissions matrix.
- Real-time token usage logs.
- Basic monthly billing invoices generator.
Constraints:
- Single host point of failure.
- Shared local inference service resource locks.
- Memory ceilings (32GB RAM total).Phase 2: High Availability Scaling (3-6 Months Pipeline)
yaml
Status: Planning Stage
Target Improvements:
- High availability load balancer layers.
- Redis caches supporting distributed sessions.
- Dedicated isolated LLM inference engines.
- Advanced Grafana performance monitoring dashboards.
- Automated offsite backup archives syncing.
Target Infrastructure:
- Upgrade VPS compute capacity (64GB RAM).
- Physical database layers remains on All-Inkl.
- Containerized Redis daemon.Phase 3: Cloud Native Cluster (6-12 Months Pipeline)
yaml
Status: Vision Stage
Target Improvements:
- Kubernetes cluster auto-scaling orchestration.
- Geo-replicated deployments.
- Auto-scaling clusters of GPU nodes.
- Microservices decoupling.✅ 9. Architecture Decision Records Summary
| Platform Layer | Chosen Technology | Hosting Site | ADR Rationale |
|---|---|---|---|
| Host Compute | STRATO VPS | Cloud (32GB RAM, 8 Cores) | Highly cost-efficient computing layer. |
| LLM Inference | Ollama | Host OS (Native systemd) | Eliminates hypervisor overhead for maximum performance. |
| Development App | Docker Container | Host OS (Port 3001) | Provides isolated testing sandboxes. |
| Production App | Docker Container | Host OS (Port 3000) | Enables smooth deployments and updates. |
| Dev Database | MariaDB 10.11 | VPS Host (cfs-db-local) | Safely isolates dev trials from live production databases. |
| Prod Database | MariaDB 10.11 | VPS Host (cfs-database-prod) | Keeps user identity and billing data secure from server crashes. |
| Uptime Monitor | Cron Auditing Agent | Host OS (Internal cron) | Provides automated self-healing scripts on failure. |
| User Access | Open WebUI RBAC | Container Engine | Provides user directory management tools out-of-the-box. |
🎯 10. Core Architectural Strengths
✅ Disaster Uptime Tolerance
- Persistent system data is stored externally on All-Inkl clusters, isolated from the VPS host.
- VPS server failures do not cause permanent data loss.
- Dev sandbox updates do not impact production database security.
✅ Optimized Compute Performance
- Running Ollama natively ensures optimal host memory utilization (32GB RAM).
- Zero hypervisor layers are injected between LLM inference and the host CPU cores.
✅ Infrastructure Portability
- Absolute separation between compute services and database storage.
- VPS resources can be migrated to other cloud host models in minutes.
📝 11. VPS Host RAM Allocation Mapping (32GB RAM Capacity)
yaml
Total VPS Capacity: 32 GB RAM
==============================
Ollama Native Engine (Inference):
Allocated Limit: 12-16 GB
Resource footprint detail:
- Parallel execution limits: 2 active models concurrently.
- Mistral weights footprint: ~4-6 GB
- Llama3 weights footprint: ~5-7 GB
- Phi3 weights footprint: ~2-3 GB
Open WebUI Production Container:
Allocated Limit: 8 GB
Resource footprint detail:
- Direct user interactions.
- Token tracking logging.
- Billing systems tasks.
Open WebUI Development Container:
Allocated Limit: 4 GB
Resource footprint detail:
- Sandboxed developer workspaces.
- Debug logging engines.
System Daemons & OS Overhead:
Allocated Limit: 4 GB
Resource footprint detail:
- Linux kernel processes.
- Docker engine runtime.
- Nginx reverse proxies.
Reserved Safety Buffer:
Allocated Limit: 4 GB
Resource footprint detail:
- Absorbs transient utilization peaks.🚀 12. Quick Start Deployment Guide
Step 1: Connect to VPS Host
bash
# Connect to the Stratos VPS
ssh root@your-strato-server.com
# Synchronize system packages lists
sudo apt update && sudo apt upgrade -yStep 2: Initialize Platform Deploys
bash
# Execute setup automation scripts
curl -fsSL https://raw.githubusercontent.com/your-repo/setup-cfs-platform.sh | bashStep 3: Configure Local MariaDB Instances
text
1. Connect to local MariaDB containers on the VPS.
2. Create staging database named 'cfs-database-staging' (Assign user: cfs_staging).
3. Create production database named 'cfs-database-prod' (Assign user: cfs_prod).
4. Verify connections from the application stack.
5. Save generated credentials securely in environment variables.Step 4: Populate Environment Files
bash
# Edit credentials inside properties files
nano /srv/cfs-llm-platform/.env.dev
nano /srv/cfs-llm-platform/.env.prodStep 5: Orchestrate Services
bash
cd /srv/cfs-llm-platform
# Activate Dev environments
docker-compose -f docker-compose.dev.yml up -d
# Activate Prod environments
docker-compose -f docker-compose.prod.yml up -dStep 6: Verify Endpoints Availability
bash
# Query health check endpoints
curl http://localhost:3001/health # Dev App
curl http://localhost:3000/health # Prod App
curl http://localhost:11434/api/tags # Ollama Models📞 Support & Contacts
Documentation Site: https://docs.cfs-platform.de
Internal Support Helpdesk: support@cfs-platform.de
📄 License & Credits
© 2026 CFS Platform. All rights reserved.
Last Updated: January 2026
Version: 1.0
Author: Christian Friedrich Schacht