Skip to content

Open WebUI Setup - Project Documentation

Date: January 20, 2026
Project: Open WebUI Installation & Configuration
Goal: Local LLM environment with cloud integration


1. Accomplished Steps

1.1 Ollama Installation (Local)

  • ✅ Ollama installed on the local system.
  • ✅ Successfully established connection to Open WebUI (http://ollama:11434).
  • ✅ Local models are fully operational and accessible.

1.2 Configured Local Models

Model NameSizeTarget Use CaseStatus
gemma4-4b-checkin-fast:latest4BRapid requests, fast checkin✅ Active
gemma4:e4b4BGeneral reasoning tasks✅ Active
mistral:7b7BCoding and technical assistance✅ Active
qwen3:8b8BGeneral reasoning and analysis✅ Active

1.3 Google Gemini Integration

  • ✅ Bound Google AI Studio API Key.
  • ✅ Configured OpenAI-compatible API connector endpoint.
  • ✅ All core Gemini models are fully accessible.

API Connector Parameters:

yaml
Base URL: https://generativelanguage.googleapis.com/v1beta/openai/
API Key: AIzaSyC... [REDACTED FOR SECURITY]
Provider Type: OpenAI compatible

Accessible Gemini Models:

  • gemini-2.0-flash-exp (Recommended for fast responses)
  • gemini-1.5-pro (Deep logical analysis)
  • gemini-1.5-flash (Balanced performance/speed)

2. Key Learnings & Best Practices

2.1 Model Selection Strategy

Local Hosting vs. Cloud Hosting:

  • Local (Ollama): High privacy (zero third-party data transmission), zero running API costs, offline operational capacity.
  • Cloud (Gemini): Maximum capabilities, search grounding, standard API volume costs.

Operational Recommendations:

  • Simple Requests ---> Local Models (gemma4-4b-checkin-fast:latest, mistral:7b)
  • Multi-step / Hard reasoning ---> gemini-1.5-pro or qwen3:8b
  • Rapid Cloud queries ---> gemini-2.0-flash-exp

2.2 System Prompts Tuning

System prompts can be set on a per-model basis: Modelling Dashboard ---> Select Model ---> Edit ---> System Prompt

System Prompt Example for gemma4-4b-checkin-fast:latest (Web Search Optimization):

text
You rely EXCLUSIVELY on provided web search results to answer the query.
Synthesize the facts concisely and cite facts exactly.

System Prompt Example for Gemini 2.0 Flash:

text
You are a highly precise and structured assistant.
Respond concisely, utilizing markdown for optimal readability.

2.3 Connections Routing

Settings ---> Connections:

  • OpenAI API: For external cloud engines (Gemini, OpenAI).
  • Ollama API: For local containerized models.
  • Direct Connectors: For dedicated custom model API endpoints.

2.4 Performance Tuning

  • Enable base models list caching: Speeds up model listings and GUI loading.
  • Model Size Rules of Thumb:
    • 3B-7B: Fast responses, lower conceptual reasoning.
    • 8B-14B: Excellent balance of speed and logic.
    • 70B+: Extremely accurate, high computing demands.

3. Current Project State

3.1 Active Infrastructure Components

  • Open WebUI: Modern web interface.
  • Ollama: Local LLM compute host.
  • Google Gemini API: Cloud-grounded reasoning.

3.2 Available LLMs Matrix

Local Catalog (Ollama):

  • gemma4-4b-checkin-fast:latest
  • gemma4:e4b
  • mistral:7b
  • qwen3:8b

Cloud Catalog (Google Gemini):

  • gemini-2.0-flash-exp
  • gemini-1.5-pro
  • gemini-1.5-flash

3.2 Open Backlog

  • ⏸️ Fine-tune system prompts on all models.
  • ⏸️ Evaluate qwen3:8b performance balance.
  • ⏸️ Integrate workflow automation with n8n.
  • ⏸️ Evaluate additional API providers (Anthropic Claude, OpenAI).

4. Technical CLI & API Reference

4.1 Connection Endpoints

Ollama Local Engine:

text
http://ollama:11434

Google Gemini Gateway:

text
https://generativelanguage.googleapis.com/v1beta/openai/

4.2 Essential Commands

Ollama Model Administration:

bash
# List all active models downloaded on host
ollama list

# Pull a new model from Ollama registry
ollama pull qwen3:8b

# Remove a model
ollama rm mistral:7b

# Test a model directly inside host terminal
ollama run gemma4-4b-checkin-fast:latest

4.3 Troubleshooting Guide

Error: Ollama API is Unreachable
Remediation steps:

  • Check Ollama docker service state.
  • Ensure the connection URL matches: http://ollama:11434.
  • Check local firewall rules (UFW ports).

Error: Gemini Cloud API Connection Failure
Remediation steps:

  • Re-verify the API key validity in Google AI Studio.
  • Ensure the gateway URL ends with a trailing slash (/).
  • Verify the provider type parameter is set to "OpenAI".

5. Next Steps Roadmap

  1. Optimize system prompts across all models.
  2. Conduct task alignment testing (determine optimal model assignments).
  3. Monitor cloud API costs dynamically.
  4. Evaluate advanced models (Anthropic Claude, GPT-4).
  5. Secure backup routines for Open WebUI user database volumes.

6. Knowledge Base Prompt

For future system reference:

text
Open WebUI Setup - State: Jan 20, 2026

Active components:
- Ollama (local): gemma4-4b-checkin-fast:latest, gemma4:e4b, mistral:7b, qwen3:8b
- Google Gemini API: gemini-2.0-flash-exp, gemini-1.5-pro, gemini-1.5-flash

Endpoints & APIs:
- Ollama Connection: http://ollama:11434
- Gemini Connection: https://generativelanguage.googleapis.com/v1beta/openai/
- API Key: Registered in Open WebUI connections panel

Model Routing Guidelines:
- Standard / Web queries: gemma4-4b-checkin-fast:latest (local)
- Programming / Debugging: mistral:7b (local)
- Hard reasoning / Deep analysis: qwen3:8b (local) or gemini-1.5-pro (cloud)
- Fast cloud tasks: gemini-2.0-flash-exp

Documentation compiled on: 20.01.2026
Version: 1.0
Author: Christian Friedrich Schacht

Released under proprietary license.