LiteLLM¶
LiteLLM provides a unified API gateway for multiple LLM providers, allowing you to use a single API to access OpenAI, Anthropic, local models, and more.
Features¶
- Unified OpenAI-compatible API
- Support for 100+ LLM providers
- Request caching with Redis
- Usage tracking and analytics
- Cost management
- Load balancing across models
Architecture¶
| Component | Purpose |
|---|---|
| LiteLLM Proxy | API gateway (4 workers) |
| PostgreSQL | Configuration and metrics |
| Redis | Response caching |
Prerequisites¶
- Foundation complete
- API keys for desired providers
- DNS record configured
System Requirements¶
| Users | vCPU | RAM | Storage |
|---|---|---|---|
| 100 | 4 | 4 GB | 20 GB |
| 250 | 4 | 8 GB | 30 GB |
| 500+ | 8 | 16 GB | 50 GB |
Installation¶
Deploy VM¶
DNS Configuration¶
Add DNS record:
| Record | Type | Value |
|---|---|---|
llm.yourdomain.com | A | Your public IP |
Reverse Proxy¶
Configure Caddy:
Configuration¶
Add API Keys¶
Configure provider API keys via the admin interface or configuration file:
model_list:
- model_name: gpt-4
litellm_params:
model: openai/gpt-4
api_key: sk-...
- model_name: claude-3
litellm_params:
model: anthropic/claude-3-opus-20240229
api_key: sk-ant-...
- model_name: local-llama
litellm_params:
model: ollama/llama2
api_base: http://ollama.mgmt.internal:11434
Master Key¶
A master API key is generated during installation. Retrieve it:
Use this key for admin access and to create user keys.
Supported Providers¶
| Provider | Models | Notes |
|---|---|---|
| OpenAI | GPT-3.5, GPT-4 | Most popular |
| Anthropic | Claude 3 family | Strong reasoning |
| OpenRouter | Multiple | Aggregator |
| Perplexity | pplx-* | Search-enhanced |
| Ollama | Local models | Self-hosted |
Usage¶
API Access¶
Use the LiteLLM endpoint like OpenAI:
import openai
client = openai.OpenAI(
base_url="https://llm.yourdomain.com/v1",
api_key="your-litellm-key"
)
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Hello!"}]
)
Caching¶
Redis caching reduces costs and latency:
- Identical requests return cached responses
- 60-80% reduction in API calls for common queries
- Configurable cache TTL
Monitoring¶
View Usage¶
Access the admin dashboard:
Metrics¶
# Check service status
systemctl status litellm
# View logs
journalctl -u litellm -f
# Check container stats
podman stats litellm
Cost Management¶
Set Budgets¶
Configure spending limits per user or team:
Track Spending¶
View cost breakdown in the admin dashboard or via API:
Backup¶
Daily automated backups include:
- PostgreSQL database
- Redis snapshots
- Configuration files
Retention: 30 days
Restore¶
Troubleshooting¶
Provider Errors¶
- Verify API key is valid
- Check provider status page
- Review rate limits
Slow Responses¶
- Check Redis cache hit rate
- Verify network connectivity
- Consider adding more workers
Database Issues¶
# Check PostgreSQL
systemctl status postgresql
# Check connections
psql -U litellm -c "SELECT count(*) FROM pg_stat_activity;"