Early Bird $20
v1.17.0 Live

Persistent Memory
for Your
LLM Stack.

UPtrim sits between your chat clients and LLM backend to keep conversations coherent at scale — graduated trimming, strict isolation, identity-aware memory, and file retrieval.

One-time payment OpenAI-compatible Self-hosted
proxy_core.py
Client UPtrim LLM
Memory injection Context trimming Identity resolution File retrieval
Works with your stack
Open WebUI SillyTavern FastAPI SQLite WAL FTS5 Search spaCy NLP

Core Pillars

Six capabilities that make your LLM stack production-ready for real multi-user deployments.

✂️

Context Trimming Engine

Pressure-aware history management with soft/hard token zones and adaptive pruning. Long chats stay coherent without hitting context limits.

🔒

Multi-User Isolation

Strict, required, and quarantine identity modes enforce per-user memory boundaries in shared deployments. Zero bleed.

🧠

Memory Intelligence

Category-aware extraction, dedup, contradiction resolution, audit trails, TTL rules, and intent-aware relevance scoring.

📄

File-Aware Retrieval

Upload documents, inject excerpts into context, and optionally use embeddings for deeper semantic search across your knowledge base.

📊

Dashboard + TUI

Full web dashboard for memory management, user admin, and real-time monitoring. Terminal UI for headless environments.

🔗

Client Compatibility

Drop-in support for Open WebUI and SillyTavern with automatic identity-aware header resolution. No code changes needed.

Up and Running in Minutes

Three steps from download to full operation.

1

Connect Clients

Point Open WebUI or SillyTavern at the proxy endpoint. Identity resolution is automatic.

2

Set Identity Mode

Choose strict, required, or quarantine profiles to control memory safety for each user.

3

Tune Context + Memory

Adjust trimming thresholds, inject budget, upload settings, and dashboard controls.

Start Building with Memory

Persistent memory, context trimming, and multi-user isolation for your LLM stack.