LLM Sliding Window Memory

A microservice for managing short-term memory in LLM-based conversational agents, with guaranteed consistency and no gaps or duplicates in conversation history.

💡 Why This Exists

The Problem: Chatbots lose context or waste tokens

❌ Long conversations hit token limits → forced to truncate history
❌ Keeping all messages → expensive API calls, slow responses
❌ Manual context management → buggy, inconsistent results

The Solution: Automatic sliding window that just works

✅ Recent messages stay full (high detail where it matters)
✅ Old messages auto-compress into summaries (save 80% tokens)
✅ Guaranteed consistency (zero gaps, zero duplicates)
✅ Set it and forget it (background processing, no manual work)

🆚 Why Choose This Over Alternatives?

Approach	Token Usage	Consistency	Latency	Setup
This Project	✅ Low (~70% avg, up to 80% on 100+ messages)	✅ Guaranteed	✅ <200ms	✅ 5 min
Manual truncation	⚠️ Medium	❌ Gaps likely	✅ Fast	⚠️ Complex
Full history	❌ Very High	✅ Complete	❌ Slow	✅ Easy
Custom solution	⚠️ Varies	❌ Error-prone	⚠️ Varies	❌ Days/weeks

How It Works

🚀 Quick Start

Get started in 5 minutes: See QUICKSTART.md for detailed setup instructions.

Prerequisites: Docker, Docker Compose, OpenAI API key

What you'll get:

✅ REST API running on localhost:3000
✅ PostgreSQL database with memory storage
✅ Ready to integrate with your chatbot

Step-by-step:

Clone and setup:

git clone <repo-url>
cd llm-sliding-window-memory
cp .env.example .env  # Add your OPENAI_API_KEY

Start services:
```
docker-compose up -d
```

Initialize database:

docker exec -i memory-core-postgres psql -U memoryuser -d memory_core < src/storage/migrations/001_initial_schema.sql

Verify it works:

curl http://localhost:3000/api/health
# Should return: {"status":"ok"}

⚡ Features

🎯 Zero Configuration - Works out of the box with sensible defaults
💰 Save 80% on API Costs - Automatic compression reduces token usage dramatically
⚡ No Latency - Pre-generated summaries, instant context retrieval
🔒 Bulletproof Consistency - State machine guarantees no gaps or duplicates
🐳 Production Ready - Full Docker support, tested with real users
🔌 Easy Integration - Simple REST API, works with any LLM

📊 Quick Stats

Metric	Value
⚡ Add Message	< 200ms
🔍 Get Context	< 100ms
💾 Token Savings	~80%
🤖 Compression	GPT-5 Nano
📦 Storage	PostgreSQL + pgvector
🐳 Deployment	Docker ready

🏗️ Architecture

Core Components

Message Pairs: User-assistant message exchanges with indices
Blocks: Ranges of message pairs that get summarized
Summaries: Compressed representations of blocks
State Machine: Controls block lifecycle transitions
Window Manager: Core logic for memory window management

Block State Lifecycle

NOT_CREATED → SUMMARY_PENDING → SUMMARY_PREPARED → SUMMARY_ACTIVE → ARCHIVED
                     ↓
              NOT_CREATED (rollback on error)

Memory Guarantees

No Gaps: Every message pair index from 1 to N is accounted for in the effective context, either as a raw pair in the window OR covered by an active summary.

No Duplicates: No message pair appears both in raw window AND in an active summary simultaneously.

Atomic Transitions: Block state transitions use row-level locking to prevent race conditions.

📡 API Documentation

⚙️ Configuration

Customize memory behavior by editing .env:

# Memory Window Settings
DEFAULT_RAW_WINDOW_SIZE=10        # Recent message pairs kept in full (default: 10, recommended: 10-15)
DEFAULT_BLOCK_SIZE=5              # Pairs compressed per summary (default: 5)
DEFAULT_MAX_SUMMARIES_IN_CONTEXT=3  # Maximum active summaries (default: 3)
DEFAULT_TRIGGER_OFFSET=1          # Extra messages before compression (default: 1)

How it works:

When raw window reaches RAW_WINDOW_SIZE + TRIGGER_OFFSET, oldest BLOCK_SIZE pairs compress into summary
RAW_WINDOW_SIZE: Use 10 for faster compression/lower memory, 15 for more context retention
System maintains MAX_SUMMARIES_IN_CONTEXT most recent summaries + raw pairs

After changing .env:

docker-compose down && docker-compose up -d

Get or Create Conversation (Recommended)

Automatically returns existing conversation for a user or creates a new one. This is the recommended endpoint for integrations like Telegram bots.

POST /api/conversations/get-or-create
Content-Type: application/json

{
  "userId": "telegram_383946741",
  "metadata": {"source": "telegram"}
}

Response (existing conversation):

{
  "success": true,
  "data": {
    "conversationId": "uuid",
    "isNew": false,
    "conversation": {
      "id": "uuid",
      "userId": "telegram_383946741",
      "createdAt": "2025-12-02T...",
      "metadata": {"source": "telegram"}
    }
  }
}

Response (new conversation):

{
  "success": true,
  "data": {
    "conversationId": "uuid",
    "isNew": true,
    "conversation": {...}
  }
}

Create Conversation

POST /api/conversations
Content-Type: application/json

{
  "userId": "user-123",
  "metadata": {"source": "telegram"}
}

Response:

{
  "success": true,
  "data": {
    "id": "uuid",
    "userId": "user-123",
    "createdAt": "2025-12-02T...",
    "metadata": {"source": "telegram"}
  }
}

Add Message Pair

POST /api/conversations/:conversationId/messages
Content-Type: application/json

{
  "userMessage": "What is the capital of France?",
  "assistantMessage": "The capital of France is Paris."
}

Response:

{
  "success": true,
  "data": {
    "pair": {...},
    "pairIndex": 1,
    "blocksCreated": 0,
    "blocksActivated": 0
  }
}

Get Effective Context

GET /api/conversations/:conversationId/context

Response:

{
  "success": true,
  "data": {
    "activeSummaries": [
      {
        "blockRange": [1, 5],
        "summaryText": "User asked about weather, assistant provided forecast..."
      }
    ],
    "rawPairs": [
      {
        "pairIndex": 6,
        "userMessage": "...",
        "assistantMessage": "..."
      }
    ],
    "totalEffectivePairs": 15
  }
}

Get Diagnostics

GET /api/conversations/:conversationId/diagnostics

Returns detailed memory state information including all blocks, their states, and coverage analysis.

Update Configuration

PATCH /api/conversations/:conversationId/config
Content-Type: application/json

{
  "rawWindowSize": 20,
  "blockSize": 5,
  "maxSummariesInContext": 3,
  "triggerOffset": 2
}

📁 Project Structure

llm-sliding-window-memory/
├── src/
│   ├── core/              # Core business logic
│   │   ├── types.ts
│   │   ├── state-machine.ts
│   │   ├── window-manager.ts
│   │   └── summary-generator.ts
│   ├── storage/           # Database layer
│   │   ├── database.ts
│   │   ├── repositories/
│   │   └── migrations/
│   ├── api/              # REST API
│   │   ├── routes/
│   │   └── middleware/
│   ├── services/         # Service orchestration
│   └── server.ts         # Entry point
├── tests/
│   ├── unit/
│   ├── integration/
│   └── scenarios/
└── docker-compose.yml

⚡ Performance

Add Message: < 200ms (excluding async summary generation)
Get Context: < 100ms
Summary Generation: ~5s per 5-pair block (GPT-5 Nano with minimal reasoning)

🎯 Use Cases

💬 Telegram/Discord Bots

Maintain conversation history for chat bots with automatic context management.

🎧 Customer Support

Keep track of customer interactions with intelligent summarization.

🤖 AI Assistants

Power conversational AI agents with reliable memory management.

📚 Educational Tutors

Remember student progress and past lessons across sessions.

🏥 Healthcare Chatbots

Maintain patient conversation history with HIPAA-compliant storage.

💼 Virtual Sales Assistants

Track client conversations and preferences over time.

🔌 Integration Example

// 1. Create conversation when user starts chat
const conv = await fetch('http://api/conversations', {
  method: 'POST',
  body: JSON.stringify({
    userId: telegramUserId,
    metadata: { chatId: telegramChatId }
  })
});

// 2. Get context before sending to LLM
const context = await fetch(`http://api/conversations/${convId}/context`);
const { activeSummaries, rawPairs } = context.data;

// 3. Format context for LLM
const messages = [
  ...activeSummaries.map(s => ({ role: 'system', content: s.summaryText })),
  ...rawPairs.flatMap(p => [
    { role: 'user', content: p.userMessage },
    { role: 'assistant', content: p.assistantMessage }
  ]),
  { role: 'user', content: newUserMessage }
];

// 4. Send to LLM and save response
const llmResponse = await callLLM(messages);
await fetch(`http://api/conversations/${convId}/messages`, {
  method: 'POST',
  body: JSON.stringify({
    userMessage: newUserMessage,
    assistantMessage: llmResponse
  })
});

🧪 Testing

# Run all tests
npm test

# Run specific test suites
npm run test:unit
npm run test:integration
npm run test:e2e

🗺️ Roadmap

v1.1 (Next Release)

WebSocket streaming for real-time updates
Bull queue for reliable background jobs
Metrics and monitoring dashboard

v1.2 (Planned)

Summary consolidation (merge multiple summaries)
RAG integration with embeddings
Multi-tenant isolation

v2.0 (Future)

Multi-language support
Cloud deployment templates (AWS, GCP, Azure)
GraphQL API
Analytics dashboard

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Make your changes
Write tests
Submit a pull request

📄 License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

⭐ Star History

👥 Community

Join our community to get help, share ideas, and stay updated:

💬 Discussions - Ask questions, share ideas
🐛 Issues - Report bugs, request features
⭐ Star this repo - Show your support
🔔 Watch releases - Get notified of updates

💬 Support

For issues and questions, please open a GitHub issue.

📚 Documentation

QUICKSTART.md - Setup instructions
ARCHITECTURE.md - Detailed architecture
DIAGRAMS.md - System diagrams

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
docs		docs
examples		examples
src		src
tests/unit		tests/unit
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
CONTRIBUTING.md		CONTRIBUTING.md
DIAGRAMS.md		DIAGRAMS.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
QUICKSTART.md		QUICKSTART.md
README.md		README.md
docker-compose.yml		docker-compose.yml
jest.config.js		jest.config.js
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

License

ligilou/llm-sliding-window-memory

Folders and files

Latest commit

History

Repository files navigation

LLM Sliding Window Memory

💡 Why This Exists

🆚 Why Choose This Over Alternatives?

How It Works

🚀 Quick Start

⚡ Features

📊 Quick Stats

🏗️ Architecture

Core Components

Block State Lifecycle

Memory Guarantees

📡 API Documentation

⚙️ Configuration

Get or Create Conversation (Recommended)

Create Conversation

Add Message Pair

Get Effective Context

Get Diagnostics

Update Configuration

📁 Project Structure

⚡ Performance

🎯 Use Cases

💬 Telegram/Discord Bots

🎧 Customer Support

🤖 AI Assistants

📚 Educational Tutors

🏥 Healthcare Chatbots

💼 Virtual Sales Assistants

🔌 Integration Example

🧪 Testing

🗺️ Roadmap

v1.1 (Next Release)

v1.2 (Planned)

v2.0 (Future)

🤝 Contributing

📄 License

⭐ Star History

👥 Community

💬 Support

📚 Documentation

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages