Skip to content

maxpert/marmot

Repository files navigation

Marmot v2

Go Report Card Discord GitHub

What & Why?

Marmot v2 is a leaderless, distributed SQLite replication system built on a gossip-based protocol with distributed transactions and eventual consistency.

Key Features:

  • Leaderless Architecture: No single point of failure - any node can accept writes
  • MySQL Protocol Compatible: Connect with any MySQL client (DBeaver, MySQL Workbench, mysql CLI)
  • Distributed Transactions: Percolator-style write intents with conflict detection
  • Multi-Database Support: Create and manage multiple databases per cluster
  • DDL Replication: Distributed schema changes with automatic idempotency and cluster-wide locking
  • Production-Ready SQL Parser: Powered by rqlite/sql AST parser for MySQL→SQLite transpilation
  • CDC-Based Replication: Row-level change data capture for consistent replication

Quick Start

# Start a single-node cluster
./marmot-v2

# Connect with MySQL client
mysql -h localhost -P 3306 -u root

# Or use DBeaver, MySQL Workbench, etc.

Testing Replication

# Test DDL and DML replication across a 2-node cluster
./scripts/test-ddl-replication.sh

# This script will:
# 1. Start a 2-node cluster
# 2. Create a table on node 1 and verify it replicates to node 2
# 3. Insert data on node 1 and verify it replicates to node 2
# 4. Update data on node 2 and verify it replicates to node 1
# 5. Delete data on node 1 and verify it replicates to node 2

# Manual cluster testing
./examples/start-seed.sh              # Start seed node (port 8081, mysql 3307)
./examples/join-cluster.sh 2 localhost:8081  # Join node 2 (port 8082, mysql 3308)
./examples/join-cluster.sh 3 localhost:8081  # Join node 3 (port 8083, mysql 3309)

# Connect to any node and run queries
mysql --protocol=TCP -h localhost -P 3307 -u root
mysql --protocol=TCP -h localhost -P 3308 -u root

# Cleanup
pkill -f marmot-v2

Stargazers over time

Stargazers over time

Architecture

Marmot v2 uses a fundamentally different architecture from other SQLite replication solutions:

vs. rqlite/dqlite/LiteFS:

  • ❌ They require a primary node for all writes
  • ✅ Marmot allows writes on any node
  • ❌ They use leader election (Raft)
  • ✅ Marmot uses gossip protocol (no leader)
  • ❌ They require proxy layer or page-level interception
  • ✅ Marmot uses MySQL protocol for direct database access

How It Works:

  1. Write Coordination: 2PC (Two-Phase Commit) with configurable consistency (ONE, QUORUM, ALL)
  2. Conflict Resolution: Last-Write-Wins (LWW) with HLC timestamps
  3. Cluster Membership: SWIM-style gossip with failure detection
  4. Data Replication: Full database replication - all nodes receive all data
  5. DDL Replication: Cluster-wide schema changes with automatic idempotency

DDL Replication

Marmot v2 supports distributed DDL (Data Definition Language) replication without requiring master election:

How It Works

  1. Cluster-Wide Locking: Each DDL operation acquires a distributed lock per database (default: 30-second lease)

    • Prevents concurrent schema changes on the same database
    • Locks automatically expire if a node crashes
    • Different databases can have concurrent DDL operations
  2. Automatic Idempotency: DDL statements are automatically rewritten for safe replay

    CREATE TABLE users (id INT)
    → CREATE TABLE IF NOT EXISTS users (id INT)
    
    DROP TABLE users
    → DROP TABLE IF EXISTS users
  3. Schema Version Tracking: Each database maintains a schema version counter

    • Incremented on every DDL operation
    • Exchanged via gossip protocol for drift detection
    • Used by delta sync to validate transaction applicability
  4. Quorum-Based Replication: DDL replicates like DML through the same 2PC mechanism

    • No special master node needed
    • Works with existing consistency levels (QUORUM, ALL, etc.)

Configuration

[ddl]
# DDL lock lease duration (seconds)
lock_lease_seconds = 30

# Automatically rewrite DDL for idempotency
enable_idempotent = true

Best Practices

  • Do: Execute DDL from a single connection/node at a time
  • Do: Use qualified table names (mydb.users instead of users)
  • ⚠️ Caution: ALTER TABLE is less idempotent - avoid replaying failed ALTER operations
  • Don't: Run concurrent DDL on the same database from multiple nodes

CDC-Based Replication

Marmot v2 uses Change Data Capture (CDC) for replication instead of SQL statement replay:

How It Works

  1. Row-Level Capture: Instead of replicating SQL statements, Marmot captures the actual row data changes (INSERT/UPDATE/DELETE)
  2. Binary Data Format: Row data is serialized as CDC messages with column values, ensuring consistent replication regardless of SQL dialect
  3. Deterministic Application: Row data is applied directly to the target database, avoiding parsing ambiguities

Benefits

  • Consistency: Same row data applied everywhere, no SQL parsing differences
  • Performance: Binary format is more efficient than SQL text
  • Reliability: No issues with SQL syntax variations between MySQL and SQLite

Row Key Extraction

For UPDATE and DELETE operations, Marmot automatically extracts row keys:

  • Uses PRIMARY KEY columns when available
  • Falls back to ROWID for tables without explicit primary key
  • Handles composite primary keys correctly

SQL Statement Compatibility

Marmot supports a wide range of MySQL/SQLite statements through its MySQL protocol server. The following table shows compatibility for different statement types:

Statement Type Support Replication Notes
DML - Data Manipulation
INSERT / REPLACE ✅ Full ✅ Yes Includes qualified table names (db.table)
UPDATE ✅ Full ✅ Yes Includes qualified table names
DELETE ✅ Full ✅ Yes Includes qualified table names
SELECT ✅ Full N/A Read operations
LOAD DATA ✅ Full ✅ Yes Bulk data loading
DDL - Data Definition
CREATE TABLE ✅ Full ✅ Yes Replicated with cluster-wide locking
ALTER TABLE ✅ Full ✅ Yes Replicated with cluster-wide locking
DROP TABLE ✅ Full ✅ Yes Replicated with cluster-wide locking
TRUNCATE TABLE ✅ Full ✅ Yes
RENAME TABLE ✅ Full ✅ Yes Replicated with cluster-wide locking
CREATE/DROP INDEX ✅ Full ✅ Yes Replicated with cluster-wide locking
CREATE/DROP VIEW ✅ Full ✅ Yes Replicated with cluster-wide locking
CREATE/DROP TRIGGER ✅ Full ✅ Yes Replicated with cluster-wide locking
Database Management
CREATE DATABASE ✅ Full ✅ Yes Replicated with cluster-wide locking
DROP DATABASE ✅ Full ✅ Yes Replicated with cluster-wide locking
ALTER DATABASE ✅ Full ✅ Yes Replicated with cluster-wide locking
SHOW DATABASES ✅ Full N/A Metadata query
SHOW TABLES ✅ Full N/A Metadata query
USE database ✅ Full N/A Session state
Transaction Control
BEGIN / START TRANSACTION ✅ Full N/A Transaction boundary
COMMIT ✅ Full ✅ Yes Commits distributed transaction
ROLLBACK ✅ Full ✅ Yes Aborts distributed transaction
SAVEPOINT ✅ Full ✅ Yes Nested transaction support
Locking
LOCK TABLES ✅ Parsed ❌ No Requires distributed locking coordination
UNLOCK TABLES ✅ Parsed ❌ No Requires distributed locking coordination
Session Configuration
SET statements ✅ Parsed ❌ No Session-local, not replicated
XA Transactions
XA START/END/PREPARE ✅ Parsed ❌ No Marmot uses its own 2PC protocol
XA COMMIT/ROLLBACK ✅ Parsed ❌ No Not compatible with Marmot's model
DCL - Data Control
GRANT / REVOKE ✅ Parsed ❌ No User management not replicated
CREATE/DROP USER ✅ Parsed ❌ No User management not replicated
ALTER USER ✅ Parsed ❌ No User management not replicated
Administrative
OPTIMIZE TABLE ✅ Parsed ❌ No Node-local administrative command
REPAIR TABLE ✅ Parsed ❌ No Node-local administrative command

Legend

  • Full: Fully supported and working
  • Parsed: Statement is parsed and recognized
  • ⚠️ Limited: Works but has limitations in distributed context
  • No: Not supported or not replicated
  • N/A: Not applicable (read-only or session-local)

Important Notes

  1. Schema Changes (DDL): DDL statements are fully replicated with cluster-wide locking and automatic idempotency. See the DDL Replication section for details.

  2. XA Transactions: Marmot has its own distributed transaction protocol based on 2PC. MySQL XA transactions are not compatible with Marmot's replication model.

  3. User Management (DCL): User and privilege management statements are local to each node. For production deployments, consider handling authentication at the application or proxy level.

  4. Table Locking: LOCK TABLES statements are recognized but not enforced across the cluster. Use application-level coordination for distributed locking needs.

  5. Qualified Names: Marmot fully supports qualified table names (e.g., db.table) in DML and DDL operations.

MySQL Protocol & Metadata Queries

Marmot includes a MySQL-compatible protocol server, allowing you to connect using any MySQL client (DBeaver, MySQL Workbench, mysql CLI, etc.). The server supports:

Metadata Query Support

Marmot provides full support for MySQL metadata queries, enabling GUI tools like DBeaver to browse databases, tables, and columns:

  • SHOW Commands: SHOW DATABASES, SHOW TABLES, SHOW COLUMNS FROM table, SHOW CREATE TABLE, SHOW INDEXES
  • INFORMATION_SCHEMA: Queries against INFORMATION_SCHEMA.TABLES, INFORMATION_SCHEMA.COLUMNS, INFORMATION_SCHEMA.SCHEMATA, and INFORMATION_SCHEMA.STATISTICS
  • Type Conversion: Automatic SQLite-to-MySQL type mapping for compatibility

These metadata queries are powered by the rqlite/sql AST parser, providing production-grade MySQL query compatibility.

Connecting with MySQL Clients

# Using mysql CLI
mysql -h localhost -P 3306 -u root

# Connection string for applications
mysql://root@localhost:3306/marmot

Recovery Scenarios

Marmot handles various failure and recovery scenarios automatically:

Network Partition (Split-Brain)

Scenario Behavior
Minority partition Writes fail - cannot achieve quorum
Majority partition Writes succeed - quorum achieved
Partition heals Delta sync + LWW merges divergent data

How it works:

  1. During partition, only the majority side can commit writes (quorum enforcement)
  2. When partition heals, nodes exchange transaction logs via StreamChanges RPC
  3. Conflicts resolved using Last-Writer-Wins (LWW) with HLC timestamps
  4. Higher node ID breaks ties for simultaneous writes

Node Failure & Recovery

Scenario Recovery Method
Brief outage Delta sync - replay missed transactions
Extended outage Snapshot transfer + delta sync
New node joining Full snapshot from existing node

Anti-Entropy Background Process:

Marmot v2 includes an automatic anti-entropy system that continuously monitors and repairs replication lag across the cluster:

  1. Lag Detection: Every 60 seconds (configurable), each node queries peers for their replication state
  2. Smart Recovery Decision:
    • Delta Sync if lag < 10,000 transactions AND < 1 hour: Streams missed transactions incrementally
    • Snapshot Transfer if lag exceeds thresholds: Full database file transfer for efficiency
  3. Gap Detection: Detects when transaction logs have been GC'd and automatically falls back to snapshot
  4. Multi-Database Support: Tracks and syncs each database independently
  5. GC Coordination: Garbage collection respects peer replication state - logs aren't deleted until all peers have applied them

Delta Sync Process:

  1. Lagging node queries last_applied_txn_id for each peer/database
  2. Requests transactions since that ID via StreamChanges RPC
  3. Gap Detection: Checks if first received txn_id has a large gap from requested ID
    • If gap > delta_sync_threshold_txns, indicates missing (GC'd) transactions
    • Automatically falls back to snapshot transfer to prevent data loss
  4. Applies changes using LWW conflict resolution
  5. Updates replication state tracking (per-database)
  6. Progress logged every 100 transactions

GC Coordination with Anti-Entropy:

  • Transaction logs are retained with a two-tier policy:
    • Min retention (2 hours): Must be >= delta sync threshold, respects peer lag
    • Max retention (24 hours): Force delete after this time to prevent unbounded growth
  • Config validation enforces: gc_min >= delta_threshold and gc_max >= 2x delta_threshold
  • Each database tracks replication progress per peer
  • GC queries minimum applied txn_id across all peers before cleanup
  • Gap detection prevents data loss if GC runs while nodes are offline

Consistency Guarantees

Write Consistency Behavior
ONE Returns after 1 node ACK (fast, less durable)
QUORUM Returns after majority ACK (default, balanced)
ALL Returns after all nodes ACK (slow, most durable)

Conflict Resolution:

  • All conflicts resolved via LWW using HLC timestamps
  • No data loss - later write always wins deterministically
  • Tie-breaker: higher node ID wins for equal timestamps

Limitations

  • Selective Table Watching: All tables in a database are replicated. Selective table replication is not supported.
  • WAL Mode Required: SQLite must use WAL mode for reliable multi-process changes.
  • Eventually Consistent: Rows may sync out of order. SERIALIZABLE transaction assumptions may not hold across nodes.
  • Concurrent DDL: Avoid running concurrent DDL operations on the same database from multiple nodes (protected by cluster-wide lock with 30s lease).

Compatibility

AUTO_INCREMENT and Integer Types

IMPORTANT: Marmot automatically converts INT AUTO_INCREMENT to BIGINT

This is a breaking change from standard MySQL/SQLite behavior. Marmot does not respect 32-bit INT for auto-increment columns - they are automatically promoted to BIGINT to support distributed ID generation.

Why?

In a distributed, leaderless system, each node must generate unique IDs independently without coordination. Marmot uses HLC-based (Hybrid Logical Clock) 64-bit IDs to ensure:

  1. Global Uniqueness: IDs are unique across all nodes without central coordination
  2. Monotonicity: IDs increase over time (within each node)
  3. No Collisions: Unlike auto-increment sequences, HLC IDs cannot collide between nodes

How It Works:

  1. DDL Transformation: When you create a table with AUTO_INCREMENT:

    CREATE TABLE users (id INT AUTO_INCREMENT PRIMARY KEY, name VARCHAR(100))
    -- Becomes internally:
    CREATE TABLE users (id BIGINT PRIMARY KEY, name TEXT)
  2. DML ID Injection: When inserting with 0 or NULL for an auto-increment column:

    INSERT INTO users (id, name) VALUES (0, 'alice')
    -- Becomes internally:
    INSERT INTO users (id, name) VALUES (7318624812345678901, 'alice')
  3. Explicit IDs Preserved: If you provide an explicit non-zero ID, it is used as-is:

    INSERT INTO users (id, name) VALUES (12345, 'bob')
    -- Remains:
    INSERT INTO users (id, name) VALUES (12345, 'bob')

Important Considerations:

Aspect Behavior
ID Range 64-bit (up to 9.2 quintillion) instead of 32-bit (4.2 billion)
ID Format HLC-based, not sequential integers
SQLite ROWID Not used - Marmot manages IDs explicitly
Client Libraries Ensure your client handles BIGINT correctly (some JSON serializers may lose precision)
Existing Data Migrate existing INT columns to BIGINT before enabling Marmot

Schema-Based Detection:

Marmot automatically detects auto-increment columns by querying SQLite schema directly. A column is considered auto-increment if:

  • It is a single-column INTEGER PRIMARY KEY (SQLite rowid alias), or
  • It is a single-column BIGINT PRIMARY KEY (Marmot's transformed columns)

This means:

  • No registration required - columns are detected from schema at runtime
  • Works across restarts - no need to re-execute DDL statements
  • Works with existing databases - tables created directly on SQLite work too

Configuration

Marmot v2 uses a TOML configuration file (default: config.toml). All settings have sensible defaults.

Core Configuration

node_id = 0  # 0 = auto-generate
data_dir = "./marmot-data"

Transaction Manager

[transaction]
heartbeat_timeout_seconds = 10  # Transaction timeout without heartbeat
conflict_window_seconds = 10    # Conflict resolution window
lock_wait_timeout_seconds = 50  # Lock wait timeout (MySQL: innodb_lock_wait_timeout)

Note: Transaction log garbage collection is managed by the replication configuration to coordinate with anti-entropy. See replication.gc_min_retention_hours and replication.gc_max_retention_hours.

Connection Pool

[connection_pool]
pool_size = 4              # Number of SQLite connections
max_idle_time_seconds = 10 # Max idle time before closing
max_lifetime_seconds = 300 # Max connection lifetime (0 = unlimited)

gRPC Client

[grpc_client]
keepalive_time_seconds = 10    # Keepalive ping interval
keepalive_timeout_seconds = 3  # Keepalive ping timeout
max_retries = 3                # Max retry attempts
retry_backoff_ms = 100         # Retry backoff duration

Coordinator

[coordinator]
prepare_timeout_ms = 2000 # Prepare phase timeout
commit_timeout_ms = 2000  # Commit phase timeout
abort_timeout_ms = 2000   # Abort phase timeout

Cluster

[cluster]
grpc_bind_address = "0.0.0.0"
grpc_port = 8080
seed_nodes = []                # List of seed node addresses
cluster_secret = ""            # PSK for cluster authentication (see Security section)
gossip_interval_ms = 1000      # Gossip interval
gossip_fanout = 3              # Number of peers to gossip to
suspect_timeout_ms = 5000      # Suspect timeout
dead_timeout_ms = 10000        # Dead timeout

Security

Marmot supports Pre-Shared Key (PSK) authentication for cluster communication. This is strongly recommended for production deployments.

[cluster]
# All nodes in the cluster must use the same secret
cluster_secret = "your-secret-key-here"

Environment Variable (Recommended):

For production, use the environment variable to avoid storing secrets in config files:

export MARMOT_CLUSTER_SECRET="your-secret-key-here"
./marmot

The environment variable takes precedence over the config file.

Generating a Secret:

# Generate a secure random secret
openssl rand -base64 32

Behavior:

  • If cluster_secret is empty and MARMOT_CLUSTER_SECRET is not set, authentication is disabled
  • A warning is logged at startup when authentication is disabled
  • All gRPC endpoints (gossip, replication, snapshots) are protected when authentication is enabled
  • Nodes with mismatched secrets will fail to communicate (connection rejected with "invalid cluster secret")

Cluster Membership Management

Marmot provides admin HTTP endpoints for managing cluster membership (requires cluster_secret to be configured):

Node Lifecycle:

  • New/restarted nodes auto-join via gossip - no manual intervention needed
  • Nodes marked REMOVED via admin API cannot auto-rejoin - must be explicitly allowed
  • This prevents decommissioned nodes from accidentally rejoining the cluster
# View cluster members and quorum info
curl -H "X-Marmot-Secret: your-secret" http://localhost:8080/admin/cluster/members

# Remove a node from the cluster (excludes from quorum, blocks auto-rejoin)
curl -X POST -H "X-Marmot-Secret: your-secret" http://localhost:8080/admin/cluster/remove/2

# Allow a removed node to rejoin (node must then restart to join)
curl -X POST -H "X-Marmot-Secret: your-secret" http://localhost:8080/admin/cluster/allow/2

See the Operations documentation for detailed usage and examples.

Replication

[replication]
default_write_consistency = "QUORUM"      # Write consistency level: ONE, QUORUM, ALL
default_read_consistency = "LOCAL_ONE"    # Read consistency level
write_timeout_ms = 5000                   # Write operation timeout
read_timeout_ms = 2000                    # Read operation timeout

# Anti-Entropy: Background healing for eventual consistency
# - Detects and repairs divergence between replicas
# - Uses delta sync for small lags, snapshot for large lags
# - Includes gap detection to prevent incomplete data after GC
enable_anti_entropy = true                 # Enable automatic catch-up for lagging nodes
anti_entropy_interval_seconds = 60         # How often to check for lag (default: 60s)
delta_sync_threshold_transactions = 10000  # Delta sync if lag < 10K txns
delta_sync_threshold_seconds = 3600        # Snapshot if lag > 1 hour

# Garbage Collection: Reclaim disk space by deleting old transaction records
# - gc_min must be >= delta_sync_threshold (validated at startup)
# - gc_max should be >= 2x delta_sync_threshold (recommended)
# - Set gc_max = 0 for unlimited retention
gc_min_retention_hours = 2   # Keep at least 2 hours (>= 1 hour delta threshold)
gc_max_retention_hours = 24  # Force delete after 24 hours

Anti-Entropy Tuning:

  • Small clusters (2-3 nodes): Use default settings (60s interval)
  • Large clusters (5+ nodes): Consider increasing interval to 120-180s to reduce network overhead
  • High write throughput: Increase delta_sync_threshold_transactions to 50000+
  • Long-running clusters: Keep gc_max_retention_hours at 24+ to handle extended outages

GC Configuration Rules (Validated at Startup):

  • gc_min_retention_hours must be >= delta_sync_threshold_seconds (in hours)
  • gc_max_retention_hours should be >= 2x delta_sync_threshold_seconds
  • Violating these rules will cause startup failure with helpful error messages

Query Pipeline

[query_pipeline]
transpiler_cache_size = 10000  # LRU cache for MySQL→SQLite transpilation
validator_pool_size = 8        # SQLite connection pool for validation

MySQL Protocol Server

[mysql]
enabled = true
bind_address = "0.0.0.0"
port = 3306
max_connections = 1000

Logging

[logging]
verbose = false          # Enable verbose logging
format = "console"       # Log format: console or json

Prometheus Metrics

[prometheus]
enabled = true  # Metrics served on gRPC port at /metrics endpoint

Accessing Metrics:

# Metrics are multiplexed with gRPC on the same port
curl http://localhost:8080/metrics

# Prometheus scrape config
scrape_configs:
  - job_name: 'marmot'
    static_configs:
      - targets: ['node1:8080', 'node2:8080', 'node3:8080']

See config.toml for complete configuration reference with detailed comments.

Benchmarks

Performance benchmarks on a local development machine (Apple M-series, 3-node cluster, single machine):

Test Configuration

Parameter Value
Nodes 3 (ports 3307, 3308, 3309)
Threads 16
Batch Size 10 ops/transaction
Consistency QUORUM

Load Phase (INSERT-only)

Metric Value
Throughput 4,175 ops/sec
TX Throughput 417 tx/sec
Records Loaded 200,000
Errors 0

Mixed Workload

Metric Value
Throughput 3,370 ops/sec
TX Throughput 337 tx/sec
Duration 120 seconds
Total Operations 404,930
Errors 0
Retries 37 (0.09%)

Operation Distribution:

  • READ: 20%
  • UPDATE: 30%
  • INSERT: 35%
  • DELETE: 5%
  • UPSERT: 10%

Latency (Mixed Workload)

Percentile Latency
P50 4.3ms
P90 14.0ms
P95 36.8ms
P99 85.1ms

Replication Verification

All 3 nodes maintained identical row counts (346,684 rows) throughout the test, confirming consistent replication.

Note: These benchmarks are from a local development machine with all nodes on the same host. Production deployments across multiple machines will have different characteristics based on network latency.

FAQs & Community

  • For FAQs visit this page
  • For community visit our discord or discussions on GitHub

About

A distributed SQLite server with MySQL wire compatible interface

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published