Marmot v2 is a leaderless, distributed SQLite replication system built on a gossip-based protocol with distributed transactions and eventual consistency.
Key Features:
- Leaderless Architecture: No single point of failure - any node can accept writes
- MySQL Protocol Compatible: Connect with any MySQL client (DBeaver, MySQL Workbench, mysql CLI)
- Distributed Transactions: Percolator-style write intents with conflict detection
- Multi-Database Support: Create and manage multiple databases per cluster
- DDL Replication: Distributed schema changes with automatic idempotency and cluster-wide locking
- Production-Ready SQL Parser: Powered by rqlite/sql AST parser for MySQL→SQLite transpilation
- CDC-Based Replication: Row-level change data capture for consistent replication
# Start a single-node cluster
./marmot-v2
# Connect with MySQL client
mysql -h localhost -P 3306 -u root
# Or use DBeaver, MySQL Workbench, etc.# Test DDL and DML replication across a 2-node cluster
./scripts/test-ddl-replication.sh
# This script will:
# 1. Start a 2-node cluster
# 2. Create a table on node 1 and verify it replicates to node 2
# 3. Insert data on node 1 and verify it replicates to node 2
# 4. Update data on node 2 and verify it replicates to node 1
# 5. Delete data on node 1 and verify it replicates to node 2
# Manual cluster testing
./examples/start-seed.sh # Start seed node (port 8081, mysql 3307)
./examples/join-cluster.sh 2 localhost:8081 # Join node 2 (port 8082, mysql 3308)
./examples/join-cluster.sh 3 localhost:8081 # Join node 3 (port 8083, mysql 3309)
# Connect to any node and run queries
mysql --protocol=TCP -h localhost -P 3307 -u root
mysql --protocol=TCP -h localhost -P 3308 -u root
# Cleanup
pkill -f marmot-v2Marmot v2 uses a fundamentally different architecture from other SQLite replication solutions:
vs. rqlite/dqlite/LiteFS:
- ❌ They require a primary node for all writes
- ✅ Marmot allows writes on any node
- ❌ They use leader election (Raft)
- ✅ Marmot uses gossip protocol (no leader)
- ❌ They require proxy layer or page-level interception
- ✅ Marmot uses MySQL protocol for direct database access
How It Works:
- Write Coordination: 2PC (Two-Phase Commit) with configurable consistency (ONE, QUORUM, ALL)
- Conflict Resolution: Last-Write-Wins (LWW) with HLC timestamps
- Cluster Membership: SWIM-style gossip with failure detection
- Data Replication: Full database replication - all nodes receive all data
- DDL Replication: Cluster-wide schema changes with automatic idempotency
Marmot v2 supports distributed DDL (Data Definition Language) replication without requiring master election:
-
Cluster-Wide Locking: Each DDL operation acquires a distributed lock per database (default: 30-second lease)
- Prevents concurrent schema changes on the same database
- Locks automatically expire if a node crashes
- Different databases can have concurrent DDL operations
-
Automatic Idempotency: DDL statements are automatically rewritten for safe replay
CREATE TABLE users (id INT) → CREATE TABLE IF NOT EXISTS users (id INT) DROP TABLE users → DROP TABLE IF EXISTS users
-
Schema Version Tracking: Each database maintains a schema version counter
- Incremented on every DDL operation
- Exchanged via gossip protocol for drift detection
- Used by delta sync to validate transaction applicability
-
Quorum-Based Replication: DDL replicates like DML through the same 2PC mechanism
- No special master node needed
- Works with existing consistency levels (QUORUM, ALL, etc.)
[ddl]
# DDL lock lease duration (seconds)
lock_lease_seconds = 30
# Automatically rewrite DDL for idempotency
enable_idempotent = true- ✅ Do: Execute DDL from a single connection/node at a time
- ✅ Do: Use qualified table names (
mydb.usersinstead ofusers) ⚠️ Caution: ALTER TABLE is less idempotent - avoid replaying failed ALTER operations- ❌ Don't: Run concurrent DDL on the same database from multiple nodes
Marmot v2 uses Change Data Capture (CDC) for replication instead of SQL statement replay:
- Row-Level Capture: Instead of replicating SQL statements, Marmot captures the actual row data changes (INSERT/UPDATE/DELETE)
- Binary Data Format: Row data is serialized as CDC messages with column values, ensuring consistent replication regardless of SQL dialect
- Deterministic Application: Row data is applied directly to the target database, avoiding parsing ambiguities
- Consistency: Same row data applied everywhere, no SQL parsing differences
- Performance: Binary format is more efficient than SQL text
- Reliability: No issues with SQL syntax variations between MySQL and SQLite
For UPDATE and DELETE operations, Marmot automatically extracts row keys:
- Uses PRIMARY KEY columns when available
- Falls back to ROWID for tables without explicit primary key
- Handles composite primary keys correctly
Marmot supports a wide range of MySQL/SQLite statements through its MySQL protocol server. The following table shows compatibility for different statement types:
| Statement Type | Support | Replication | Notes |
|---|---|---|---|
| DML - Data Manipulation | |||
INSERT / REPLACE |
✅ Full | ✅ Yes | Includes qualified table names (db.table) |
UPDATE |
✅ Full | ✅ Yes | Includes qualified table names |
DELETE |
✅ Full | ✅ Yes | Includes qualified table names |
SELECT |
✅ Full | N/A | Read operations |
LOAD DATA |
✅ Full | ✅ Yes | Bulk data loading |
| DDL - Data Definition | |||
CREATE TABLE |
✅ Full | ✅ Yes | Replicated with cluster-wide locking |
ALTER TABLE |
✅ Full | ✅ Yes | Replicated with cluster-wide locking |
DROP TABLE |
✅ Full | ✅ Yes | Replicated with cluster-wide locking |
TRUNCATE TABLE |
✅ Full | ✅ Yes | |
RENAME TABLE |
✅ Full | ✅ Yes | Replicated with cluster-wide locking |
CREATE/DROP INDEX |
✅ Full | ✅ Yes | Replicated with cluster-wide locking |
CREATE/DROP VIEW |
✅ Full | ✅ Yes | Replicated with cluster-wide locking |
CREATE/DROP TRIGGER |
✅ Full | ✅ Yes | Replicated with cluster-wide locking |
| Database Management | |||
CREATE DATABASE |
✅ Full | ✅ Yes | Replicated with cluster-wide locking |
DROP DATABASE |
✅ Full | ✅ Yes | Replicated with cluster-wide locking |
ALTER DATABASE |
✅ Full | ✅ Yes | Replicated with cluster-wide locking |
SHOW DATABASES |
✅ Full | N/A | Metadata query |
SHOW TABLES |
✅ Full | N/A | Metadata query |
USE database |
✅ Full | N/A | Session state |
| Transaction Control | |||
BEGIN / START TRANSACTION |
✅ Full | N/A | Transaction boundary |
COMMIT |
✅ Full | ✅ Yes | Commits distributed transaction |
ROLLBACK |
✅ Full | ✅ Yes | Aborts distributed transaction |
SAVEPOINT |
✅ Full | ✅ Yes | Nested transaction support |
| Locking | |||
LOCK TABLES |
✅ Parsed | ❌ No | Requires distributed locking coordination |
UNLOCK TABLES |
✅ Parsed | ❌ No | Requires distributed locking coordination |
| Session Configuration | |||
SET statements |
✅ Parsed | ❌ No | Session-local, not replicated |
| XA Transactions | |||
XA START/END/PREPARE |
✅ Parsed | ❌ No | Marmot uses its own 2PC protocol |
XA COMMIT/ROLLBACK |
✅ Parsed | ❌ No | Not compatible with Marmot's model |
| DCL - Data Control | |||
GRANT / REVOKE |
✅ Parsed | ❌ No | User management not replicated |
CREATE/DROP USER |
✅ Parsed | ❌ No | User management not replicated |
ALTER USER |
✅ Parsed | ❌ No | User management not replicated |
| Administrative | |||
OPTIMIZE TABLE |
✅ Parsed | ❌ No | Node-local administrative command |
REPAIR TABLE |
✅ Parsed | ❌ No | Node-local administrative command |
- ✅ Full: Fully supported and working
- ✅ Parsed: Statement is parsed and recognized
⚠️ Limited: Works but has limitations in distributed context- ❌ No: Not supported or not replicated
- N/A: Not applicable (read-only or session-local)
-
Schema Changes (DDL): DDL statements are fully replicated with cluster-wide locking and automatic idempotency. See the DDL Replication section for details.
-
XA Transactions: Marmot has its own distributed transaction protocol based on 2PC. MySQL XA transactions are not compatible with Marmot's replication model.
-
User Management (DCL): User and privilege management statements are local to each node. For production deployments, consider handling authentication at the application or proxy level.
-
Table Locking:
LOCK TABLESstatements are recognized but not enforced across the cluster. Use application-level coordination for distributed locking needs. -
Qualified Names: Marmot fully supports qualified table names (e.g.,
db.table) in DML and DDL operations.
Marmot includes a MySQL-compatible protocol server, allowing you to connect using any MySQL client (DBeaver, MySQL Workbench, mysql CLI, etc.). The server supports:
Marmot provides full support for MySQL metadata queries, enabling GUI tools like DBeaver to browse databases, tables, and columns:
- SHOW Commands:
SHOW DATABASES,SHOW TABLES,SHOW COLUMNS FROM table,SHOW CREATE TABLE,SHOW INDEXES - INFORMATION_SCHEMA: Queries against
INFORMATION_SCHEMA.TABLES,INFORMATION_SCHEMA.COLUMNS,INFORMATION_SCHEMA.SCHEMATA, andINFORMATION_SCHEMA.STATISTICS - Type Conversion: Automatic SQLite-to-MySQL type mapping for compatibility
These metadata queries are powered by the rqlite/sql AST parser, providing production-grade MySQL query compatibility.
# Using mysql CLI
mysql -h localhost -P 3306 -u root
# Connection string for applications
mysql://root@localhost:3306/marmotMarmot handles various failure and recovery scenarios automatically:
| Scenario | Behavior |
|---|---|
| Minority partition | Writes fail - cannot achieve quorum |
| Majority partition | Writes succeed - quorum achieved |
| Partition heals | Delta sync + LWW merges divergent data |
How it works:
- During partition, only the majority side can commit writes (quorum enforcement)
- When partition heals, nodes exchange transaction logs via
StreamChangesRPC - Conflicts resolved using Last-Writer-Wins (LWW) with HLC timestamps
- Higher node ID breaks ties for simultaneous writes
| Scenario | Recovery Method |
|---|---|
| Brief outage | Delta sync - replay missed transactions |
| Extended outage | Snapshot transfer + delta sync |
| New node joining | Full snapshot from existing node |
Anti-Entropy Background Process:
Marmot v2 includes an automatic anti-entropy system that continuously monitors and repairs replication lag across the cluster:
- Lag Detection: Every 60 seconds (configurable), each node queries peers for their replication state
- Smart Recovery Decision:
- Delta Sync if lag < 10,000 transactions AND < 1 hour: Streams missed transactions incrementally
- Snapshot Transfer if lag exceeds thresholds: Full database file transfer for efficiency
- Gap Detection: Detects when transaction logs have been GC'd and automatically falls back to snapshot
- Multi-Database Support: Tracks and syncs each database independently
- GC Coordination: Garbage collection respects peer replication state - logs aren't deleted until all peers have applied them
Delta Sync Process:
- Lagging node queries
last_applied_txn_idfor each peer/database - Requests transactions since that ID via
StreamChangesRPC - Gap Detection: Checks if first received txn_id has a large gap from requested ID
- If gap > delta_sync_threshold_txns, indicates missing (GC'd) transactions
- Automatically falls back to snapshot transfer to prevent data loss
- Applies changes using LWW conflict resolution
- Updates replication state tracking (per-database)
- Progress logged every 100 transactions
GC Coordination with Anti-Entropy:
- Transaction logs are retained with a two-tier policy:
- Min retention (2 hours): Must be >= delta sync threshold, respects peer lag
- Max retention (24 hours): Force delete after this time to prevent unbounded growth
- Config validation enforces:
gc_min >= delta_thresholdandgc_max >= 2x delta_threshold - Each database tracks replication progress per peer
- GC queries minimum applied txn_id across all peers before cleanup
- Gap detection prevents data loss if GC runs while nodes are offline
| Write Consistency | Behavior |
|---|---|
ONE |
Returns after 1 node ACK (fast, less durable) |
QUORUM |
Returns after majority ACK (default, balanced) |
ALL |
Returns after all nodes ACK (slow, most durable) |
Conflict Resolution:
- All conflicts resolved via LWW using HLC timestamps
- No data loss - later write always wins deterministically
- Tie-breaker: higher node ID wins for equal timestamps
- Selective Table Watching: All tables in a database are replicated. Selective table replication is not supported.
- WAL Mode Required: SQLite must use WAL mode for reliable multi-process changes.
- Eventually Consistent: Rows may sync out of order.
SERIALIZABLEtransaction assumptions may not hold across nodes. - Concurrent DDL: Avoid running concurrent DDL operations on the same database from multiple nodes (protected by cluster-wide lock with 30s lease).
IMPORTANT: Marmot automatically converts
INT AUTO_INCREMENTtoBIGINTThis is a breaking change from standard MySQL/SQLite behavior. Marmot does not respect 32-bit
INTfor auto-increment columns - they are automatically promoted toBIGINTto support distributed ID generation.
Why?
In a distributed, leaderless system, each node must generate unique IDs independently without coordination. Marmot uses HLC-based (Hybrid Logical Clock) 64-bit IDs to ensure:
- Global Uniqueness: IDs are unique across all nodes without central coordination
- Monotonicity: IDs increase over time (within each node)
- No Collisions: Unlike auto-increment sequences, HLC IDs cannot collide between nodes
How It Works:
-
DDL Transformation: When you create a table with
AUTO_INCREMENT:CREATE TABLE users (id INT AUTO_INCREMENT PRIMARY KEY, name VARCHAR(100)) -- Becomes internally: CREATE TABLE users (id BIGINT PRIMARY KEY, name TEXT)
-
DML ID Injection: When inserting with
0orNULLfor an auto-increment column:INSERT INTO users (id, name) VALUES (0, 'alice') -- Becomes internally: INSERT INTO users (id, name) VALUES (7318624812345678901, 'alice')
-
Explicit IDs Preserved: If you provide an explicit non-zero ID, it is used as-is:
INSERT INTO users (id, name) VALUES (12345, 'bob') -- Remains: INSERT INTO users (id, name) VALUES (12345, 'bob')
Important Considerations:
| Aspect | Behavior |
|---|---|
| ID Range | 64-bit (up to 9.2 quintillion) instead of 32-bit (4.2 billion) |
| ID Format | HLC-based, not sequential integers |
| SQLite ROWID | Not used - Marmot manages IDs explicitly |
| Client Libraries | Ensure your client handles BIGINT correctly (some JSON serializers may lose precision) |
| Existing Data | Migrate existing INT columns to BIGINT before enabling Marmot |
Schema-Based Detection:
Marmot automatically detects auto-increment columns by querying SQLite schema directly. A column is considered auto-increment if:
- It is a single-column
INTEGER PRIMARY KEY(SQLite rowid alias), or - It is a single-column
BIGINT PRIMARY KEY(Marmot's transformed columns)
This means:
- No registration required - columns are detected from schema at runtime
- Works across restarts - no need to re-execute DDL statements
- Works with existing databases - tables created directly on SQLite work too
Marmot v2 uses a TOML configuration file (default: config.toml). All settings have sensible defaults.
node_id = 0 # 0 = auto-generate
data_dir = "./marmot-data"[transaction]
heartbeat_timeout_seconds = 10 # Transaction timeout without heartbeat
conflict_window_seconds = 10 # Conflict resolution window
lock_wait_timeout_seconds = 50 # Lock wait timeout (MySQL: innodb_lock_wait_timeout)Note: Transaction log garbage collection is managed by the replication configuration to coordinate with anti-entropy. See replication.gc_min_retention_hours and replication.gc_max_retention_hours.
[connection_pool]
pool_size = 4 # Number of SQLite connections
max_idle_time_seconds = 10 # Max idle time before closing
max_lifetime_seconds = 300 # Max connection lifetime (0 = unlimited)[grpc_client]
keepalive_time_seconds = 10 # Keepalive ping interval
keepalive_timeout_seconds = 3 # Keepalive ping timeout
max_retries = 3 # Max retry attempts
retry_backoff_ms = 100 # Retry backoff duration[coordinator]
prepare_timeout_ms = 2000 # Prepare phase timeout
commit_timeout_ms = 2000 # Commit phase timeout
abort_timeout_ms = 2000 # Abort phase timeout[cluster]
grpc_bind_address = "0.0.0.0"
grpc_port = 8080
seed_nodes = [] # List of seed node addresses
cluster_secret = "" # PSK for cluster authentication (see Security section)
gossip_interval_ms = 1000 # Gossip interval
gossip_fanout = 3 # Number of peers to gossip to
suspect_timeout_ms = 5000 # Suspect timeout
dead_timeout_ms = 10000 # Dead timeoutMarmot supports Pre-Shared Key (PSK) authentication for cluster communication. This is strongly recommended for production deployments.
[cluster]
# All nodes in the cluster must use the same secret
cluster_secret = "your-secret-key-here"Environment Variable (Recommended):
For production, use the environment variable to avoid storing secrets in config files:
export MARMOT_CLUSTER_SECRET="your-secret-key-here"
./marmotThe environment variable takes precedence over the config file.
Generating a Secret:
# Generate a secure random secret
openssl rand -base64 32Behavior:
- If
cluster_secretis empty andMARMOT_CLUSTER_SECRETis not set, authentication is disabled - A warning is logged at startup when authentication is disabled
- All gRPC endpoints (gossip, replication, snapshots) are protected when authentication is enabled
- Nodes with mismatched secrets will fail to communicate (connection rejected with "invalid cluster secret")
Marmot provides admin HTTP endpoints for managing cluster membership (requires cluster_secret to be configured):
Node Lifecycle:
- New/restarted nodes auto-join via gossip - no manual intervention needed
- Nodes marked REMOVED via admin API cannot auto-rejoin - must be explicitly allowed
- This prevents decommissioned nodes from accidentally rejoining the cluster
# View cluster members and quorum info
curl -H "X-Marmot-Secret: your-secret" http://localhost:8080/admin/cluster/members
# Remove a node from the cluster (excludes from quorum, blocks auto-rejoin)
curl -X POST -H "X-Marmot-Secret: your-secret" http://localhost:8080/admin/cluster/remove/2
# Allow a removed node to rejoin (node must then restart to join)
curl -X POST -H "X-Marmot-Secret: your-secret" http://localhost:8080/admin/cluster/allow/2See the Operations documentation for detailed usage and examples.
[replication]
default_write_consistency = "QUORUM" # Write consistency level: ONE, QUORUM, ALL
default_read_consistency = "LOCAL_ONE" # Read consistency level
write_timeout_ms = 5000 # Write operation timeout
read_timeout_ms = 2000 # Read operation timeout
# Anti-Entropy: Background healing for eventual consistency
# - Detects and repairs divergence between replicas
# - Uses delta sync for small lags, snapshot for large lags
# - Includes gap detection to prevent incomplete data after GC
enable_anti_entropy = true # Enable automatic catch-up for lagging nodes
anti_entropy_interval_seconds = 60 # How often to check for lag (default: 60s)
delta_sync_threshold_transactions = 10000 # Delta sync if lag < 10K txns
delta_sync_threshold_seconds = 3600 # Snapshot if lag > 1 hour
# Garbage Collection: Reclaim disk space by deleting old transaction records
# - gc_min must be >= delta_sync_threshold (validated at startup)
# - gc_max should be >= 2x delta_sync_threshold (recommended)
# - Set gc_max = 0 for unlimited retention
gc_min_retention_hours = 2 # Keep at least 2 hours (>= 1 hour delta threshold)
gc_max_retention_hours = 24 # Force delete after 24 hoursAnti-Entropy Tuning:
- Small clusters (2-3 nodes): Use default settings (60s interval)
- Large clusters (5+ nodes): Consider increasing interval to 120-180s to reduce network overhead
- High write throughput: Increase
delta_sync_threshold_transactionsto 50000+ - Long-running clusters: Keep
gc_max_retention_hoursat 24+ to handle extended outages
GC Configuration Rules (Validated at Startup):
gc_min_retention_hoursmust be >=delta_sync_threshold_seconds(in hours)gc_max_retention_hoursshould be >= 2xdelta_sync_threshold_seconds- Violating these rules will cause startup failure with helpful error messages
[query_pipeline]
transpiler_cache_size = 10000 # LRU cache for MySQL→SQLite transpilation
validator_pool_size = 8 # SQLite connection pool for validation[mysql]
enabled = true
bind_address = "0.0.0.0"
port = 3306
max_connections = 1000[logging]
verbose = false # Enable verbose logging
format = "console" # Log format: console or json[prometheus]
enabled = true # Metrics served on gRPC port at /metrics endpointAccessing Metrics:
# Metrics are multiplexed with gRPC on the same port
curl http://localhost:8080/metrics
# Prometheus scrape config
scrape_configs:
- job_name: 'marmot'
static_configs:
- targets: ['node1:8080', 'node2:8080', 'node3:8080']See config.toml for complete configuration reference with detailed comments.
Performance benchmarks on a local development machine (Apple M-series, 3-node cluster, single machine):
| Parameter | Value |
|---|---|
| Nodes | 3 (ports 3307, 3308, 3309) |
| Threads | 16 |
| Batch Size | 10 ops/transaction |
| Consistency | QUORUM |
| Metric | Value |
|---|---|
| Throughput | 4,175 ops/sec |
| TX Throughput | 417 tx/sec |
| Records Loaded | 200,000 |
| Errors | 0 |
| Metric | Value |
|---|---|
| Throughput | 3,370 ops/sec |
| TX Throughput | 337 tx/sec |
| Duration | 120 seconds |
| Total Operations | 404,930 |
| Errors | 0 |
| Retries | 37 (0.09%) |
Operation Distribution:
- READ: 20%
- UPDATE: 30%
- INSERT: 35%
- DELETE: 5%
- UPSERT: 10%
| Percentile | Latency |
|---|---|
| P50 | 4.3ms |
| P90 | 14.0ms |
| P95 | 36.8ms |
| P99 | 85.1ms |
All 3 nodes maintained identical row counts (346,684 rows) throughout the test, confirming consistent replication.
Note: These benchmarks are from a local development machine with all nodes on the same host. Production deployments across multiple machines will have different characteristics based on network latency.