Skip to content

Commit a789bbf

Browse files
committed
docs: split up architecture guide
1 parent da56212 commit a789bbf

22 files changed

+2449
-2301
lines changed

README.md

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# <a><img src="./docs/images/toydb.svg" height="40" valign="top" /></a> toyDB
1+
# <a><img src="./docs/architecture/images/toydb.svg" height="40" valign="top" /></a> toyDB
22

33
Distributed SQL database in Rust, built from scratch as an educational project. Main features:
44

@@ -16,9 +16,8 @@ Distributed SQL database in Rust, built from scratch as an educational project.
1616
I originally wrote toyDB in 2020 to learn more about database internals. Since then, I've spent
1717
several years building real distributed SQL databases at
1818
[CockroachDB](https://github.com/cockroachdb/cockroach) and
19-
[Neon](https://github.com/neondatabase/neon), where I learnt a lot more. Based on this experience,
20-
I've rewritten toyDB as a simple illustration of the architecture and concepts behind distributed
21-
SQL databases.
19+
[Neon](https://github.com/neondatabase/neon). Based on this experience, I've rewritten toyDB as a
20+
simple illustration of the architecture and concepts behind distributed SQL databases.
2221

2322
toyDB is intended to be simple and understandable, and also functional and correct. Other aspects
2423
like performance, scalability, and availability are non-goals -- these are major sources of
@@ -36,7 +35,7 @@ been taken where possible.
3635

3736
## Documentation
3837

39-
* [Architecture guide](docs/architecture.md): a guided tour of toyDB's code and architecture.
38+
* [Architecture guide](docs/architecture/index.md): a guided tour of toyDB's code and architecture.
4039

4140
* [SQL examples](docs/examples.md): walkthrough of toyDB's SQL features.
4241

@@ -106,9 +105,9 @@ Remap: m.title, genre, studio, m.rating (dropped: m.released)
106105

107106
toyDB's architecture is fairly typical for a distributed SQL database: a transactional
108107
key/value store managed by a Raft cluster with a SQL query engine on top. See the
109-
[architecture guide](./docs/architecture.md) for more details.
108+
[architecture guide](./docs/architecture/index.md) for more details.
110109

111-
[![toyDB architecture](./docs/images/architecture.svg)](./docs/architecture.md)
110+
[![toyDB architecture](./docs/architecture/images/architecture.svg)](./docs/architecture/index.md)
112111

113112
## Tests
114113

docs/architecture.md

Lines changed: 0 additions & 2294 deletions
This file was deleted.

docs/architecture/encoding.md

Lines changed: 172 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,172 @@
1+
# Key/Value Encoding
2+
3+
The key/value store uses binary `Vec<u8>` keys and values, so we need an encoding scheme to
4+
translate between Rust in-memory data structures and the on-disk binary data. This is provided by
5+
the [`encoding`](https://github.com/erikgrinaker/toydb/tree/213e5c02b09f1a3cac6a8bbd0a81773462f367f5/src/encoding)
6+
module, with separate schemes for key and value encoding.
7+
8+
## `Bincode` Value Encoding
9+
10+
Values are encoded using [Bincode](https://github.com/bincode-org/bincode), a third-party binary
11+
encoding scheme for Rust. Bincode is convenient because it can easily encode any arbitrary Rust
12+
data type. But we could also have chosen e.g. [JSON](https://en.wikipedia.org/wiki/JSON),
13+
[Protobuf](https://protobuf.dev), [MessagePack](https://msgpack.org/), or any other encoding.
14+
15+
We won't dwell on the actual binary format here, see the [Bincode specification](https://github.com/bincode-org/bincode/blob/trunk/docs/spec.md)
16+
for details.
17+
18+
To use a consistent configuration for all encoding and decoding, we provide helper functions using
19+
`bincode::config::standard()` in the [`encoding::bincode`](https://github.com/erikgrinaker/toydb/blob/213e5c02b09f1a3cac6a8bbd0a81773462f367f5/src/encoding/bincode.rs)
20+
module:
21+
22+
https://github.com/erikgrinaker/toydb/blob/0ce1fb34349fda043cb9905135f103bceb4395b4/src/encoding/bincode.rs#L15-L27
23+
24+
Bincode uses the very common [Serde](https://serde.rs) framework for its API. toyDB also provides
25+
an `encoding::Value` helper trait for value types with automatic `encode()` and `decode()` methods:
26+
27+
https://github.com/erikgrinaker/toydb/blob/b57ae6502e93ea06df00d94946a7304b7d60b977/src/encoding/mod.rs#L39-L68
28+
29+
Here's an example of how this is used to encode and decode an arbitrary `Dog` data type:
30+
31+
```rust
32+
#[derive(serde::Serialize, serde::Deserialize)]
33+
struct Dog {
34+
name: String,
35+
age: u8,
36+
good_boy: bool,
37+
}
38+
39+
impl encoding::Value for Dog {}
40+
41+
let pluto = Dog { name: "Pluto".into(), age: 4, good_boy: true };
42+
let bytes = pluto.encode();
43+
println!("{bytes:02x?}");
44+
45+
// Outputs [05, 50, 6c, 75, 74, 6f, 04, 01].
46+
//
47+
// * Length of string "Pluto": 05.
48+
// * String "Pluto": 50 6c 75 74 6f.
49+
// * Age 4: 04.
50+
// * Good boy: 01 (true).
51+
52+
let pluto = Dog::decode(&bytes)?; // gives us back Pluto
53+
```
54+
55+
## `Keycode` Key Encoding
56+
57+
Unlike values, keys can't just use any binary encoding like Bincode. As mentioned before, the
58+
storage engine sorts data by key to enable range scans, which will be used e.g. for SQL table scans,
59+
limited SQL index scans, Raft log scans, etc. Because of this, the encoding needs to preserve the
60+
[lexicographical order](https://en.wikipedia.org/wiki/Lexicographic_order) of the encoded values:
61+
the binary byte slices must sort in the same order as the original values.
62+
63+
As an example of why we can't just use Bincode, let's consider two strings: "house" should be
64+
sorted before "key", alphabetically. However, Bincode encodes strings prefixed by their length, so
65+
"key" would be sorted before "house" in binary form:
66+
67+
```
68+
03 6b 65 79 ← 3 bytes: key
69+
05 68 6f 75 73 65 ← 5 bytes: house
70+
```
71+
72+
For similar reasons, we can't just encode numbers in their native binary form, because the
73+
[little-endian](https://en.wikipedia.org/wiki/Endianness) representation will sometimes order very
74+
large numbers before small numbers, and the [sign bit](https://en.wikipedia.org/wiki/Sign_bit)
75+
will order positive numbers before negative numbers.
76+
77+
We also have to be careful with value sequences, which should be ordered element-wise. For example,
78+
the pair ("a", "xyz") should be ordered before ("ab", "cd"), so we can't just encode the strings
79+
one after the other like "axyz" and "abcd" since that would sort "abcd" first.
80+
81+
toyDB provides an encoding called "Keycode" which provides these properties, in the
82+
[`encoding::keycode`](https://github.com/erikgrinaker/toydb/blob/213e5c02b09f1a3cac6a8bbd0a81773462f367f5/src/encoding/keycode.rs)
83+
module. It is implemented as a [Serde](https://serde.rs) (de)serializer, which
84+
requires a lot of boilerplate code, but we'll just focus on the actual encoding.
85+
86+
Keycode only supports a handful of primary data types, and just needs to order values of the same
87+
type:
88+
89+
* `bool`: `00` for `false` and `01` for `true`.
90+
91+
https://github.com/erikgrinaker/toydb/blob/2027641004989355c2162bbd9eeefcc991d6b29b/src/encoding/keycode.rs#L113-L117
92+
93+
* `u64`: the [big-endian](https://en.wikipedia.org/wiki/Endianness) binary encoding.
94+
95+
https://github.com/erikgrinaker/toydb/blob/2027641004989355c2162bbd9eeefcc991d6b29b/src/encoding/keycode.rs#L157-L161
96+
97+
* `i64`: the [big-endian](https://en.wikipedia.org/wiki/Endianness) binary encoding, but with the
98+
sign bit flipped to order negative numbers before positive ones.
99+
100+
https://github.com/erikgrinaker/toydb/blob/2027641004989355c2162bbd9eeefcc991d6b29b/src/encoding/keycode.rs#L131-L143
101+
102+
* `f64`: the [big-endian IEEE 754](https://en.wikipedia.org/wiki/Double-precision_floating-point_format)
103+
binary encoding, but with the sign bit flipped, and all bits flipped for negative numbers, to
104+
order negative numbers correctly.
105+
106+
https://github.com/erikgrinaker/toydb/blob/2027641004989355c2162bbd9eeefcc991d6b29b/src/encoding/keycode.rs#L167-L179
107+
108+
* `Vec<u8>`: terminated by `00 00`, with `00` escaped as `00 ff` to disambiguate it.
109+
110+
https://github.com/erikgrinaker/toydb/blob/2027641004989355c2162bbd9eeefcc991d6b29b/src/encoding/keycode.rs#L190-L205
111+
112+
* `String`: like `Vec<u8>`.
113+
114+
https://github.com/erikgrinaker/toydb/blob/2027641004989355c2162bbd9eeefcc991d6b29b/src/encoding/keycode.rs#L185-L188
115+
116+
* `Vec<T>`, `[T]`, `(T,)`: just the concatenation of the inner values.
117+
118+
https://github.com/erikgrinaker/toydb/blob/2027641004989355c2162bbd9eeefcc991d6b29b/src/encoding/keycode.rs#L295-L307
119+
120+
* `enum`: the enum variant's numerical index as a `u8`, then the inner values (if any).
121+
122+
https://github.com/erikgrinaker/toydb/blob/2027641004989355c2162bbd9eeefcc991d6b29b/src/encoding/keycode.rs#L223-L227
123+
124+
Decoding is just the inverse of the encoding.
125+
126+
Like `encoding::Value`, there is also an `encoding::Key` helper trait:
127+
128+
https://github.com/erikgrinaker/toydb/blob/b57ae6502e93ea06df00d94946a7304b7d60b977/src/encoding/mod.rs#L20-L37
129+
130+
We typically use enums to represent different kinds of keys. For example, if we wanted to store
131+
cars and video games, we could use:
132+
133+
```rust
134+
#[derive(serde::Serialize, serde::Deserialize)]
135+
enum Key {
136+
Car(String, String, u64), // make, model, year
137+
Game(String, u64, Platform), // name, year, platform
138+
}
139+
140+
#[derive(serde::Serialize, serde::Deserialize)]
141+
enum Platform {
142+
PC,
143+
PS5,
144+
Switch,
145+
Xbox,
146+
}
147+
148+
impl encoding::Key for Key {}
149+
150+
let returnal = Key::Game("Returnal".into(), 2021, Platform::PS5);
151+
let bytes = returnal.encode();
152+
println!("{bytes:02x?}");
153+
154+
// Outputs [01, 52, 65, 74, 75, 72, 6e, 61, 6c, 00, 00, 00, 00, 00, 00, 00, 00, 07, e5, 01].
155+
//
156+
// * Key::Game: 01
157+
// * Returnal: 52 65 74 75 72 6e 61 6c 00 00
158+
// * 2021: 00 00 00 00 00 00 07 e5
159+
// * Platform::PS5: 01
160+
161+
let returnal = Key::decode(&bytes)?;
162+
```
163+
164+
Because the keys are sorted in element-wise order, this would allow us to e.g. perform a prefix
165+
scan to fetch all platforms which Returnal (2021) was released on, or perform a range scan to fetch
166+
all models of Nissan Altima released between 2010 and 2015.
167+
168+
---
169+
170+
<p align="center">
171+
← <a href="storage.md">Storage Engine</a> &nbsp; | &nbsp; <a href="mvcc.md">MVCC Transactions</a> →
172+
</p>
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.

docs/architecture/index.md

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
# toyDB Architecture
2+
3+
toyDB is a simple distributed SQL database, intended to illustrate how such systems are built. The
4+
overall structure is similar to real-world distributed databases, but the design and implementation
5+
has been kept as simple as possible for understandability. Performance and scalability are explicit
6+
non-goals, as these are major sources of complexity in real-world systems.
7+
8+
This guide will walk through toyDB's architecture and code from the bottom up, with plenty of links
9+
to the actual code.
10+
11+
> ℹ️ View on GitHub desktop for inline code listings.
12+
13+
* [Overview](overview.md)
14+
* [Properties](overview.md#properties)
15+
* [Components](overview.md#components)
16+
* [Storage Engine](storage.md)
17+
* [`Memory` Storage Engine](storage.md#memory-storage-engine)
18+
* [`BitCask` Storage Engine](storage.md#bitcask-storage-engine)
19+
* [Key/Value Encoding](encoding.md)
20+
* [`Bincode` Value Encoding](encoding.md#bincode-value-encoding)
21+
* [`Keycode` Key Encoding](encoding.md#keycode-key-encoding)
22+
* [MVCC Transactions](mvcc.md)
23+
* [Raft Consensus](raft.md)
24+
* [Log Storage](raft.md#log-storage)
25+
* [State Machine Interface](raft.md#state-machine-interface)
26+
* [Node Roles](raft.md#node-roles)
27+
* [Node Interface and Communication](raft.md#node-interface-and-communication)
28+
* [Leader Election and Terms](raft.md#leader-election-and-terms)
29+
* [Client Requests and Forwarding](raft.md#client-requests-and-forwarding)
30+
* [Write Replication and Application](raft.md#write-replication-and-application)
31+
* [Read Processing](raft.md#read-processing)
32+
* [SQL Engine](sql.md)
33+
* [Data Model](sql-data.md)
34+
* [Data Types](sql-data.md#data-types)
35+
* [Schemas](sql-data.md#schemas)
36+
* [Expressions](sql-data.md#expressions)
37+
* [Storage](sql-storage.md)
38+
* [Key/Value Representation](sql-storage.md#keyvalue-representation)
39+
* [Schema Catalog](sql-storage.md#schema-catalog)
40+
* [Row Storage and Transactions](sql-storage.md#row-storage-and-transactions)
41+
* [Raft Replication](sql-raft.md)
42+
* [Parsing](sql-parser.md)
43+
* [Lexer](sql-parser.md#lexer)
44+
* [Abstract Syntax Tree](sql-parser.md#abstract-syntax-tree)
45+
* [Parser](sql-parser.md#parser)
46+
* [Planning](sql-planner.md)
47+
* [Execution Plan](sql-planner.md#execution-plan)
48+
* [Scope and Name Resolution](sql-planner.md#scope-and-name-resolution)
49+
* [Planner](sql-planner.md#planner)
50+
* [Optimization](sql-optimizer.md)
51+
* [Constant Folding](sql-optimizer.md#constant-folding)
52+
* [Filter Pushdown](sql-optimizer.md#filter-pushdown)
53+
* [Index Lookups](sql-optimizer.md#index-lookups)
54+
* [Hash Join](sql-optimizer.md#hash-join)
55+
* [Short Circuiting](sql-optimizer.md#short-circuiting)
56+
* [Execution](sql-execution.md)
57+
* [Plan Executor](sql-execution.md#plan-executor)
58+
* [Session Management](sql-execution.md#session-management)
59+
* [Server and Client](server.md)
60+
* [Server](server.md#server)
61+
* [Raft Routing](server.md#raft-routing)
62+
* [SQL Service](server.md#sql-service)
63+
* [`toydb` Binary](server.md#toydb-binary)
64+
* [Client Library](server.md#client-library)
65+
* [`toysql` Binary](server.md#toysql-binary)
66+
67+
---
68+
69+
<p align="center">
70+
<a href="overview.md">Overview</a> →
71+
</p>

0 commit comments

Comments
 (0)