garage/doc/talks/2026-06-03-adr/talk.typ

374 lines
11 KiB
Typst
Raw Permalink Normal View History

#import "@preview/slydst:0.1.5": *
2026-05-12 15:02:09 +02:00
// some display rules
#set par(spacing: 2em)
#set list(spacing: 1em)
#show link: set text(font: "DejaVu Sans Mono", size: 9pt)
// some functions to customize styles
#let vhcenter(content) = [
#v(1fr)
#align(center)[#content]
#v(1fr)
]
#let imgcenter(..args) = vhcenter(image(..args))
#let mytable(..args) = {
show table.cell: set text(size: 9pt)
set table(stroke: 0.5pt + black)
grid(
columns: (1cm, 1fr, 1cm),
[], table(..args), []
)
}
// actual slides
#show: slides.with(
//title: "Garage",
authors: ("Alex Auvolat",),
date: "2026-06-03",
layout: "large",
//ratio: 16/9,
ratio: 4/3,
title-color: rgb("#ff9329"),
)
#title-slide[
2026-05-12 15:02:09 +02:00
#align(center)[
#image("../../sticker/Garage.png", width: 20%)
#v(1em)
*An introduction to Garage*\
Alex Auvolat, Deuxfleurs
#v(1em)
#link("https://garagehq.deuxfleurs.fr/")\
Matrix channel: `#garage:deuxfleurs.fr`
]
]
2026-05-12 15:02:09 +02:00
== A non-profit initiative
#grid(
2026-05-12 15:02:09 +02:00
columns: (2fr, 8fr),
[#v(2em)],[],
[
#image("../assets/logos/deuxfleurs.svg", width: 50%)
],
[
2026-05-12 15:02:09 +02:00
*Part of a degrowth initiative*\
Garage has been created at Deuxfleurs where we experiment running Internet services without datacenter on commodity and refurbished hardware.
],
2026-05-12 15:02:09 +02:00
[#v(2em)],[],
[
2026-05-12 15:02:09 +02:00
#image("../assets/community.png", width: 50%)
],
[
2026-05-12 15:02:09 +02:00
*Developed by a community*\
#text(size: 0.8em)[Some recent contributors: Arthur C, Charles H, dongdigua, Etienne L, Jonah A, Julien K, Lapineige, MagicRR, Milas B, Niklas M, RockWolf, Schwitzd, trinity-1686a, Xavier S, babykart, Baptiste J, eddster2309, James O'C, Joker9944, Maximilien R, Renjaya RZ, Yureka...]
],
2026-05-12 15:02:09 +02:00
[#v(3em)],[],
[
2026-05-12 15:02:09 +02:00
#image("../assets/logos/AGPLv3_Logo.png", width: 50%)
],
[
2026-05-12 15:02:09 +02:00
*Owned by nobody, open-core is impossible, zero VC money*\
AGPL + no Contributor License Agreement = Garage ownership spreads among hundredth of contributors.
]
)
2026-05-12 15:02:09 +02:00
== Our initial objective at Deuxfleurs
#v(4em)
#align(center)[
#text(weight: "bold")[
Promote self-hosting and small-scale hosting\
as an alternative to large cloud providers
]
]
#v(2em)
Why is it hard?
#v(2em)
#align(center)[
#underline[Resilience]\
#text(size: 0.8em)[we want good uptime/availability with low supervision]
]
== Our very low-tech infrastructure
2026-05-12 15:02:09 +02:00
//== Building a resilient system with cheap stuff
//
#v(4em)
#[
#set list(spacing: 2em)
- Commodity hardware (e.g. old desktop PCs)\
#text(size: 0.8em)[(can die at any time)]
- Regular Internet (e.g. FTTB, FTTH) and power grid connections\
#text(size: 0.8em)[(can be unavailable randomly)]
- *Geographical redundancy* (multi-site replication)
]
#pagebreak()
2026-05-12 15:02:09 +02:00
#imgcenter("../assets/neptune.jpg", width: 100%)
2026-05-12 15:02:09 +02:00
#pagebreak()
#imgcenter("../assets/atuin.jpg", width: 100%)
2026-05-12 15:02:09 +02:00
#pagebreak()
#imgcenter("../assets/inframap_jdll2023.pdf", width: 100%)
2026-05-12 15:02:09 +02:00
== Object storage: a crucial component
2026-05-12 15:02:09 +02:00
#vhcenter[
#grid(
columns: (3fr, 3fr, 3fr),
[#image("../assets/logos/Amazon-S3.jpg", height: 6em)],
[#image("../assets/logos/minio.png", height: 5em)],
[#image("../../logo/garage_hires_crop.png", height: 6em)]
)
]
S3: a de-facto standard, many compatible applications
2026-05-12 15:02:09 +02:00
MinIO: not suited for geo-distributed deployments, becoming closed source
*Garage is a self-hosted drop-in replacement for the Amazon S3 object store*
2026-05-12 15:02:09 +02:00
#v(2em)
2026-05-12 15:02:09 +02:00
== Principle 1: based on CRDTs
2026-05-12 15:02:09 +02:00
#v(1cm)
#underline[Internally, Garage uses only CRDTs] (conflict-free replicated data types)
Why not Raft, Paxos, ...? Issues of consensus algorithms:
- *Software complexity*
- *Performance issues:*
- The leader is a *bottleneck* for all requests
- *Sensitive to higher latency* between nodes
- *Takes time to reconverge* when disrupted (e.g. node going down)
2026-05-12 15:02:09 +02:00
== The data model of object storage
2026-05-12 15:02:09 +02:00
#[
#set list(spacing: 1em)
Object storage is basically a *key-value store*:
#mytable(
columns: (2fr, 5fr),
align: left,
[*Key: file path + name*], [*Value: file data + metadata*],
[`index.html`], text(size: 8pt)[
`Content-Type: text/html; charset=utf-8`\
`Content-Length: 24929`\
`<binary blob>`
],
[`img/logo.svg`], text(size: 8pt)[
`Content-Type: text/svg+xml`\
`Content-Length: 13429`\
`<binary blob>`
],
[`download/index.html`], text(size: 8pt)[
`Content-Type: text/html; charset=utf-8`\
`Content-Length: 26563`\
`<binary blob>`
]
)
*Consistency model:*
- Not ACID (not required by S3 spec) / not linearizable
- *Read-after-write consistency*\
#text(size: 0.8em)[(stronger than eventual consistency)]
]
== Principle 2: geo-distributed data model
2026-05-12 15:02:09 +02:00
#imgcenter("../assets/map.png", width: 90%)
2026-05-12 15:02:09 +02:00
Garage stores replicas on different zones when possible
2026-05-12 15:02:09 +02:00
== Zone-aware cluster configuration
2026-05-12 15:02:09 +02:00
#imgcenter("../assets/screenshots/garage_status_0.9_prod_zonehl.png", width: 100%)
2026-05-12 15:02:09 +02:00
== Performance evaluation
2026-05-12 15:02:09 +02:00
#imgcenter("../assets/perf/endpoint_latency_0.7_0.8_minio.png", width: 100%)
#pagebreak()
2026-05-12 15:02:09 +02:00
#imgcenter("../assets/perf/ttfb.png", width: 100%)
#pagebreak()
2026-05-12 15:02:09 +02:00
#imgcenter("../assets/perf/io-0.7-0.8-minio.png", width: 100%)
2026-05-12 15:02:09 +02:00
= Deploying Garage
2026-05-12 15:02:09 +02:00
== Garage in the wild
2026-05-12 15:02:09 +02:00
#imgcenter("../assets/cluster_kind.png", width: 100%)
2026-05-12 15:02:09 +02:00
== Size of known deployments
2026-05-12 15:02:09 +02:00
#imgcenter("../assets/cluster_size.png", width: 100%)
2026-05-12 15:02:09 +02:00
_"Petabyte storage setup for a video site. Nginx as CDN in-front using garage-s3-website feature. Each storage node has ~64TB storage with raid10, no replication within garage. 25gbit nic. haproxy to loadbalance across 5 nodes. mostly reads with very few writes."_
2026-05-12 15:02:09 +02:00
_"We currently manage 7 Garage nodes, 28TB total storage, 6M blocks for 3M objects and 4TB of object data. We have been running Garage in production for 2.5 years."_
2026-05-12 15:02:09 +02:00
== Setting up data and metadata storage
2026-05-12 15:02:09 +02:00
#vhcenter[
#mytable(
columns: (0.7fr, 1fr, 1fr),
inset: 0.8em,
align: center + horizon,
table.header[][*Metadata storage*][*Data storage*],
[*Content*],[access keys, buckets\ index of objects],[raw data blocks],
[*Size*],[\< 10\% of data\ rarely over 100GB],[replication × dataset size\ *no erasure-coding*],
[*Constraints*],[latency sensitive\ write-intensive under load],[big\ many files],
[*Ideal hardware*],[entreprise-grade SSD],[HDD],
[*Recommended redundancy*],[RAID1],[none, use disks directly\ *avoid RAID if possible*],
[*Recommended filesystem*],[ZFS, Btrfs],[XFS on invidual disks],
[*Tunables in Garage*],[database engine\
automatic snapshots],[block size\ compression],
)
]
2026-05-12 15:02:09 +02:00
== Chosing a replication factor
2026-05-12 15:02:09 +02:00
#vhcenter[
#mytable(
columns: (0.7fr, 1fr, 1.3fr),
inset: 0.8em,
align: center + horizon,
table.header[*Replication factor*][*Pro*][*Cons*],
[*1*], [easy single-node setup\ full space efficiency], [no metadata redundancy\ *vunlerable to hardware crash or data corruption*\ no high-availability],
[*2*], [redundancy\ limited storage overhead], [limited high-availability\ (read-only when one node is unavailable)],
[*3*], [high-availability setup\ best data resilience], [big storage overhead],
[*4, 5, ...*], [possible if needed], [...],
2026-05-12 15:02:09 +02:00
)
2026-05-12 15:02:09 +02:00
#v(0.5cm)
*Important note:* metadata replication == data replication\
Choose well, this cannot be changed easily!
]
2026-05-12 15:02:09 +02:00
== Picking a metadata engine
2026-05-12 15:02:09 +02:00
#vhcenter[
All files-to-block mappings are stored in the metadata engine, including bucket and object metadata. Files below 3KB are stored directly in the metadata engine.
#v(0.5cm)
2026-05-12 15:02:09 +02:00
#mytable(
columns: (0.7fr, 1fr, 1.3fr),
inset: 0.8em,
align: center + horizon,
table.header[*Metadata engine*][*Characteristics*][*Use case*],
[*SQlite*],[safer],[single node deployment\ small clusters\ clusters with infrequent access],
[*LMDB*],[faster\ sometimes has inexplicable corruptions],[larger clusters with metadata redundancy],
[*Fjall*],[experimental\ best of both worlds?],[help us test it!],
)
2026-05-12 15:02:09 +02:00
#v(0.5cm)
Metadata engine can be set node per-node, and changed later with a migration tool
]
2026-05-12 15:02:09 +02:00
== Avoiding common issues as soon as possible
#vhcenter[
#mytable(
columns: (1fr, 1.4fr),
inset: 0.8em,
align: center + horizon,
table.header[*Risk*][*How to avoid*],
[*Metadata corruption*\ (esp. with LMDB)],[Configure automatic snapshots with\ `metadata_auto_snapshot_interval`\ Use replication factor 2 or 3],
[*Data not well balanced between nodes*],[Avoid clusters with too many nodes\ Target: \#nodes 10 × replication_factor],
[*Performance issues with many objects in one single bucket*],[Spread your data over multiple buckets],
[*Performance issues with big objects*],[Increase `block_size` configuration parameter\ Target: object size 1000 × `block_size`,\ `block_size` 100MB],
[*Performance issues with many small objects*],[Have enough RAM to fit the entire metadata DB],
)
]
2026-05-12 15:02:09 +02:00
== Other things to consider during set-up
#vhcenter[
#mytable(
columns: (1fr, 1.2fr),
inset: 0.8em,
align: center + horizon,
[*Tools for cluster deployment*],[Ansible + systemd\ NixOS\ Kubernetes or Nomad with Docker],
[*Initial cluster setup*],[Manual layout configuration\ Read the documentation!],
[*TLS support on public endpoints*],[Add an external reverse-proxy (Nginx, ...)],
[*S3 anonymous access*],[Not implemented, use website endpoint],
[*Monitoring*],[Prometheus + Grafana for Garage metrics\ External tool to monitor HDD health],
)
]
2026-05-12 15:02:09 +02:00
== Monitoring with Prometheus + Grafana
2026-05-12 15:02:09 +02:00
#imgcenter("../2026-01-31-fosdem/assets/garage-stats.png", width: 83%)
== Common issues and their solutions
#vhcenter[
#mytable(
columns: (1fr, 1.5fr),
inset: 0.8em,
align: center + horizon,
table.header[*Problem*][*Solution*],
table.cell(rowspan: 2)[*S3 access authorization issues*],[Correctly set the `region` parameter in your S3 client\ default = `garage`, not `us-east-1`],[Check your reverse proxy configuration],
[*Debugging other API issues*],[Set `RUST_LOG=garage=debug` to investigate],
[*Resync queue fills up*],[`garage worker set -a resync-worker-count 8`\ `garage worker set -a resync-tranquility 0`],
[*LMDB database too big*],[Stop garage and compact with `mdb_copy -c`],
[*Data recovery with dead/unavailable nodes*],[Consistency mode `degraded` allows to read data from an unhealthy cluster. *Do not use it for regular operation.*],
[*Other issues*],[Ask us on matrix `#garage:deuxfleurs.fr` or open an issue on `git.deuxfleurs.fr`\
Provide the output of `garage status`, `garage stats` and relevant metrics and logs],
)
]
2026-05-12 15:02:09 +02:00
== Future developments
2026-05-12 15:02:09 +02:00
#imgcenter("../assets/survey_requested_features.png", width: 80%)
2026-05-12 15:02:09 +02:00
#pagebreak()
#imgcenter("../2026-01-31-fosdem/assets/Garage Web Admin - Dashboard@2x.png", width: 100%)
2026-05-12 15:02:09 +02:00
#pagebreak()
#imgcenter("../2026-01-31-fosdem/assets/Garage Web Admin - Bucket details page@2x.png", width: 100%)
== Where to find us
2026-05-12 15:02:09 +02:00
#align(center)[
#v(1fr)
#image("../../logo/garage_hires.png", width: 25%)
#link("https://garagehq.deuxfleurs.fr/")\
#link("mailto:garagehq@deuxfleurs.fr")\
`#garage:deuxfleurs.fr` on Matrix
#v(1fr)
#grid(columns: (6%,3%,13%),
image("../assets/logos/rust_logo.png"),
[],
image("../assets/logos/AGPLv3_Logo.png"),
)
]