The OPTIONS placeholder for buckets without a resolvable global alias returns` Access-Control-Allow-Origin: *` and `Access-Control-Allow-Methods: *` but omits `Access-Control-Allow-Headers`.
Bug verified against Garage v2.2.0 with a local-aliased bucket: OPTIONS placeholder doesn't have `Access-Control-Allow-Headers`, causes the browser to reject signed PUT preflights
The current placeholder fails open for unsigned simple requests but blocks every signed request, undermining the design intent flagged in the FIXME:
```rs
// We take the permissive approach of allowing everything,
// because we don't want to prevent web apps that use
// local bucket names from making API calls.
```
Adds `Access-Control-Allow-Headers: *` so the permissive default is actually permissive for the request shapes that exist in practice.
Refs #258. Does not address the broader FIXME (CORS rule resolution for local-aliased buckets); the placeholder approach is preserved.
All tests are fine locally:
```bash
▲ ~/opensource/garage cargo test -p garage_api_common cors::
running 5 tests
test cors::tests::preflight_with_single_allowed_origin_returns_request_origin ... ok
test cors::tests::preflight_with_multiple_allowed_origins_reflects_request_origin ... ok
test cors::tests::preflight_with_wildcard_allowed_origin_returns_wildcard ... ok
test xml::cors::tests::test_deserialize_norules ... ok
test xml::cors::tests::test_deserialize ... ok
test result: ok. 5 passed; 0 failed; 0 ignored; 0 measured; 16 filtered out; finished in 0.00s
```
Co-authored-by: smattymatty <smattymatt@gmail.com>
Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1450
Reviewed-by: Alex <lx@deuxfleurs.fr>
## Title
fix(cors): return single matching origin instead of multiple values in `Access-Control-Allow-Origin`
## Summary
This PR fixes bucket CORS responses when a single CORS rule contains multiple `AllowedOrigins`.
Previously, Garage returned the configured origins as a comma-separated list in `Access-Control-Allow-Origin`, for example:
```http
Access-Control-Allow-Origin: https://app.example.test, https://admin.example.test
```
This is not the expected browser-facing behavior.
When a request origin matches a configured rule, the response should reflect **only the matching request origin**, unless the rule contains `*`.
## What changed
- `Access-Control-Allow-Origin` now behaves as follows:
- returns `*` when the matched rule contains a wildcard origin
- otherwise returns the request `Origin` as a **single value**
- added `Vary: Origin` when ACAO reflects the request origin
- added preflight-specific `Vary` handling in the preflight path for:
- `Origin`
- `Access-Control-Request-Method`
- `Access-Control-Request-Headers`
## Scope
This change applies to shared bucket CORS handling paths, including:
- S3 API responses
- K2V API responses
- S3 POST object responses
- web bucket responses
- preflight (`OPTIONS`) bucket CORS responses
This does **not** change admin API fixed CORS behavior.
## Reproduction
A direct repro script is included:
```bash
./script/test-cors-multi-origin.sh
```
It exercises two cases against a direct single-node Garage instance:
1. **single-origin control**
2. **multi-origin repro**
Before this fix, the multi-origin case returned a comma-separated ACAO value.
After this fix, both cases reflect only the request origin.
## Example behavior
### Before
```http
Access-Control-Allow-Origin: https://app.example.test, https://admin.example.test
```
### After
```http
Access-Control-Allow-Origin: https://app.example.test
```
## Tests
Added/updated tests in `src/api/common/cors.rs` for:
- single-origin control
- multiple allowed origins reflecting the request origin
- wildcard origin preserving `*`
- preserving existing `Vary` values while appending `Origin`
## Validation
Used for validation:
```bash
cargo test -p garage_api_common cors::tests -- --nocapture
cargo build -p garage --bin garage
./script/test-cors-multi-origin.sh
```
## Reproducibility
For reviewers who want to validate behavior by commit:
- Before fix: `aa368e4b`
- includes the direct repro script and the regression test setup
- multi-origin ACAO is reproduced as a comma-separated value
- After fix: `f630eb92`
- reflects only the matching request origin
- preserves wildcard behavior
- adds `Vary: Origin` and preflight-specific `Vary` handling
Branch:
- `fix/cors-multiple-allow-origin`
Base used during validation:
- `74ad3bf8` (`main-v2`)
ClosesDeuxfleurs/garage#1149
Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1419
## Summary
Garage's SigV4 canonical-request builder trims leading/trailing whitespace from signed header values but does not collapse sequential internal whitespace, which the SigV4 spec requires:
> Convert sequential spaces to a single space.
— https://docs.aws.amazon.com/IAM/latest/UserGuide/create-signed-request.html
AWS SDKs apply this normalization before computing the signature, but transmit the raw value on the wire. The receiver must therefore apply the same normalization when reconstructing the canonical request, otherwise the recomputed hash differs and the request is rejected as `Invalid signature`.
Same class of canonicalization-drift bug as #1155 / !1382, but on the canonical-headers axis rather than the canonical-URI axis.
## Reproduction
Surfaces in practice with `gitlab-runner`'s S3 cache uploader. I was in the midst of migrating my runner cache from AWS S3 to garage, but I noticed some shared runner caches were no longer uploading.
I was using `sha256sum | sha256sum` to compute my cache keys, which leaves a trailing ` -` on the value. Once GitLab appends `-protected` for protected branches the resulting `x-amz-meta-cachekey` header value contains internal sequential whitespace and triggers the mismatch:
```
x-amz-meta-cachekey:php- --protected
^^
two spaces, preserved by Garage
```
Without the fix the included regression test (`test_presigned_put_with_user_metadata`) fails with HTTP 403; with the fix it returns 200.
`aws-cli` is unaffected because it signs `Content-Type` rather than user metadata, so the specific code path with whitespace-bearing signed header values isn't exercised.
## Fix
In `canonical_request` (`src/api/common/signature/payload.rs`), replace the `.trim()` call on the joined header value with the full SigV4 normalization — `split_whitespace().collect::<Vec<_>>().join(" ")` — which both trims edges and collapses internal runs.
## Tests
* New regression test `test_presigned_put_with_user_metadata` covering a presigned PUT whose `x-amz-meta-*` value contains internal sequential whitespace.
* Full integration suite passes: `40 passed; 0 failed; 2 ignored`.
* `garage_api_common` unit tests pass: `18 passed; 0 failed`.
## Notes
* Backwards-compatible: any signature that validated before still validates, because clients are spec-required to collapse on their side; Garage was only rejecting requests where the client had collapsed correctly but Garage hadn't.
* No config or migration changes.
* Fix applies to both presigned-URL and Authorization-header code paths since they share the canonical-request builder.
Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1424
Reviewed-by: Alex <lx@deuxfleurs.fr>
known_addrs in PeerInfoInternal is append-only — addresses accumulate
via add_addr() and PeerList gossip but are never removed. In dynamic
environments (k8s pod restarts, DHCP, NAT traversal), this list grows
unboundedly with stale addresses.
Combined with sequential iteration in try_connect() and no TCP connect
timeout in netapp.rs, each unreachable address blocks reconnection for
the kernel's TCP SYN timeout (75-130s on Linux). With 10+ stale
addresses, worst-case reconnection exceeds 750s — a full outage for
replication_factor=3 clusters.
This commit contains the two following changes:
1. Address failure tracking and pruning (peering.rs): Track consecutive
connection failures per address in PeerInfoInternal. After 3 failures,
prune from known_addrs. Reset count when address is re-advertised via
gossip or incoming connection. Prevents unbounded list growth.
2. Shuffle before connecting (peering.rs): Randomize address order in
try_connect() so the valid address (often appended last) gets a fair
chance instead of always trying stale addresses first.
known_addrs in PeerInfoInternal is append-only — addresses accumulate
via add_addr() and PeerList gossip but are never removed. In dynamic
environments (k8s pod restarts, DHCP, NAT traversal), this list grows
unboundedly with stale addresses.
Combined with sequential iteration in try_connect() and no TCP connect
timeout in netapp.rs, each unreachable address blocks reconnection for
the kernel's TCP SYN timeout (75-130s on Linux). With 10+ stale
addresses, worst-case reconnection exceeds 750s — a full outage for
replication_factor=3 clusters.
This patches includes a first change to fix this issue:
1. TCP connect timeout (netapp.rs): Wrap TcpStream::connect() in
tokio::time::timeout(10s). Caps per-address attempt from 75-130s
to 10s, reducing worst-case 10-addr reconnection from ~750s to ~100s.
## Summary
This PR ensures that the `LifecycleWorker` yields at least once to the Tokio scheduler in between each batch of 100 objects.
## Problem being solved
I'm administrating a Garage cluster which has been experiencing timeouts on all endpoints while the lifecycle worker is running at midnight UTC : `Ping timeout` error messages and even requests eventually failing due to `Could not reach quorum ...`.
I have found that this happens while the lifecycle worker is working on a big bucket (containing millions of objects) with a lifecycle rule that applies to very few objects.
The `process_object()` function does not hit any `await`:
- `last_bucket` is always the same, so the `bucket_table` is not read asynchronously
- no transaction is made on the `object_table` because my lifecycle rule (almost) never applies to any object
The first commit in this PR adds an executable which reproduces the problem that I've been experiencing in a self-contained way : the lifecycle worker starves the Tokio scheduler so much that no other task is able to run (or very rarely).
To run it : `cargo run -p garage_model --bin lifecycle-starvation-test`.
This commit can be dropped post-review, as it's only useful to demonstrate the starvation.
The error messages completely stopped after adding the extra yield to the nodes of my cluster.
The duration of the lifecycle worker task does not appear to have changed at all from what I can see (looking at the timestamps produced either by the self-contained binary or by each of my nodes with the `Lifecycle worker finished` message).
## Note
An other potential fix would have been to force the `WorkerProcessor` to yield before re-enqueuing a busy task, but this would have affected all Garage workers even though it's only the `LifecycleWorker` being uncooperative.
Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1396
Reviewed-by: Alex <lx@deuxfleurs.fr>
Co-authored-by: Gauthier Zirnhelt <gauthier.zirnhelt@insimo.fr>
Co-committed-by: Gauthier Zirnhelt <gauthier.zirnhelt@insimo.fr>
This fixes a regression wrt garage-v1, likely caused by the version upgrade of quick_xml.
Currently, garage-v2 will emit empty ErrorDocument/IndexDocument/RedirectAllRequestsTo attributes in the response of GetBucketWebsite if there are no corresponding values.
This is somewhat wrong; at least, the S3 documentation for RedirectAllRequestsTo (https://docs.aws.amazon.com/AmazonS3/latest/API/API_RedirectAllRequestsTo.html) writes that it has a required HostName field. So emitting an empty RedirectAllRequestsTo is invalid.
This PR skips emitting XML attributes for these parameters if they contain no value.
Co-authored-by: Armaël Guéneau <armael.gueneau@ens-lyon.org>
Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1391
Co-authored-by: Armael <armael@noreply.localhost>
Co-committed-by: Armael <armael@noreply.localhost>
This is a port of #1320 on top of the main-v2 branch.
Co-authored-by: Armaël Guéneau <armael.gueneau@ens-lyon.org>
Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1392
Co-authored-by: Armael <armael@noreply.localhost>
Co-committed-by: Armael <armael@noreply.localhost>
this makes it more easy to correlate an error with the request that caused it. This can be helpful during debugging, or when setting up some sort of automation based on log content
Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1390
Reviewed-by: Alex <lx@deuxfleurs.fr>
Reviewed-by: maximilien <git@mricher.fr>
Co-authored-by: trinity-1686a <trinity@deuxfleurs.fr>
Co-committed-by: trinity-1686a <trinity@deuxfleurs.fr>
Made a quick pr to add a sub-command called completions for generating shell completions, was going pretty crazy that this wasn't a thing :P.
Tried my best to do everything properly, let me know if I need to change something, I tested it and it works perfectly.
Co-authored-by: MrSnowy <snow@mrsnowy.dev>
Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1386
Reviewed-by: Alex <lx@deuxfleurs.fr>
Co-authored-by: MrSnowy <mrsnowy@noreply.localhost>
Co-committed-by: MrSnowy <mrsnowy@noreply.localhost>
## Summary
This PR fixes S3 `DeleteObjects` XML parsing when the request body is pretty-printed (contains indentation/newlines as whitespace text nodes).
Although PR #1324 already tried to address this, parsing could still fail with:
`InvalidRequest: Bad request: Invalid delete XML query`
because non-element nodes were validated but not actually skipped in the parsing loop.
## What changed
- In `src/api/s3/delete.rs`:
- Properly skip non-element whitespace text nodes while iterating over `<Delete>` children.
- Keep rejecting non-whitespace stray text content.
- Parse the root `<Delete>` element more robustly by selecting the first element child.
## Tests added
New unit tests in `src/api/s3/delete.rs`:
- `parse_delete_objects_xml_with_formatting`
- pretty-printed valid XML is accepted.
- `parse_delete_objects_xml_accepts_compact_valid_xml`
- compact valid XML is accepted.
- `parse_delete_objects_xml_rejects_non_whitespace_text_node`
- compact XML with stray text is rejected.
- `parse_delete_objects_xml_rejects_pretty_print_with_stray_text`
- pretty-printed XML with stray text is rejected.
## Validation
Executed:
```bash
cargo test -p garage_api_s3 parse_delete_objects_xml -- --nocapture
```
Result: all parser tests pass.
Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1374
Co-authored-by: milouz1985 <francois.hoyez@gmail.com>
Co-committed-by: milouz1985 <francois.hoyez@gmail.com>
## Problem
`hugo deploy` is broken with Garage on recent hugo versions when using gzip matchers
## Why?
We don't support multi-value headers correctly, in this case this specific headers combination:
```
Content-Encoding: gzip
Content-Encoding: aws-chunked
```
is interpreted as:
```
Content-Encoding: gzip
```
instead of:
```
Content-Encoding: gzip,aws-chunked
```
It fails both 1. the signature check and 2. the streaming check.
## Proposed fix
- Taking into account multi-value headers when building Canonical Request (validated with hugo deploy + AWS SDK v2)
- Taking into account multi-value headers (both comma separated and HeaderEntry separated) when removing `aws-chunked` (validated with hugo deploy + AWS SDK v2)
## Full explanation
Currently, `hugo deploy` on version `hugo v0.152.2` or more recent uses AWS SDK v2 only and supports for sending gzipped content.
That's configured with a matcher like that:
```yaml
deployment:
matchers:
- pattern: "^.+\\.(woff2|woff|svg|ttf|otf|eot|js|css)$"
cacheControl: "max-age=31536000, no-transform, public"
gzip: true # <-------- here
```
Also, with SDK v2, hugo is streaming all of its files.
Thus, it sends that kind of requests:
```python
Request {
method: PUT,
uri: /sebou/pagefind/pagefind.js?x-id=PutObject,
version: HTTP/1.1,
headers: {
"host": "localhost",
"user-agent": "aws-sdk-go-v2/1.39.2 ua/2.1 os/linux lang/go#1.25.6 md/GOOS#linux md/GOARCH#amd64 api/s3#1.84.0 ft/s3-transfer m/E,G,Z,g",
"content-length": "10026",
"accept-encoding": "identity",
"amz-sdk-invocation-id": "aed6df34-a67c-4bab-b63b-2b3777b751a0",
"amz-sdk-request": "attempt=1; max=3",
"authorization": "AWS4-HMAC-SHA256 Credential=GKxxxxx/20260227/garage/s3/aws4_request, SignedHeaders=accept-encoding;amz-sdk-invocation-id;amz-sdk-request;cache-control;content-encoding;content-length;content-type;host;x-amz-content-sha256;x-amz-date;x-amz-decoded-content-length;x-amz-meta-md5chksum;x-amz-trailer, Signature=76cd9b77f693ca89c2e6dd2a4dc55f83d4a82eca0f563d9d095ff96076f7b057",
"cache-control": "max-age=31536000, no-transform, public",
"content-encoding": "gzip", # <---- see here 1st instance of Content-Encoding
"content-encoding": "aws-chunked", # <---- 2nd instance of Content-Encoding
"content-type": "text/javascript",
"via": "2.0 Caddy",
"x-amz-content-sha256": "STREAMING-UNSIGNED-PAYLOAD-TRAILER",
"x-amz-date": "20260227T132212Z",
"x-amz-decoded-content-length": "9982",
"x-amz-meta-md5chksum": "aad88ac0bf704e91584b8d9ad9796670",
"x-amz-trailer": "x-amz-checksum-crc32",
"x-forwarded-for": "::1",
"x-forwarded-host": "localhost",
"x-forwarded-proto": "https"
},
body: Body(Streaming)
}
```
But our canonical request function only calls `HeaderMap.get()` that returns only the 1st value and not `HeaderMap.get_all()` that returns all the values for a header.
Leading to the following invalid `CanonicalRequest` value:
```python
PUT
/sebou/pagefind/pagefind.js
x-id=PutObject
accept-encoding:identity
amz-sdk-invocation-id:aed6df34-a67c-4bab-b63b-2b3777b751a0
amz-sdk-request:attempt=1; max=3
cache-control:max-age=31536000, no-transform, public
content-encoding:gzip # <----- see here, we kept only gzip and dropped aws-chunked
content-length:10026
content-type:text/javascript
host:localhost
x-amz-content-sha256:STREAMING-UNSIGNED-PAYLOAD-TRAILER
x-amz-date:20260227T132212Z
x-amz-decoded-content-length:9982
x-amz-meta-md5chksum:aad88ac0bf704e91584b8d9ad9796670
x-amz-trailer:x-amz-checksum-crc32
accept-encoding;amz-sdk-invocation-id;amz-sdk-request;cache-control;content-encoding;content-length;content-type;host;x-amz-content-sha256;x-amz-date;x-amz-decoded-content-length;x-amz-meta-md5chksum;x-amz-trailer
```
Amazon is crystal clear that, instead of dropping the other values, we should concatenate them with a comma:

https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_sigv-create-signed-request.html#create-canonical-request
Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1369
Reviewed-by: Alex <lx@deuxfleurs.fr>
Co-authored-by: Quentin Dufour <quentin@deuxfleurs.fr>
Co-committed-by: Quentin Dufour <quentin@deuxfleurs.fr>