1
0
Fork 1
mirror of https://github.com/git-pkgs/proxy.git synced 2026-06-02 08:38:17 -04:00
Commit graph

4 commits

Author SHA1 Message Date
Lars Wallenborn
76f41cf271
Add storage backend probe to /health (closes #73) (#119)
* config: add Health.StorageProbeInterval

* metrics: add proxy_health_probe_failures_total counter

* server: add storageProbe with happy-path test

* server: add storageProbe failure-mode tests

* server: add healthCache with TTL, single-flight, transition logging

* server: wire storage probe into /health

* server: update TestHealthEndpoint for JSON; wire healthCache into newTestServer

Also fix Windows file-locking issue in storageProbe: close the reader
explicitly before Delete so the file handle is released prior to os.Remove.

* server: clean up stale comment in storageProbe

* docs: document storage health probe and new metric

* docs: regenerate Swagger for /health JSON response

* server: simplify rc.Close error handling in storageProbe

* server: defer probe cleanup so size/open/read/verify failures don't leak objects

Previously, storageProbe only called Delete on the success path. Any
failure between Store and the final Delete (size mismatch, Open error,
mid-stream read failure, content mismatch) left the probe object orphaned
in the storage backend. With caching disabled and Kubernetes-rate probing,
the leak could accumulate noticeably on backends like S3.

Use a named return + defer to attempt Delete after every successful Store.
The earlier-step failure remains the primary error; Delete failure only
surfaces as step="delete" when nothing else went wrong. Add a table-driven
test that asserts cleanup runs for each non-delete failure path.

Reported by Copilot on #119.

* config: validate health.storage_probe_interval in Config.Validate

The new duration field was only validated at use time in newHealthCache.
The existing codebase already validates other duration fields
(MetadataTTL, DirectServeTTL, Gradle.MaxAge, Gradle.SweepInterval) in
Config.Validate() so misconfiguration fails fast at startup with a
config-key-specific error.

Match that pattern. The parse-at-use code in newHealthCache stays as
a safety net, mirroring the MetadataTTL precedent.

Reported by Copilot on #119.

* docs: lowercase "counter" in metrics table for consistency

Other rows in the table use lowercase type names (counter/gauge/histogram).
Match that style.

Reported by Copilot on #119.

* docs: include size-check step in /health probe description

The probe is write → size-check → read → verify → delete; the
architecture note was missing the size-check step.

Reported by Copilot on #119.

* server: address andrew's review on #119

- Drop unused callerCtx parameter from healthCache.Check (Check is now
  parameter-less; the comment-only "accepted for symmetry" justification
  wasn't carrying its weight).
- Emit "storage": {"status": "skipped"} on DB short-circuit instead of
  omitting the key, so monitors expecting a fixed key set keep working.
- Reject negative storage_probe_interval at config validation time
  (previously parsed and silently behaved like "0").
- Extract HealthConfig.Validate to keep Config.Validate under the
  gocognit threshold and match the existing GradleBuildCacheConfig pattern.
- README Health Check section: note that /health is intended as a
  readiness probe rather than a liveness probe (Check holds a mutex
  for up to the 10s probe timeout).
- cmd/proxy/main.go godoc: column-align the new env var with the
  surrounding Gradle entries.

Reported by andrew on #119.
2026-05-22 12:14:01 +01:00
Andrew Nesbitt
61741123bf
Verify cached artifacts on read (#111)
checkCache opened the storage reader and streamed it to the client
without checking that the bytes still matched what was originally
stored, or what the upstream registry declared. Disk corruption,
accidental overwrites, or local tampering would go unnoticed.

Wrap the storage reader in a verifyingReader that computes SHA256
(against artifact.content_hash) and, when version.integrity holds an
SRI string, the corresponding sha256/384/512 digest as bytes flow
through. At EOF the digests are compared; on mismatch we log at
error level, bump proxy_integrity_failures_total, and clear the
artifact's cache entry so the next request refetches from upstream.

Verification is skipped when the stream was not fully consumed
(client disconnect) to avoid evicting good artifacts on partial
reads. The DirectServe presigned-URL path is unverified since the
proxy never sees those bytes.

Refs #42 (part 1)
2026-05-03 10:36:28 +01:00
c655399a07 Apply 'go fmt' as suggested in CONTRIBUTING.md. 2026-04-18 07:43:22 -04:00
Andrew Nesbitt
2d7cb8eae5
Refactoring and features 2026-02-03 22:40:40 +00:00