Bake the extended linter set into a project config so plain
golangci-lint run matches what we check locally, with goconst tuned
to ignore tests and bare lowercase words to drop ~200 ecosystem-name
and test-literal false positives.
Clear the remaining real findings: extract GradleBuildCacheConfig.Validate
from Config.Validate, pull the eviction sort comparator into
sortOldestFirst (zero time.Time already sorts first via Before so the
switch was redundant), add headerAcceptEncoding and SQL column-type
constants, and drop a dead empty-key recheck in the gradle handler.
* add Gradle Build Cache support with handler and tests
* linting issue
* MR Suggestions: Add Gradle HTTP Build Cache configuration to README
* implement minor stuff: Refactor Gradle handler to remove unnecessary URL parameter and update related tests
Co-authored-by: Copilot <copilot@github.com>
* Add Gradle build cache configuration and eviction support
- Introduced configuration options for Gradle build cache in config files and documentation.
- Implemented read-only mode and upload size limits for the Gradle build cache.
- Added cache eviction logic based on age and size, with corresponding tests.
- Enhanced storage interfaces to support listing objects by prefix.
* implement minor stuff: Refactor Gradle handler to remove unnecessary URL parameter and update related tests
* last finding fix
* fix tests and implement PR suggestions
Co-authored-by: Copilot <copilot@github.com>
* unify path
---------
Co-authored-by: Mateusz (Mati) Kepa <m.kepa@sportradar.com>
Co-authored-by: Copilot <copilot@github.com>
checkCache opened the storage reader and streamed it to the client
without checking that the bytes still matched what was originally
stored, or what the upstream registry declared. Disk corruption,
accidental overwrites, or local tampering would go unnoticed.
Wrap the storage reader in a verifyingReader that computes SHA256
(against artifact.content_hash) and, when version.integrity holds an
SRI string, the corresponding sha256/384/512 digest as bytes flow
through. At EOF the digests are compared; on mismatch we log at
error level, bump proxy_integrity_failures_total, and clear the
artifact's cache entry so the next request refetches from upstream.
Verification is skipped when the stream was not fully consumed
(client disconnect) to avoid evicting good artifacts on partial
reads. The DirectServe presigned-URL path is unverified since the
proxy never sees those bytes.
Refs #42 (part 1)
containsPathTraversal only checked literal ".." segments separated by
forward slashes. Encoded forms like %2e%2e%2f or backslash separators
would slip past if a caller ever passed a raw or Windows-style path.
The check now URL-decodes the input and treats backslashes as
separators before splitting. Go's stdlib already decodes r.URL.Path so
the encoded case is mostly belt-and-braces for cache keys and other
non-router inputs, but the storage layer guard from #106 makes this
worth locking in with tests.
Fixes#74
When the proxy reaches storage at an internal address (127.0.0.1, a
Docker service name) the presigned URLs it generates point there too,
which is useless to external clients. This adds an optional base URL
that replaces the scheme and host of signed URLs before they're returned,
keeping the signed path and query intact.
When storage.direct_serve is enabled and the backend supports it (S3,
Azure), cached artifact downloads return a 302 redirect to a presigned
URL instead of streaming bytes through the proxy. Falls back to
streaming when the backend can't sign (fileblob, local filesystem) or
signing fails.
Adds the azureblob driver so azblob:// storage URLs work.
Cache-hit accounting already happened before io.Copy so redirects are
counted correctly; the metrics calls are pulled into a helper so both
paths share them.
Closes#96
Cached metadata is now served directly within a configurable TTL window
(default 5m) without contacting upstream, reducing latency and upstream
load. When upstream is unreachable and the cache is past its TTL, stale
content is served with a Warning: 110 header per RFC 7234.
New config: `metadata_ttl` (YAML) / `PROXY_METADATA_TTL` (env).
Set to "0" to always revalidate with upstream.
- ProxyCached now stores upstream Last-Modified in the cache and uses it
(along with ETag) for conditional request handling, returning 304 when
client validators match. Adds Content-Length to cached responses.
- Handlers calling FetchOrCacheMetadata (pypi, composer, pub, nuget) now
check for ErrUpstreamNotFound and return 404 instead of 502, matching
the existing npm and cargo behavior.
- Mirror jobs report live progress via a periodic callback while running,
so API polls return real counts instead of zeroed progress.
- Registry mirroring removed from CLI flags, API acceptance, README, and
docs since every enumerator was a stub returning "not yet implemented".
- Added tests for the conditional metadata path (ETag/If-None-Match,
Last-Modified/If-Modified-Since, 304 responses, header omission).
- Fix race where runJob could overwrite canceled state set by Cancel()
- Fix Debian ecosystem name inconsistency ("deb" -> "debian")
- Stream metadata responses when caching is disabled to avoid buffering
- Add metadata_cache table to initial schema strings for consistency
- Gate mirror API behind mirror_api config flag (disabled by default)
- Fix goconst lint in metadata_cache_test.go
Add a `proxy mirror` CLI command and `/api/mirror` API endpoints that
pre-populate the cache from various input sources: individual PURLs,
SBOM files (CycloneDX and SPDX), or full registry enumeration.
The mirror reuses the existing handler.Proxy.GetOrFetchArtifact()
pipeline so cached artifacts are identical to those fetched on demand.
A bounded worker pool controls download parallelism.
Metadata caching is opt-in via `cache_metadata: true` in config (or
PROXY_CACHE_METADATA=true). The mirror command always enables it. When
enabled, upstream metadata responses are stored for offline fallback
with ETag-based conditional revalidation.
New internal/mirror package with Source interface, PURLSource,
SBOMSource, RegistrySource, and async JobStore. New metadata_cache
database table for offline metadata serving.
ReadMetadata used io.LimitReader which silently truncated responses at
the size limit. For packages like drizzle-orm (~92MB metadata), this
produced invalid JSON that was served to clients.
Now returns ErrMetadataTooLarge when the limit is exceeded, and bumps
the limit from 50MB to 100MB.
Fixes#78
* Fix container blob caching by passing auth token to fetcher
The container handler was calling GetOrFetchArtifactFromURL without
authentication headers, causing Docker Hub to return 401. The fallback
proxyBlobWithAuth path had auth but bypassed the cache entirely.
Now passes the Bearer token through GetOrFetchArtifactFromURLWithHeaders
so blobs are both authenticated and cached.
Fixesgit-pkgs/proxy#43
* Update registries to v0.4.0
Replace pre-release pseudo-version with the released v0.4.0 now that
git-pkgs/registries#13 has been merged.
* Fix all golangci-lint issues across the codebase
Resolve 77 lint issues reported by golangci-lint with gocritic, gocognit,
gocyclo, maintidx, dupl, mnd, unparam, ireturn, goconst, and errcheck
enabled. Net reduction of ~175 lines through shared helpers and
deduplication.
* Suppress staticcheck SA1019 for intentional deprecated field usage
The Storage.Path field is deprecated but still read for backwards
compatibility with existing configs that haven't migrated to the URL field.
All handler metadata and proxy requests were using http.DefaultClient directly,
bypassing any timeout or transport configuration. Added an HTTPClient field to
the Proxy struct with a 30-second default timeout, and updated every handler
to use it for upstream HTTP requests.
POST endpoints (/api/outdated, /api/bulk) now reject bodies over 1 MB
using http.MaxBytesReader. Upstream metadata reads (npm, pypi, composer,
nuget, pub) now use io.LimitReader capped at 50 MB to prevent OOM from
unexpectedly large responses.
The debian and rpm handlers take the request path and pass it directly
to the upstream URL without checking for ".." segments. This could let
a client craft a request that reaches unintended upstream paths.
Add a containsPathTraversal check at the entry point of both handlers
and return 400 for any path containing ".." segments.
Hides package versions published too recently from metadata responses,
giving the community time to spot malicious releases. Configurable
per-ecosystem and per-package with duration overrides. Supported for
npm, PyPI, pub.dev, and Composer.
Uses purl.MakePURLString() instead of fmt.Sprintf("pkg:...") for
correct namespace handling (npm scopes, Go module paths, Maven group
IDs) and percent-encoding. Replaces hand-rolled extractEcosystem and
inline PURL parsing in the bulk lookup fallback with purl.Parse().
Use the new client/ and fetch/ sub-packages from git-pkgs/registries
instead of the local upstream package. The fetcher, circuit breaker, and
resolver now live in registries where they can be shared across projects.
Depends on git-pkgs/registries#8.
The proxy can now use an existing git-pkgs database as a starting point.
Packages and versions tables match git-pkgs schema, using PURL-based
references instead of integer IDs. The proxy adds its own artifacts
table for caching functionality.