Cached metadata is now served directly within a configurable TTL window
(default 5m) without contacting upstream, reducing latency and upstream
load. When upstream is unreachable and the cache is past its TTL, stale
content is served with a Warning: 110 header per RFC 7234.
New config: `metadata_ttl` (YAML) / `PROXY_METADATA_TTL` (env).
Set to "0" to always revalidate with upstream.
- ProxyCached now stores upstream Last-Modified in the cache and uses it
(along with ETag) for conditional request handling, returning 304 when
client validators match. Adds Content-Length to cached responses.
- Handlers calling FetchOrCacheMetadata (pypi, composer, pub, nuget) now
check for ErrUpstreamNotFound and return 404 instead of 502, matching
the existing npm and cargo behavior.
- Mirror jobs report live progress via a periodic callback while running,
so API polls return real counts instead of zeroed progress.
- Registry mirroring removed from CLI flags, API acceptance, README, and
docs since every enumerator was a stub returning "not yet implemented".
- Added tests for the conditional metadata path (ETag/If-None-Match,
Last-Modified/If-Modified-Since, 304 responses, header omission).
- Fix race where runJob could overwrite canceled state set by Cancel()
- Fix Debian ecosystem name inconsistency ("deb" -> "debian")
- Stream metadata responses when caching is disabled to avoid buffering
- Add metadata_cache table to initial schema strings for consistency
- Gate mirror API behind mirror_api config flag (disabled by default)
- Fix goconst lint in metadata_cache_test.go
- Wire job contexts to server shutdown context so jobs are canceled on
server stop instead of running indefinitely
- Defer context cancel in runJob so completed jobs don't leak contexts
- Cap error accumulation in progressTracker to 1000 entries to prevent
OOM on large mirror operations with many failures
- Add panic recovery in errgroup workers to prevent process crashes
- Use defer for db.Close() in runMirror CLI to ensure cleanup on all
error paths
Add a `proxy mirror` CLI command and `/api/mirror` API endpoints that
pre-populate the cache from various input sources: individual PURLs,
SBOM files (CycloneDX and SPDX), or full registry enumeration.
The mirror reuses the existing handler.Proxy.GetOrFetchArtifact()
pipeline so cached artifacts are identical to those fetched on demand.
A bounded worker pool controls download parallelism.
Metadata caching is opt-in via `cache_metadata: true` in config (or
PROXY_CACHE_METADATA=true). The mirror command always enables it. When
enabled, upstream metadata responses are stored for offline fallback
with ETag-based conditional revalidation.
New internal/mirror package with Source interface, PURLSource,
SBOMSource, RegistrySource, and async JobStore. New metadata_cache
database table for offline metadata serving.
Request application/vnd.npm.install-v1+json from the npm registry
when cooldown filtering is not enabled. This format strips READMEs
and other bulk data, reducing drizzle-orm metadata from 92MB to 4MB.
Fall back to full metadata when cooldown is enabled since the
abbreviated format lacks the time map needed for publish-date filtering.
ReadMetadata used io.LimitReader which silently truncated responses at
the size limit. For packages like drizzle-orm (~92MB metadata), this
produced invalid JSON that was served to clients.
Now returns ErrMetadataTooLarge when the limit is exceeded, and bumps
the limit from 50MB to 100MB.
Fixes#78
GitHub zipballs wrap all files in a repo-hash/ directory. Instead of
hardcoding prefixes per ecosystem, open the archive once to check if all
files share a single root directory and strip it automatically. The npm
package/ prefix is still handled as a special case.
GitHub zipball URLs end in a bare commit hash with no file extension.
rewriteDistURL now appends .zip when the filename has no extension and
the dist type is zip. expandMinifiedVersions deep copies inherited
values so in-place URL rewriting no longer corrupts shared references.
browse.go infers .zip for extensionless filenames so existing cached
artifacts can still be opened.
Decode the Hex registry protobuf format, filter releases by fetching
timestamps from the Hex HTTP API (hex.pm/api/packages/{name}), and
re-encode without the original signature.
The protobuf handling uses protowire for low-level encoding/decoding
of the Signed wrapper, Package, and Release messages. Timestamps come
from the inserted_at field in the JSON API response.
Since the proxy re-encodes the payload without the original signature,
users need to disable registry signature verification.
Filter versions from the compact index (/info/{name}) by fetching
timestamps from the versions API (/api/v1/versions/{name}.json).
Both requests run concurrently to minimize latency. If the versions
API is unavailable, the compact index is proxied unfiltered.
Handles platform-specific versions (e.g. 1.0.0-java) by matching
the compact index format.
* Add cooldown support for Conda
Filter entries from Conda repodata.json based on the timestamp field
(milliseconds since epoch). Filters both packages and packages.conda
sections. When cooldown is disabled, repodata requests are proxied
directly without parsing.
* Update README table to mark Conda cooldown support
* Add cooldown support for NuGet
Filter versions from NuGet registration pages based on the
catalogEntry.published timestamp. Handles both RFC3339 and NuGet's
fractional-second timestamp formats. When cooldown is disabled,
registration requests are proxied directly without parsing.
* Update README table to mark NuGet cooldown support
`fileblob` creates temp files in `os.TempDir()` (`/tmp`) by default,
then uses `os.Rename` to move them to the final path. When the storage
directory is on a different filesystem (e.g. a Docker volume mount at
`/data`), the rename fails with "invalid cross-device link".
Set `no_tmp_dir=true` on file:// bucket URLs so fileblob creates temp
files next to the final destination instead.
Fixes#65
* Fix Composer minified metadata expansion and namespaced package routing
Packagist serves metadata in a minified format where only the first version
entry has all fields and subsequent entries inherit from the previous one.
The proxy was passing this through without expanding it, which meant cooldown
filtering could break the inheritance chain (losing fields like `name`) and
`~dev` sentinel markers were silently dropped.
The proxy now expands the minified format before filtering and rewriting,
ensuring every version entry is self-contained.
Web UI and API routes used single-segment chi URL params for package names,
which broke for Composer's `vendor/name` format. `/package/composer/monolog/monolog`
would match the version show route instead of the package show route.
All `/package/` and related API routes now use wildcard paths with a
`resolvePackageName` helper that tries increasingly longer path prefixes as
package names via DB lookup, correctly handling namespaced packages across
all endpoints (show, version, browse, compare, vulns).
Fixes#61, fixes#62
* Add namespaced package routing tests for all affected ecosystems
Verifies the wildcard routing handles slashes in package names for
npm (@babel/core), Go modules (github.com/stretchr/testify),
OCI images (library/nginx), Conda (conda-forge/numpy), and
Conan (zlib/1.2.13@demo/stable).
* Regenerate swagger docs after route refactor
The swagger annotations for the old per-endpoint handlers were removed
during the wildcard routing refactor. Regenerate to match current state.
* Track applied migrations to skip column checks on startup
Add a migrations table that records which migrations have been applied.
On boot, load the set of applied names in one query and only run new ones.
A fully migrated database now does 1 query instead of ~12 HasColumn/HasTable
checks.
Fresh databases created via CreateSchema record all migrations as already
applied. Old databases get the migrations table on first MigrateSchema call
and each migration is recorded after it runs.
Closes#54
* Add benchmark for MigrateSchema on fully migrated database
* Optimize MigrateSchema to single query for fully migrated databases
Skip HasTable/HasColumn checks when the migrations table already exists.
A fully migrated database now does one SELECT instead of ~12 individual
column and table checks.
* Add migration docs and link from architecture
* Add test for upgrade from fully migrated database without migrations table
* Fix startup message and add connectivity check for S3 storage
When S3 storage is configured, the startup log incorrectly showed the
default local path (./cache/artifacts) instead of the actual S3 URL.
This also adds a lightweight connectivity check at startup so bad
credentials or endpoints fail immediately rather than on first request.
Add URL() and Close() to the Storage interface so all backends report
their URL and can be cleaned up properly. Rename the stats JSON field
from storage_path to storage_url. Close storage in error paths and
during graceful shutdown.
Fixes#49
* Fix Windows test assertion for file:// URL format
OpenBucket normalizes Windows paths to file:///C:/path (three slashes)
but the test expected file://C:/path (two slashes).
* Add Cargo cooldown support
- Added support for cooldowns for cargo
- Added a test to test cooldowns with cargo
* Update README.md
add cargo to registry's with support for cooldowns
* Apply suggestion from @andrew
Co-authored-by: Andrew Nesbitt <andrewnez@gmail.com>
Adds regression test for the PyPI double-packages bug fixed in #50,
and adds fetchedURL assertions to every ecosystem that constructs
upstream download URLs (Conda, CRAN, Maven, NuGet, Conan, Debian, RPM).
* Fix container blob caching by passing auth token to fetcher
The container handler was calling GetOrFetchArtifactFromURL without
authentication headers, causing Docker Hub to return 401. The fallback
proxyBlobWithAuth path had auth but bypassed the cache entirely.
Now passes the Bearer token through GetOrFetchArtifactFromURLWithHeaders
so blobs are both authenticated and cached.
Fixesgit-pkgs/proxy#43
* Update registries to v0.4.0
Replace pre-release pseudo-version with the released v0.4.0 now that
git-pkgs/registries#13 has been merged.
* Fix all golangci-lint issues across the codebase
Resolve 77 lint issues reported by golangci-lint with gocritic, gocognit,
gocyclo, maintidx, dupl, mnd, unparam, ireturn, goconst, and errcheck
enabled. Net reduction of ~175 lines through shared helpers and
deduplication.
* Suppress staticcheck SA1019 for intentional deprecated field usage
The Storage.Path field is deprecated but still read for backwards
compatibility with existing configs that haven't migrated to the URL field.
Covers HTTP download paths for gem, hex, go, conda, cran, and maven
handlers with cache hit, invalid input, and upstream proxy scenarios.
Adds server tests for formatTimeAgo, formatSize, categorizeLicense,
LoggerMiddleware, search/pagination, and API packages list endpoint.
All handler metadata and proxy requests were using http.DefaultClient directly,
bypassing any timeout or transport configuration. Added an HTTPClient field to
the Proxy struct with a 30-second default timeout, and updated every handler
to use it for upstream HTTP requests.
POST endpoints (/api/outdated, /api/bulk) now reject bodies over 1 MB
using http.MaxBytesReader. Upstream metadata reads (npm, pypi, composer,
nuget, pub) now use io.LimitReader capped at 50 MB to prevent OOM from
unexpectedly large responses.
The debian and rpm handlers take the request path and pass it directly
to the upstream URL without checking for ".." segments. This could let
a client craft a request that reaches unintended upstream paths.
Add a containsPathTraversal check at the entry point of both handlers
and return 400 for any path containing ".." segments.
Replace err.Error() in HTTP error responses with generic messages.
Internal details like database driver errors and enrichment failures
were being sent directly to clients.
File paths from archive contents were interpolated directly into onclick
handlers and innerHTML via template literals. A crafted filename containing
quotes could break out of the string context and execute arbitrary JS.
Add an escapeHTML helper and use it on all interpolated path and URL values
in the browse source page.
Hides package versions published too recently from metadata responses,
giving the community time to spot malicious releases. Configurable
per-ecosystem and per-package with duration overrides. Supported for
npm, PyPI, pub.dev, and Composer.
Uses purl.MakePURLString() instead of fmt.Sprintf("pkg:...") for
correct namespace handling (npm scopes, Go module paths, Maven group
IDs) and percent-encoding. Replaces hand-rolled extractEcosystem and
inline PURL parsing in the bulk lookup fallback with purl.Parse().
The diff package has been extracted into the archives module where it
belongs, since it operates on archives.Reader. This removes the internal
copy and imports from github.com/git-pkgs/archives/diff instead.
Use the new client/ and fetch/ sub-packages from git-pkgs/registries
instead of the local upstream package. The fetcher, circuit breaker, and
resolver now live in registries where they can be shared across projects.
Depends on git-pkgs/registries#8.