Closes#99. The max_size storage config was parsed and validated but
never enforced. This adds a background eviction loop that periodically
checks total cache size and evicts least recently used artifacts when
the limit is exceeded.
When the proxy reaches storage at an internal address (127.0.0.1, a
Docker service name) the presigned URLs it generates point there too,
which is useless to external clients. This adds an optional base URL
that replaces the scheme and host of signed URLs before they're returned,
keeping the signed path and query intact.
When storage.direct_serve is enabled and the backend supports it (S3,
Azure), cached artifact downloads return a 302 redirect to a presigned
URL instead of streaming bytes through the proxy. Falls back to
streaming when the backend can't sign (fileblob, local filesystem) or
signing fails.
Adds the azureblob driver so azblob:// storage URLs work.
Cache-hit accounting already happened before io.Copy so redirects are
counted correctly; the metrics calls are pulled into a helper so both
paths share them.
Closes#96
Cached metadata is now served directly within a configurable TTL window
(default 5m) without contacting upstream, reducing latency and upstream
load. When upstream is unreachable and the cache is past its TTL, stale
content is served with a Warning: 110 header per RFC 7234.
New config: `metadata_ttl` (YAML) / `PROXY_METADATA_TTL` (env).
Set to "0" to always revalidate with upstream.
- ProxyCached now stores upstream Last-Modified in the cache and uses it
(along with ETag) for conditional request handling, returning 304 when
client validators match. Adds Content-Length to cached responses.
- Handlers calling FetchOrCacheMetadata (pypi, composer, pub, nuget) now
check for ErrUpstreamNotFound and return 404 instead of 502, matching
the existing npm and cargo behavior.
- Mirror jobs report live progress via a periodic callback while running,
so API polls return real counts instead of zeroed progress.
- Registry mirroring removed from CLI flags, API acceptance, README, and
docs since every enumerator was a stub returning "not yet implemented".
- Added tests for the conditional metadata path (ETag/If-None-Match,
Last-Modified/If-Modified-Since, 304 responses, header omission).
- Fix race where runJob could overwrite canceled state set by Cancel()
- Fix Debian ecosystem name inconsistency ("deb" -> "debian")
- Stream metadata responses when caching is disabled to avoid buffering
- Add metadata_cache table to initial schema strings for consistency
- Gate mirror API behind mirror_api config flag (disabled by default)
- Fix goconst lint in metadata_cache_test.go
- Wire job contexts to server shutdown context so jobs are canceled on
server stop instead of running indefinitely
- Defer context cancel in runJob so completed jobs don't leak contexts
- Cap error accumulation in progressTracker to 1000 entries to prevent
OOM on large mirror operations with many failures
- Add panic recovery in errgroup workers to prevent process crashes
- Use defer for db.Close() in runMirror CLI to ensure cleanup on all
error paths
Add a `proxy mirror` CLI command and `/api/mirror` API endpoints that
pre-populate the cache from various input sources: individual PURLs,
SBOM files (CycloneDX and SPDX), or full registry enumeration.
The mirror reuses the existing handler.Proxy.GetOrFetchArtifact()
pipeline so cached artifacts are identical to those fetched on demand.
A bounded worker pool controls download parallelism.
Metadata caching is opt-in via `cache_metadata: true` in config (or
PROXY_CACHE_METADATA=true). The mirror command always enables it. When
enabled, upstream metadata responses are stored for offline fallback
with ETag-based conditional revalidation.
New internal/mirror package with Source interface, PURLSource,
SBOMSource, RegistrySource, and async JobStore. New metadata_cache
database table for offline metadata serving.
Request application/vnd.npm.install-v1+json from the npm registry
when cooldown filtering is not enabled. This format strips READMEs
and other bulk data, reducing drizzle-orm metadata from 92MB to 4MB.
Fall back to full metadata when cooldown is enabled since the
abbreviated format lacks the time map needed for publish-date filtering.
ReadMetadata used io.LimitReader which silently truncated responses at
the size limit. For packages like drizzle-orm (~92MB metadata), this
produced invalid JSON that was served to clients.
Now returns ErrMetadataTooLarge when the limit is exceeded, and bumps
the limit from 50MB to 100MB.
Fixes#78