The UI now lives under /ui so reverse proxies can apply different
access rules to it (e.g. require auth) while leaving the package
endpoints (/npm, /pypi, /v2, ...) open to build machines.
- GET / redirects to /ui/
- /api/browse and /api/compare move to /ui/api/browse and
/ui/api/compare since only the browser JS calls them
- /health, /stats, /metrics, /openapi.json and /api/* stay at root
* config: add Health.StorageProbeInterval
* metrics: add proxy_health_probe_failures_total counter
* server: add storageProbe with happy-path test
* server: add storageProbe failure-mode tests
* server: add healthCache with TTL, single-flight, transition logging
* server: wire storage probe into /health
* server: update TestHealthEndpoint for JSON; wire healthCache into newTestServer
Also fix Windows file-locking issue in storageProbe: close the reader
explicitly before Delete so the file handle is released prior to os.Remove.
* server: clean up stale comment in storageProbe
* docs: document storage health probe and new metric
* docs: regenerate Swagger for /health JSON response
* server: simplify rc.Close error handling in storageProbe
* server: defer probe cleanup so size/open/read/verify failures don't leak objects
Previously, storageProbe only called Delete on the success path. Any
failure between Store and the final Delete (size mismatch, Open error,
mid-stream read failure, content mismatch) left the probe object orphaned
in the storage backend. With caching disabled and Kubernetes-rate probing,
the leak could accumulate noticeably on backends like S3.
Use a named return + defer to attempt Delete after every successful Store.
The earlier-step failure remains the primary error; Delete failure only
surfaces as step="delete" when nothing else went wrong. Add a table-driven
test that asserts cleanup runs for each non-delete failure path.
Reported by Copilot on #119.
* config: validate health.storage_probe_interval in Config.Validate
The new duration field was only validated at use time in newHealthCache.
The existing codebase already validates other duration fields
(MetadataTTL, DirectServeTTL, Gradle.MaxAge, Gradle.SweepInterval) in
Config.Validate() so misconfiguration fails fast at startup with a
config-key-specific error.
Match that pattern. The parse-at-use code in newHealthCache stays as
a safety net, mirroring the MetadataTTL precedent.
Reported by Copilot on #119.
* docs: lowercase "counter" in metrics table for consistency
Other rows in the table use lowercase type names (counter/gauge/histogram).
Match that style.
Reported by Copilot on #119.
* docs: include size-check step in /health probe description
The probe is write → size-check → read → verify → delete; the
architecture note was missing the size-check step.
Reported by Copilot on #119.
* server: address andrew's review on #119
- Drop unused callerCtx parameter from healthCache.Check (Check is now
parameter-less; the comment-only "accepted for symmetry" justification
wasn't carrying its weight).
- Emit "storage": {"status": "skipped"} on DB short-circuit instead of
omitting the key, so monitors expecting a fixed key set keep working.
- Reject negative storage_probe_interval at config validation time
(previously parsed and silently behaved like "0").
- Extract HealthConfig.Validate to keep Config.Validate under the
gocognit threshold and match the existing GradleBuildCacheConfig pattern.
- README Health Check section: note that /health is intended as a
readiness probe rather than a liveness probe (Check holds a mutex
for up to the 10s probe timeout).
- cmd/proxy/main.go godoc: column-align the new env var with the
surrounding Gradle entries.
Reported by andrew on #119.
- Implement /julia/* handler for the Pkg server protocol
(registries, registry, package, artifact, meta)
- Resolve package UUIDs to names by parsing Registry.toml from
the General registry tarball, with a hash-guarded background
refresh on registry updates
- Wire into router, ecosystem list, install page, badge styles
- Update README and architecture docs
- Bump github.com/git-pkgs/registries to v0.6.0: the fetcher now
honours HTTP_PROXY, gates dialled IPs against the safehttp block
list, and Version.Integrity is populated for pub, julia and nuget
- Replace internal/cooldown with github.com/git-pkgs/cooldown v0.1.1
(identical surface, lifted from this repo)
- Update docs/architecture.md to point at the external package
Bake the extended linter set into a project config so plain
golangci-lint run matches what we check locally, with goconst tuned
to ignore tests and bare lowercase words to drop ~200 ecosystem-name
and test-literal false positives.
Clear the remaining real findings: extract GradleBuildCacheConfig.Validate
from Config.Validate, pull the eviction sort comparator into
sortOldestFirst (zero time.Time already sorts first via Before so the
switch was redundant), add headerAcceptEncoding and SQL column-type
constants, and drop a dead empty-key recheck in the gradle handler.
* add Gradle Build Cache support with handler and tests
* linting issue
* MR Suggestions: Add Gradle HTTP Build Cache configuration to README
* implement minor stuff: Refactor Gradle handler to remove unnecessary URL parameter and update related tests
Co-authored-by: Copilot <copilot@github.com>
* Add Gradle build cache configuration and eviction support
- Introduced configuration options for Gradle build cache in config files and documentation.
- Implemented read-only mode and upload size limits for the Gradle build cache.
- Added cache eviction logic based on age and size, with corresponding tests.
- Enhanced storage interfaces to support listing objects by prefix.
* implement minor stuff: Refactor Gradle handler to remove unnecessary URL parameter and update related tests
* last finding fix
* fix tests and implement PR suggestions
Co-authored-by: Copilot <copilot@github.com>
* unify path
---------
Co-authored-by: Mateusz (Mati) Kepa <m.kepa@sportradar.com>
Co-authored-by: Copilot <copilot@github.com>
* Structured JSON error responses for API endpoints
API handlers returned errors via http.Error (text/plain) with ad-hoc
strings, while the mirror API used a different {"error": "..."} shape
and leaked internal err.Error() text to clients.
Add ErrorResponse{Code, Message} with stable codes (BAD_REQUEST,
NOT_FOUND, UPSTREAM_ERROR, INTERNAL_ERROR) and writeError/badRequest/
notFound/internalError helpers. Convert all JSON API handlers in
api.go, browse.go, mirror_api.go and the /stats endpoint. Enrichment
failures now report 502 UPSTREAM_ERROR rather than 500.
Protocol handlers in internal/handler/ are deliberately unchanged
since npm/pip/cargo clients expect their own response formats, not
JSON. HTML page handlers in server.go also keep text/plain.
Swagger @Failure annotations updated and docs regenerated.
Fixes#76
* Convert validatePackagePath errors to JSON in API handlers
The wildcard package routes (/packages/{ecosystem}/*, /api/package/*,
/api/vulns/*, /api/browse/*, /api/compare/*) only checked for an empty
path before passing user input to GetPackageByEcosystemName and the
enrichment service.
Add validatePackagePath as a coarse first-line filter: reject null
bytes, other control characters, and paths over 512 bytes. Wired into
all five entry handlers immediately after the chi wildcard is read.
This is the generic layer; ecosystem-specific name format rules (npm
scoped name shape, Maven coordinate structure, etc.) can be added on
top per #75.
Fixes#75
The browse and compare handlers buffer the full artifact into memory for
prefix detection. Without a cap, a single request for a large cached
artifact could exhaust server memory.
These file types were served with executable content types (text/html,
image/svg+xml) allowing stored XSS via package archive contents.
Also adds Content-Security-Policy: sandbox and X-Content-Type-Options:
nosniff headers to all browse file responses.
Closes#99. The max_size storage config was parsed and validated but
never enforced. This adds a background eviction loop that periodically
checks total cache size and evicts least recently used artifacts when
the limit is exceeded.
When the proxy reaches storage at an internal address (127.0.0.1, a
Docker service name) the presigned URLs it generates point there too,
which is useless to external clients. This adds an optional base URL
that replaces the scheme and host of signed URLs before they're returned,
keeping the signed path and query intact.
When storage.direct_serve is enabled and the backend supports it (S3,
Azure), cached artifact downloads return a 302 redirect to a presigned
URL instead of streaming bytes through the proxy. Falls back to
streaming when the backend can't sign (fileblob, local filesystem) or
signing fails.
Adds the azureblob driver so azblob:// storage URLs work.
Cache-hit accounting already happened before io.Copy so redirects are
counted correctly; the metrics calls are pulled into a helper so both
paths share them.
Closes#96
Cached metadata is now served directly within a configurable TTL window
(default 5m) without contacting upstream, reducing latency and upstream
load. When upstream is unreachable and the cache is past its TTL, stale
content is served with a Warning: 110 header per RFC 7234.
New config: `metadata_ttl` (YAML) / `PROXY_METADATA_TTL` (env).
Set to "0" to always revalidate with upstream.
- Fix race where runJob could overwrite canceled state set by Cancel()
- Fix Debian ecosystem name inconsistency ("deb" -> "debian")
- Stream metadata responses when caching is disabled to avoid buffering
- Add metadata_cache table to initial schema strings for consistency
- Gate mirror API behind mirror_api config flag (disabled by default)
- Fix goconst lint in metadata_cache_test.go
- Wire job contexts to server shutdown context so jobs are canceled on
server stop instead of running indefinitely
- Defer context cancel in runJob so completed jobs don't leak contexts
- Cap error accumulation in progressTracker to 1000 entries to prevent
OOM on large mirror operations with many failures
- Add panic recovery in errgroup workers to prevent process crashes
- Use defer for db.Close() in runMirror CLI to ensure cleanup on all
error paths
Add a `proxy mirror` CLI command and `/api/mirror` API endpoints that
pre-populate the cache from various input sources: individual PURLs,
SBOM files (CycloneDX and SPDX), or full registry enumeration.
The mirror reuses the existing handler.Proxy.GetOrFetchArtifact()
pipeline so cached artifacts are identical to those fetched on demand.
A bounded worker pool controls download parallelism.
Metadata caching is opt-in via `cache_metadata: true` in config (or
PROXY_CACHE_METADATA=true). The mirror command always enables it. When
enabled, upstream metadata responses are stored for offline fallback
with ETag-based conditional revalidation.
New internal/mirror package with Source interface, PURLSource,
SBOMSource, RegistrySource, and async JobStore. New metadata_cache
database table for offline metadata serving.
GitHub zipballs wrap all files in a repo-hash/ directory. Instead of
hardcoding prefixes per ecosystem, open the archive once to check if all
files share a single root directory and strip it automatically. The npm
package/ prefix is still handled as a special case.
GitHub zipball URLs end in a bare commit hash with no file extension.
rewriteDistURL now appends .zip when the filename has no extension and
the dist type is zip. expandMinifiedVersions deep copies inherited
values so in-place URL rewriting no longer corrupts shared references.
browse.go infers .zip for extensionless filenames so existing cached
artifacts can still be opened.
`fileblob` creates temp files in `os.TempDir()` (`/tmp`) by default,
then uses `os.Rename` to move them to the final path. When the storage
directory is on a different filesystem (e.g. a Docker volume mount at
`/data`), the rename fails with "invalid cross-device link".
Set `no_tmp_dir=true` on file:// bucket URLs so fileblob creates temp
files next to the final destination instead.
Fixes#65
* Fix Composer minified metadata expansion and namespaced package routing
Packagist serves metadata in a minified format where only the first version
entry has all fields and subsequent entries inherit from the previous one.
The proxy was passing this through without expanding it, which meant cooldown
filtering could break the inheritance chain (losing fields like `name`) and
`~dev` sentinel markers were silently dropped.
The proxy now expands the minified format before filtering and rewriting,
ensuring every version entry is self-contained.
Web UI and API routes used single-segment chi URL params for package names,
which broke for Composer's `vendor/name` format. `/package/composer/monolog/monolog`
would match the version show route instead of the package show route.
All `/package/` and related API routes now use wildcard paths with a
`resolvePackageName` helper that tries increasingly longer path prefixes as
package names via DB lookup, correctly handling namespaced packages across
all endpoints (show, version, browse, compare, vulns).
Fixes#61, fixes#62
* Add namespaced package routing tests for all affected ecosystems
Verifies the wildcard routing handles slashes in package names for
npm (@babel/core), Go modules (github.com/stretchr/testify),
OCI images (library/nginx), Conda (conda-forge/numpy), and
Conan (zlib/1.2.13@demo/stable).
* Regenerate swagger docs after route refactor
The swagger annotations for the old per-endpoint handlers were removed
during the wildcard routing refactor. Regenerate to match current state.
* Fix startup message and add connectivity check for S3 storage
When S3 storage is configured, the startup log incorrectly showed the
default local path (./cache/artifacts) instead of the actual S3 URL.
This also adds a lightweight connectivity check at startup so bad
credentials or endpoints fail immediately rather than on first request.
Add URL() and Close() to the Storage interface so all backends report
their URL and can be cleaned up properly. Rename the stats JSON field
from storage_path to storage_url. Close storage in error paths and
during graceful shutdown.
Fixes#49
* Fix Windows test assertion for file:// URL format
OpenBucket normalizes Windows paths to file:///C:/path (three slashes)
but the test expected file://C:/path (two slashes).
* Fix all golangci-lint issues across the codebase
Resolve 77 lint issues reported by golangci-lint with gocritic, gocognit,
gocyclo, maintidx, dupl, mnd, unparam, ireturn, goconst, and errcheck
enabled. Net reduction of ~175 lines through shared helpers and
deduplication.
* Suppress staticcheck SA1019 for intentional deprecated field usage
The Storage.Path field is deprecated but still read for backwards
compatibility with existing configs that haven't migrated to the URL field.
Covers HTTP download paths for gem, hex, go, conda, cran, and maven
handlers with cache hit, invalid input, and upstream proxy scenarios.
Adds server tests for formatTimeAgo, formatSize, categorizeLicense,
LoggerMiddleware, search/pagination, and API packages list endpoint.
POST endpoints (/api/outdated, /api/bulk) now reject bodies over 1 MB
using http.MaxBytesReader. Upstream metadata reads (npm, pypi, composer,
nuget, pub) now use io.LimitReader capped at 50 MB to prevent OOM from
unexpectedly large responses.
Replace err.Error() in HTTP error responses with generic messages.
Internal details like database driver errors and enrichment failures
were being sent directly to clients.
File paths from archive contents were interpolated directly into onclick
handlers and innerHTML via template literals. A crafted filename containing
quotes could break out of the string context and execute arbitrary JS.
Add an escapeHTML helper and use it on all interpolated path and URL values
in the browse source page.
Hides package versions published too recently from metadata responses,
giving the community time to spot malicious releases. Configurable
per-ecosystem and per-package with duration overrides. Supported for
npm, PyPI, pub.dev, and Composer.
Uses purl.MakePURLString() instead of fmt.Sprintf("pkg:...") for
correct namespace handling (npm scopes, Go module paths, Maven group
IDs) and percent-encoding. Replaces hand-rolled extractEcosystem and
inline PURL parsing in the bulk lookup fallback with purl.Parse().
The diff package has been extracted into the archives module where it
belongs, since it operates on archives.Reader. This removes the internal
copy and imports from github.com/git-pkgs/archives/diff instead.
Use the new client/ and fetch/ sub-packages from git-pkgs/registries
instead of the local upstream package. The fetcher, circuit breaker, and
resolver now live in registries where they can be shared across projects.
Depends on git-pkgs/registries#8.
Adds proxy support for Docker/OCI container registries, Debian/APT
repositories, and RPM/Yum repositories. Includes a new enrichment API
for package metadata, vulnerability scanning, and outdated detection.
Updates the dashboard with Tailwind CSS, dark mode support, and a
security overview section showing vulnerability counts.