1
0
Fork 1
mirror of https://github.com/git-pkgs/proxy.git synced 2026-06-02 16:48:16 -04:00
Commit graph

76 commits

Author SHA1 Message Date
Andrew Nesbitt
8fab14da59
WIP: vendor diff2html via pin
Adds pin.yaml/pin.lock and the vendored diff2html bundle. Templates not
yet updated to reference /static/vendor/, and tailwind still served from
the manually-copied static/tailwind.js (blocked on git-pkgs/pin#2).

Anchors .gitignore vendor/ rule to repo root so it stops matching
internal/server/static/vendor/.
2026-05-13 21:51:49 +01:00
Andrew Nesbitt
f2a5b704f0
Add Julia Pkg server support (#117)
- Implement /julia/* handler for the Pkg server protocol
  (registries, registry, package, artifact, meta)
- Resolve package UUIDs to names by parsing Registry.toml from
  the General registry tarball, with a hash-guarded background
  refresh on registry updates
- Wire into router, ecosystem list, install page, badge styles
- Update README and architecture docs
2026-05-13 06:46:35 +01:00
Andrew Nesbitt
5315883c3b
Bump registries to v0.6.0 and replace internal/cooldown (#120)
- Bump github.com/git-pkgs/registries to v0.6.0: the fetcher now
  honours HTTP_PROXY, gates dialled IPs against the safehttp block
  list, and Version.Integrity is populated for pub, julia and nuget
- Replace internal/cooldown with github.com/git-pkgs/cooldown v0.1.1
  (identical surface, lifted from this repo)
- Update docs/architecture.md to point at the external package
2026-05-13 06:45:33 +01:00
Andrew Nesbitt
992f5c68a7
Add .golangci.yml and clear gocognit/goconst findings (#113)
Bake the extended linter set into a project config so plain
golangci-lint run matches what we check locally, with goconst tuned
to ignore tests and bare lowercase words to drop ~200 ecosystem-name
and test-literal false positives.

Clear the remaining real findings: extract GradleBuildCacheConfig.Validate
from Config.Validate, pull the eviction sort comparator into
sortOldestFirst (zero time.Time already sorts first via Before so the
switch was redundant), add headerAcceptEncoding and SQL column-type
constants, and drop a dead empty-key recheck in the gradle handler.
2026-05-05 10:25:17 +01:00
Mati Kepa
31a9ca75b2
add Gradle Build Cache support with handler and tests (#87)
* add Gradle Build Cache support with handler and tests

* linting issue

* MR Suggestions: Add Gradle HTTP Build Cache configuration to README

* implement  minor stuff: Refactor Gradle handler to remove unnecessary URL parameter and update related tests

Co-authored-by: Copilot <copilot@github.com>

* Add Gradle build cache configuration and eviction support

- Introduced configuration options for Gradle build cache in config files and documentation.
- Implemented read-only mode and upload size limits for the Gradle build cache.
- Added cache eviction logic based on age and size, with corresponding tests.
- Enhanced storage interfaces to support listing objects by prefix.

* implement minor stuff: Refactor Gradle handler to remove unnecessary URL parameter and update related tests

* last finding fix

* fix tests and implement PR suggestions

Co-authored-by: Copilot <copilot@github.com>

* unify path

---------

Co-authored-by: Mateusz (Mati) Kepa <m.kepa@sportradar.com>
Co-authored-by: Copilot <copilot@github.com>
2026-05-04 11:15:16 +01:00
Andrew Nesbitt
61741123bf
Verify cached artifacts on read (#111)
checkCache opened the storage reader and streamed it to the client
without checking that the bytes still matched what was originally
stored, or what the upstream registry declared. Disk corruption,
accidental overwrites, or local tampering would go unnoticed.

Wrap the storage reader in a verifyingReader that computes SHA256
(against artifact.content_hash) and, when version.integrity holds an
SRI string, the corresponding sha256/384/512 digest as bytes flow
through. At EOF the digests are compared; on mismatch we log at
error level, bump proxy_integrity_failures_total, and clear the
artifact's cache entry so the next request refetches from upstream.

Verification is skipped when the stream was not fully consumed
(client disconnect) to avoid evicting good artifacts on partial
reads. The DirectServe presigned-URL path is unverified since the
proxy never sees those bytes.

Refs #42 (part 1)
2026-05-03 10:36:28 +01:00
Andrew Nesbitt
8d2740624f
Structured JSON error responses for API endpoints (#110)
* Structured JSON error responses for API endpoints

API handlers returned errors via http.Error (text/plain) with ad-hoc
strings, while the mirror API used a different {"error": "..."} shape
and leaked internal err.Error() text to clients.

Add ErrorResponse{Code, Message} with stable codes (BAD_REQUEST,
NOT_FOUND, UPSTREAM_ERROR, INTERNAL_ERROR) and writeError/badRequest/
notFound/internalError helpers. Convert all JSON API handlers in
api.go, browse.go, mirror_api.go and the /stats endpoint. Enrichment
failures now report 502 UPSTREAM_ERROR rather than 500.

Protocol handlers in internal/handler/ are deliberately unchanged
since npm/pip/cargo clients expect their own response formats, not
JSON. HTML page handlers in server.go also keep text/plain.

Swagger @Failure annotations updated and docs regenerated.

Fixes #76

* Convert validatePackagePath errors to JSON in API handlers
2026-05-03 09:42:03 +01:00
Andrew Nesbitt
e912227e3b
Use archives.OpenBytes in browse handler to cut buffer copies (#107)
* Use archives.OpenBytes in openArchive to avoid redundant buffer copies

* Bump git-pkgs/archives to v0.3.0
2026-05-03 09:29:42 +01:00
Andrew Nesbitt
522c6f233e
Validate package paths before database lookups (#109)
The wildcard package routes (/packages/{ecosystem}/*, /api/package/*,
/api/vulns/*, /api/browse/*, /api/compare/*) only checked for an empty
path before passing user input to GetPackageByEcosystemName and the
enrichment service.

Add validatePackagePath as a coarse first-line filter: reject null
bytes, other control characters, and paths over 512 bytes. Wired into
all five entry handlers immediately after the chi wildcard is read.

This is the generic layer; ecosystem-specific name format rules (npm
scoped name shape, Maven coordinate structure, etc.) can be added on
top per #75.

Fixes #75
2026-05-03 09:14:18 +01:00
Andrew Nesbitt
a4fd333d48
Check for path traversal after URL decoding (#108)
containsPathTraversal only checked literal ".." segments separated by
forward slashes. Encoded forms like %2e%2e%2f or backslash separators
would slip past if a caller ever passed a raw or Windows-style path.

The check now URL-decodes the input and treats backslashes as
separators before splitting. Go's stdlib already decodes r.URL.Path so
the encoded case is mostly belt-and-braces for cache keys and other
non-router inputs, but the storage layer guard from #106 makes this
worth locking in with tests.

Fixes #74
2026-05-03 09:07:16 +01:00
Andrew Nesbitt
71e8d3b9eb
Reject path traversal in filesystem storage (#106) 2026-05-02 18:00:28 +01:00
Andrew Nesbitt
d6066af230
Add MaxBytesReader to mirror API HandleCreate (#105)
Matches the pattern already used in api.go to prevent unbounded request
body reads.
2026-05-02 18:00:25 +01:00
Andrew Nesbitt
cfcf480f69
Limit io.ReadAll in openArchive to 512 MB (#104)
The browse and compare handlers buffer the full artifact into memory for
prefix detection. Without a cap, a single request for a large cached
artifact could exhaust server memory.
2026-05-02 18:00:22 +01:00
Andrew Nesbitt
f1ea8b50a1
Serve .html, .svg and .xhtml as text/plain in browse file handler (#103)
These file types were served with executable content types (text/html,
image/svg+xml) allowing stored XSS via package archive contents.
Also adds Content-Security-Policy: sandbox and X-Content-Type-Options:
nosniff headers to all browse file responses.
2026-05-02 18:00:04 +01:00
Andrew Nesbitt
e2495ef0aa
Merge pull request #102 from git-pkgs/enforce-max-size-eviction
Enforce max_size config with LRU cache eviction
2026-04-30 23:26:16 +01:00
Andrew Nesbitt
461a95c518
Enforce max_size config with LRU cache eviction
Closes #99. The max_size storage config was parsed and validated but
never enforced. This adds a background eviction loop that periodically
checks total cache size and evicts least recently used artifacts when
the limit is exceeded.
2026-04-30 18:09:01 +01:00
Andrew Nesbitt
1ad182782d
Add storage.direct_serve_base_url to override presigned URL host
When the proxy reaches storage at an internal address (127.0.0.1, a
Docker service name) the presigned URLs it generates point there too,
which is useless to external clients. This adds an optional base URL
that replaces the scheme and host of signed URLs before they're returned,
keeping the signed path and query intact.
2026-04-27 12:14:37 +01:00
Andrew Nesbitt
c73b0a35a1
Add direct-serve via presigned storage URLs
When storage.direct_serve is enabled and the backend supports it (S3,
Azure), cached artifact downloads return a 302 redirect to a presigned
URL instead of streaming bytes through the proxy. Falls back to
streaming when the backend can't sign (fileblob, local filesystem) or
signing fails.

Adds the azureblob driver so azblob:// storage URLs work.

Cache-hit accounting already happened before io.Copy so redirects are
counted correctly; the metrics calls are pulled into a helper so both
paths share them.

Closes #96
2026-04-27 12:04:38 +01:00
c655399a07 Apply 'go fmt' as suggested in CONTRIBUTING.md. 2026-04-18 07:43:22 -04:00
Andrew Nesbitt
7346008aa5
Add metadata TTL and stale-while-revalidate support
Cached metadata is now served directly within a configurable TTL window
(default 5m) without contacting upstream, reducing latency and upstream
load. When upstream is unreachable and the cache is past its TTL, stale
content is served with a Warning: 110 header per RFC 7234.

New config: `metadata_ttl` (YAML) / `PROXY_METADATA_TTL` (env).
Set to "0" to always revalidate with upstream.
2026-04-13 09:01:05 +01:00
Andrew Nesbitt
c01f0a5c05
Fix metadata caching, 404 propagation, mirror progress, and registry stubs
- ProxyCached now stores upstream Last-Modified in the cache and uses it
  (along with ETag) for conditional request handling, returning 304 when
  client validators match. Adds Content-Length to cached responses.

- Handlers calling FetchOrCacheMetadata (pypi, composer, pub, nuget) now
  check for ErrUpstreamNotFound and return 404 instead of 502, matching
  the existing npm and cargo behavior.

- Mirror jobs report live progress via a periodic callback while running,
  so API polls return real counts instead of zeroed progress.

- Registry mirroring removed from CLI flags, API acceptance, README, and
  docs since every enumerator was a stub returning "not yet implemented".

- Added tests for the conditional metadata path (ETag/If-None-Match,
  Last-Modified/If-Modified-Since, 304 responses, header omission).
2026-04-13 09:01:05 +01:00
Andrew Nesbitt
47681066b5
Fix review issues in mirror feature
- Fix race where runJob could overwrite canceled state set by Cancel()
- Fix Debian ecosystem name inconsistency ("deb" -> "debian")
- Stream metadata responses when caching is disabled to avoid buffering
- Add metadata_cache table to initial schema strings for consistency
- Gate mirror API behind mirror_api config flag (disabled by default)
- Fix goconst lint in metadata_cache_test.go
2026-04-13 09:01:04 +01:00
Andrew Nesbitt
02738651ab
Fix concurrency, resource, and reliability issues in mirror
- Wire job contexts to server shutdown context so jobs are canceled on
  server stop instead of running indefinitely
- Defer context cancel in runJob so completed jobs don't leak contexts
- Cap error accumulation in progressTracker to 1000 entries to prevent
  OOM on large mirror operations with many failures
- Add panic recovery in errgroup workers to prevent process crashes
- Use defer for db.Close() in runMirror CLI to ensure cleanup on all
  error paths
2026-04-13 09:01:04 +01:00
Andrew Nesbitt
d62c42b8d7
Add mirror command and API for selective package mirroring
Add a `proxy mirror` CLI command and `/api/mirror` API endpoints that
pre-populate the cache from various input sources: individual PURLs,
SBOM files (CycloneDX and SPDX), or full registry enumeration.

The mirror reuses the existing handler.Proxy.GetOrFetchArtifact()
pipeline so cached artifacts are identical to those fetched on demand.
A bounded worker pool controls download parallelism.

Metadata caching is opt-in via `cache_metadata: true` in config (or
PROXY_CACHE_METADATA=true). The mirror command always enables it. When
enabled, upstream metadata responses are stored for offline fallback
with ETag-based conditional revalidation.

New internal/mirror package with Source interface, PURLSource,
SBOMSource, RegistrySource, and async JobStore. New metadata_cache
database table for offline metadata serving.
2026-04-13 09:01:04 +01:00
Andrew Nesbitt
01b4e7210d
Use abbreviated npm metadata when cooldown is disabled
Request application/vnd.npm.install-v1+json from the npm registry
when cooldown filtering is not enabled. This format strips READMEs
and other bulk data, reducing drizzle-orm metadata from 92MB to 4MB.

Fall back to full metadata when cooldown is enabled since the
abbreviated format lacks the time map needed for publish-date filtering.
2026-04-08 16:12:43 +01:00
Andrew Nesbitt
8b762ffb39
Fix silent truncation of large npm metadata responses
ReadMetadata used io.LimitReader which silently truncated responses at
the size limit. For packages like drizzle-orm (~92MB metadata), this
produced invalid JSON that was served to clients.

Now returns ErrMetadataTooLarge when the limit is exceeded, and bumps
the limit from 50MB to 100MB.

Fixes #78
2026-04-08 16:02:30 +01:00
a947a7546a Sort the ecosystems list for presentation in the UI
In the page footer and the 'select' list on the packages page, the
list of ecosystems should be sorted in a predictable order.
2026-04-06 18:06:16 -04:00
Andrew Nesbitt
e36a92433e
Clean up review feedback: use path.Ext for extension checks, remove dead getStripPrefix, add openArchive tests 2026-04-06 19:06:48 +01:00
Andrew Nesbitt
941ed51f76
Auto-detect and strip single top-level directory prefix when browsing archives
GitHub zipballs wrap all files in a repo-hash/ directory. Instead of
hardcoding prefixes per ecosystem, open the archive once to check if all
files share a single root directory and strip it automatically. The npm
package/ prefix is still handled as a special case.
2026-04-06 17:14:15 +01:00
Andrew Nesbitt
b68184cbab
Fix composer dist URL rewriting and browse source for extensionless filenames
GitHub zipball URLs end in a bare commit hash with no file extension.
rewriteDistURL now appends .zip when the filename has no extension and
the dist type is zip. expandMinifiedVersions deep copies inherited
values so in-place URL rewriting no longer corrupts shared references.
browse.go infers .zip for extensionless filenames so existing cached
artifacts can still be opened.
2026-04-06 17:07:20 +01:00
Andrew Nesbitt
bcbb883d1b
Add failing tests for composer dist URL and shared reference bugs
GitHub zipball URLs produce filenames without .zip extension, breaking
browse source. Minified version expansion shares nested map references,
causing dist URL corruption when versions inherit unchanged dist fields.
2026-04-06 17:07:20 +01:00
Andrew Nesbitt
43a164ed72
Add cooldown support for Hex
Decode the Hex registry protobuf format, filter releases by fetching
timestamps from the Hex HTTP API (hex.pm/api/packages/{name}), and
re-encode without the original signature.

The protobuf handling uses protowire for low-level encoding/decoding
of the Signed wrapper, Package, and Release messages. Timestamps come
from the inserted_at field in the JSON API response.

Since the proxy re-encodes the payload without the original signature,
users need to disable registry signature verification.
2026-04-06 13:18:57 +01:00
Andrew Nesbitt
cb9bbbc385
Add cooldown support for RubyGems
Filter versions from the compact index (/info/{name}) by fetching
timestamps from the versions API (/api/v1/versions/{name}.json).
Both requests run concurrently to minimize latency. If the versions
API is unavailable, the compact index is proxied unfiltered.

Handles platform-specific versions (e.g. 1.0.0-java) by matching
the compact index format.
2026-04-06 13:16:26 +01:00
Andrew Nesbitt
75ff85f2f0
Add cooldown support for Conda (#68)
* Add cooldown support for Conda

Filter entries from Conda repodata.json based on the timestamp field
(milliseconds since epoch). Filters both packages and packages.conda
sections. When cooldown is disabled, repodata requests are proxied
directly without parsing.

* Update README table to mark Conda cooldown support
2026-04-06 13:16:00 +01:00
Andrew Nesbitt
70fe686953
Add cooldown support for NuGet (#67)
* Add cooldown support for NuGet

Filter versions from NuGet registration pages based on the
catalogEntry.published timestamp. Handles both RFC3339 and NuGet's
fractional-second timestamp formats. When cooldown is disabled,
registration requests are proxied directly without parsing.

* Update README table to mark NuGet cooldown support
2026-04-06 13:12:18 +01:00
Andrew Nesbitt
24d5e77443
Fix cross-device link error when running in Docker with volumes (#66)
`fileblob` creates temp files in `os.TempDir()` (`/tmp`) by default,
then uses `os.Rename` to move them to the final path. When the storage
directory is on a different filesystem (e.g. a Docker volume mount at
`/data`), the rename fails with "invalid cross-device link".

Set `no_tmp_dir=true` on file:// bucket URLs so fileblob creates temp
files next to the final destination instead.

Fixes #65
2026-04-06 13:07:31 +01:00
Andrew Nesbitt
15c133f1fa
Fix Composer minified metadata expansion and namespaced package routing (#63)
* Fix Composer minified metadata expansion and namespaced package routing

Packagist serves metadata in a minified format where only the first version
entry has all fields and subsequent entries inherit from the previous one.
The proxy was passing this through without expanding it, which meant cooldown
filtering could break the inheritance chain (losing fields like `name`) and
`~dev` sentinel markers were silently dropped.

The proxy now expands the minified format before filtering and rewriting,
ensuring every version entry is self-contained.

Web UI and API routes used single-segment chi URL params for package names,
which broke for Composer's `vendor/name` format. `/package/composer/monolog/monolog`
would match the version show route instead of the package show route.

All `/package/` and related API routes now use wildcard paths with a
`resolvePackageName` helper that tries increasingly longer path prefixes as
package names via DB lookup, correctly handling namespaced packages across
all endpoints (show, version, browse, compare, vulns).

Fixes #61, fixes #62

* Add namespaced package routing tests for all affected ecosystems

Verifies the wildcard routing handles slashes in package names for
npm (@babel/core), Go modules (github.com/stretchr/testify),
OCI images (library/nginx), Conda (conda-forge/numpy), and
Conan (zlib/1.2.13@demo/stable).

* Regenerate swagger docs after route refactor

The swagger annotations for the old per-endpoint handlers were removed
during the wildcard routing refactor. Regenerate to match current state.
2026-04-06 13:07:02 +01:00
Andrew Nesbitt
e45706d808
Track applied migrations to skip column checks on startup (#60)
* Track applied migrations to skip column checks on startup

Add a migrations table that records which migrations have been applied.
On boot, load the set of applied names in one query and only run new ones.
A fully migrated database now does 1 query instead of ~12 HasColumn/HasTable
checks.

Fresh databases created via CreateSchema record all migrations as already
applied. Old databases get the migrations table on first MigrateSchema call
and each migration is recorded after it runs.

Closes #54

* Add benchmark for MigrateSchema on fully migrated database

* Optimize MigrateSchema to single query for fully migrated databases

Skip HasTable/HasColumn checks when the migrations table already exists.
A fully migrated database now does one SELECT instead of ~12 individual
column and table checks.

* Add migration docs and link from architecture

* Add test for upgrade from fully migrated database without migrations table
2026-04-06 13:06:45 +01:00
Andrew Nesbitt
34009bad98
Lazy-load HTML templates behind sync.Once (#59)
Templates are parsed on first Render call instead of at server startup.
API-only traffic never pays the ~780µs parsing cost.

Closes #53
2026-04-06 13:06:25 +01:00
Kevin P. Fleming
ec9c437498
Correct ecosystem name in UI for Go (golang). (#64) 2026-04-05 16:20:57 +01:00
Andrew Nesbitt
beddf8357a
Fix startup message and add connectivity check for S3 storage (#57)
* Fix startup message and add connectivity check for S3 storage

When S3 storage is configured, the startup log incorrectly showed the
default local path (./cache/artifacts) instead of the actual S3 URL.
This also adds a lightweight connectivity check at startup so bad
credentials or endpoints fail immediately rather than on first request.

Add URL() and Close() to the Storage interface so all backends report
their URL and can be cleaned up properly. Rename the stats JSON field
from storage_path to storage_url. Close storage in error paths and
during graceful shutdown.

Fixes #49

* Fix Windows test assertion for file:// URL format

OpenBucket normalizes Windows paths to file:///C:/path (three slashes)
but the test expected file://C:/path (two slashes).
2026-04-03 14:06:51 +01:00
Lily Young
922d44b34e
Add Cargo cooldown support (#48)
* Add Cargo cooldown support

- Added support for cooldowns for cargo
- Added a test to test cooldowns with cargo

* Update README.md

add cargo to registry's with support for cooldowns

* Apply suggestion from @andrew

Co-authored-by: Andrew Nesbitt <andrewnez@gmail.com>
2026-04-01 18:15:07 +01:00
Andrew Nesbitt
5e04182bbd
Add upstream URL tests for all ecosystem download handlers (#51)
Adds regression test for the PyPI double-packages bug fixed in #50,
and adds fetchedURL assertions to every ecosystem that constructs
upstream download URLs (Conda, CRAN, Maven, NuGet, Conan, Debian, RPM).
2026-04-01 15:22:52 +01:00
Andrew Nesbitt
bdc246dc10
Fix container blob caching by passing auth token to fetcher (#44)
* Fix container blob caching by passing auth token to fetcher

The container handler was calling GetOrFetchArtifactFromURL without
authentication headers, causing Docker Hub to return 401. The fallback
proxyBlobWithAuth path had auth but bypassed the cache entirely.

Now passes the Bearer token through GetOrFetchArtifactFromURLWithHeaders
so blobs are both authenticated and cached.

Fixes git-pkgs/proxy#43

* Update registries to v0.4.0

Replace pre-release pseudo-version with the released v0.4.0 now that
git-pkgs/registries#13 has been merged.
2026-04-01 15:22:39 +01:00
Kevin P. Fleming
03ddad10ec
Fix paths for files.pythonhosted.org (#50)
The URLs constructed for downloading package assets from PyPI had
'packages' twice, resulting in 404s.
2026-04-01 14:58:08 +01:00
Andrew Nesbitt
599fe9e254
Fix all golangci-lint issues across the codebase (#32)
* Fix all golangci-lint issues across the codebase

Resolve 77 lint issues reported by golangci-lint with gocritic, gocognit,
gocyclo, maintidx, dupl, mnd, unparam, ireturn, goconst, and errcheck
enabled. Net reduction of ~175 lines through shared helpers and
deduplication.

* Suppress staticcheck SA1019 for intentional deprecated field usage

The Storage.Path field is deprecated but still read for backwards
compatibility with existing configs that haven't migrated to the URL field.
2026-03-18 10:59:29 +00:00
Andrew Nesbitt
3afa5e050d
Add handler download flow and server utility tests
Covers HTTP download paths for gem, hex, go, conda, cran, and maven
handlers with cache hit, invalid input, and upstream proxy scenarios.
Adds server tests for formatTimeAgo, formatSize, categorizeLicense,
LoggerMiddleware, search/pagination, and API packages list endpoint.
2026-03-17 20:31:54 +00:00
Andrew Nesbitt
d820f75fa6
Add direct tests for handler core methods, template rendering, and query validation 2026-03-13 17:05:14 +00:00
Andrew Nesbitt
240a61c537
Merge pull request #28 from git-pkgs/add-handler-tests
Add tests for Conan and NuGet handlers
2026-03-13 08:23:20 +00:00
Andrew Nesbitt
e2a683c7a6
Route handler metadata requests through Proxy.HTTPClient instead of http.DefaultClient
All handler metadata and proxy requests were using http.DefaultClient directly,
bypassing any timeout or transport configuration. Added an HTTPClient field to
the Proxy struct with a 30-second default timeout, and updated every handler
to use it for upstream HTTP requests.
2026-03-13 07:46:28 +00:00