Commit graph

18 commits

Author SHA1 Message Date
c655399a07 Apply 'go fmt' as suggested in CONTRIBUTING.md. 2026-04-18 07:43:22 -04:00
Andrew Nesbitt
7346008aa5
Add metadata TTL and stale-while-revalidate support
Cached metadata is now served directly within a configurable TTL window
(default 5m) without contacting upstream, reducing latency and upstream
load. When upstream is unreachable and the cache is past its TTL, stale
content is served with a Warning: 110 header per RFC 7234.

New config: `metadata_ttl` (YAML) / `PROXY_METADATA_TTL` (env).
Set to "0" to always revalidate with upstream.
2026-04-13 09:01:05 +01:00
Andrew Nesbitt
c01f0a5c05
Fix metadata caching, 404 propagation, mirror progress, and registry stubs
- ProxyCached now stores upstream Last-Modified in the cache and uses it
  (along with ETag) for conditional request handling, returning 304 when
  client validators match. Adds Content-Length to cached responses.

- Handlers calling FetchOrCacheMetadata (pypi, composer, pub, nuget) now
  check for ErrUpstreamNotFound and return 404 instead of 502, matching
  the existing npm and cargo behavior.

- Mirror jobs report live progress via a periodic callback while running,
  so API polls return real counts instead of zeroed progress.

- Registry mirroring removed from CLI flags, API acceptance, README, and
  docs since every enumerator was a stub returning "not yet implemented".

- Added tests for the conditional metadata path (ETag/If-None-Match,
  Last-Modified/If-Modified-Since, 304 responses, header omission).
2026-04-13 09:01:05 +01:00
Andrew Nesbitt
47681066b5
Fix review issues in mirror feature
- Fix race where runJob could overwrite canceled state set by Cancel()
- Fix Debian ecosystem name inconsistency ("deb" -> "debian")
- Stream metadata responses when caching is disabled to avoid buffering
- Add metadata_cache table to initial schema strings for consistency
- Gate mirror API behind mirror_api config flag (disabled by default)
- Fix goconst lint in metadata_cache_test.go
2026-04-13 09:01:04 +01:00
Andrew Nesbitt
d62c42b8d7
Add mirror command and API for selective package mirroring
Add a `proxy mirror` CLI command and `/api/mirror` API endpoints that
pre-populate the cache from various input sources: individual PURLs,
SBOM files (CycloneDX and SPDX), or full registry enumeration.

The mirror reuses the existing handler.Proxy.GetOrFetchArtifact()
pipeline so cached artifacts are identical to those fetched on demand.
A bounded worker pool controls download parallelism.

Metadata caching is opt-in via `cache_metadata: true` in config (or
PROXY_CACHE_METADATA=true). The mirror command always enables it. When
enabled, upstream metadata responses are stored for offline fallback
with ETag-based conditional revalidation.

New internal/mirror package with Source interface, PURLSource,
SBOMSource, RegistrySource, and async JobStore. New metadata_cache
database table for offline metadata serving.
2026-04-13 09:01:04 +01:00
Andrew Nesbitt
8b762ffb39
Fix silent truncation of large npm metadata responses
ReadMetadata used io.LimitReader which silently truncated responses at
the size limit. For packages like drizzle-orm (~92MB metadata), this
produced invalid JSON that was served to clients.

Now returns ErrMetadataTooLarge when the limit is exceeded, and bumps
the limit from 50MB to 100MB.

Fixes #78
2026-04-08 16:02:30 +01:00
Andrew Nesbitt
bdc246dc10
Fix container blob caching by passing auth token to fetcher (#44)
* Fix container blob caching by passing auth token to fetcher

The container handler was calling GetOrFetchArtifactFromURL without
authentication headers, causing Docker Hub to return 401. The fallback
proxyBlobWithAuth path had auth but bypassed the cache entirely.

Now passes the Bearer token through GetOrFetchArtifactFromURLWithHeaders
so blobs are both authenticated and cached.

Fixes git-pkgs/proxy#43

* Update registries to v0.4.0

Replace pre-release pseudo-version with the released v0.4.0 now that
git-pkgs/registries#13 has been merged.
2026-04-01 15:22:39 +01:00
Andrew Nesbitt
599fe9e254
Fix all golangci-lint issues across the codebase (#32)
* Fix all golangci-lint issues across the codebase

Resolve 77 lint issues reported by golangci-lint with gocritic, gocognit,
gocyclo, maintidx, dupl, mnd, unparam, ireturn, goconst, and errcheck
enabled. Net reduction of ~175 lines through shared helpers and
deduplication.

* Suppress staticcheck SA1019 for intentional deprecated field usage

The Storage.Path field is deprecated but still read for backwards
compatibility with existing configs that haven't migrated to the URL field.
2026-03-18 10:59:29 +00:00
Andrew Nesbitt
e2a683c7a6
Route handler metadata requests through Proxy.HTTPClient instead of http.DefaultClient
All handler metadata and proxy requests were using http.DefaultClient directly,
bypassing any timeout or transport configuration. Added an HTTPClient field to
the Proxy struct with a 30-second default timeout, and updated every handler
to use it for upstream HTTP requests.
2026-03-13 07:46:28 +00:00
Andrew Nesbitt
0e1a06c5e6
Add size limits on request bodies and upstream metadata reads
POST endpoints (/api/outdated, /api/bulk) now reject bodies over 1 MB
using http.MaxBytesReader. Upstream metadata reads (npm, pypi, composer,
nuget, pub) now use io.LimitReader capped at 50 MB to prevent OOM from
unexpectedly large responses.
2026-03-13 07:28:20 +00:00
Andrew Nesbitt
bf7e83efe3
Reject path traversal in debian and rpm handlers
The debian and rpm handlers take the request path and pass it directly
to the upstream URL without checking for ".." segments. This could let
a client craft a request that reaches unintended upstream paths.

Add a containsPathTraversal check at the entry point of both handlers
and return 400 for any path containing ".." segments.
2026-03-12 12:05:52 +00:00
Andrew Nesbitt
4f8f63f354
Add version cooldown to filter recently published packages
Hides package versions published too recently from metadata responses,
giving the community time to spot malicious releases. Configurable
per-ecosystem and per-package with duration overrides. Supported for
npm, PyPI, pub.dev, and Composer.
2026-03-04 19:00:31 +00:00
Andrew Nesbitt
364549ad14
Replace inline PURL construction with purl library
Uses purl.MakePURLString() instead of fmt.Sprintf("pkg:...") for
correct namespace handling (npm scopes, Go module paths, Maven group
IDs) and percent-encoding. Replaces hand-rolled extractEcosystem and
inline PURL parsing in the bulk lookup fallback with purl.Parse().
2026-03-04 09:20:16 +00:00
Andrew Nesbitt
be8c4b9860
Replace internal/upstream with registries/fetch
Use the new client/ and fetch/ sub-packages from git-pkgs/registries
instead of the local upstream package. The fetcher, circuit breaker, and
resolver now live in registries where they can be shared across projects.

Depends on git-pkgs/registries#8.
2026-02-20 17:31:12 +00:00
Andrew Nesbitt
2d7cb8eae5
Refactoring and features 2026-02-03 22:40:40 +00:00
Andrew Nesbitt
1eb9d71bd7
Align database schema with git-pkgs for compatibility
The proxy can now use an existing git-pkgs database as a starting point.
Packages and versions tables match git-pkgs schema, using PURL-based
references instead of integer IDs. The proxy adds its own artifacts
table for caching functionality.
2026-01-29 16:44:01 +00:00
Andrew Nesbitt
7b1bdf75f7
Extract checkCache helper to reduce duplication 2026-01-21 22:47:23 +00:00
Andrew Nesbitt
7b22638ef7
Hello world 2026-01-20 22:00:31 +00:00