Commit graph

25 commits

Author SHA1 Message Date
25d5d741c3 WIP 2026-05-01 06:16:13 -04:00
Andrew Nesbitt
e2495ef0aa
Merge pull request #102 from git-pkgs/enforce-max-size-eviction
Enforce max_size config with LRU cache eviction
2026-04-30 23:26:16 +01:00
Andrew Nesbitt
461a95c518
Enforce max_size config with LRU cache eviction
Closes #99. The max_size storage config was parsed and validated but
never enforced. This adds a background eviction loop that periodically
checks total cache size and evicts least recently used artifacts when
the limit is exceeded.
2026-04-30 18:09:01 +01:00
Andrew Nesbitt
1ad182782d
Add storage.direct_serve_base_url to override presigned URL host
When the proxy reaches storage at an internal address (127.0.0.1, a
Docker service name) the presigned URLs it generates point there too,
which is useless to external clients. This adds an optional base URL
that replaces the scheme and host of signed URLs before they're returned,
keeping the signed path and query intact.
2026-04-27 12:14:37 +01:00
Andrew Nesbitt
c73b0a35a1
Add direct-serve via presigned storage URLs
When storage.direct_serve is enabled and the backend supports it (S3,
Azure), cached artifact downloads return a 302 redirect to a presigned
URL instead of streaming bytes through the proxy. Falls back to
streaming when the backend can't sign (fileblob, local filesystem) or
signing fails.

Adds the azureblob driver so azblob:// storage URLs work.

Cache-hit accounting already happened before io.Copy so redirects are
counted correctly; the metrics calls are pulled into a helper so both
paths share them.

Closes #96
2026-04-27 12:04:38 +01:00
Andrew Nesbitt
7346008aa5
Add metadata TTL and stale-while-revalidate support
Cached metadata is now served directly within a configurable TTL window
(default 5m) without contacting upstream, reducing latency and upstream
load. When upstream is unreachable and the cache is past its TTL, stale
content is served with a Warning: 110 header per RFC 7234.

New config: `metadata_ttl` (YAML) / `PROXY_METADATA_TTL` (env).
Set to "0" to always revalidate with upstream.
2026-04-13 09:01:05 +01:00
Andrew Nesbitt
47681066b5
Fix review issues in mirror feature
- Fix race where runJob could overwrite canceled state set by Cancel()
- Fix Debian ecosystem name inconsistency ("deb" -> "debian")
- Stream metadata responses when caching is disabled to avoid buffering
- Add metadata_cache table to initial schema strings for consistency
- Gate mirror API behind mirror_api config flag (disabled by default)
- Fix goconst lint in metadata_cache_test.go
2026-04-13 09:01:04 +01:00
Andrew Nesbitt
02738651ab
Fix concurrency, resource, and reliability issues in mirror
- Wire job contexts to server shutdown context so jobs are canceled on
  server stop instead of running indefinitely
- Defer context cancel in runJob so completed jobs don't leak contexts
- Cap error accumulation in progressTracker to 1000 entries to prevent
  OOM on large mirror operations with many failures
- Add panic recovery in errgroup workers to prevent process crashes
- Use defer for db.Close() in runMirror CLI to ensure cleanup on all
  error paths
2026-04-13 09:01:04 +01:00
Andrew Nesbitt
d62c42b8d7
Add mirror command and API for selective package mirroring
Add a `proxy mirror` CLI command and `/api/mirror` API endpoints that
pre-populate the cache from various input sources: individual PURLs,
SBOM files (CycloneDX and SPDX), or full registry enumeration.

The mirror reuses the existing handler.Proxy.GetOrFetchArtifact()
pipeline so cached artifacts are identical to those fetched on demand.
A bounded worker pool controls download parallelism.

Metadata caching is opt-in via `cache_metadata: true` in config (or
PROXY_CACHE_METADATA=true). The mirror command always enables it. When
enabled, upstream metadata responses are stored for offline fallback
with ETag-based conditional revalidation.

New internal/mirror package with Source interface, PURLSource,
SBOMSource, RegistrySource, and async JobStore. New metadata_cache
database table for offline metadata serving.
2026-04-13 09:01:04 +01:00
Andrew Nesbitt
15c133f1fa
Fix Composer minified metadata expansion and namespaced package routing (#63)
* Fix Composer minified metadata expansion and namespaced package routing

Packagist serves metadata in a minified format where only the first version
entry has all fields and subsequent entries inherit from the previous one.
The proxy was passing this through without expanding it, which meant cooldown
filtering could break the inheritance chain (losing fields like `name`) and
`~dev` sentinel markers were silently dropped.

The proxy now expands the minified format before filtering and rewriting,
ensuring every version entry is self-contained.

Web UI and API routes used single-segment chi URL params for package names,
which broke for Composer's `vendor/name` format. `/package/composer/monolog/monolog`
would match the version show route instead of the package show route.

All `/package/` and related API routes now use wildcard paths with a
`resolvePackageName` helper that tries increasingly longer path prefixes as
package names via DB lookup, correctly handling namespaced packages across
all endpoints (show, version, browse, compare, vulns).

Fixes #61, fixes #62

* Add namespaced package routing tests for all affected ecosystems

Verifies the wildcard routing handles slashes in package names for
npm (@babel/core), Go modules (github.com/stretchr/testify),
OCI images (library/nginx), Conda (conda-forge/numpy), and
Conan (zlib/1.2.13@demo/stable).

* Regenerate swagger docs after route refactor

The swagger annotations for the old per-endpoint handlers were removed
during the wildcard routing refactor. Regenerate to match current state.
2026-04-06 13:07:02 +01:00
Andrew Nesbitt
34009bad98
Lazy-load HTML templates behind sync.Once (#59)
Templates are parsed on first Render call instead of at server startup.
API-only traffic never pays the ~780µs parsing cost.

Closes #53
2026-04-06 13:06:25 +01:00
Andrew Nesbitt
beddf8357a
Fix startup message and add connectivity check for S3 storage (#57)
* Fix startup message and add connectivity check for S3 storage

When S3 storage is configured, the startup log incorrectly showed the
default local path (./cache/artifacts) instead of the actual S3 URL.
This also adds a lightweight connectivity check at startup so bad
credentials or endpoints fail immediately rather than on first request.

Add URL() and Close() to the Storage interface so all backends report
their URL and can be cleaned up properly. Rename the stats JSON field
from storage_path to storage_url. Close storage in error paths and
during graceful shutdown.

Fixes #49

* Fix Windows test assertion for file:// URL format

OpenBucket normalizes Windows paths to file:///C:/path (three slashes)
but the test expected file://C:/path (two slashes).
2026-04-03 14:06:51 +01:00
Andrew Nesbitt
599fe9e254
Fix all golangci-lint issues across the codebase (#32)
* Fix all golangci-lint issues across the codebase

Resolve 77 lint issues reported by golangci-lint with gocritic, gocognit,
gocyclo, maintidx, dupl, mnd, unparam, ireturn, goconst, and errcheck
enabled. Net reduction of ~175 lines through shared helpers and
deduplication.

* Suppress staticcheck SA1019 for intentional deprecated field usage

The Storage.Path field is deprecated but still read for backwards
compatibility with existing configs that haven't migrated to the URL field.
2026-03-18 10:59:29 +00:00
Andrew Nesbitt
3d6ebc9522
Stop leaking internal error messages in API and health responses
Replace err.Error() in HTTP error responses with generic messages.
Internal details like database driver errors and enrichment failures
were being sent directly to clients.
2026-03-12 12:01:29 +00:00
Andrew Nesbitt
82443e137f
Add generated OpenAPI docs support 2026-03-12 11:49:31 +00:00
Andrew Nesbitt
4f8f63f354
Add version cooldown to filter recently published packages
Hides package versions published too recently from metadata responses,
giving the community time to spot malicious releases. Configurable
per-ecosystem and per-package with duration overrides. Supported for
npm, PyPI, pub.dev, and Composer.
2026-03-04 19:00:31 +00:00
Andrew Nesbitt
364549ad14
Replace inline PURL construction with purl library
Uses purl.MakePURLString() instead of fmt.Sprintf("pkg:...") for
correct namespace handling (npm scopes, Go module paths, Maven group
IDs) and percent-encoding. Replaces hand-rolled extractEcosystem and
inline PURL parsing in the bulk lookup fallback with purl.Parse().
2026-03-04 09:20:16 +00:00
Andrew Nesbitt
be8c4b9860
Replace internal/upstream with registries/fetch
Use the new client/ and fetch/ sub-packages from git-pkgs/registries
instead of the local upstream package. The fetcher, circuit breaker, and
resolver now live in registries where they can be shared across projects.

Depends on git-pkgs/registries#8.
2026-02-20 17:31:12 +00:00
Andrew Nesbitt
8ddf07587c
Fix golangci-lint errors (errcheck, staticcheck, unused) 2026-02-16 10:53:21 +00:00
Andrew Nesbitt
2d7cb8eae5
Refactoring and features 2026-02-03 22:40:40 +00:00
Andrew Nesbitt
658e9621d8
Add Container, Debian, RPM handlers and enrichment API
Adds proxy support for Docker/OCI container registries, Debian/APT
repositories, and RPM/Yum repositories. Includes a new enrichment API
for package metadata, vulnerability scanning, and outdated detection.

Updates the dashboard with Tailwind CSS, dark mode support, and a
security overview section showing vulnerability counts.
2026-01-29 19:35:15 +00:00
Andrew Nesbitt
fcc5289f97
Add auth pass-through for upstream registries
Configure authentication per URL prefix in config:

  upstream:
    auth:
      "https://registry.npmjs.org":
        type: bearer
        token: "${NPM_TOKEN}"

Supports bearer tokens, basic auth, and custom headers.
Credentials can reference environment variables with ${VAR_NAME} syntax.
The longest matching URL prefix wins when multiple patterns match.
2026-01-29 16:33:09 +00:00
Andrew Nesbitt
ba754f8a79
Add gocloud.dev/blob for S3 and filesystem storage
Replace custom filesystem storage with gocloud.dev/blob for unified
storage backend support.

Supported backends:
- file:///path/to/dir - Local filesystem (default)
- s3://bucket-name - Amazon S3
- s3://bucket?endpoint=http://localhost:9000 - S3-compatible (MinIO)

Configuration via:
- CLI flag: -storage-url
- Environment: PROXY_STORAGE_URL
- Config file: storage.url

The old storage.path config is deprecated but still supported.
2026-01-29 16:13:16 +00:00
Andrew Nesbitt
41aa11ab66
Add sqlx with SQLite default and PostgreSQL option
Replace raw database/sql with jmoiron/sqlx for cleaner query handling.
Support both SQLite (default) and PostgreSQL as configurable backends.

Configuration via:
- CLI flags: -database-driver, -database-path, -database-url
- Environment: PROXY_DATABASE_DRIVER, PROXY_DATABASE_PATH, PROXY_DATABASE_URL
- Config file: database.driver, database.path, database.url

Tests run against both databases when PROXY_DATABASE_URL is set.
2026-01-29 16:06:56 +00:00
Andrew Nesbitt
7b22638ef7
Hello world 2026-01-20 22:00:31 +00:00