pkg-proxy/README.md

943 lines
22 KiB
Markdown
Raw Permalink Normal View History

2026-01-20 21:52:44 +00:00
# git-pkgs proxy
A caching proxy for package registries. Speeds up package downloads by caching artifacts locally, reducing bandwidth usage and improving reliability.
## Version Cooldown
Most supply chain attacks rely on speed: a malicious version gets published and consumed by automated pipelines within minutes, before anyone notices. The cooldown feature adds a quarantine period to newly published versions. When enabled, the proxy strips versions from metadata responses until they've aged past a configurable threshold.
```yaml
cooldown:
default: "3d" # hide versions published less than 3 days ago
ecosystems:
npm: "7d" # npm gets a longer window
cargo: "0" # disable for cargo
packages:
"pkg:npm/lodash": "0" # exempt trusted packages
```
A 3-day cooldown means that when `lodash` publishes version `4.18.0`, your builds keep using `4.17.21` until 3 days have passed. If the new release turns out to be compromised, you were never exposed.
Resolution order: package override, then ecosystem override, then global default. This lets you set a conservative default and carve out exceptions for packages where you need faster updates. See [docs/configuration.md](docs/configuration.md) for the full config reference.
2026-01-20 21:52:44 +00:00
## Supported Registries
| Registry | Language/Platform | Cooldown | Completed |
|----------|-------------------|:--------:|:---------:|
| npm | JavaScript | Yes | ✓ |
| Cargo | Rust | Yes | ✓ |
| RubyGems | Ruby | Yes | ✓ |
| Go proxy | Go | | ✓ |
| Hex | Elixir | Yes* | ✓ |
| pub.dev | Dart | Yes | ✓ |
| PyPI | Python | Yes | ✓ |
| Maven | Java | | ✓ |
| NuGet | .NET | Yes | ✓ |
| Composer | PHP | Yes | ✓ |
| Conan | C/C++ | | ✓ |
| Conda | Python/R | Yes | ✓ |
| CRAN | R | | ✓ |
| Container | Docker/OCI | | ✓ |
| Debian | Debian/Ubuntu | | ✓ |
| RPM | RHEL/Fedora | | ✓ |
| Alpine | Alpine Linux | | ✗ |
| Arch | Arch Linux | | ✗ |
| Chef | Chef | | ✗ |
| Generic | Any | | ✗ |
| Helm | Kubernetes | | ✗ |
| Swift | Swift | | ✗ |
| Vagrant | Vagrant | | ✗ |
Cooldown requires publish timestamps in metadata. Registries without a "Yes" in the cooldown column either don't expose timestamps or haven't been wired up yet.
2026-01-20 21:52:44 +00:00
\* Hex cooldown requires disabling registry signature verification (`HEX_NO_VERIFY_REPO_ORIGIN=1`) since the proxy re-encodes the protobuf payload.
2026-04-13 11:03:59 +01:00
## Install
```bash
brew install git-pkgs/git-pkgs/proxy
```
Or download a binary from the [releases page](https://github.com/git-pkgs/proxy/releases).
2026-01-20 21:52:44 +00:00
## Quick Start
```bash
# Build from source
go build -o proxy ./cmd/proxy
# Run with defaults (listens on :8080)
./proxy
# Run with custom settings
./proxy -listen :3000 -base-url https://proxy.example.com
```
The proxy is now running. Configure your package managers to use it.
2026-03-11 17:18:29 +00:00
## OpenAPI (Swagger)
This repo uses swaggo to generate an OpenAPI spec from annotated handlers.
Generate the spec:
```bash
go install github.com/swaggo/swag/cmd/swag@latest
go generate ./internal/server
```
Generated files are written to `docs/swagger/`.
When the proxy is running, fetch the live spec from:
- `http://localhost:8080/openapi.json`
Or replace `http://localhost:8080` with your configured base URL. This link is also shown on the dashboard.
2026-01-20 21:52:44 +00:00
## Configuring Package Managers
### npm
Create or edit `~/.npmrc`:
```
registry=http://localhost:8080/npm/
```
Or set per-project in `.npmrc`:
```
registry=http://localhost:8080/npm/
```
Or use environment variable:
```bash
npm_config_registry=http://localhost:8080/npm/ npm install
```
### Cargo
Create or edit `~/.cargo/config.toml`:
```toml
[source.crates-io]
replace-with = "proxy"
[source.proxy]
registry = "sparse+http://localhost:8080/cargo/"
```
Or set per-project in `.cargo/config.toml` in your project root.
### RubyGems / Bundler
Set the gem source in your `Gemfile`:
```ruby
source "http://localhost:8080/gem"
```
Or configure globally:
```bash
gem sources --add http://localhost:8080/gem/
bundle config mirror.https://rubygems.org http://localhost:8080/gem
```
### Go modules
Set the GOPROXY environment variable:
```bash
export GOPROXY=http://localhost:8080/go,direct
```
Or in your shell profile for persistence.
### Hex (Elixir)
Configure in `~/.hex/hex.config`:
```erlang
{default_url, <<"http://localhost:8080/hex">>}.
```
Or set the environment variable:
```bash
export HEX_MIRROR=http://localhost:8080/hex
```
### pub.dev (Dart/Flutter)
Set the PUB_HOSTED_URL environment variable:
```bash
export PUB_HOSTED_URL=http://localhost:8080/pub
```
### PyPI (pip)
Configure pip to use the proxy:
```bash
pip install --index-url http://localhost:8080/pypi/simple/ package_name
```
Or set in `~/.pip/pip.conf`:
```ini
[global]
index-url = http://localhost:8080/pypi/simple/
```
### Maven
Add to your `~/.m2/settings.xml`:
```xml
<settings>
<mirrors>
<mirror>
<id>proxy</id>
<mirrorOf>central</mirrorOf>
<url>http://localhost:8080/maven/</url>
</mirror>
</mirrors>
</settings>
```
### NuGet
Configure in `nuget.config`:
```xml
<configuration>
<packageSources>
<clear />
<add key="proxy" value="http://localhost:8080/nuget/v3/index.json" />
</packageSources>
</configuration>
```
Or use the CLI:
```bash
dotnet nuget add source http://localhost:8080/nuget/v3/index.json -n proxy
```
### Composer (PHP)
Configure in `composer.json`:
```json
{
"repositories": [
{
"type": "composer",
"url": "http://localhost:8080/composer"
}
]
}
```
Or set globally:
```bash
composer config -g repositories.proxy composer http://localhost:8080/composer
```
### Conan (C/C++)
Add the proxy as a remote:
```bash
conan remote add proxy http://localhost:8080/conan
conan remote disable conancenter
```
Or configure in `~/.conan2/remotes.json`.
### Conda
Configure in `~/.condarc`:
```yaml
channels:
- http://localhost:8080/conda/main
- http://localhost:8080/conda/conda-forge
default_channels:
- http://localhost:8080/conda/main
```
Or set via command:
```bash
conda config --add channels http://localhost:8080/conda/main
```
### CRAN (R)
Set the repository in R:
```r
options(repos = c(CRAN = "http://localhost:8080/cran"))
```
Or in `~/.Rprofile` for persistence:
```r
local({
r <- getOption("repos")
r["CRAN"] <- "http://localhost:8080/cran"
options(repos = r)
})
```
### Docker / Container Registry
Configure Docker to use the proxy as a registry mirror in `/etc/docker/daemon.json`:
```json
{
"registry-mirrors": ["http://localhost:8080"]
}
```
Then restart Docker:
```bash
sudo systemctl restart docker
```
Or pull images directly:
```bash
docker pull localhost:8080/library/nginx:latest
```
### Debian / APT
Configure APT to use the proxy in `/etc/apt/sources.list.d/proxy.list`:
```
deb http://localhost:8080/debian stable main contrib
```
Replace your existing sources.list entries, then:
```bash
sudo apt update
```
### RPM / Yum / DNF
Configure yum/dnf to use the proxy in `/etc/yum.repos.d/proxy.repo`:
```ini
[proxy-fedora]
name=Fedora via Proxy
baseurl=http://localhost:8080/rpm/releases/$releasever/Everything/$basearch/os/
enabled=1
gpgcheck=0
```
Then:
```bash
sudo dnf clean all
sudo dnf update
```
2026-01-20 21:52:44 +00:00
## Configuration
The proxy can be configured via:
1. Command line flags (highest priority)
2. Environment variables
3. Configuration file (YAML or JSON)
### Command Line Flags
```
-config string Path to configuration file
-listen string Address to listen on (default ":8080")
-base-url string Public URL of this proxy (default "http://localhost:8080")
-storage-url string Storage URL (file:// or s3://)
-storage-path string Path to artifact storage directory (deprecated, use -storage-url)
-database-driver string Database driver: sqlite or postgres (default "sqlite")
-database-path string Path to SQLite database file (default "./cache/proxy.db")
-database-url string PostgreSQL connection URL
-log-level string Log level: debug, info, warn, error (default "info")
-log-format string Log format: text, json (default "text")
-version Print version and exit
2026-01-20 21:52:44 +00:00
```
### Environment Variables
```bash
PROXY_LISTEN=:8080
PROXY_BASE_URL=http://localhost:8080
PROXY_STORAGE_URL=file:///var/cache/proxy/artifacts
PROXY_DATABASE_DRIVER=sqlite
2026-01-20 21:52:44 +00:00
PROXY_DATABASE_PATH=./cache/proxy.db
PROXY_DATABASE_URL=postgres://user:pass@localhost/proxy?sslmode=disable
2026-01-20 21:52:44 +00:00
PROXY_LOG_LEVEL=info
PROXY_LOG_FORMAT=text
```
### Configuration File
```yaml
listen: ":8080"
base_url: "http://localhost:8080"
storage:
url: "file:///var/cache/proxy/artifacts"
2026-01-20 21:52:44 +00:00
max_size: "10GB" # Optional: evict LRU when exceeded
database:
driver: "sqlite"
2026-01-20 21:52:44 +00:00
path: "/var/lib/proxy/cache.db"
log:
level: "info"
format: "text"
# Optional: override upstream URLs
upstream:
npm: "https://registry.npmjs.org"
cargo: "https://index.crates.io"
# Optional: version cooldown (see above)
cooldown:
default: "3d"
2026-01-20 21:52:44 +00:00
```
Run with config file:
```bash
./proxy -config /etc/proxy/config.yaml
```
### PostgreSQL
SQLite is the default and works well for single-node deployments. For multi-node setups or if you prefer a managed database, switch to Postgres:
```yaml
database:
driver: "postgres"
url: "postgres://user:password@localhost:5432/proxy?sslmode=disable"
```
Or via environment variables:
```bash
PROXY_DATABASE_DRIVER=postgres
PROXY_DATABASE_URL=postgres://user:password@localhost:5432/proxy?sslmode=disable
```
The proxy creates tables automatically on first run.
### S3 Storage
The proxy can store cached artifacts in S3 or any S3-compatible service (MinIO, R2, etc.) instead of the local filesystem.
```yaml
storage:
url: "s3://my-bucket-name?region=us-east-1"
```
For S3-compatible services like MinIO:
```yaml
storage:
url: "s3://my-bucket?endpoint=http://localhost:9000&disableSSL=true&s3ForcePathStyle=true"
```
Set credentials via standard AWS environment variables (`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_REGION`).
2026-01-20 21:52:44 +00:00
## CLI Commands
### serve (default)
Start the proxy server. This is the default command if none is specified.
```bash
proxy serve [flags]
proxy [flags] # same as 'proxy serve'
```
### mirror
Pre-populate the cache from PURLs, SBOM files, or entire registries. Useful for ensuring offline availability or warming the cache before deployments.
```bash
# Mirror specific package versions
proxy mirror pkg:npm/lodash@4.17.21 pkg:cargo/serde@1.0.0
# Mirror all versions of a package
proxy mirror pkg:npm/lodash
# Mirror from a CycloneDX or SPDX SBOM
proxy mirror --sbom sbom.cdx.json
# Preview what would be mirrored
proxy mirror --dry-run pkg:npm/lodash
# Control parallelism
proxy mirror --concurrency 8 pkg:npm/lodash@4.17.21
```
The mirror command accepts the same storage and database flags as `serve`. Already-cached artifacts are skipped.
A mirror API is also available when the server is running:
```bash
# Start a mirror job
curl -X POST http://localhost:8080/api/mirror \
-H "Content-Type: application/json" \
-d '{"purls": ["pkg:npm/lodash@4.17.21"]}'
# Check job status
curl http://localhost:8080/api/mirror/mirror-1
# Cancel a running job
curl -X DELETE http://localhost:8080/api/mirror/mirror-1
```
2026-01-20 21:52:44 +00:00
### stats
Show cache statistics without running the server.
```bash
# Text output
proxy stats
# JSON output
proxy stats -json
# Custom database path
proxy stats -database-path /var/lib/proxy/cache.db
# With PostgreSQL
proxy stats -database-driver postgres -database-url postgres://user:pass@localhost/proxy
2026-01-20 21:52:44 +00:00
# Show top 20 most popular packages
proxy stats -popular 20
```
Example output:
```
Cache Statistics
================
Packages: 45
Versions: 128
Artifacts: 128
Total size: 892.4 MB
Total hits: 1547
Packages by ecosystem:
npm 32
cargo 13
Most popular packages:
1. npm/lodash (342 hits, 24.7 KB)
2. npm/react (198 hits, 89.3 KB)
3. cargo/serde (156 hits, 234.1 KB)
Recently cached:
npm/express@4.18.2 (2024-01-15 14:32, 54.2 KB)
cargo/tokio@1.35.0 (2024-01-15 14:28, 412.8 KB)
```
## API Endpoints
### Registry Protocols
2026-01-20 21:52:44 +00:00
| Endpoint | Description |
|----------|-------------|
| `GET /` | Dashboard (web UI) |
2026-01-20 21:52:44 +00:00
| `GET /health` | Health check (returns "ok" if healthy) |
| `GET /stats` | Cache statistics (JSON) |
| `GET /metrics` | Prometheus metrics |
2026-01-20 21:52:44 +00:00
| `GET /npm/*` | npm registry protocol |
| `GET /cargo/*` | Cargo sparse index protocol |
| `GET /gem/*` | RubyGems protocol |
| `GET /go/*` | Go module proxy protocol |
| `GET /hex/*` | Hex.pm protocol |
| `GET /pub/*` | pub.dev protocol |
| `GET /pypi/*` | PyPI simple/JSON API |
| `GET /maven/*` | Maven repository protocol |
| `GET /nuget/*` | NuGet V3 API |
| `GET /composer/*` | Composer/Packagist protocol |
| `GET /conan/*` | Conan C/C++ protocol |
| `GET /conda/*` | Conda/Anaconda protocol |
| `GET /cran/*` | CRAN (R) protocol |
| `GET /v2/*` | OCI/Docker registry protocol |
| `GET /debian/*` | Debian/APT repository protocol |
| `GET /rpm/*` | RPM/Yum repository protocol |
### Mirror API
| Endpoint | Description |
|----------|-------------|
| `POST /api/mirror` | Start a mirror job (JSON body with `purls`) |
| `GET /api/mirror/{id}` | Get job status and progress |
| `DELETE /api/mirror/{id}` | Cancel a running job |
### Enrichment API
The proxy provides REST endpoints for package metadata enrichment, vulnerability scanning, and outdated detection.
| Endpoint | Description |
|----------|-------------|
| `GET /api/package/{ecosystem}/{name}` | Get package metadata |
| `GET /api/package/{ecosystem}/{name}/{version}` | Get version metadata with vulnerabilities |
| `GET /api/vulns/{ecosystem}/{name}` | Get all vulnerabilities for a package |
| `GET /api/vulns/{ecosystem}/{name}/{version}` | Get vulnerabilities for a specific version |
| `POST /api/outdated` | Check multiple packages for outdated versions |
| `POST /api/bulk` | Bulk package metadata lookup |
#### Get Package Metadata
```bash
curl http://localhost:8080/api/package/npm/lodash
```
Response:
```json
{
"ecosystem": "npm",
"name": "lodash",
"latest_version": "4.17.21",
"license": "MIT",
"license_category": "permissive",
"description": "Lodash modular utilities",
"homepage": "https://lodash.com/",
"repository": "https://github.com/lodash/lodash",
"registry_url": "https://registry.npmjs.org"
}
```
#### Get Version with Vulnerabilities
```bash
curl http://localhost:8080/api/package/npm/lodash/4.17.0
```
Response:
```json
{
"package": {
"ecosystem": "npm",
"name": "lodash",
"latest_version": "4.17.21",
"license": "MIT",
"license_category": "permissive"
},
"version": {
"ecosystem": "npm",
"name": "lodash",
"version": "4.17.0",
"license": "MIT",
"published_at": "2016-06-17T03:59:56Z",
"yanked": false,
"is_outdated": true
},
"vulnerabilities": [
{
"id": "GHSA-p6mc-m468-83gw",
"summary": "Prototype Pollution in lodash",
"severity": "HIGH",
"cvss_score": 7.4,
"fixed_version": "4.17.12"
}
],
"is_outdated": true,
"license_category": "permissive"
}
```
#### Check Outdated Packages
```bash
curl -X POST http://localhost:8080/api/outdated \
-H "Content-Type: application/json" \
-d '{
"packages": [
{"ecosystem": "npm", "name": "lodash", "version": "4.17.0"},
{"ecosystem": "pypi", "name": "requests", "version": "2.25.0"}
]
}'
```
Response:
```json
{
"results": [
{
"ecosystem": "npm",
"name": "lodash",
"version": "4.17.0",
"latest_version": "4.17.21",
"is_outdated": true
},
{
"ecosystem": "pypi",
"name": "requests",
"version": "2.25.0",
"latest_version": "2.31.0",
"is_outdated": true
}
]
}
```
#### Bulk Package Lookup
```bash
curl -X POST http://localhost:8080/api/bulk \
-H "Content-Type: application/json" \
-d '{
"purls": [
"pkg:npm/lodash@4.17.21",
"pkg:pypi/requests@2.28.0"
]
}'
```
Response:
```json
{
"packages": {
"pkg:npm/lodash": {
"ecosystem": "npm",
"name": "lodash",
"latest_version": "4.17.21",
"license": "MIT",
"license_category": "permissive"
},
"pkg:pypi/requests": {
"ecosystem": "pypi",
"name": "requests",
"latest_version": "2.31.0",
"license": "Apache-2.0",
"license_category": "permissive"
}
}
}
```
2026-01-20 21:52:44 +00:00
### Stats Response (HTTP endpoint)
```json
{
"cached_artifacts": 142,
"total_size_bytes": 523456789,
"total_size": "499.2 MB",
"storage_url": "file:///path/to/cache/artifacts",
2026-01-20 21:52:44 +00:00
"database_path": "./cache/proxy.db"
}
```
## How It Works
1. Package manager requests package metadata from the proxy
2. Proxy fetches metadata from upstream, rewrites artifact URLs to point at proxy
3. Package manager requests artifact (tarball, crate, etc.)
4. Proxy checks local cache:
- **Cache hit**: Serve from local storage
- **Cache miss**: Fetch from upstream, store locally, serve to client
5. Subsequent requests for the same artifact are served from cache
```
┌─────────────┐ ┌─────────┐ ┌──────────┐
│ npm/cargo │────▶│ proxy │────▶│ upstream │
│ client │◀────│ │◀────│ registry │
└─────────────┘ └─────────┘ └──────────┘
┌─────────┐
│ cache │
│ storage │
└─────────┘
```
## Web Interface
The proxy serves a web UI at the root URL. No separate frontend build is needed -- templates and assets are embedded in the binary.
- **Dashboard** (`/`) -- cache stats, popular packages, recently cached artifacts, and vulnerability overview.
- **Install guide** (`/install`) -- per-ecosystem configuration instructions, so you don't have to look them up here.
- **Package browser** (`/packages`) -- browse all cached packages with filtering by ecosystem and sorting by hits, size, name, or vulnerability count.
- **Search** (`/search?q=...`) -- search cached packages by name.
- **Package detail** (`/package/{ecosystem}/{name}`) -- metadata, license, vulnerabilities, and version list for a package. You can select two versions to compare.
- **Version detail** (`/package/{ecosystem}/{name}/{version}`) -- per-version metadata, integrity hash, artifact cache status, and hit counts.
- **Source browser** (`/package/{ecosystem}/{name}/{version}/browse`) -- browse files inside cached archives with syntax highlighting for text files and image previews.
- **Version diff** (`/package/{ecosystem}/{name}/compare/{v1}...{v2}`) -- side-by-side diff of two cached versions showing added, removed, and changed files.
## Monitoring
The proxy exposes Prometheus metrics at `GET /metrics`. All metric names are prefixed with `proxy_`.
| Metric | Type | Labels | Description |
|--------|------|--------|-------------|
| `proxy_cache_hits_total` | counter | `ecosystem` | Cache hits |
| `proxy_cache_misses_total` | counter | `ecosystem` | Cache misses |
| `proxy_cache_size_bytes` | gauge | | Total size of cached artifacts |
| `proxy_cached_artifacts_total` | gauge | | Number of cached artifacts |
| `proxy_upstream_fetch_duration_seconds` | histogram | `ecosystem` | Time spent fetching from upstream |
| `proxy_upstream_errors_total` | counter | `ecosystem`, `error_type` | Upstream fetch failures |
| `proxy_storage_operation_duration_seconds` | histogram | `operation` | Storage read/write latency |
| `proxy_storage_errors_total` | counter | `operation` | Storage read/write failures |
| `proxy_active_requests` | gauge | | In-flight requests |
Cache size and artifact count are refreshed every 60 seconds. The remaining metrics update on each request.
Scrape config for Prometheus:
```yaml
scrape_configs:
- job_name: git-pkgs-proxy
static_configs:
- targets: ["localhost:8080"]
```
2026-01-20 21:52:44 +00:00
## Production Deployment
### Systemd Service
Create `/etc/systemd/system/proxy.service`:
```ini
[Unit]
Description=git-pkgs proxy
After=network.target
[Service]
Type=simple
User=proxy
ExecStart=/usr/local/bin/proxy -config /etc/proxy/config.yaml
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
```
Enable and start:
```bash
sudo systemctl enable proxy
sudo systemctl start proxy
```
### Docker
A Dockerfile is included in the repo. Build and run:
2026-01-20 21:52:44 +00:00
```bash
docker build -t proxy .
docker run -p 8080:8080 -v proxy-data:/data proxy
2026-01-20 21:52:44 +00:00
```
With Postgres and S3:
2026-01-20 21:52:44 +00:00
```bash
docker run -p 8080:8080 \
-e PROXY_DATABASE_DRIVER=postgres \
-e PROXY_DATABASE_URL=postgres://user:pass@db:5432/proxy \
-e PROXY_STORAGE_URL=s3://my-bucket?region=us-east-1 \
-e AWS_ACCESS_KEY_ID=... \
-e AWS_SECRET_ACCESS_KEY=... \
proxy
2026-01-20 21:52:44 +00:00
```
### Behind a Reverse Proxy
When running behind nginx, Apache, or another reverse proxy, set `base_url` to your public URL:
```yaml
base_url: "https://proxy.example.com"
```
nginx example:
```nginx
server {
listen 443 ssl;
server_name proxy.example.com;
location / {
proxy_pass http://127.0.0.1:8080;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_buffering off;
}
}
```
## Cache Management
The proxy stores artifacts in the configured storage directory with this structure:
```
cache/artifacts/
├── npm/
│ └── lodash/
│ └── 4.17.21/
│ └── lodash-4.17.21.tgz
├── cargo/
│ └── serde/
│ └── 1.0.193/
│ └── serde-1.0.193.crate
├── oci/
│ └── library/nginx/
│ └── sha256:abc123.../
│ └── sha256:abc123...
├── deb/
│ └── nginx/
│ └── 1.18.0-6/
│ └── nginx_1.18.0-6_amd64.deb
└── rpm/
└── nginx/
└── 1.24.0-1.fc39/
└── nginx-1.24.0-1.fc39.x86_64.rpm
2026-01-20 21:52:44 +00:00
```
Cache metadata is stored in SQLite (default) or PostgreSQL. To clear a local cache:
2026-01-20 21:52:44 +00:00
```bash
rm -rf ./cache/artifacts/*
rm ./cache/proxy.db
```
The proxy will recreate the database on next start.
## Building from Source
Requirements:
- Go 1.25 or later
2026-01-20 21:52:44 +00:00
```bash
git clone https://github.com/git-pkgs/proxy.git
cd proxy
go build -o proxy ./cmd/proxy
```
Run tests:
```bash
go test ./...
```
## License
GPL-3.0-or-later