pkg-proxy/docs/configuration.md

334 lines
8.7 KiB
Markdown
Raw Permalink Normal View History

# Configuration
The proxy can be configured via command line flags, environment variables, or a configuration file. Command line flags take precedence over environment variables, which take precedence over the configuration file.
## Configuration File
Create a YAML or JSON file and pass it with `-config`:
```bash
proxy serve -config config.yaml
```
See `config.example.yaml` in the repository root for a complete example.
## Server Settings
| Config | Environment | Flag | Default | Description |
|--------|-------------|------|---------|-------------|
| `listen` | `PROXY_LISTEN` | `-listen` | `:8080` | Address to listen on |
| `base_url` | `PROXY_BASE_URL` | `-base-url` | `http://localhost:8080` | Public URL for the proxy |
## Storage
The proxy stores cached artifacts using gocloud.dev/blob, supporting local filesystem and S3-compatible storage.
### Local Filesystem
```yaml
storage:
url: "file:///var/cache/proxy"
```
Or using the legacy path option:
```yaml
storage:
path: "./cache/artifacts"
```
| Config | Environment | Flag | Description |
|--------|-------------|------|-------------|
| `storage.url` | `PROXY_STORAGE_URL` | `-storage-url` | Storage URL (file:// or s3://) |
| `storage.path` | `PROXY_STORAGE_PATH` | `-storage-path` | Local path (deprecated, use url) |
| `storage.max_size` | `PROXY_STORAGE_MAX_SIZE` | - | Max cache size (e.g., "10GB") |
### Amazon S3
```yaml
storage:
url: "s3://my-bucket"
```
Configure credentials via environment variables:
```bash
export AWS_ACCESS_KEY_ID=your-key
export AWS_SECRET_ACCESS_KEY=your-secret
export AWS_REGION=us-east-1
```
### S3-Compatible (MinIO, etc.)
```yaml
storage:
url: "s3://my-bucket?endpoint=http://localhost:9000&disableSSL=true&s3ForcePathStyle=true"
```
## Database
The proxy supports SQLite (default) and PostgreSQL for storing package metadata.
### SQLite
```yaml
database:
driver: "sqlite"
path: "./cache/proxy.db"
```
| Config | Environment | Flag | Description |
|--------|-------------|------|-------------|
| `database.driver` | `PROXY_DATABASE_DRIVER` | `-database-driver` | `sqlite` or `postgres` |
| `database.path` | `PROXY_DATABASE_PATH` | `-database-path` | SQLite file path |
### PostgreSQL
```yaml
database:
driver: "postgres"
url: "postgres://user:password@localhost:5432/proxy?sslmode=disable"
```
| Config | Environment | Flag | Description |
|--------|-------------|------|-------------|
| `database.url` | `PROXY_DATABASE_URL` | `-database-url` | PostgreSQL connection URL |
## Logging
```yaml
log:
level: "info"
format: "text"
```
| Config | Environment | Flag | Values |
|--------|-------------|------|--------|
| `log.level` | `PROXY_LOG_LEVEL` | `-log-level` | `debug`, `info`, `warn`, `error` |
| `log.format` | `PROXY_LOG_FORMAT` | `-log-format` | `text`, `json` |
## Upstream Registries
Override default upstream registry URLs:
```yaml
upstream:
npm: "https://registry.npmjs.org"
cargo: "https://index.crates.io"
cargo_download: "https://static.crates.io/crates"
```
## Authentication
Configure authentication for private upstream registries. Auth is matched by URL prefix, and credentials can reference environment variables using `${VAR_NAME}` syntax.
### Bearer Token
Used by npm, GitHub Package Registry, and many other registries:
```yaml
upstream:
auth:
"https://registry.npmjs.org":
type: bearer
token: "${NPM_TOKEN}"
"https://npm.pkg.github.com":
type: bearer
token: "${GITHUB_TOKEN}"
```
### Basic Authentication
Used by PyPI, Artifactory, and others:
```yaml
upstream:
auth:
"https://pypi.org":
type: basic
username: "__token__"
password: "${PYPI_TOKEN}"
"https://artifactory.mycompany.com":
type: basic
username: "deploy"
password: "${ARTIFACTORY_PASSWORD}"
```
### Custom Header
For registries that use non-standard authentication headers:
```yaml
upstream:
auth:
"https://maven.mycompany.com":
type: header
header_name: "X-Auth-Token"
header_value: "${MAVEN_TOKEN}"
```
### URL Matching
Auth configs are matched by URL prefix. The longest matching prefix wins, so you can configure different credentials for different paths:
```yaml
upstream:
auth:
# All requests to this registry
"https://registry.mycompany.com":
type: bearer
token: "${REGISTRY_TOKEN}"
# Override for a specific scope
"https://registry.mycompany.com/@private":
type: bearer
token: "${PRIVATE_TOKEN}"
```
## Cooldown
The cooldown feature hides package versions published too recently, giving the community time to spot malicious releases before they reach your projects. When a version is within its cooldown period, it's stripped from metadata responses so package managers won't install it.
```yaml
cooldown:
default: "3d"
ecosystems:
npm: "7d"
cargo: "0"
packages:
"pkg:npm/lodash": "0"
"pkg:npm/@babel/core": "14d"
```
| Config | Environment | Description |
|--------|-------------|-------------|
| `cooldown.default` | `PROXY_COOLDOWN_DEFAULT` | Global default cooldown |
| `cooldown.ecosystems` | - | Per-ecosystem overrides |
| `cooldown.packages` | - | Per-package overrides (keyed by PURL) |
Durations support days (`7d`), hours (`48h`), and minutes (`30m`). Set to `0` to disable.
Resolution order: package override, then ecosystem override, then global default. This lets you set a conservative default while exempting trusted packages.
Currently supported for npm, PyPI, pub.dev, Composer, Cargo, NuGet, Conda, RubyGems, and Hex. These ecosystems include publish timestamps in their metadata.
Note: Hex cooldown requires disabling registry signature verification since the proxy re-encodes the protobuf payload without the original signature. Set `HEX_NO_VERIFY_REPO_ORIGIN=1` or configure your repo with `no_verify: true`.
## Metadata Caching
By default the proxy fetches metadata fresh from upstream on every request. Enable `cache_metadata` to store metadata responses in the database and storage backend for offline fallback. When upstream is unreachable, the proxy serves the last cached copy. ETag-based revalidation avoids re-downloading unchanged metadata.
```yaml
cache_metadata: true
```
Or via environment variable: `PROXY_CACHE_METADATA=true`.
The `proxy mirror` command always enables metadata caching regardless of this setting.
### Metadata TTL
When metadata caching is enabled, `metadata_ttl` controls how long a cached response is considered fresh before revalidating with upstream. During the TTL window, cached metadata is served directly without contacting upstream, reducing latency and upstream load.
```yaml
metadata_ttl: "5m" # default
```
Or via environment variable: `PROXY_METADATA_TTL=10m`.
Set to `"0"` to always revalidate with upstream (ETag-based conditional requests still avoid re-downloading unchanged content).
When upstream is unreachable and the cached entry is past its TTL, the proxy serves the stale cached copy with a `Warning: 110 - "Response is Stale"` header so clients can tell the data may be outdated.
## Mirror API
The `/api/mirror` endpoints are disabled by default. Enable them to allow starting mirror jobs via HTTP:
```yaml
mirror_api: true
```
Or via environment variable: `PROXY_MIRROR_API=true`.
When disabled, the endpoints are not registered and return 404.
## Mirror Command
The `proxy mirror` command pre-populates the cache from various sources. It accepts the same storage and database flags as `serve`.
| Flag | Default | Description |
|------|---------|-------------|
| `--sbom` | | Path to CycloneDX or SPDX SBOM file |
| `--concurrency` | `4` | Number of parallel downloads |
| `--dry-run` | `false` | Show what would be mirrored without downloading |
| `--config` | | Path to configuration file |
| `--storage-url` | | Storage URL |
| `--database-driver` | | Database driver |
| `--database-path` | | SQLite database file |
| `--database-url` | | PostgreSQL connection URL |
Positional arguments are treated as PURLs:
```bash
proxy mirror pkg:npm/lodash@4.17.21 pkg:cargo/serde@1.0.0
```
## Docker
### SQLite with Local Storage
```bash
docker compose up
```
### PostgreSQL with Local Storage
```bash
docker compose --profile postgres up
```
### PostgreSQL with S3 (MinIO)
```bash
docker compose --profile s3 up
```
## Example Configurations
### Minimal (defaults)
```yaml
listen: ":8080"
```
### Production with PostgreSQL and S3
```yaml
listen: ":8080"
base_url: "https://proxy.example.com"
storage:
url: "s3://my-cache-bucket"
max_size: "100GB"
database:
driver: "postgres"
url: "postgres://proxy:secret@db.example.com:5432/proxy?sslmode=require"
log:
level: "info"
format: "json"
```
### Private npm Registry
```yaml
listen: ":8080"
base_url: "http://localhost:8080"
upstream:
npm: "https://npm.pkg.github.com"
auth:
npm:
type: bearer
token: "${GITHUB_TOKEN}"
```