No description
  • Go 96.7%
  • Shell 3.3%
Find a file
Gabriel Cosi fb4fd8b386
All checks were successful
CI / test (push) Successful in 4m22s
feat: improve install flow and file-based configuration (#1)
- Add file-based daemon configuration using Viper and a committed llama-suspendd.conf example.
- Add a release installer script that downloads the correct binary, installs config/systemd files, and enables the service.
- Streamline the README around binary releases, installer usage, and runtime flags.
- Simplify GoReleaser asset names to avoid repeated architecture suffixes.

Reviewed-on: #1
2026-05-09 08:47:51 +00:00
.forgejo/workflows ci: limit GoReleaser parallelism 2026-05-08 22:58:34 +02:00
cmd/llama-suspendd feat: improve install flow and file-based configuration (#1) 2026-05-09 08:47:51 +00:00
internal/monitor feat: improve install flow and file-based configuration (#1) 2026-05-09 08:47:51 +00:00
.gitignore ci: add Forgejo release automation 2026-05-08 22:24:13 +02:00
.goreleaser.yaml feat: improve install flow and file-based configuration (#1) 2026-05-09 08:47:51 +00:00
devbox.json chore: add devbox 2026-05-01 20:33:49 +00:00
devbox.lock chore: add devbox.lock to version control 2026-05-01 20:52:53 +00:00
go.mod feat: improve install flow and file-based configuration (#1) 2026-05-09 08:47:51 +00:00
go.sum feat: improve install flow and file-based configuration (#1) 2026-05-09 08:47:51 +00:00
install.sh feat: improve install flow and file-based configuration (#1) 2026-05-09 08:47:51 +00:00
llama-suspendd.conf feat: improve install flow and file-based configuration (#1) 2026-05-09 08:47:51 +00:00
README.md feat: improve install flow and file-based configuration (#1) 2026-05-09 08:47:51 +00:00

llama-suspendd

llama-suspendd is a small Linux daemon for hosts dedicated to llama-swap. It watches the llama-swap event stream and runs a power action after backend API inactivity.

It ignores UI traffic, keeps the machine awake during active remote SSH sessions, and can optionally block suspend or shutdown while NVIDIA GPU utilization or VRAM usage is above a configured threshold. If it loses the event stream or cannot make a safe decision, it does nothing.

Behavior

Tracked backend routes:

POST /v1/completions
POST /v1/chat/completions
POST /v1/responses
POST /v1/embeddings
POST /v1/audio/speech
POST /v1/audio/transcriptions
POST /v1/audio/voices
GET  /v1/audio/voices
POST /v1/images/generations
POST /v1/images/edits
POST /v1/messages
POST /v1/messages/count_tokens

The daemon connects to /api/events, tracks inflight request events, and records completed tracked requests from proxy logData events. On first startup it writes the current time to the state file, so a newly installed service waits one full idle window before taking action.

Requirements

  • Linux with systemd
  • Reachable llama-swap instance
  • loginctl for SSH session detection
  • nvidia-smi only when GPU_GUARD_ENABLED=true

Install

Run the installer on the llama-swap host:

curl -fsSL https://git.xcd.dev/gabrielcosi/llama-suspendd/raw/branch/main/install.sh | sh

The script downloads the latest release binary for linux/amd64 or linux/arm64, installs it to /usr/local/bin, installs llama-suspendd.conf to /etc/llama-suspendd/config if missing, creates the systemd service, and enables it.

Configuration

Edit /etc/llama-suspendd/config, then restart the service:

sudo systemctl restart llama-suspendd.service
Variable Default Meaning
LLAMA_SWAP_URL http://127.0.0.1:9292 Base URL for llama-swap
IDLE_SECONDS 600 Idle time before the power action is allowed
CHECK_INTERVAL_SECONDS 30 Power decision interval
RECONNECT_DELAY_SECONDS 5 Delay before reconnecting to the event stream
RECONNECT_GRACE_SECONDS IDLE_SECONDS Safety window after reconnect
SESSION_CHECK_TIMEOUT_SECONDS 5 Timeout for each loginctl check
POWER_ACTION shutdown suspend or shutdown
GPU_GUARD_ENABLED false Enable NVIDIA GPU checks
GPU_CHECK_TIMEOUT_SECONDS 5 Timeout for each nvidia-smi check
GPU_UTILIZATION_THRESHOLD_PERCENT 0 Block at or above this GPU utilization; 0 disables
GPU_MEMORY_THRESHOLD_PERCENT 0 Block at or above this VRAM usage; 0 disables

Runtime-only options are binary flags: -config, -state-file, and -dry-run.

Development

go test ./...
go build ./cmd/llama-suspendd
go run ./cmd/llama-suspendd

Troubleshooting

journalctl -u llama-suspendd.service -f
curl -N http://127.0.0.1:9292/api/events
loginctl list-sessions --no-legend