No description

Go 96.7%
Shell 3.3%

Find a file

Gabriel Cosi fb4fd8b386 All checks were successful CI / test (push) Successful in 4m22s Details feat: improve install flow and file-based configuration (#1 ) - Add file-based daemon configuration using Viper and a committed llama-suspendd.conf example. - Add a release installer script that downloads the correct binary, installs config/systemd files, and enables the service. - Streamline the README around binary releases, installer usage, and runtime flags. - Simplify GoReleaser asset names to avoid repeated architecture suffixes. Reviewed-on: #1		2026-05-09 08:47:51 +00:00
.forgejo/workflows	ci: limit GoReleaser parallelism	2026-05-08 22:58:34 +02:00
cmd/llama-suspendd	feat: improve install flow and file-based configuration (#1 )	2026-05-09 08:47:51 +00:00
internal/monitor	feat: improve install flow and file-based configuration (#1 )	2026-05-09 08:47:51 +00:00
.gitignore	ci: add Forgejo release automation	2026-05-08 22:24:13 +02:00
.goreleaser.yaml	feat: improve install flow and file-based configuration (#1 )	2026-05-09 08:47:51 +00:00
devbox.json	chore: add devbox	2026-05-01 20:33:49 +00:00
devbox.lock	chore: add devbox.lock to version control	2026-05-01 20:52:53 +00:00
go.mod	feat: improve install flow and file-based configuration (#1 )	2026-05-09 08:47:51 +00:00
go.sum	feat: improve install flow and file-based configuration (#1 )	2026-05-09 08:47:51 +00:00
install.sh	feat: improve install flow and file-based configuration (#1 )	2026-05-09 08:47:51 +00:00
llama-suspendd.conf	feat: improve install flow and file-based configuration (#1 )	2026-05-09 08:47:51 +00:00
README.md	feat: improve install flow and file-based configuration (#1 )	2026-05-09 08:47:51 +00:00

README.md

llama-suspendd

llama-suspendd is a small Linux daemon for hosts dedicated to llama-swap. It watches the llama-swap event stream and runs a power action after backend API inactivity.

It ignores UI traffic, keeps the machine awake during active remote SSH sessions, and can optionally block suspend or shutdown while NVIDIA GPU utilization or VRAM usage is above a configured threshold. If it loses the event stream or cannot make a safe decision, it does nothing.

Behavior

Tracked backend routes:

POST /v1/completions
POST /v1/chat/completions
POST /v1/responses
POST /v1/embeddings
POST /v1/audio/speech
POST /v1/audio/transcriptions
POST /v1/audio/voices
GET  /v1/audio/voices
POST /v1/images/generations
POST /v1/images/edits
POST /v1/messages
POST /v1/messages/count_tokens

The daemon connects to /api/events, tracks inflight request events, and records completed tracked requests from proxy logData events. On first startup it writes the current time to the state file, so a newly installed service waits one full idle window before taking action.

Requirements

Linux with systemd
Reachable llama-swap instance
loginctl for SSH session detection
nvidia-smi only when GPU_GUARD_ENABLED=true

Install

Run the installer on the llama-swap host:

curl -fsSL https://git.xcd.dev/gabrielcosi/llama-suspendd/raw/branch/main/install.sh | sh

The script downloads the latest release binary for linux/amd64 or linux/arm64, installs it to /usr/local/bin, installs llama-suspendd.conf to /etc/llama-suspendd/config if missing, creates the systemd service, and enables it.

Configuration

Edit /etc/llama-suspendd/config, then restart the service:

sudo systemctl restart llama-suspendd.service

Variable	Default	Meaning
`LLAMA_SWAP_URL`	`http://127.0.0.1:9292`	Base URL for `llama-swap`
`IDLE_SECONDS`	`600`	Idle time before the power action is allowed
`CHECK_INTERVAL_SECONDS`	`30`	Power decision interval
`RECONNECT_DELAY_SECONDS`	`5`	Delay before reconnecting to the event stream
`RECONNECT_GRACE_SECONDS`	`IDLE_SECONDS`	Safety window after reconnect
`SESSION_CHECK_TIMEOUT_SECONDS`	`5`	Timeout for each `loginctl` check
`POWER_ACTION`	`shutdown`	`suspend` or `shutdown`
`GPU_GUARD_ENABLED`	`false`	Enable NVIDIA GPU checks
`GPU_CHECK_TIMEOUT_SECONDS`	`5`	Timeout for each `nvidia-smi` check
`GPU_UTILIZATION_THRESHOLD_PERCENT`	`0`	Block at or above this GPU utilization; `0` disables
`GPU_MEMORY_THRESHOLD_PERCENT`	`0`	Block at or above this VRAM usage; `0` disables

Runtime-only options are binary flags: -config, -state-file, and -dry-run.

Development

go test ./...
go build ./cmd/llama-suspendd
go run ./cmd/llama-suspendd

Troubleshooting

journalctl -u llama-suspendd.service -f
curl -N http://127.0.0.1:9292/api/events
loginctl list-sessions --no-legend