Server prerequisites (macOS)
Install Homebrew tools on your Mac mini (or any Apple Silicon machine):
# Homebrew (if not installed) /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)" # Required tools brew install yq # YAML parser used by launch.sh brew install uv # Python package manager
uv, tmux, and node_exporter are also installed automatically on each Pi by launch.sh — no need to install them manually on the workers.
Pi prerequisites
Follow the Raspberry Pi cluster setup guide to get your Pis networked and SSH-accessible. Then on each Pi:
sudo apt update && sudo apt install -y python3.13 python3.13-venv curl git
Host alias in ~/.ssh/config must exactly match the host field in configs/config.yaml. launch.sh uses those values directly as SSH targets.
Verify SSH access from the server:
ssh pi4-1 # or whatever alias you chose
Clone & configure
Clone the repo on the server and install Python dependencies:
git clone https://github.com/YuvrajSingh-mist/smoltorrent cd smoltorrent uv sync
Edit configs/config.yaml — set ckpt_root and each worker's host, ip, port, and rank:
curl http://localhost:8000/discover after launching — mDNS discovery will find all workers on the network automatically.
Launch the cluster
One command rsyncs the codebase to every Pi, installs deps, and starts everything in tmux:
bash scripts/launch.sh
This starts:
syncps_api— FastAPI server on the master (port 8000)syncps_watcher— Watcher daemon on the mastersyncps_worker_N— TCP worker on each Pi (port 5001+)
Useful launch flags
| Flag | What it does |
|---|---|
--dry-run | Print what would happen, no SSH or launches |
--api-only | Heartbeat-check workers, start API only |
--workers 1,3 | Launch only specific worker ranks |
--ext .safetensors,.pth | Override file extensions the watcher monitors |
Watch logs
# API / watcher tmux attach -t syncps_api tmux attach -t syncps_watcher # Worker (SSH first) ssh pi4-1 && tmux attach -t syncps_worker_1 # All logs tail -f logging/cluster-logs/*.log
Pi worker auto-start
Install a systemd service on each Pi so workers restart automatically after a Pi reboot — without waiting for the server:
# All 4 workers bash scripts/install_worker_service.sh # Specific ranks only bash scripts/install_worker_service.sh --workers 1,3 # Remove from all bash scripts/install_worker_service.sh --uninstall
launch.sh also kills and re-launches workers via tmux on every run — systemd and tmux are independent. If both are running, systemd's process will fail to bind the port and retry after 5 s (harmless).
Server auto-start at boot
Register macOS LaunchDaemons so the entire cluster comes up after a server reboot — no manual intervention:
bash scripts/launch.sh --daemons
This registers two system daemons:
com.smoltorrent.startup— waits for network (pings first worker every 5 s), then runslaunch.shcom.node-exporter— keepsnode_exporterrunning for Grafana system stats
Verify & check logs
# Check both are registered sudo launchctl print system/com.smoltorrent.startup sudo launchctl print system/com.node-exporter # Startup log (after reboot) cat /tmp/smoltorrent-startup.log
launchctl print may show last exit code = 1 and state = not running — this is expected. The daemon runs once at boot, launches everything into tmux, then exits. The cluster keeps running in tmux independently.
Remove / uninstall
sudo launchctl bootout system/com.smoltorrent.startup 2>/dev/null || true sudo rm -f /Library/LaunchDaemons/com.smoltorrent.startup.plist sudo rm -f /usr/local/bin/smoltorrent_startup.sh sudo launchctl bootout system/com.node-exporter 2>/dev/null || true sudo rm -f /Library/LaunchDaemons/com.node-exporter.plist
Monitoring (Prometheus + Grafana + Loki)
9200+rank. Everything runs in Docker on the server only.
1. Install Docker (via colima on macOS)
brew install colima docker docker-compose colima start
2. Configure & start the stack
# Copy and fill in credentials cp monitoring/.env.example monitoring/.env # Edit monitoring/.env — set Gmail app password for alert emails # Start Prometheus + Grafana + Loki cd monitoring && docker compose up -d # Grafana → http://<master-ip>:3000 (admin / smoltorrent)
Metrics endpoints
| Source | URL | What |
|---|---|---|
| Master API | <master-ip>:8000/metrics | FastAPI + transfer metrics |
| Pi worker N | <pi-ip>:920N/metrics | Per-worker shard/transfer metrics |
| All nodes | <node-ip>:9100/metrics | System stats (node_exporter) |
monitoring/README.md in the repo for the full metrics reference and dashboard panel guide.
Requirements summary
| Dependency | Where | How |
|---|---|---|
| Python ≥ 3.13 | All nodes | Manual on Pis; already on macOS |
| uv | All nodes | Auto by launch.sh |
| tmux ≥ 3.0 | All nodes | Auto by launch.sh |
| yq | Server only | brew install yq |
| node_exporter | All nodes | Auto by launch.sh |
| zeroconf | All nodes | Auto by launch.sh (mDNS discovery) |
| Network (LAN/VPN) | All nodes | Nodes must reach each other over TCP |
| SSH key auth | Server → Pis | ssh-copy-id (step 2) |
| Docker + colima | Server only | Monitoring only — no SSH needed |