Complete server deployment config

Josh Bicking bbdc4905fd no fs resource limits, image updates 6 months ago
backup bbdc4905fd no fs resource limits, image updates 6 months ago
elasticsearch 17b77eed0c add mastodon and sonarr 2 years ago
examples 17b77eed0c add mastodon and sonarr 2 years ago
immich bbdc4905fd no fs resource limits, image updates 6 months ago
monitoring bf82e30c70 fix & document traefik & nextcloud 7 months ago
nextcloud bbdc4905fd no fs resource limits, image updates 6 months ago
postgres c3eb2abc4a update tags, add run-one cronjobs to scripts 1 year ago
redis ede6421bba add redis 2 years ago
rook bbdc4905fd no fs resource limits, image updates 6 months ago
.gitignore 326976f29b finishing touches to restore script 1 year ago
README.md bbdc4905fd no fs resource limits, image updates 6 months ago
bazarr-pvc.yaml a32b479aa2 add diun, immich, syncthing 11 months ago
bazarr.yaml 80f3e5e987 add lan entires for internal service access 6 months ago
blog.yaml 17b77eed0c add mastodon and sonarr 2 years ago
cloudflared.yaml bf82e30c70 fix & document traefik & nextcloud 7 months ago
data-pv.yaml 326976f29b finishing touches to restore script 1 year ago
data-pvc.yaml dadd05a339 add ntfy, start on backup strategy 1 year ago
diun-pvc.yaml a32b479aa2 add diun, immich, syncthing 11 months ago
diun.yaml a32b479aa2 add diun, immich, syncthing 11 months ago
duplicati-pvc.yaml dadd05a339 add ntfy, start on backup strategy 1 year ago
duplicati.yaml 80f3e5e987 add lan entires for internal service access 6 months ago
gogs-pvc.yaml bbdc4905fd no fs resource limits, image updates 6 months ago
gogs.yaml bbdc4905fd no fs resource limits, image updates 6 months ago
homeassistant-pvc.yaml 7cfa4556cf update services 8 months ago
homeassistant.yaml bbdc4905fd no fs resource limits, image updates 6 months ago
inotify-consumers.sh 4bcbe4d51d add inotify watchers script 1 year ago
jellyfin-pvc.yaml 1440bee64a gpu sharing & object storage working 2 years ago
jellyfin.yaml bf82e30c70 fix & document traefik & nextcloud 7 months ago
lidarr-pvc.yaml 34f822e12a nextcloud real IPs (but not locally) 1 year ago
lidarr.yaml 80f3e5e987 add lan entires for internal service access 6 months ago
lidarr_empty_folders.py c3eb2abc4a update tags, add run-one cronjobs to scripts 1 year ago
makemkv.sh bf82e30c70 fix & document traefik & nextcloud 7 months ago
mastodon.yaml 17b77eed0c add mastodon and sonarr 2 years ago
matrix-pvc.yaml c671a0f368 k3s only 2 years ago
matrix.yaml 2f60ae93f9 add vaultwarden 2 years ago
miniflux.yaml 7cfa4556cf update services 8 months ago
ntfy-pvc.yaml dadd05a339 add ntfy, start on backup strategy 1 year ago
ntfy.yaml dadd05a339 add ntfy, start on backup strategy 1 year ago
nvidia-device-plugin-config.yaml 1440bee64a gpu sharing & object storage working 2 years ago
plex-pvc.yaml dadd05a339 add ntfy, start on backup strategy 1 year ago
plex.yaml bf82e30c70 fix & document traefik & nextcloud 7 months ago
prowlarr-pvc.yaml a32b479aa2 add diun, immich, syncthing 11 months ago
prowlarr.yaml 80f3e5e987 add lan entires for internal service access 6 months ago
radarr-pvc.yaml a32b479aa2 add diun, immich, syncthing 11 months ago
radarr.yaml 80f3e5e987 add lan entires for internal service access 6 months ago
secret-example.yaml dadd05a339 add ntfy, start on backup strategy 1 year ago
seedbox_sync.py bf82e30c70 fix & document traefik & nextcloud 7 months ago
selfoss-pvc.yaml c671a0f368 k3s only 2 years ago
selfoss.yaml c671a0f368 k3s only 2 years ago
sonarr-pvc.yaml a32b479aa2 add diun, immich, syncthing 11 months ago
sonarr.yaml bbdc4905fd no fs resource limits, image updates 6 months ago
syncthing-pvc.yaml a32b479aa2 add diun, immich, syncthing 11 months ago
syncthing.yaml bf82e30c70 fix & document traefik & nextcloud 7 months ago
temp-pvc-pod.yaml 30507da839 updates and cephfs docs 1 year ago
traefik-configmap.yaml 80f3e5e987 add lan entires for internal service access 6 months ago
traefik-dashboard.yaml c671a0f368 k3s only 2 years ago
traefik-helmchartconfig.yaml 80f3e5e987 add lan entires for internal service access 6 months ago
vaultwarden-pvc.yaml 2f60ae93f9 add vaultwarden 2 years ago
vaultwarden.yaml 7cfa4556cf update services 8 months ago
whoami.yaml c3eb2abc4a update tags, add run-one cronjobs to scripts 1 year ago

README.md

k3s + rook Homelab

Writeup still a WIP, please pardon the dust.

Below is mostly braindumps & rough commands for creating/tweaking these services. Formal writeup coming soon!

k3s

installing k3s

curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="server --cluster-init" sh -
export NODE_TOKEN=$(cat /var/lib/rancher/k3s/server/node-token)
curl -sfL https://get.k3s.io | K3S_TOKEN=$NODE_TOKEN INSTALL_K3S_EXEC="server --server https://192.168.122.87:6443" INSTALL_K3S_VERSION=v1.23.6+k3s1 sh -

upgrading k3s

TODO

purging k3s image cache

$ sudo crictl rmi --prune

limiting log size

k3s logs a lot.

In /etc/systemd/journald.conf, set "SystemMaxUse=100M"

In /etc/logrotate.conf, set "size 100M"

purging containerd snapshots

https://github.com/containerd/containerd/blob/main/docs/content-flow.md

containerd really doesn't want you batch-deleting snapshots.

https://github.com/k3s-io/k3s/issues/1905#issuecomment-820554037

for sha in $(sudo k3s ctr snapshot usage | awk '{print $1}'); do sudo k3s ctr snapshot rm $sha && echo $sha; done

Run this a few times until it stops returning results.

ingress

Uses traefik, the k3s default.

externalTrafficPolicy: Local is used to preserve forwarded IPs.

A cluster-ingress=true label is given to the node my router is pointing to. Some services use a nodeAffinity to request it.

For traefik, this is a harmless optimization to reduce traffic hairpinning. For pods with hostNetwork: true, this ensures they run on the node with the right IP.

rook

installing rook

KUBECONFIG=/etc/rancher/k3s/k3s.yaml helm upgrade --install --create-namespace --namespace rook-ceph rook-ceph rook-release/rook-ceph:1.9.2 -f rook-ceph-values.yaml

KUBECONFIG=/etc/rancher/k3s/k3s.yaml helm install --create-namespace --namespace rook-ceph rook-ceph-cluster --set operatorNamespace=rook-ceph rook-release/rook-ceph-cluster:1.9.2 -f rook-ceph-cluster-values.yaml

upgrading rook

TODO

Finding the physical device for an OSD

ceph osd metadata <id> | grep -e '"hostname"' -e '"bluestore_bdev_dev_node"'

$ ceph osd metadata osd.1 | grep -e '"hostname"' -e '"bluestore_bdev_dev_node"'
    "bluestore_bdev_dev_node": "/dev/sdd",
    "hostname": "node1",

tolerations

My setup divides k8s nodes into ceph & non-ceph nodes (using the label storage-node=true).

Ensure labels & a toleration are set properly, so non-rook nodes can still run PV plugin Daemonsets. I accomplished this with a storage-node=false label on non-rook nodes, with a toleration checking for storage-node.

Otherwise, any pod scheduled on a non-ceph node won't be able to mount ceph-backed PVCs.

See rook-ceph-cluster-values.yaml->cephClusterSpec->placement for an example.

CephFS

EC backing pool

EC-backed filesystems require a regular replicated pool as a default.

https://lists.ceph.io/hyperkitty/list/[email protected]/thread/QI42CLL3GJ6G7PZEMAD3CXBHA5BNWSYS/ https://tracker.ceph.com/issues/42450

Then setfattr a directory on the filesystem with an EC-backed pool. Any new data written to the folder will go to the EC-backed pool.

setfattr -n ceph.dir.layout.pool -v cephfs-erasurecoded /mnt/cephfs/my-erasure-coded-dir

https://docs.ceph.com/en/quincy/cephfs/file-layouts/

Sharing 1 CephFS instance between multiple PVCs

https://github.com/rook/rook/blob/677d3fa47f21b07245e2e4ab6cc964eb44223c48/Documentation/Storage-Configuration/Shared-Filesystem-CephFS/filesystem-storage.md

Create CephFilesystem Create SC backed by Filesystem & Pool Ensure the CSI subvolumegroup was created. If not, ceph fs subvolumegroup create <fsname> csi Create PVC without a specified PV: PV will be auto-created Super important: Set created PV to ReclaimPolicy: Retain Create a new, better-named PVC

Resizing a CephFS PVC

Grow resources->storage on PV Grow resources->storage on PVC

Verify the new limit: getfattr -n ceph.quota.max_bytes /mnt/volumes/csi/csi-vol-<uuid>/<uuid>

Crush rules for each pool

for i in ceph osd pool ls; do echo $i: ceph osd pool get $i crush_rule; done

On ES backed pools, device class information is in the erasure code profile, not the crush rule. https://docs.ceph.com/en/latest/dev/erasure-coded-pool/

for i in ceph osd erasure-code-profile ls; do echo $i: ceph osd erasure-code-profile get $i; done

ObjectStore

If hostNetwork is enabled on the cluster, ensure rook-ceph-operator is not running with hostNetwork enable. It doesn't need host network access to orchestrate the cluster, & impedes orchestration of objectstores & associated resources.

public s3-interface bucket listing w/ HTML

This is great for setting up easy public downloads.

  • Create a user (see rook/buckets/user-josh.yaml)
  • kubectl -n rook-ceph get secret rook-ceph-object-user-ceph-objectstore-josh -o go-template='{{range $k,$v := .data}}{{printf "%s: " $k}}{{if not $v}}{{$v}}{{else}}{{$v | base64decode}}{{end}}{{"\n"}}{{end}}
  • Create bucket (rook/buckets/bucket.py::create_bucket)
  • Set policy (rook/buckets/bucket.py::set_public_read_policy)
  • Upload file

    from bucket import *
    conn = connect()
    conn.upload_file('path/to/s3-bucket-listing/index.html', 'public', 'index.html', ExtraArgs={'ContentType': 'text/html'})
    

nvidia driver (on debian)

curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey |   sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list |   sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list

wget https://developer.download.nvidia.com/compute/cuda/11.6.2/local_installers/cuda-repo-debian11-11-6-local_11.6.2-510.47.03-1_amd64.deb
sudo dpkg -i cuda-repo-debian11-11-6-local_11.6.2-510.47.03-1_amd64.deb
sudo apt-key add /var/cuda-repo-debian11-11-6-local/7fa2af80.pub
sudo apt-get update

install kernel headers

sudo apt install cuda nvidia-container-runtime nvidia-kernel-dkms

sudo apt install --reinstall nvidia-kernel-dkms

verify dkms is actually running

sudo vi /etc/modprobe.d/blacklist-nvidia-nouveau.conf

blacklist nouveau
options nouveau modeset=0

sudo update-initramfs -u

configure containerd to use nvidia by default

Copy https://github.com/k3s-io/k3s/blob/v1.24.2%2Bk3s2/pkg/agent/templates/templates_linux.go into /var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl (substitute your k3s version)

Edit the file to add a [plugins.cri.containerd.runtimes.runc.options] section:

<... snip>
  conf_dir = "{{ .NodeConfig.AgentConfig.CNIConfDir }}"
{{end}}
[plugins.cri.containerd.runtimes.runc]
  runtime_type = "io.containerd.runc.v2"

[plugins.cri.containerd.runtimes.runc.options]
  BinaryName = "/usr/bin/nvidia-container-runtime"

{{ if .PrivateRegistryConfig }}
<... snip>

& then systemctl restart k3s

Label your GPU-capable nodes: kubectl label nodes <node name> gpu-node=true

& then install the nvidia device plugin:

helm repo add nvdp https://nvidia.github.io/k8s-device-plugin
helm repo update
KUBECONFIG=/etc/rancher/k3s/k3s.yaml helm upgrade -i nvdp nvdp/nvidia-device-plugin --version=0.12.2 --namespace nvidia-device-plugin --create-namespace --set-string nodeSelector.gpu-node=true

Ensure the pods on the namespace are Running.

Test GPU passthrough by applying examples/cuda-pod.yaml, then exec-ing into it & running nvidia-smi.

Sharing GPU

https://github.com/NVIDIA/k8s-device-plugin#shared-access-to-gpus-with-cuda-time-slicing

version: v1
sharing:
  timeSlicing:
    renameByDefault: false
    failRequestsGreaterThanOne: false
    resources:
    - name: nvidia.com/gpu
      replicas: 5
$ helm upgrade -i nvdp nvdp/nvidia-device-plugin ... --set-file config.map.config=nvidia-device-plugin-config.yaml

ceph client for cephfs volumes

New method

https://docs.ceph.com/en/latest/man/8/mount.ceph/

sudo mount -t ceph user@<cluster FSID>.<filesystem name>=/ /mnt/ceph -o secret=<secret key>,x-systemd.requires=ceph.target,x-systemd.mount-timeout=5min,_netdev,mon_addr=192.168.1.1

Older method (stopped working for me around Pacific)

sudo vi /etc/fstab

192.168.1.1,192.168.1.2:/    /ceph   ceph    name=admin,secret=<secret key>,x-systemd.mount-timeout=5min,_netdev,mds_namespace=data

disable mitigations

https://unix.stackexchange.com/questions/554908/disable-spectre-and-meltdown-mitigations

Monitoring

https://rpi4cluster.com/monitoring/monitor-intro/, + what's in the monitoring folder.

Tried https://github.com/prometheus-operator/kube-prometheus. The only way to persist dashboards is to add them to Jsonnet & apply the generated configmap. I'm not ready for that kind of IaC commitment in a homelab.

Exposing internal services

kubectl expose svc/some-service --name=some-service-external --port 1234 --target-port 1234 --type LoadBalancer

Service will then be available on port 1234 of any k8s node.

Backups

My backups target is a machine running

  • k3s
  • minio
  • velero

Important services are backed up with velero to the remote minio instance. These backups are restored to the remote k3s instance to ensure functionality.

installing velero

KUBECONFIG=/etc/rancher/k3s/k3s.yaml velero install \
 --provider aws \
 --plugins velero/velero-plugin-for-aws:v1.0.0 \
 --bucket velero  \
 --secret-file ./credentials-velero  \
 --use-volume-snapshots=true \
 --default-volumes-to-fs-backup \
 --use-node-agent \
 --backup-location-config region=default,s3ForcePathStyle="true",s3Url=http://172.16.69.234:9000  \
 --snapshot-location-config region="default"

Had to remove resources: from the daemonset.

Change s3 target after install

kubectl -n velero edit backupstoragelocation default

Using a local storage storageClass in the target

https://velero.io/docs/v1.3.0/restore-reference/

Velero does not support hostPath PVCs, but works just fine with the openebs-hostpath storageClass.

KUBECONFIG=/etc/rancher/k3s/k3s.yaml helm install openebs --namespace openebs openebs/openebs --create-namespace --set localprovisioner.basePath=/k3s-storage/openebs

This is a nice PVC option for simpler backup target setups.

libvirtd

TODO. This would be nice for one-off Windows game servers.

Still to do

  • bittorrent + VPN
  • gogs ssh ingress?
    • can't go through cloudflare without cloudflared on the client
    • cloudflared running in the gogs pod?
    • do gitea or gitlab have better options?