Sen descrición

Josh Bicking add61fa72b remove old manifests		hai 2 días
argocd	a87a18938a miniflux to 2.3.2	hai 1 semana
backup	65e19a56e2 disable backup cloudflared until i configure a new tunnel addr	hai 4 meses
scripts	493c06e4b2 update applications	hai 4 meses
secrets	b0c7b84657 add external-secrets	hai 4 meses
.gitignore	b0c7b84657 add external-secrets	hai 4 meses
README.md	add61fa72b remove old manifests	hai 2 días
osd-purge-example.yaml	bc320c9795 move osd purge example out of argocd	hai 4 meses
temp-pvc-pod.yaml	30507da839 updates and cephfs docs	%!s(int64=3) %!d(string=hai) anos

k3s + rook Homelab

Writeup still a WIP, please pardon the dust.

Below is mostly braindumps & rough commands for creating/tweaking these services. Formal writeup coming soon!

Applications

Public facing services

Service	Uptime (1mo)	ArgoCD
copyparty
gogs
plex
homeassistant
jellyfin
miniflux
ntfy

Infra

Application	ArgoCD
argocd
rook
cloudflared
media-automation
traefik
monitoring
upgrade-plan
duplicati

Why?

TODO

k3s

installing k3s

# First node
curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.34.3+k3s1 INSTALL_K3S_EXEC="server --cluster-init" sh -
export NODE_TOKEN=$(cat /var/lib/rancher/k3s/server/node-token)

# Remaining nodes
curl -sfL https://get.k3s.io | K3S_TOKEN=$NODE_TOKEN INSTALL_K3S_VERSION=v1.34.3+k3s1 INSTALL_K3S_EXEC="server --server https://<server node ip>:6443 --kubelet-arg=allowed-unsafe-sysctls=net.ipv4.*,net.ipv6.conf.all.forwarding" sh -

# All nodes
# /etc/sysctl.d/01-kube.conf
fs.inotify.max_user_watches = 524288
fs.inotify.max_user_instances = 4096

upgrading k3s

https://docs.k3s.io/upgrades/automated

Ensure you account for any node taints. Anecdotal, but I had one node fail to run upgrade pods due to a taint, & it appeared upgrades were postponed across the entire cluster.

purging k3s image cache

$ sudo crictl rmi --prune

purging containerd snapshots

https://github.com/containerd/containerd/blob/main/docs/content-flow.md

containerd really doesn't want you batch-deleting snapshots.

https://github.com/k3s-io/k3s/issues/1905#issuecomment-820554037

Run the below command a few times until it stops returning results:

sudo k3s ctr -n k8s.io i rm $(sudo k3s ctr -n k8s.io i ls -q)

This other command below has given me problems before, but may purge more images. Beware of error unpacking image: failed to extract layer sha256:1021ef88c7974bfff89c5a0ec4fd3160daac6c48a075f74cff721f85dd104e68: failed to get reader from content store: content digest sha256:fbe1a72f5dcd08ba4ca3ce3468c742786c1f6578c1f6bb401be1c4620d6ff705: not found (if it's not found... redownload it??)

for sha in $(sudo k3s ctr snapshot usage | awk '{print $1}'); do sudo k3s ctr snapshot rm $sha && echo $sha; done

ingress

Uses traefik, the k3s default.

externalTrafficPolicy: Local is used to preserve forwarded IPs.

A cluster-ingress=true label is given to the node my router is pointing to. Some services use a nodeAffinity to request it. (ex: for pods with hostNetwork: true, this ensures they run on the node with the right IP)

argocd

bootstrap

https://argo-cd.readthedocs.io/en/stable/getting_started/

kubectl create namespace argocd
kubectl apply -n argocd --server-side --force-conflicts -f https://raw.githubusercontent.com/argoproj/argo-cd/v3.3.1/manifests/install.yaml

& install the CLI: https://argo-cd.readthedocs.io/en/stable/cli_installation/

webhooks

The default admin account does not have the ability to generate api keys, so make a dedicated webhook user:

$ kubectl -n argocd edit configmap argocd-cm

...
data:
  accounts.webhook: apiKey
...

Generate a token for the user:

argocd account generate-token --account webhook

rook

installing rook

See rook/rook-ceph-operator-values.yaml and rook/rook-ceph-cluster-values.yaml.

upgrading rook

https://rook.io/docs/rook/latest-release/Upgrade/rook-upgrade/?h=upgrade

upgrading ceph

https://rook.io/docs/rook/latest-release/Upgrade/ceph-upgrade/?h=upgrade

Finding the physical device for an OSD

ceph osd metadata <id> | grep -e '"hostname"' -e '"bluestore_bdev_dev_node"'

$ ceph osd metadata osd.1 | grep -e '"hostname"' -e '"bluestore_bdev_dev_node"'
    "bluestore_bdev_dev_node": "/dev/sdd",
    "hostname": "node1",

tolerations

My setup divides k8s nodes into ceph & non-ceph nodes (using the label storage-node=true).

Ensure labels & a toleration are set properly, so non-rook nodes can still run PV plugin Daemonsets. I accomplished this with a storage-node=false label on non-rook nodes, with a toleration checking for storage-node.

Otherwise, any pod scheduled on a non-ceph node won't be able to mount ceph-backed PVCs.

See rook-ceph-cluster-values.yaml->cephClusterSpec->placement for an example.

CephFS

EC backing pool

EC-backed filesystems require a regular replicated pool as a default.

https://lists.ceph.io/hyperkitty/list/[email protected]/thread/QI42CLL3GJ6G7PZEMAD3CXBHA5BNWSYS/ https://tracker.ceph.com/issues/42450

Then setfattr a directory on the filesystem with an EC-backed pool. Any new data written to the folder will go to the EC-backed pool.

setfattr -n ceph.dir.layout.pool -v cephfs-erasurecoded /mnt/cephfs/my-erasure-coded-dir

https://docs.ceph.com/en/quincy/cephfs/file-layouts/

Sharing 1 CephFS instance between multiple PVCs

https://github.com/rook/rook/blob/677d3fa47f21b07245e2e4ab6cc964eb44223c48/Documentation/Storage-Configuration/Shared-Filesystem-CephFS/filesystem-storage.md

Create CephFilesystem Create SC backed by Filesystem & Pool Ensure the CSI subvolumegroup was created. If not, ceph fs subvolumegroup create <fsname> csi Create PVC without a specified PV: PV will be auto-created Super important: Set created PV's persistentVolumeReclaimPolicy to Retain Save the PV yaml, remove any extra information (see rook/data/data-static-pv.yaml for an example of what's required). Give it a more descriptive name. Delete the PVC, and PV. Apply your new PV YAML. Create a new PVC, pointing at this new PV.

Resizing a CephFS PVC

Grow resources->storage on PV Grow resources->storage on PVC

Verify the new limit: getfattr -n ceph.quota.max_bytes /mnt/volumes/csi/csi-vol-<uuid>/<uuid>

Deleting a CephFS instance

Removing a cephfs instance with a subvolume group requires deleting the group + all snapshots.

Simply deleting the CephFileSystem CRD may result in this error appearing in operator logs:

2026-02-08 17:27:15.558449 E | ceph-file-controller: failed to reconcile CephFilesystem "rook-ceph/data" will not be deleted until all dependents are removed: filesystem subvolume groups that contain subvolumes (could be from CephFilesystem PVCs or CephNFS exports): [csi]

Trying to remove the subvolumegroup may indicate it has snapshots:

$ kubectl rook-ceph ceph fs subvolumegroup rm data csi                                                                          
Info: running 'ceph' command with args: [fs subvolumegroup rm data csi]
Error ENOTEMPTY: subvolume group csi contains subvolume(s) or retained snapshots of deleted subvolume(s)
Error: . failed to run command. command terminated with exit code 39

$ kubectl rook-ceph ceph fs subvolume ls data csi                                                                               
Info: running 'ceph' command with args: [fs subvolume ls data csi]
[
    {
        "name": "csi-vol-42675a4d-052f-11ed-8662-4a986e7745e3"
    }
]

$ kubectl rook-ceph ceph fs subvolume rm data csi-vol-42675a4d-052f-11ed-8662-4a986e7745e3 csi                                  
Info: running 'ceph' command with args: [fs subvolume rm data csi-vol-42675a4d-052f-11ed-8662-4a986e7745e3 csi]

After this, CephFileSystem deletion should proceed normally.

Crush rules for each pool

for i in ceph osd pool ls; do echo $i: ceph osd pool get $i crush_rule; done

On ES backed pools, device class information is in the erasure code profile, not the crush rule. https://docs.ceph.com/en/latest/dev/erasure-coded-pool/

for i in ceph osd erasure-code-profile ls; do echo $i: ceph osd erasure-code-profile get $i; done

ObjectStore

If hostNetwork is enabled on the cluster, ensure rook-ceph-operator is not running with hostNetwork enable. It doesn't need host network access to orchestrate the cluster, & impedes orchestration of objectstores & associated resources.

public s3-interface bucket listing w/ HTML

This is great for setting up easy public downloads.

Create a user (see rook/buckets/user-josh.yaml)
kubectl -n rook-ceph get secret rook-ceph-object-user-ceph-objectstore-josh -o go-template='{{range $k,$v := .data}}{{printf "%s: " $k}}{{if not $v}}{{$v}}{{else}}{{$v | base64decode}}{{end}}{{"\n"}}{{end}}
Create bucket (rook/buckets/bucket.py::create_bucket)
Set policy (rook/buckets/bucket.py::set_public_read_policy)

Upload file

from bucket import *
conn = connect()
conn.upload_file('path/to/s3-bucket-listing/index.html', 'public', 'index.html', ExtraArgs={'ContentType': 'text/html'})

Imbalance of PGs across OSDs

https://github.com/TheJJ/ceph-balancer

See the README for how this balancing strategy compares to ceph's balancer module.

TLDR:

kubectl -n rook-ceph cp placementoptimizer.py $(kubectl -n rook-ceph get pod -l app=rook-ceph-tools -o jsonpath='{.items[0].metadata.name}'):/tmp/

kubectl -n rook-ceph exec -it deployment/rook-ceph-tools -- bash -c 'python3 /tmp/placementoptimizer.py -v balance --max-pg-moves 50 | tee /tmp/balance-upmaps'

kubectl -n rook-ceph exec -it deployment/rook-ceph-tools -- bash /tmp/balance-upmaps

ceph client for cephfs volumes

Kernel driver

New method

https://docs.ceph.com/en/latest/man/8/mount.ceph/

sudo mount -t ceph user@<cluster FSID>.<filesystem name>=/ /mnt/ceph -o secret=<secret key>,x-systemd.requires=ceph.target,x-systemd.mount-timeout=5min,_netdev,mon_addr=192.168.1.1

Older method (stopped working for me around Pacific)

sudo vi /etc/fstab

192.168.1.1,192.168.1.2:/    /ceph   ceph    name=admin,secret=<secret key>,x-systemd.mount-timeout=5min,_netdev,mds_namespace=data

FUSE

# /etc/ceph/ceph.conf
[global]
        fsid = <my cluster uuid>
        mon_host = [v2:192.168.1.1:3300/0,v1:192.168.1.1:6789/0] [v2:192.168.1.2:3300/0,v1:192.168.1.2:6789/0]

# /etc/ceph/ceph.client.admin.keyring
[client.admin]
        key = <my key>
        caps mds = "allow *"
        caps mgr = "allow *"
        caps mon = "allow *"
        caps osd = "allow *"

# /etc/fstab
none /ceph fuse.ceph ceph.id=admin,ceph.client_fs=data,x-systemd.requires=ceph.target,x-systemd.mount-timeout=5min,_netdev 0 0

nfs client

192.168.1.1:/seedbox /nfs/seedbox nfs rw,soft 0 0

disable mitigations

https://unix.stackexchange.com/questions/554908/disable-spectre-and-meltdown-mitigations

Monitoring

The monitoring folder is mostly the manifests from https://rpi4cluster.com/monitoring/monitor-intro/.

I tried https://github.com/prometheus-operator/kube-prometheus, & when I did, the only way to persist dashboards is to add them to Jsonnet & apply the generated configmap. I don't need that kind of IaC commitment in monitoring personal use dashboards.

Exposing internal services

kubectl expose

kubectl expose svc/some-service --name=some-service-external --port 1234 --target-port 1234 --type LoadBalancer

Service will then be available on port 1234 of any k8s node.

using a lan-only domain

An A record for lan.jibby.org & *.lan.jibby.org points to an internal IP.

To be safe, a middleware is included to filter out source IPs outside of the LAN network & k3s CIDR. See traefik/middleware-lanonly.yaml.

Then, internal services can be exposed with an Ingress, as a subdomain of lan.jibby.org. See examples/nginx's Ingress.

Backups

TODO: k3s, argocd, rook

These backups can be restored to the remote k3s instance to ensure functionality, or function as a secondary service instance.

velero

Using a local storage storageClass in the target

https://velero.io/docs/v1.3.0/restore-reference/

Velero does not support hostPath PVCs, but works just fine with the openebs-hostpath storageClass.

KUBECONFIG=/etc/rancher/k3s/k3s.yaml helm install openebs --namespace openebs openebs/openebs --create-namespace --set localprovisioner.basePath=/k3s-storage/openebs

This is a nice PVC option for simpler backup target setups.

proxmox networking issues

Using docker on a proxmox host can break LXC & VM networking.

https://forum.proxmox.com/threads/proxmox-vm-networking-is-bust-after-restart.166637/post-773461

TODO

[X] move to https://argo-workflows.readthedocs.io/en/latest/quick-start/
https://external-secrets.io/latest/introduction/getting-started/
upgrade rook https://rook.io/docs/rook/v1.14/Upgrade/rook-upgrade/
rook CSI snapshots https://rook.io/docs/rook/v1.19/Storage-Configuration/Ceph-CSI/ceph-csi-snapshot/
velero CSI snapshots https://velero.io/docs/v1.17/csi/
- Way to use cephfs shallow snapshots in velero: https://github.com/ceph/ceph-csi/blob/devel/docs/design/proposals/cephfs-snapshot-shallow-ro-vol.md Alternatively: ensure cephfs vols aren't included, & just let duplicati handle it.
redo backup target
- argocd + lan ui domain
- I think about my backup target way less often, IaC would be very helpful for it
- single host ceph
- removes openebs & minio requirement, plus self-healing
- external-secrets
- weekly restore + validation
redo paperless, with dedicated postgres cluster (applicationset)
- https://github.com/bitnami/charts/blob/main/bitnami/postgresql/README.md
- look at paperless-ai
Use https://github.com/dgzlopes/cdk8s-on-argocd to deduplicate main/backup manifests
write up: node affinity + eviction, how i limit non-rook pods running on rook nodes
- PreferNoSchedule taint on rook nodes
write up: seedbox setup & sharing the disk w/ NFS
finish this writeup + blog post
try https://kubevirt.io/
metallb failover, or cilium?
logs
- https://old.reddit.com/r/kubernetes/comments/y3ze83/lightweight_logging_tool_for_k3s_cluster_with/
backup over tailscale?

README.md

k3s + rook Homelab

Applications

Public facing services

Infra

Why?

k3s

installing k3s

upgrading k3s

purging k3s image cache

purging containerd snapshots

ingress

argocd

bootstrap

webhooks

rook

installing rook

upgrading rook

upgrading ceph

Finding the physical device for an OSD

tolerations

CephFS

EC backing pool

Sharing 1 CephFS instance between multiple PVCs

Resizing a CephFS PVC

Deleting a CephFS instance

Crush rules for each pool

ObjectStore

public s3-interface bucket listing w/ HTML

Imbalance of PGs across OSDs

ceph client for cephfs volumes

Kernel driver

New method

Older method (stopped working for me around Pacific)

FUSE

nfs client

disable mitigations

Monitoring

Exposing internal services

kubectl expose

using a lan-only domain

Backups

velero

Using a local storage storageClass in the target

proxmox networking issues

TODO