1 year ago · 26b2d6a9e5
--- a/README.md
+++ b/README.md
@@ -34,17 +34,23 @@ TODO
 
				 
			
 
				 ## Finding the physical device for an OSD
			
 
				 
			
 
				- ceph osd metadata <id>
			
 
				+`ceph osd metadata <id> | grep -e '"hostname"' -e '"bluestore_bdev_dev_node"'`
			
 
				 
			
 
				+```
			
 
				+$ ceph osd metadata osd.1 | grep -e '"hostname"' -e '"bluestore_bdev_dev_node"'
			
 
				+    "bluestore_bdev_dev_node": "/dev/sdd",
			
 
				+    "hostname": "node1",
			
 
				+```
			
 
				 
			
 
				+## tolerations
			
 
				 
			
 
				+My setup divides k8s nodes into ceph & non-ceph nodes (using the label `storage-node=true`).
			
 
				 
			
 
				-## tolerations
			
 
				-If your setup divides k8s nodes into ceph & non-ceph nodes (using a label, like `storage-node=true`), ensure labels & a toleration are set properly (`storage-node=false`, with a toleration checking for `storage-node`) so non-ceph nodes still run PV plugin Daemonsets.
			
 
				+Ensure labels & a toleration are set properly, so non-rook nodes can still run PV plugin Daemonsets. I accomplished this with a `storage-node=false` label on non-rook nodes, with a toleration checking for `storage-node`.
			
 
				 
			
 
				 Otherwise, any pod scheduled on a non-ceph node won't be able to mount ceph-backed PVCs.
			
 
				 
			
 
				-See rook-ceph-cluster-values.yaml->cephClusterSpec->placement for an example.
			
 
				+See `rook-ceph-cluster-values.yaml->cephClusterSpec->placement` for an example.
			
 
				 
			
 
				 ## CephFS
			
 
				 
			
@@ -96,7 +102,7 @@ If hostNetwork is enabled on the cluster, ensure rook-ceph-operator is not runni
 
				 
			
 
				 This is great for setting up easy public downloads.
			
 
				 
			
 
				-- Create a user (rook/buckets/user-josh.yaml)
			
 
				+- Create a user (see `rook/buckets/user-josh.yaml`)
			
 
				 - `kubectl -n rook-ceph get secret rook-ceph-object-user-ceph-objectstore-josh -o go-template='{{range $k,$v := .data}}{{printf "%s: " $k}}{{if not $v}}{{$v}}{{else}}{{$v | base64decode}}{{end}}{{"\n"}}{{end}}`
			
 
				 - Create bucket (`rook/buckets/bucket.py::create_bucket`)
			
 
				 - Set policy (`rook/buckets/bucket.py::set_public_read_policy`)
			
@@ -142,9 +148,9 @@ sudo update-initramfs -u
 
				 
			
 
				 ## configure containerd to use nvidia by default
			
 
				 
			
 
				-Copy https://github.com/k3s-io/k3s/blob/v1.24.2%2Bk3s2/pkg/agent/templates/templates_linux.go into /var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl (substitute your k3s version)
			
 
				+Copy https://github.com/k3s-io/k3s/blob/v1.24.2%2Bk3s2/pkg/agent/templates/templates_linux.go into `/var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl` (substitute your k3s version)
			
 
				 
			
 
				-Edit the file:
			
 
				+Edit the file to add a `[plugins.cri.containerd.runtimes.runc.options]` section:
			
 
				 
			
 
				 ```
			
 
				 <... snip>
			
@@ -176,7 +182,7 @@ KUBECONFIG=/etc/rancher/k3s/k3s.yaml helm upgrade -i nvdp nvdp/nvidia-device-plu
 
				 
			
 
				 Ensure the pods on the namespace are Running.
			
 
				 
			
 
				-Test GPU passthrough by applying examples/cuda-pod.yaml, then exec-ing into it & running `nvidia-smi`.
			
 
				+Test GPU passthrough by applying `examples/cuda-pod.yaml`, then exec-ing into it & running `nvidia-smi`.
			
 
				 
			
 
				 ## Sharing GPU
			
 
				 
			
@@ -215,7 +221,6 @@ sudo vi /etc/fstab
 
				 192.168.1.1,192.168.1.2:/    /ceph   ceph    name=admin,secret=<secret key>,x-systemd.mount-timeout=5min,_netdev,mds_namespace=data
			
 
				 ```
			
 
				 
			
 
				-
			
 
				 # disable mitigations
			
 
				 https://unix.stackexchange.com/questions/554908/disable-spectre-and-meltdown-mitigations
			
 
				 
			
@@ -235,7 +240,14 @@ Service will then be available on port 1234 of any k8s node.
 
				 
			
 
				 # Backups
			
 
				 
			
 
				-## velero
			
 
				+My backups target is a machine running
			
 
				+- k3s
			
 
				+- minio
			
 
				+- velero
			
 
				+
			
 
				+Important services are backed up with velero to the remote minio instance. These backups are restored to the remote k3s instance to ensure functionality.
			
 
				+
			
 
				+## installing velero
			
 
				 ```
			
 
				 KUBECONFIG=/etc/rancher/k3s/k3s.yaml velero install \
			
 
				  --provider aws \
			
@@ -251,7 +263,7 @@ KUBECONFIG=/etc/rancher/k3s/k3s.yaml velero install \
 
				 
			
 
				 Had to remove `resources:` from the daemonset.
			
 
				 
			
 
				-### Change s3 target:
			
 
				+### Change s3 target after install
			
 
				 ```
			
 
				 kubectl -n velero edit backupstoragelocation default
			
 
				 ```
			
@@ -271,12 +283,14 @@ This is a nice PVC option for simpler backup target setups.
 
				 
			
 
				 # libvirtd
			
 
				 
			
 
				-...
			
 
				+TODO. This would be nice for one-off Windows game servers.
			
 
				 
			
 
				 # Still to do
			
 
				 
			
 
				-- deluge
			
 
				+- bittorrent + VPN
			
 
				 - gogs ssh ingress?
			
 
				   - can't go through cloudflare without cloudflared on the client
			
 
				   - cloudflared running in the gogs pod?
			
 
				+  - do gitea or gitlab have better options?
			
 
				 - Something better than `expose` for accessing internal services
			
 
				+  - short term, capture the resource definition YAML & save it alongside the service