Browse Source

update readme

Josh Bicking 3 months ago
parent
commit
26b2d6a9e5
1 changed files with 27 additions and 13 deletions
  1. 27 13
      README.md

+ 27 - 13
README.md

@@ -34,17 +34,23 @@ TODO
 
 ## Finding the physical device for an OSD
 
- ceph osd metadata <id>
+`ceph osd metadata <id> | grep -e '"hostname"' -e '"bluestore_bdev_dev_node"'`
 
+```
+$ ceph osd metadata osd.1 | grep -e '"hostname"' -e '"bluestore_bdev_dev_node"'
+    "bluestore_bdev_dev_node": "/dev/sdd",
+    "hostname": "node1",
+```
 
+## tolerations
 
+My setup divides k8s nodes into ceph & non-ceph nodes (using the label `storage-node=true`).
 
-## tolerations
-If your setup divides k8s nodes into ceph & non-ceph nodes (using a label, like `storage-node=true`), ensure labels & a toleration are set properly (`storage-node=false`, with a toleration checking for `storage-node`) so non-ceph nodes still run PV plugin Daemonsets.
+Ensure labels & a toleration are set properly, so non-rook nodes can still run PV plugin Daemonsets. I accomplished this with a `storage-node=false` label on non-rook nodes, with a toleration checking for `storage-node`.
 
 Otherwise, any pod scheduled on a non-ceph node won't be able to mount ceph-backed PVCs.
 
-See rook-ceph-cluster-values.yaml->cephClusterSpec->placement for an example.
+See `rook-ceph-cluster-values.yaml->cephClusterSpec->placement` for an example.
 
 ## CephFS
 
@@ -96,7 +102,7 @@ If hostNetwork is enabled on the cluster, ensure rook-ceph-operator is not runni
 
 This is great for setting up easy public downloads.
 
-- Create a user (rook/buckets/user-josh.yaml)
+- Create a user (see `rook/buckets/user-josh.yaml`)
 - `kubectl -n rook-ceph get secret rook-ceph-object-user-ceph-objectstore-josh -o go-template='{{range $k,$v := .data}}{{printf "%s: " $k}}{{if not $v}}{{$v}}{{else}}{{$v | base64decode}}{{end}}{{"\n"}}{{end}}`
 - Create bucket (`rook/buckets/bucket.py::create_bucket`)
 - Set policy (`rook/buckets/bucket.py::set_public_read_policy`)
@@ -142,9 +148,9 @@ sudo update-initramfs -u
 
 ## configure containerd to use nvidia by default
 
-Copy https://github.com/k3s-io/k3s/blob/v1.24.2%2Bk3s2/pkg/agent/templates/templates_linux.go into /var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl (substitute your k3s version)
+Copy https://github.com/k3s-io/k3s/blob/v1.24.2%2Bk3s2/pkg/agent/templates/templates_linux.go into `/var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl` (substitute your k3s version)
 
-Edit the file:
+Edit the file to add a `[plugins.cri.containerd.runtimes.runc.options]` section:
 
 ```
 <... snip>
@@ -176,7 +182,7 @@ KUBECONFIG=/etc/rancher/k3s/k3s.yaml helm upgrade -i nvdp nvdp/nvidia-device-plu
 
 Ensure the pods on the namespace are Running.
 
-Test GPU passthrough by applying examples/cuda-pod.yaml, then exec-ing into it & running `nvidia-smi`.
+Test GPU passthrough by applying `examples/cuda-pod.yaml`, then exec-ing into it & running `nvidia-smi`.
 
 ## Sharing GPU
 
@@ -215,7 +221,6 @@ sudo vi /etc/fstab
 192.168.1.1,192.168.1.2:/    /ceph   ceph    name=admin,secret=<secret key>,x-systemd.mount-timeout=5min,_netdev,mds_namespace=data
 ```
 
-
 # disable mitigations
 https://unix.stackexchange.com/questions/554908/disable-spectre-and-meltdown-mitigations
 
@@ -235,7 +240,14 @@ Service will then be available on port 1234 of any k8s node.
 
 # Backups
 
-## velero
+My backups target is a machine running
+- k3s
+- minio
+- velero
+
+Important services are backed up with velero to the remote minio instance. These backups are restored to the remote k3s instance to ensure functionality.
+
+## installing velero
 ```
 KUBECONFIG=/etc/rancher/k3s/k3s.yaml velero install \
  --provider aws \
@@ -251,7 +263,7 @@ KUBECONFIG=/etc/rancher/k3s/k3s.yaml velero install \
 
 Had to remove `resources:` from the daemonset.
 
-### Change s3 target:
+### Change s3 target after install
 ```
 kubectl -n velero edit backupstoragelocation default
 ```
@@ -271,12 +283,14 @@ This is a nice PVC option for simpler backup target setups.
 
 # libvirtd
 
-...
+TODO. This would be nice for one-off Windows game servers.
 
 # Still to do
 
-- deluge
+- bittorrent + VPN
 - gogs ssh ingress?
   - can't go through cloudflare without cloudflared on the client
   - cloudflared running in the gogs pod?
+  - do gitea or gitlab have better options?
 - Something better than `expose` for accessing internal services
+  - short term, capture the resource definition YAML & save it alongside the service