|
@@ -34,17 +34,23 @@ TODO
|
|
|
|
|
|
## Finding the physical device for an OSD
|
|
## Finding the physical device for an OSD
|
|
|
|
|
|
- ceph osd metadata <id>
|
|
|
|
|
|
+`ceph osd metadata <id> | grep -e '"hostname"' -e '"bluestore_bdev_dev_node"'`
|
|
|
|
|
|
|
|
+```
|
|
|
|
+$ ceph osd metadata osd.1 | grep -e '"hostname"' -e '"bluestore_bdev_dev_node"'
|
|
|
|
+ "bluestore_bdev_dev_node": "/dev/sdd",
|
|
|
|
+ "hostname": "node1",
|
|
|
|
+```
|
|
|
|
|
|
|
|
+## tolerations
|
|
|
|
|
|
|
|
+My setup divides k8s nodes into ceph & non-ceph nodes (using the label `storage-node=true`).
|
|
|
|
|
|
-## tolerations
|
|
|
|
-If your setup divides k8s nodes into ceph & non-ceph nodes (using a label, like `storage-node=true`), ensure labels & a toleration are set properly (`storage-node=false`, with a toleration checking for `storage-node`) so non-ceph nodes still run PV plugin Daemonsets.
|
|
|
|
|
|
+Ensure labels & a toleration are set properly, so non-rook nodes can still run PV plugin Daemonsets. I accomplished this with a `storage-node=false` label on non-rook nodes, with a toleration checking for `storage-node`.
|
|
|
|
|
|
Otherwise, any pod scheduled on a non-ceph node won't be able to mount ceph-backed PVCs.
|
|
Otherwise, any pod scheduled on a non-ceph node won't be able to mount ceph-backed PVCs.
|
|
|
|
|
|
-See rook-ceph-cluster-values.yaml->cephClusterSpec->placement for an example.
|
|
|
|
|
|
+See `rook-ceph-cluster-values.yaml->cephClusterSpec->placement` for an example.
|
|
|
|
|
|
## CephFS
|
|
## CephFS
|
|
|
|
|
|
@@ -96,7 +102,7 @@ If hostNetwork is enabled on the cluster, ensure rook-ceph-operator is not runni
|
|
|
|
|
|
This is great for setting up easy public downloads.
|
|
This is great for setting up easy public downloads.
|
|
|
|
|
|
-- Create a user (rook/buckets/user-josh.yaml)
|
|
|
|
|
|
+- Create a user (see `rook/buckets/user-josh.yaml`)
|
|
- `kubectl -n rook-ceph get secret rook-ceph-object-user-ceph-objectstore-josh -o go-template='{{range $k,$v := .data}}{{printf "%s: " $k}}{{if not $v}}{{$v}}{{else}}{{$v | base64decode}}{{end}}{{"\n"}}{{end}}`
|
|
- `kubectl -n rook-ceph get secret rook-ceph-object-user-ceph-objectstore-josh -o go-template='{{range $k,$v := .data}}{{printf "%s: " $k}}{{if not $v}}{{$v}}{{else}}{{$v | base64decode}}{{end}}{{"\n"}}{{end}}`
|
|
- Create bucket (`rook/buckets/bucket.py::create_bucket`)
|
|
- Create bucket (`rook/buckets/bucket.py::create_bucket`)
|
|
- Set policy (`rook/buckets/bucket.py::set_public_read_policy`)
|
|
- Set policy (`rook/buckets/bucket.py::set_public_read_policy`)
|
|
@@ -142,9 +148,9 @@ sudo update-initramfs -u
|
|
|
|
|
|
## configure containerd to use nvidia by default
|
|
## configure containerd to use nvidia by default
|
|
|
|
|
|
-Copy https://github.com/k3s-io/k3s/blob/v1.24.2%2Bk3s2/pkg/agent/templates/templates_linux.go into /var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl (substitute your k3s version)
|
|
|
|
|
|
+Copy https://github.com/k3s-io/k3s/blob/v1.24.2%2Bk3s2/pkg/agent/templates/templates_linux.go into `/var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl` (substitute your k3s version)
|
|
|
|
|
|
-Edit the file:
|
|
|
|
|
|
+Edit the file to add a `[plugins.cri.containerd.runtimes.runc.options]` section:
|
|
|
|
|
|
```
|
|
```
|
|
<... snip>
|
|
<... snip>
|
|
@@ -176,7 +182,7 @@ KUBECONFIG=/etc/rancher/k3s/k3s.yaml helm upgrade -i nvdp nvdp/nvidia-device-plu
|
|
|
|
|
|
Ensure the pods on the namespace are Running.
|
|
Ensure the pods on the namespace are Running.
|
|
|
|
|
|
-Test GPU passthrough by applying examples/cuda-pod.yaml, then exec-ing into it & running `nvidia-smi`.
|
|
|
|
|
|
+Test GPU passthrough by applying `examples/cuda-pod.yaml`, then exec-ing into it & running `nvidia-smi`.
|
|
|
|
|
|
## Sharing GPU
|
|
## Sharing GPU
|
|
|
|
|
|
@@ -215,7 +221,6 @@ sudo vi /etc/fstab
|
|
192.168.1.1,192.168.1.2:/ /ceph ceph name=admin,secret=<secret key>,x-systemd.mount-timeout=5min,_netdev,mds_namespace=data
|
|
192.168.1.1,192.168.1.2:/ /ceph ceph name=admin,secret=<secret key>,x-systemd.mount-timeout=5min,_netdev,mds_namespace=data
|
|
```
|
|
```
|
|
|
|
|
|
-
|
|
|
|
# disable mitigations
|
|
# disable mitigations
|
|
https://unix.stackexchange.com/questions/554908/disable-spectre-and-meltdown-mitigations
|
|
https://unix.stackexchange.com/questions/554908/disable-spectre-and-meltdown-mitigations
|
|
|
|
|
|
@@ -235,7 +240,14 @@ Service will then be available on port 1234 of any k8s node.
|
|
|
|
|
|
# Backups
|
|
# Backups
|
|
|
|
|
|
-## velero
|
|
|
|
|
|
+My backups target is a machine running
|
|
|
|
+- k3s
|
|
|
|
+- minio
|
|
|
|
+- velero
|
|
|
|
+
|
|
|
|
+Important services are backed up with velero to the remote minio instance. These backups are restored to the remote k3s instance to ensure functionality.
|
|
|
|
+
|
|
|
|
+## installing velero
|
|
```
|
|
```
|
|
KUBECONFIG=/etc/rancher/k3s/k3s.yaml velero install \
|
|
KUBECONFIG=/etc/rancher/k3s/k3s.yaml velero install \
|
|
--provider aws \
|
|
--provider aws \
|
|
@@ -251,7 +263,7 @@ KUBECONFIG=/etc/rancher/k3s/k3s.yaml velero install \
|
|
|
|
|
|
Had to remove `resources:` from the daemonset.
|
|
Had to remove `resources:` from the daemonset.
|
|
|
|
|
|
-### Change s3 target:
|
|
|
|
|
|
+### Change s3 target after install
|
|
```
|
|
```
|
|
kubectl -n velero edit backupstoragelocation default
|
|
kubectl -n velero edit backupstoragelocation default
|
|
```
|
|
```
|
|
@@ -271,12 +283,14 @@ This is a nice PVC option for simpler backup target setups.
|
|
|
|
|
|
# libvirtd
|
|
# libvirtd
|
|
|
|
|
|
-...
|
|
|
|
|
|
+TODO. This would be nice for one-off Windows game servers.
|
|
|
|
|
|
# Still to do
|
|
# Still to do
|
|
|
|
|
|
-- deluge
|
|
|
|
|
|
+- bittorrent + VPN
|
|
- gogs ssh ingress?
|
|
- gogs ssh ingress?
|
|
- can't go through cloudflare without cloudflared on the client
|
|
- can't go through cloudflare without cloudflared on the client
|
|
- cloudflared running in the gogs pod?
|
|
- cloudflared running in the gogs pod?
|
|
|
|
+ - do gitea or gitlab have better options?
|
|
- Something better than `expose` for accessing internal services
|
|
- Something better than `expose` for accessing internal services
|
|
|
|
+ - short term, capture the resource definition YAML & save it alongside the service
|