Complete server deployment config
Josh Bicking 4bcbe4d51d add inotify watchers script | před 1 rokem | |
---|---|---|
elasticsearch | před 2 roky | |
examples | před 2 roky | |
monitoring | před 2 roky | |
nextcloud | před 2 roky | |
postgres | před 2 roky | |
redis | před 2 roky | |
rook | před 2 roky | |
.env.example | před 3 roky | |
.gitignore | před 1 rokem | |
README.md | před 1 rokem | |
blog.yaml | před 2 roky | |
cloudflared.yaml | před 2 roky | |
gogs-pvc.yaml | před 2 roky | |
gogs.yaml | před 2 roky | |
inotify-consumers.sh | před 1 rokem | |
jellyfin-pvc.yaml | před 2 roky | |
jellyfin.yaml | před 2 roky | |
mastodon.yaml | před 2 roky | |
matrix-pvc.yaml | před 2 roky | |
matrix.yaml | před 2 roky | |
miniflux.yaml | před 2 roky | |
nvidia-device-plugin-config.yaml | před 2 roky | |
plex-pvc.yaml | před 2 roky | |
plex.yaml | před 1 rokem | |
prowlarr-pvc.yaml | před 1 rokem | |
prowlarr.yaml | před 1 rokem | |
radarr-pvc.yaml | před 1 rokem | |
radarr.yaml | před 1 rokem | |
seedbox_sync.py | před 1 rokem | |
selfoss-pvc.yaml | před 2 roky | |
selfoss.yaml | před 2 roky | |
sonarr-pvc.yaml | před 2 roky | |
sonarr.yaml | před 1 rokem | |
temp-pvc-pod.yaml | před 2 roky | |
traefik-dashboard.yaml | před 2 roky | |
traefik-helmchartconfig.yaml | před 2 roky | |
vaultwarden-pvc.yaml | před 2 roky | |
vaultwarden.yaml | před 2 roky |
curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="server --cluster-init" sh -
export NODE_TOKEN=$(cat /var/lib/rancher/k3s/server/node-token)
curl -sfL https://get.k3s.io | K3S_TOKEN=$NODE_TOKEN INSTALL_K3S_EXEC="server --server https://192.168.122.87:6443" INSTALL_K3S_VERSION=v1.23.6+k3s1 sh -
KUBECONFIG=/etc/rancher/k3s/k3s.yaml helm upgrade --install --create-namespace --namespace rook-ceph rook-ceph rook-release/rook-ceph:1.9.2 -f rook-ceph-values.yaml
KUBECONFIG=/etc/rancher/k3s/k3s.yaml helm install --create-namespace --namespace rook-ceph rook-ceph-cluster --set operatorNamespace=rook-ceph rook-release/rook-ceph-cluster:1.9.2 -f rook-ceph-cluster-values.yaml
Create CephFilesystem
Create SC backed by Filesystem & Pool
Ensure the CSI subvolumegroup was created. If not, ceph fs subvolumegroup create <fsname> csi
Create PVC without a specified PV: PV will be auto-created
Set created PV to ReclaimPolicy: Retain
Create a new, better-named PVC
If important data is on CephBlockPool-backed PVCs, don't forget to set the PV's persistentVolumeReclaimPolicy to Retain
.
If your setup divides k8s nodes into ceph & non-ceph nodes (using a label, like storage-node=true
), ensure labels & a toleration are set properly (storage-node=false
, with a toleration checking for storage-node
) so non-ceph nodes still run PV plugin Daemonsets.
EC-backed filesystems require a regular replicated pool as a default
https://lists.ceph.io/hyperkitty/list/[email protected]/thread/Y6T7OVTC4XAAWMFTK3MYGC7TB6G47OCH/ https://tracker.ceph.com/issues/42450
If hostNetwork is enabled on the cluster, ensure rook-ceph-operator is not running with hostNetwork enable. It doesn't need host network access to orchestrate the cluster, & impedes orchestration of objectstores & associated resources.
This is great for setting up easy public downloads.
kubectl -n rook-ceph get secret rook-ceph-object-user-ceph-objectstore-josh -o go-template='{{range $k,$v := .data}}{{printf "%s: " $k}}{{if not $v}}{{$v}}{{else}}{{$v | base64decode}}{{end}}{{"\n"}}{{end}}
rook/buckets/bucket.py::create_bucket
)rook/buckets/bucket.py::set_public_read_policy
)Upload file
from bucket import *
conn = connect()
conn.upload_file('path/to/s3-bucket-listing/index.html', 'public', 'index.html', ExtraArgs={'ContentType': 'text/html'})
curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list | sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
wget https://developer.download.nvidia.com/compute/cuda/11.6.2/local_installers/cuda-repo-debian11-11-6-local_11.6.2-510.47.03-1_amd64.deb
sudo dpkg -i cuda-repo-debian11-11-6-local_11.6.2-510.47.03-1_amd64.deb
sudo apt-key add /var/cuda-repo-debian11-11-6-local/7fa2af80.pub
sudo apt-get update
sudo apt install cuda nvidia-container-runtime nvidia-kernel-dkms
sudo apt install --reinstall nvidia-kernel-dkms
sudo vi /etc/modprobe.d/blacklist-nvidia-nouveau.conf
blacklist nouveau
options nouveau modeset=0
sudo update-initramfs -u
Copy https://github.com/k3s-io/k3s/blob/v1.24.2%2Bk3s2/pkg/agent/templates/templates_linux.go into /var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl (substitute your k3s version)
Edit the file:
<... snip>
conf_dir = "{{ .NodeConfig.AgentConfig.CNIConfDir }}"
{{end}}
[plugins.cri.containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"
[plugins.cri.containerd.runtimes.runc.options]
BinaryName = "/usr/bin/nvidia-container-runtime"
{{ if .PrivateRegistryConfig }}
<... snip>
& then systemctl restart k3s
Label your GPU-capable nodes: kubectl label nodes <node name> gpu-node=true
& then install the nvidia device plugin:
helm repo add nvdp https://nvidia.github.io/k8s-device-plugin
helm repo update
KUBECONFIG=/etc/rancher/k3s/k3s.yaml helm upgrade -i nvdp nvdp/nvidia-device-plugin --version=0.12.2 --namespace nvidia-device-plugin --create-namespace --set-string nodeSelector.gpu-node=true
Ensure the pods on the namespace are Running.
Test GPU passthrough by applying examples/cuda-pod.yaml, then exec-ing into it & running nvidia-smi
.
https://github.com/NVIDIA/k8s-device-plugin#shared-access-to-gpus-with-cuda-time-slicing
version: v1
sharing:
timeSlicing:
renameByDefault: false
failRequestsGreaterThanOne: false
resources:
- name: nvidia.com/gpu
replicas: 5
$ helm upgrade -i nvdp nvdp/nvidia-device-plugin ... --set-file config.map.config=nvidia-device-plugin-config.yaml
sudo apt install ceph-fuse
sudo vi /etc/fstab
192.168.1.1.,192.168.1.2:/ /ceph ceph name=admin,secret=<secret key>,x-systemd.mount-timeout=5min,_netdev,mds_namespace=data
https://unix.stackexchange.com/questions/554908/disable-spectre-and-meltdown-mitigations
https://rpi4cluster.com/monitoring/k3s-grafana/
Tried https://github.com/prometheus-operator/kube-prometheus. The only way to persist dashboards is to add them to Jsonnet & apply the generated configmap.
kubectl expose svc/some-service --name=some-service-external --port 1234 --target-port 1234 --type LoadBalancer
Service will then be available on port 1234 of any k8s node.
...
expose
for accessing internal services