ソースを参照

cleanup backup, update README

Josh Bicking 2 週間 前
コミット
5276782663
4 ファイル変更44 行追加220 行削除
  1. 43 103
      README.md
  2. 0 21
      backup/minio-pvc.yaml
  3. 0 88
      backup/minio.yaml
  4. 1 8
      backup/rook/rook-ceph-cluster-values.yaml

+ 43 - 103
README.md

@@ -33,9 +33,7 @@ duplicati | ![App status](https://argocd.jibby.org/api/badge?name=duplicati&show
 
 # Why?
 
-## argocd
-
-## k3s
+TODO
 
 # k3s
 
@@ -68,14 +66,6 @@ Ensure you account for any node taints. Anecdotal, but I had one node fail to ru
 $ sudo crictl rmi --prune
 ```
 
-## limiting log size
-
-(Shouldn't be a problem on newer Debian, where rsyslog is not in use.)
-
-In /etc/systemd/journald.conf, set "SystemMaxUse=100M"
-
-In /etc/logrotate.conf, set "size 100M"
-
 ## purging containerd snapshots
 
 https://github.com/containerd/containerd/blob/main/docs/content-flow.md
@@ -105,6 +95,40 @@ externalTrafficPolicy: Local is used to preserve forwarded IPs.
 
 A `cluster-ingress=true` label is given to the node my router is pointing to. Some services use a nodeAffinity to request it. (ex: for pods with `hostNetwork: true`, this ensures they run on the node with the right IP)
 
+# argocd
+
+## bootstrap
+
+https://argo-cd.readthedocs.io/en/stable/getting_started/
+
+```
+kubectl create namespace argocd
+kubectl apply -n argocd --server-side --force-conflicts -f https://raw.githubusercontent.com/argoproj/argo-cd/v3.3.1/manifests/install.yaml
+```
+
+& install the CLI: 
+https://argo-cd.readthedocs.io/en/stable/cli_installation/
+
+## webhooks
+
+The default admin account does not have the ability to generate api keys, so make a dedicated webhook user:
+
+```
+$ kubectl -n argocd edit configmap argocd-cm
+
+...
+data:
+  accounts.webhook: apiKey
+...
+```
+
+Generate a token for the user:
+
+```
+argocd account generate-token --account webhook
+```
+
+
 # rook
 
 ## installing rook
@@ -250,95 +274,6 @@ $ python3 /tmp/placementoptimizer.py -v balance --max-pg-moves 10 | tee /tmp/bal
 $ bash /tmp/balance-upmaps
 ```
 
-# NVIDIA
-
-## nvidia driver (on debian)
-
-```
-curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey |   sudo apt-key add -
-distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
-curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list |   sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
-
-wget https://developer.download.nvidia.com/compute/cuda/11.6.2/local_installers/cuda-repo-debian11-11-6-local_11.6.2-510.47.03-1_amd64.deb
-sudo dpkg -i cuda-repo-debian11-11-6-local_11.6.2-510.47.03-1_amd64.deb
-sudo apt-key add /var/cuda-repo-debian11-11-6-local/7fa2af80.pub
-sudo apt-get update
-```
-
-### install kernel headers
-
-```
-sudo apt install cuda nvidia-container-runtime nvidia-kernel-dkms
-
-sudo apt install --reinstall nvidia-kernel-dkms
-```
-
-### verify dkms is actually running
-
-```
-sudo vi /etc/modprobe.d/blacklist-nvidia-nouveau.conf
-
-blacklist nouveau
-options nouveau modeset=0
-
-sudo update-initramfs -u
-```
-
-## configure containerd to use nvidia by default
-
-Copy https://github.com/k3s-io/k3s/blob/v1.24.2%2Bk3s2/pkg/agent/templates/templates_linux.go into `/var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl` (substitute your k3s version)
-
-Edit the file to add a `[plugins.cri.containerd.runtimes.runc.options]` section:
-
-```
-<... snip>
-  conf_dir = "{{ .NodeConfig.AgentConfig.CNIConfDir }}"
-{{end}}
-[plugins.cri.containerd.runtimes.runc]
-  runtime_type = "io.containerd.runc.v2"
-
-[plugins.cri.containerd.runtimes.runc.options]
-  BinaryName = "/usr/bin/nvidia-container-runtime"
-
-{{ if .PrivateRegistryConfig }}
-<... snip>
-```
-
-& then `systemctl restart k3s`
-
-Label your GPU-capable nodes: `kubectl label nodes <node name> gpu-node=true`
-
-& then install the nvidia device plugin:
-
-```
-helm repo add nvdp https://nvidia.github.io/k8s-device-plugin
-helm repo update
-KUBECONFIG=/etc/rancher/k3s/k3s.yaml helm upgrade -i nvdp nvdp/nvidia-device-plugin --version=0.12.2 --namespace nvidia-device-plugin --create-namespace --set-string nodeSelector.gpu-node=true
-```
-
-Ensure the pods on the namespace are Running.
-
-Test GPU passthrough by applying `examples/cuda-pod.yaml`, then exec-ing into it & running `nvidia-smi`.
-
-### Share NVIDIA GPU
-
-https://github.com/NVIDIA/k8s-device-plugin#shared-access-to-gpus-with-cuda-time-slicing
-
-```yaml
-version: v1
-sharing:
-  timeSlicing:
-    renameByDefault: false
-    failRequestsGreaterThanOne: false
-    resources:
-    - name: nvidia.com/gpu
-      replicas: 5
-```
-
-```
-$ helm upgrade -i nvdp nvdp/nvidia-device-plugin ... --set-file config.map.config=nvidia-device-plugin-config.yaml
-```
-
 # ceph client for cephfs volumes
 
 ## Kernel driver
@@ -454,10 +389,15 @@ This is a nice PVC option for simpler backup target setups.
 # TODO
 
 - [X] move to https://argo-workflows.readthedocs.io/en/latest/quick-start/
-- [ ] https://external-secrets.io/latest/introduction/getting-started/
+- [x] https://external-secrets.io/latest/introduction/getting-started/
+- redo backup target
+  - [ ] argocd + lan ui domain
+    - I think about my backup target way less often, IaC would be very helpful for it
+  - [ ] single host ceph
+    - removes openebs & minio requirement, plus self-healing
+  - [ ] external-secrets
 - [ ] redo paperless, with dedicated postgres cluster (applicationset)
-- [ ] argocd for backup target
-  - I think about my backup target way less often, IaC would be very helpful for it
+- [ ] upgrade rook
 - [ ] Try https://github.com/dgzlopes/cdk8s-on-argocd
 - [ ] explore metallb failover, or cilium
   - https://metallb.universe.tf/concepts/layer2/

+ 0 - 21
backup/minio-pvc.yaml

@@ -1,21 +0,0 @@
----
-apiVersion: v1
-kind: Namespace
-metadata:
-    name: minio
----
-apiVersion: v1
-kind: PersistentVolumeClaim
-metadata:
-  name: minio-pvc
-  namespace: minio
-  labels:
-    app: minio
-spec:
-  storageClassName: local-path
-  accessModes:
-    - ReadWriteOnce
-  resources:
-    requests:
-      storage: 1800Gi
-

+ 0 - 88
backup/minio.yaml

@@ -1,88 +0,0 @@
----
-apiVersion: v1
-kind: Namespace
-metadata:
-    name: minio
----
-apiVersion: apps/v1
-kind: Deployment
-metadata:
-  name: minio
-  namespace: minio
-spec:
-  strategy:
-    type: Recreate
-  selector:
-    matchLabels:
-      app: minio
-  replicas: 1
-  template:
-    metadata:
-      labels:
-        app: minio
-    spec:
-      containers:
-      - name: minio
-        image: "quay.io/minio/minio:RELEASE.2024-01-16T16-07-38Z"
-        command: ["minio", "server", "/data", "--console-address", ":9090"]
-        ports:
-        - containerPort: 9000
-          name: http-web-svc
-        - containerPort: 9090
-          name: http-con-svc
-        envFrom:
-        - secretRef:
-            name: minio-secret
-        env:
-        volumeMounts:
-        - mountPath: "/data"
-          name: data
-        livenessProbe:
-          httpGet:
-            path: /minio/health/live
-            port: 9000
-          failureThreshold: 10
-          initialDelaySeconds: 30
-          periodSeconds: 10
-        resources:
-          limits:
-            memory: 7Gi
-      volumes:
-      - name: data
-        persistentVolumeClaim:
-          claimName: minio-pvc
----
-apiVersion: v1
-kind: Service
-metadata:
-  name: minio-service
-  namespace: minio
-spec:
-  selector:
-    app: minio
-  type: ClusterIP
-  ports:
-  - name: minio-web-port
-    protocol: TCP
-    port: 9000
-    targetPort: http-web-svc
-  - name: minio-con-port
-    protocol: TCP
-    port: 9090
-    targetPort: http-con-svc
----
-apiVersion: traefik.containo.us/v1alpha1
-kind: IngressRoute
-metadata:
-  name: minio
-  namespace: minio
-spec:
-  entryPoints:
-  - websecure
-  routes:
-  - kind: Rule
-    match: Host(`s3.bnuuy.org`)
-    services:
-    - kind: Service
-      name: minio-service
-      port: 9000

+ 1 - 8
backup/rook/rook-ceph-cluster-values.yaml

@@ -133,16 +133,9 @@ cephObjectStores:
         priorityClassName: system-cluster-critical
     storageClass:
       enabled: false
-      #name: ceph-bucket
-      #reclaimPolicy: Delete
-      #volumeBindingMode: "Immediate"
-      #annotations: {}
-      #labels: {}
-      #parameters:
-      #  # note: objectStoreNamespace and objectStoreName are configured by the chart
-      #  region: us-east-1
     ingress:
       enabled: false
     route:
       enabled: false
+
 cephFileSystems: []