Rafael Rodrigues Santana
11/30/2022, 10:44 PMUnable to attach or mount volumes: unmounted volumes=[userdir-pvc], unattached volumes=[container-runtime-socket kube-api-access-jh5mn userdir-pvc]: timed out waiting for the condition
We sshed into the worker machine and the userdir volume is attached.
Most pods are failing with the above message.
Any thoughts on what may be the cause of this issue?Jacopo
12/01/2022, 8:51 AMRafael Rodrigues Santana
12/01/2022, 11:54 AMJacopo
12/01/2022, 12:01 PMRafael Rodrigues Santana
12/01/2022, 12:01 PMJacopo
12/01/2022, 12:03 PMebs-csi-controller
or ebs-csi-node
pods?Rafael Rodrigues Santana
12/01/2022, 12:12 PM➜ ~ kubectl logs ebs-csi-controller-d54f948bf-frmlb -n kube-system
Defaulted container "ebs-plugin" out of: ebs-plugin, csi-provisioner, csi-attacher, csi-snapshotter, csi-resizer, liveness-probe
➜ ~ kubectl logs ebs-csi-controller-d54f948bf-xc5n2 -n kube-system
Defaulted container "ebs-plugin" out of: ebs-plugin, csi-provisioner, csi-attacher, csi-snapshotter, csi-resizer, liveness-probe
➜ ~ kubectl logs ebs-csi-node-llc5c -n kube-system
Defaulted container "ebs-plugin" out of: ebs-plugin, node-driver-registrar, liveness-probe
I1201 12:09:40.089632 1 node.go:98] regionFromSession Node service
I1201 12:09:40.089731 1 metadata.go:85] retrieving instance data from ec2 metadata
I1201 12:09:40.091852 1 metadata.go:92] ec2 metadata is available
I1201 12:09:40.092874 1 metadata_ec2.go:25] regionFromSession
I1201 12:09:40.095856 1 mount_linux.go:207] Detected OS without systemd
Jacopo
12/01/2022, 12:21 PMkubectl logs
default to a container chosen by kubectlRafael Rodrigues Santana
12/01/2022, 12:25 PM➜ ~ kubectl logs ebs-csi-node-llc5c -n kube-system --all-containers=true
I1201 12:09:40.020121 1 main.go:166] Version: v2.5.1-1-g9ad99c33
I1201 12:09:40.020195 1 main.go:167] Running node-driver-registrar in mode=registration
I1201 12:09:40.089891 1 main.go:191] Attempting to open a gRPC connection with: "/csi/csi.sock"
I1201 12:09:41.091326 1 main.go:198] Calling CSI driver to discover driver name
I1201 12:09:41.092671 1 main.go:208] CSI driver name: "<http://ebs.csi.aws.com|ebs.csi.aws.com>"
I1201 12:09:41.092734 1 node_register.go:53] Starting Registration Server at: /registration/ebs.csi.aws.com-reg.sock
I1201 12:09:41.093061 1 node_register.go:62] Registration Server started at: /registration/ebs.csi.aws.com-reg.sock
I1201 12:09:41.093294 1 node_register.go:92] Skipping HTTP server because endpoint is set to: ""
I1201 12:09:41.805505 1 main.go:102] Received GetInfo call: &InfoRequest{}
E1201 12:09:41.805788 1 main.go:107] "Failed to create registration probe file" err="mkdir /var/lib/kubelet: read-only file system" registrationProbePath="/var/lib/kubelet/plugins/ebs.csi.aws.com/registration"
I1201 12:09:41.805829 1 main.go:109] "Kubelet registration probe created" path="/var/lib/kubelet/plugins/ebs.csi.aws.com/registration"
I1201 12:09:41.844721 1 main.go:120] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:true,Error:,}
I1201 12:09:40.190990 1 main.go:149] calling CSI driver to discover driver name
I1201 12:09:40.193136 1 main.go:155] CSI driver name: "<http://ebs.csi.aws.com|ebs.csi.aws.com>"
I1201 12:09:40.193150 1 main.go:183] ServeMux listening at "0.0.0.0:9808"
I1201 12:09:40.089632 1 node.go:98] regionFromSession Node service
I1201 12:09:40.089731 1 metadata.go:85] retrieving instance data from ec2 metadata
I1201 12:09:40.091852 1 metadata.go:92] ec2 metadata is available
I1201 12:09:40.092874 1 metadata_ec2.go:25] regionFromSession
I1201 12:09:40.095856 1 mount_linux.go:207] Detected OS without systemd
Alexsander Pereira
12/01/2022, 12:35 PMJacopo
12/01/2022, 12:37 PMkubectl get volumeattachments
Alexsander Pereira
12/01/2022, 12:39 PMJacopo
12/01/2022, 12:42 PMRafael Rodrigues Santana
12/01/2022, 12:46 PMAlexsander Pereira
12/01/2022, 2:28 PMJacopo
12/01/2022, 2:30 PMAlexsander Pereira
12/01/2022, 2:33 PMJacopo
12/01/2022, 2:33 PMAlexsander Pereira
12/01/2022, 2:36 PMkind: Pod
apiVersion: v1
metadata:
name: environment-shell-268bec2c-0a2f-4b5a-ab69-4df9d5e514c6-0cdvr98s
generateName: environment-shell-268bec2c-0a2f-4b5a-ab69-4df9d5e514c6-0cde0a-69b66685bd-
namespace: orchest
uid: 93b8bc9a-c5b8-4e51-9d42-4f59828cf506
resourceVersion: '7796216'
creationTimestamp: '2022-12-01T14:25:48Z'
labels:
app: environment-shell
pod-template-hash: 69b66685bd
project_uuid: 8495ece7-ec3a-4581-aa73-999e38b27c63
session_uuid: 8495ece7-ec3a-4581fb5467fc-7db6-41c2
shell_uuid: 268bec2c-0a2f-4b5a-ab69-4df9d5e514c6-0cde0a
annotations:
<http://kubernetes.io/psp|kubernetes.io/psp>: eks.privileged
ownerReferences:
- apiVersion: apps/v1
kind: ReplicaSet
name: environment-shell-268bec2c-0a2f-4b5a-ab69-4df9d5e514c6-0cde0a-69b66685bd
uid: 167abb73-ce7e-4cc7-9a1f-757d518c0060
controller: true
blockOwnerDeletion: true
managedFields:
- manager: kube-controller-manager
operation: Update
apiVersion: v1
time: '2022-12-01T14:25:48Z'
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:generateName: {}
f:labels:
.: {}
f:app: {}
f:pod-template-hash: {}
f:project_uuid: {}
f:session_uuid: {}
f:shell_uuid: {}
f:ownerReferences:
.: {}
k:{"uid":"167abb73-ce7e-4cc7-9a1f-757d518c0060"}: {}
f:spec:
f:affinity:
.: {}
f:nodeAffinity:
.: {}
f:requiredDuringSchedulingIgnoredDuringExecution: {}
f:containers:
k:{"name":"environment-shell-268bec2c-0a2f-4b5a-ab69-4df9d5e514c6-0cde0a"}:
.: {}
f:args: {}
f:command: {}
f:env:
.: {}
k:{"name":"ORCHEST_PIPELINE_PATH"}:
.: {}
f:name: {}
f:value: {}
k:{"name":"ORCHEST_PIPELINE_UUID"}:
.: {}
f:name: {}
f:value: {}
k:{"name":"ORCHEST_PROJECT_UUID"}:
.: {}
f:name: {}
f:value: {}
k:{"name":"ORCHEST_SESSION_TYPE"}:
.: {}
f:name: {}
f:value: {}
k:{"name":"ORCHEST_SESSION_UUID"}:
.: {}
f:name: {}
f:value: {}
f:image: {}
f:imagePullPolicy: {}
f:name: {}
f:ports:
.: {}
k:{"containerPort":22,"protocol":"TCP"}:
.: {}
f:containerPort: {}
f:protocol: {}
f:resources:
.: {}
f:requests:
.: {}
f:cpu: {}
f:startupProbe:
.: {}
f:exec:
.: {}
f:command: {}
f:failureThreshold: {}
f:periodSeconds: {}
f:successThreshold: {}
f:timeoutSeconds: {}
f:terminationMessagePath: {}
f:terminationMessagePolicy: {}
f:volumeMounts:
.: {}
k:{"mountPath":"/data"}:
.: {}
f:mountPath: {}
f:name: {}
f:subPath: {}
k:{"mountPath":"/pipeline.json"}:
.: {}
f:mountPath: {}
f:name: {}
f:subPath: {}
k:{"mountPath":"/project-dir"}:
.: {}
f:mountPath: {}
f:name: {}
f:subPath: {}
f:dnsConfig:
.: {}
f:options: {}
f:dnsPolicy: {}
f:enableServiceLinks: {}
f:initContainers:
.: {}
k:{"name":"image-puller"}:
.: {}
f:command: {}
f:env:
.: {}
k:{"name":"CONTAINER_RUNTIME"}:
.: {}
f:name: {}
f:value: {}
k:{"name":"IMAGE_TO_PULL"}:
.: {}
f:name: {}
f:value: {}
f:image: {}
f:imagePullPolicy: {}
f:name: {}
f:resources: {}
f:securityContext:
.: {}
f:privileged: {}
f:runAsUser: {}
f:terminationMessagePath: {}
f:terminationMessagePolicy: {}
f:volumeMounts:
.: {}
k:{"mountPath":"/var/run/runtime.sock"}:
.: {}
f:mountPath: {}
f:name: {}
f:restartPolicy: {}
f:schedulerName: {}
f:securityContext:
.: {}
f:fsGroup: {}
f:runAsGroup: {}
f:runAsUser: {}
f:terminationGracePeriodSeconds: {}
f:volumes:
.: {}
k:{"name":"container-runtime-socket"}:
.: {}
f:hostPath:
.: {}
f:path: {}
f:type: {}
f:name: {}
k:{"name":"userdir-pvc"}:
.: {}
f:name: {}
f:persistentVolumeClaim:
.: {}
f:claimName: {}
- manager: kube-scheduler
operation: Update
apiVersion: v1
time: '2022-12-01T14:25:48Z'
fieldsType: FieldsV1
fieldsV1:
f:status:
f:conditions:
.: {}
k:{"type":"PodScheduled"}:
.: {}
f:lastProbeTime: {}
f:lastTransitionTime: {}
f:message: {}
f:reason: {}
f:status: {}
f:type: {}
subresource: status
spec:
volumes:
- name: userdir-pvc
persistentVolumeClaim:
claimName: userdir-pvc
- name: container-runtime-socket
hostPath:
path: /var/run/docker.sock
type: Socket
- name: kube-api-access-cghb7
projected:
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
name: kube-root-ca.crt
items:
- key: ca.crt
path: ca.crt
- downwardAPI:
items:
- path: namespace
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
defaultMode: 420
initContainers:
- name: image-puller
image: orchest/image-puller:v2022.10.5
command:
- /pull_image.sh
env:
- name: IMAGE_TO_PULL
value: >-
10.100.0.2/orchest-env-8495ece7-ec3a-4581-aa73-999e38b27c63-268bec2c-0a2f-4b5a-ab69-4df9d5e514c6:8
- name: CONTAINER_RUNTIME
value: docker
resources: {}
volumeMounts:
- name: container-runtime-socket
mountPath: /var/run/runtime.sock
- name: kube-api-access-cghb7
readOnly: true
mountPath: /var/run/secrets/kubernetes.io/serviceaccount
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent
securityContext:
privileged: true
runAsUser: 0
containers:
- name: environment-shell-268bec2c-0a2f-4b5a-ab69-4df9d5e514c6-0cde0a
image: >-
10.100.0.2/orchest-env-8495ece7-ec3a-4581-aa73-999e38b27c63-268bec2c-0a2f-4b5a-ab69-4df9d5e514c6:8
command:
- /orchest/bootscript.sh
args:
- shell
ports:
- containerPort: 22
protocol: TCP
env:
- name: ORCHEST_PROJECT_UUID
value: 8495ece7-ec3a-4581-aa73-999e38b27c63
- name: ORCHEST_PIPELINE_UUID
value: fb5467fc-7db6-41c2-bcc1-941631f90b3a
- name: ORCHEST_PIPELINE_PATH
value: /pipeline.json
- name: ORCHEST_SESSION_UUID
value: 8495ece7-ec3a-4581fb5467fc-7db6-41c2
- name: ORCHEST_SESSION_TYPE
value: interactive
resources:
requests:
cpu: 1m
volumeMounts:
- name: userdir-pvc
mountPath: /project-dir
subPath: projects/demo-advanced-elections-first-round
- name: userdir-pvc
mountPath: /data
subPath: data
- name: userdir-pvc
mountPath: /pipeline.json
subPath: projects/demo-advanced-elections-first-round/main.orchest
- name: kube-api-access-cghb7
readOnly: true
mountPath: /var/run/secrets/kubernetes.io/serviceaccount
startupProbe:
exec:
command:
- echo
- '1'
timeoutSeconds: 1
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent
restartPolicy: Always
terminationGracePeriodSeconds: 5
dnsPolicy: ClusterFirst
serviceAccountName: default
serviceAccount: default
securityContext:
runAsUser: 0
runAsGroup: 1
fsGroup: 1
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchFields:
- key: metadata.name
operator: In
values:
- ip-13-0-1-185.ec2.internal
schedulerName: default-scheduler
tolerations:
- key: <http://node.kubernetes.io/not-ready|node.kubernetes.io/not-ready>
operator: Exists
effect: NoExecute
tolerationSeconds: 300
- key: <http://node.kubernetes.io/unreachable|node.kubernetes.io/unreachable>
operator: Exists
effect: NoExecute
tolerationSeconds: 300
priority: 0
dnsConfig:
options:
- name: timeout
value: '10'
- name: attempts
value: '5'
enableServiceLinks: true
preemptionPolicy: PreemptLowerPriority
status:
phase: Pending
conditions:
- type: PodScheduled
status: 'False'
lastProbeTime: null
lastTransitionTime: '2022-12-01T14:25:48Z'
reason: Unschedulable
message: >-
0/1 nodes are available: 1 node(s) didn't match Pod's node
affinity/selector.
qosClass: Burstable
Jacopo
12/01/2022, 2:38 PMkubectl -n orchest describe deployment
Alexsander Pereira
12/01/2022, 2:39 PMkind: Deployment
apiVersion: apps/v1
metadata:
name: environment-shell-f1929b0d-c0b3-4153-a67e-474e3e9c8b61-8adc2c
namespace: orchest
uid: 703f3c79-b919-4e2d-a9c5-03bbe242189f
resourceVersion: '7787664'
generation: 1
creationTimestamp: '2022-12-01T13:30:47Z'
labels:
app: environment-shell
project_uuid: dfac15de-be36-4521-a7f7-4869923bccc3
session_uuid: dfac15de-be36-452174c195b0-910d-4aab
shell_uuid: f1929b0d-c0b3-4153-a67e-474e3e9c8b61-8adc2c
annotations:
<http://deployment.kubernetes.io/revision|deployment.kubernetes.io/revision>: '1'
managedFields:
- manager: OpenAPI-Generator
operation: Update
apiVersion: apps/v1
time: '2022-12-01T13:30:47Z'
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:labels:
.: {}
f:app: {}
f:project_uuid: {}
f:session_uuid: {}
f:shell_uuid: {}
f:spec:
f:progressDeadlineSeconds: {}
f:replicas: {}
f:revisionHistoryLimit: {}
f:selector: {}
f:strategy:
f:rollingUpdate:
.: {}
f:maxSurge: {}
f:maxUnavailable: {}
f:type: {}
f:template:
f:metadata:
f:labels:
.: {}
f:app: {}
f:project_uuid: {}
f:session_uuid: {}
f:shell_uuid: {}
f:name: {}
f:spec:
f:affinity:
.: {}
f:nodeAffinity:
.: {}
f:requiredDuringSchedulingIgnoredDuringExecution: {}
f:containers:
k:{"name":"environment-shell-f1929b0d-c0b3-4153-a67e-474e3e9c8b61-8adc2c"}:
.: {}
f:args: {}
f:command: {}
f:env:
.: {}
k:{"name":"ORCHEST_PIPELINE_PATH"}:
.: {}
f:name: {}
f:value: {}
k:{"name":"ORCHEST_PIPELINE_UUID"}:
.: {}
f:name: {}
f:value: {}
k:{"name":"ORCHEST_PROJECT_UUID"}:
.: {}
f:name: {}
f:value: {}
k:{"name":"ORCHEST_SESSION_TYPE"}:
.: {}
f:name: {}
f:value: {}
k:{"name":"ORCHEST_SESSION_UUID"}:
.: {}
f:name: {}
f:value: {}
f:image: {}
f:imagePullPolicy: {}
f:name: {}
f:ports:
.: {}
k:{"containerPort":22,"protocol":"TCP"}:
.: {}
f:containerPort: {}
f:protocol: {}
f:resources:
.: {}
f:requests:
.: {}
f:cpu: {}
f:startupProbe:
.: {}
f:exec:
.: {}
f:command: {}
f:failureThreshold: {}
f:periodSeconds: {}
f:successThreshold: {}
f:timeoutSeconds: {}
f:terminationMessagePath: {}
f:terminationMessagePolicy: {}
f:volumeMounts:
.: {}
k:{"mountPath":"/data"}:
.: {}
f:mountPath: {}
f:name: {}
f:subPath: {}
k:{"mountPath":"/pipeline.json"}:
.: {}
f:mountPath: {}
f:name: {}
f:subPath: {}
k:{"mountPath":"/project-dir"}:
.: {}
f:mountPath: {}
f:name: {}
f:subPath: {}
f:dnsConfig:
.: {}
f:options: {}
f:dnsPolicy: {}
f:initContainers:
.: {}
k:{"name":"image-puller"}:
.: {}
f:command: {}
f:env:
.: {}
k:{"name":"CONTAINER_RUNTIME"}:
.: {}
f:name: {}
f:value: {}
k:{"name":"IMAGE_TO_PULL"}:
.: {}
f:name: {}
f:value: {}
f:image: {}
f:imagePullPolicy: {}
f:name: {}
f:resources: {}
f:securityContext:
.: {}
f:privileged: {}
f:runAsUser: {}
f:terminationMessagePath: {}
f:terminationMessagePolicy: {}
f:volumeMounts:
.: {}
k:{"mountPath":"/var/run/runtime.sock"}:
.: {}
f:mountPath: {}
f:name: {}
f:restartPolicy: {}
f:schedulerName: {}
f:securityContext:
.: {}
f:fsGroup: {}
f:runAsGroup: {}
f:runAsUser: {}
f:terminationGracePeriodSeconds: {}
f:volumes:
.: {}
k:{"name":"container-runtime-socket"}:
.: {}
f:hostPath:
.: {}
f:path: {}
f:type: {}
f:name: {}
k:{"name":"userdir-pvc"}:
.: {}
f:name: {}
f:persistentVolumeClaim:
.: {}
f:claimName: {}
- manager: kube-controller-manager
operation: Update
apiVersion: apps/v1
time: '2022-12-01T13:30:47Z'
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:<http://deployment.kubernetes.io/revision|deployment.kubernetes.io/revision>: {}
f:status:
f:conditions:
.: {}
k:{"type":"Available"}:
.: {}
f:lastTransitionTime: {}
f:lastUpdateTime: {}
f:message: {}
f:reason: {}
f:status: {}
f:type: {}
k:{"type":"Progressing"}:
.: {}
f:lastTransitionTime: {}
f:lastUpdateTime: {}
f:message: {}
f:reason: {}
f:status: {}
f:type: {}
f:observedGeneration: {}
f:replicas: {}
f:unavailableReplicas: {}
f:updatedReplicas: {}
subresource: status
spec:
replicas: 1
selector:
matchLabels:
app: environment-shell
project_uuid: dfac15de-be36-4521-a7f7-4869923bccc3
session_uuid: dfac15de-be36-452174c195b0-910d-4aab
shell_uuid: f1929b0d-c0b3-4153-a67e-474e3e9c8b61-8adc2c
template:
metadata:
name: environment-shell-f1929b0d-c0b3-4153-a67e-474e3e9c8b61-8adc2c
creationTimestamp: null
labels:
app: environment-shell
project_uuid: dfac15de-be36-4521-a7f7-4869923bccc3
session_uuid: dfac15de-be36-452174c195b0-910d-4aab
shell_uuid: f1929b0d-c0b3-4153-a67e-474e3e9c8b61-8adc2c
spec:
volumes:
- name: userdir-pvc
persistentVolumeClaim:
claimName: userdir-pvc
- name: container-runtime-socket
hostPath:
path: /var/run/docker.sock
type: Socket
initContainers:
- name: image-puller
image: orchest/image-puller:v2022.10.5
command:
- /pull_image.sh
env:
- name: IMAGE_TO_PULL
value: >-
10.100.0.2/orchest-env-dfac15de-be36-4521-a7f7-4869923bccc3-f1929b0d-c0b3-4153-a67e-474e3e9c8b61:1
- name: CONTAINER_RUNTIME
value: docker
resources: {}
volumeMounts:
- name: container-runtime-socket
mountPath: /var/run/runtime.sock
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent
securityContext:
privileged: true
runAsUser: 0
containers:
- name: environment-shell-f1929b0d-c0b3-4153-a67e-474e3e9c8b61-8adc2c
image: >-
10.100.0.2/orchest-env-dfac15de-be36-4521-a7f7-4869923bccc3-f1929b0d-c0b3-4153-a67e-474e3e9c8b61:1
command:
- /orchest/bootscript.sh
args:
- shell
ports:
- containerPort: 22
protocol: TCP
env:
- name: ORCHEST_PROJECT_UUID
value: dfac15de-be36-4521-a7f7-4869923bccc3
- name: ORCHEST_PIPELINE_UUID
value: 74c195b0-910d-4aab-a7fb-fdd5aa975804
- name: ORCHEST_PIPELINE_PATH
value: /pipeline.json
- name: ORCHEST_SESSION_UUID
value: dfac15de-be36-452174c195b0-910d-4aab
- name: ORCHEST_SESSION_TYPE
value: interactive
resources:
requests:
cpu: 1m
volumeMounts:
- name: userdir-pvc
mountPath: /project-dir
subPath: projects/dev-demo-advanced-retail
- name: userdir-pvc
mountPath: /data
subPath: data
- name: userdir-pvc
mountPath: /pipeline.json
subPath: projects/dev-demo-advanced-retail/demo_amazon_retail_new.orchest
startupProbe:
exec:
command:
- echo
- '1'
timeoutSeconds: 1
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent
restartPolicy: Always
terminationGracePeriodSeconds: 5
dnsPolicy: ClusterFirst
securityContext:
runAsUser: 0
runAsGroup: 1
fsGroup: 1
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchFields:
- key: metadata.name
operator: In
values:
- ip-13-0-1-185.ec2.internal
schedulerName: default-scheduler
dnsConfig:
options:
- name: timeout
value: '10'
- name: attempts
value: '5'
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 25%
maxSurge: 25%
revisionHistoryLimit: 10
progressDeadlineSeconds: 600
status:
observedGeneration: 1
replicas: 1
updatedReplicas: 1
unavailableReplicas: 1
conditions:
- type: Available
status: 'False'
lastUpdateTime: '2022-12-01T13:30:47Z'
lastTransitionTime: '2022-12-01T13:30:47Z'
reason: MinimumReplicasUnavailable
message: Deployment does not have minimum availability.
- type: Progressing
status: 'False'
lastUpdateTime: '2022-12-01T13:40:48Z'
lastTransitionTime: '2022-12-01T13:40:48Z'
reason: ProgressDeadlineExceeded
message: >-
ReplicaSet
"environment-shell-f1929b0d-c0b3-4153-a67e-474e3e9c8b61-8adc2c-898bd56bf"
has timed out progressing.
Jacopo
12/01/2022, 2:44 PMkubectl get nodes
?Alexsander Pereira
12/01/2022, 2:47 PMJacopo
12/01/2022, 2:49 PMAlexsander Pereira
12/01/2022, 2:49 PMJacopo
12/01/2022, 2:57 PMAlexsander Pereira
12/01/2022, 3:05 PMJacopo
12/01/2022, 3:05 PMAlexsander Pereira
12/01/2022, 3:11 PMJacopo
12/01/2022, 3:12 PMkubectl exec -it -n orchest deploy/orchest-database -- psql -U postgres -d orchest_api
then run select * from cluster_nodes ;
Here we are interested about the nodes that are actually in the cluster and nodes that are not there anymore. We'll need to manipulate orchest state to make orchest not schedule anything on these nodes anymoreAlexsander Pereira
12/01/2022, 3:14 PMJacopo
12/01/2022, 3:15 PMkubectl get nodes
you should find out which are the nodes that we want to removeAlexsander Pereira
12/01/2022, 3:15 PMJacopo
12/01/2022, 3:16 PMselect * from environment_images where not stored_in_registry;
, it's a safety checkAlexsander Pereira
12/01/2022, 3:16 PMJacopo
12/01/2022, 3:16 PMcluster_nodes
, environment_image_on_nodes
, jupyter_image_on_node
Alexsander Pereira
12/01/2022, 3:20 PMJacopo
12/01/2022, 3:21 PMselect node_name, count(*) from environment_image_on_nodes group by node_name;
select node_name, count(*) from jupyter_image_on_nodes group by node_name;
Here we want to know that/if the newer nodes have >= images than the ones that have been cycled outkubectl exec -it -n orchest deploy/orchest-database -- <your command>
to create backups of the database and/or specific tables you can run pg_dump
with the appropriate options
kubectl exec -it -n orchest deploy/orchest-database -- pg_dump <rest of flags>
Alexsander Pereira
12/01/2022, 3:23 PMkubectl exec -it -n orchest deploy/orchest-database -- pg_dump -U postgres -d orchest_api -t cluster_nodes > cluster_nodes.sql
kubectl exec -it -n orchest deploy/orchest-database -- pg_dump -U postgres -d orchest_api -t environment_image_on_nodes > environment_image_on_nodes.sql
kubectl exec -it -n orchest deploy/orchest-database -- pg_dump -U postgres -d orchest_api -t jupyter_image_on_nodes > jupyter_image_on_nodes.sql
Jacopo
12/01/2022, 3:31 PMkubectl cp
to get it out to your machine
Please run
SELECT node_name,
count(*)
FROM environment_image_on_nodes ein
JOIN environment_images ei ON ein.project_uuid=ei.project_uuid
AND ein.environment_uuid=ei.environment_uuid
AND ein.environment_image_tag = ei.tag
WHERE NOT ei.marked_for_removal
GROUP BY node_name;
Alexsander Pereira
12/01/2022, 3:32 PMJacopo
12/01/2022, 3:32 PMAlexsander Pereira
12/01/2022, 3:32 PMJacopo
12/01/2022, 3:35 PMdelete from cluster_nodes where name= '<your_node_that_does_not_exist_anymore>';
kubectl get nodes
Alexsander Pereira
12/01/2022, 3:38 PMJacopo
12/01/2022, 3:39 PMAlexsander Pereira
12/01/2022, 3:41 PMJacopo
12/01/2022, 3:41 PMAlexsander Pereira
12/01/2022, 3:42 PMJacopo
12/01/2022, 3:44 PMAlexsander Pereira
12/01/2022, 3:45 PMJacopo
12/01/2022, 3:46 PMAlexsander Pereira
12/01/2022, 3:47 PMJacopo
12/01/2022, 3:49 PMAlexsander Pereira
12/01/2022, 3:49 PMJacopo
12/01/2022, 3:50 PMkubectl logs --follow -n orchest deploy/orchest-api
Alexsander Pereira
12/01/2022, 3:52 PMJacopo
12/01/2022, 3:59 PMException on /api/sessions/ [GET]
, along with other stuff that makes it look like there is something wrong 🤔 , might be leftovers from the sessions trying to startAlexsander Pereira
12/01/2022, 4:00 PMJacopo
12/01/2022, 4:01 PMAlexsander Pereira
12/01/2022, 4:02 PMJacopo
12/01/2022, 4:03 PMAlexsander Pereira
12/01/2022, 4:03 PMJacopo
12/01/2022, 4:08 PMkubectl get volumeattachments
?Alexsander Pereira
12/01/2022, 4:08 PMJacopo
12/01/2022, 4:09 PMAlexsander Pereira
12/01/2022, 4:10 PMdf -h
i see the EBS volumesJacopo
12/01/2022, 4:16 PMkubectl get events
?Alexsander Pereira
12/01/2022, 4:17 PMJacopo
12/01/2022, 4:20 PMAlexsander Pereira
12/01/2022, 4:21 PMJacopo
12/01/2022, 4:23 PMAlexsander Pereira
12/01/2022, 4:24 PMName: userdir-pvc
Namespace: orchest
StorageClass: gp2
Status: Bound
Volume: pvc-7efa04de-55f2-4b7d-9a2a-8588152e1f4f
Labels: <http://controller.orchest.io/component=userdir-pvc|controller.orchest.io/component=userdir-pvc>
<http://controller.orchest.io/owner=cluster-1|controller.orchest.io/owner=cluster-1>
<http://controller.orchest.io/part-of=orchest|controller.orchest.io/part-of=orchest>
<http://orchest.io/orchest-hash=3|orchest.io/orchest-hash=3>
Annotations: <http://controller.orchest.io/deploy-ingress|controller.orchest.io/deploy-ingress>: false
<http://controller.orchest.io/k8s|controller.orchest.io/k8s>: eks
<http://pv.kubernetes.io/bind-completed|pv.kubernetes.io/bind-completed>: yes
<http://pv.kubernetes.io/bound-by-controller|pv.kubernetes.io/bound-by-controller>: yes
<http://volume.beta.kubernetes.io/storage-provisioner|volume.beta.kubernetes.io/storage-provisioner>: <http://ebs.csi.aws.com|ebs.csi.aws.com>
<http://volume.kubernetes.io/selected-node|volume.kubernetes.io/selected-node>: ip-13-0-1-113.ec2.internal
<http://volume.kubernetes.io/storage-provisioner|volume.kubernetes.io/storage-provisioner>: <http://ebs.csi.aws.com|ebs.csi.aws.com>
Finalizers: [<http://kubernetes.io/pvc-protection|kubernetes.io/pvc-protection>]
Capacity: 150Gi
Access Modes: RWO
VolumeMode: Filesystem
Used By: celery-worker-7db6d7c6d8-68qbk
data-app-f6690730-6896-4e7fb6c1ef3f-da79-4486-6bdb74b77d-v4gx8
environment-shell-268bec2c-0a2f-4b5a-ab69-4df9d5e514c6-0cdvr98s
environment-shell-268bec2c-0a2f-4b5a-ab69-4df9d5e514c6-eaaj4sb5
environment-shell-3099aa6b-0e6b-4c51-954c-5f6fefbf8fe3-201kbx9w
environment-shell-3099aa6b-0e6b-4c51-954c-5f6fefbf8fe3-3e5p2b9r
environment-shell-9c96381f-d12c-46bd-9214-7021b33984eb-169s8kr4
environment-shell-b6b9af33-1531-44c1-bc66-3b9bf7999d29-c49ghzqf
environment-shell-c56ab762-539c-4cce-9b1e-c4b00300ec6f-2da7kw5v
environment-shell-de413fa4-dbc7-4d6a-9008-ca5faf512fe7-1ea2bkmn
environment-shell-f1929b0d-c0b3-4153-a67e-474e3e9c8b61-8addlfkb
fast-api-f6690730-6896-4e7fb6c1ef3f-da79-4486-5866f95f46-hvb56
jupyter-eg-3113449b-3d31-442b97ef5146-99d7-4e82-8467cd467dwkmnh
jupyter-eg-4634f845-a98d-482311e7b51d-d877-48e5-7fb7c875df6grv7
jupyter-eg-8495ece7-ec3a-45819c759df9-278e-48ab-7b7d478d4d29rmq
jupyter-eg-8495ece7-ec3a-4581fb5467fc-7db6-41c2-57b797c58b5lwrf
jupyter-eg-85414ec5-acc7-491c17ff1121-e5ac-4b5c-6bc8778bc7nsb4q
jupyter-eg-8647bf67-0b9c-4d5a07a9d9e0-cdfd-4944-7457954c8897l6q
jupyter-eg-9ac69ab1-c85e-41ff0915b350-b929-4cbd-74bff56dfcvvrds
jupyter-eg-dfac15de-be36-452174c195b0-910d-4aab-6d465db65c8p5vl
jupyter-eg-f6690730-6896-4e7f8b1d2370-a3e1-44a3-f7d5697c8-d5vwk
jupyter-eg-f6690730-6896-4e7fb6c1ef3f-da79-4486-5f784d6bb4rmsct
jupyter-server-3113449b-3d31-442b97ef5146-99d7-4e82-bf7df9bzn9d
jupyter-server-4634f845-a98d-482311e7b51d-d877-48e5-9bdfd59pgv6
jupyter-server-8495ece7-ec3a-45819c759df9-278e-48ab-548cd7knql9
jupyter-server-8495ece7-ec3a-4581fb5467fc-7db6-41c2-5df457qcn5q
jupyter-server-85414ec5-acc7-491c17ff1121-e5ac-4b5c-7b7d846qj47
jupyter-server-8647bf67-0b9c-4d5a07a9d9e0-cdfd-4944-7f487f5m9f2
jupyter-server-9ac69ab1-c85e-41ff0915b350-b929-4cbd-5dccf67n7w7
jupyter-server-dfac15de-be36-452174c195b0-910d-4aab-54dd67mkwbx
jupyter-server-f6690730-6896-4e7f8b1d2370-a3e1-44a3-84bd4865n6b
jupyter-server-f6690730-6896-4e7fb6c1ef3f-da79-4486-647d78s42mh
orchest-api-74dbf8f9cf-jcr7v
orchest-database-68676bc697-xws94
orchest-webserver-7db6586774-qcq8q
rabbitmq-server-57b9fd578c-5q2dg
session-sidecar-f6690730-6896-4e7fb6c1ef3f-da79-4486-569f55dzjr
Events: <none>
Jacopo
12/01/2022, 4:27 PMAlexsander Pereira
12/01/2022, 4:28 PMJacopo
12/01/2022, 4:29 PMcluster_nodes
, so the issue with environment shells shouldn't be there anymoreAlexsander Pereira
12/01/2022, 4:36 PMJacopo
12/01/2022, 4:38 PMAlexsander Pereira
12/01/2022, 4:39 PMRafael Rodrigues Santana
12/01/2022, 4:39 PMAlexsander Pereira
12/01/2022, 4:40 PMJacopo
12/01/2022, 4:50 PM<http://orchest.io/v1alpha1/orchestclusters|orchest.io/v1alpha1/orchestclusters>
(e.g. kubectl -n orchest get orchestclusters
etc.) CRD you can find what behavior can be parameterized
One question, is there any way to use a userdir volume that is not an isolated EBS?
Couldn't I use the node's own file system?
I don't know why Orchest creates this additional disk and uses mount with EBS CSICurrently, the storage class that is used for the volumes is the default storage class of the cluster, in this case EBS About the NFS and more specific/flexible support for storage, we need to chat internally about a couple of points before we can come back with a reply @Yannick @Rick Lamers
Alexsander Pereira
12/01/2022, 5:08 PMYannick
Alexsander Pereira
12/01/2022, 6:02 PMRick Lamers
Rafael Rodrigues Santana
12/02/2022, 2:36 PMJacopo
12/02/2022, 2:37 PMWe started to eliminate files inside that disk and we had discover that one of our users had created something like a million files because of a dataset that he has downloaded and unpackaged.That's super interesting to know, will keep an eye out for this, thanks for the update, glad things are solved 🙂
Alexsander Pereira
12/02/2022, 2:40 PMJacopo
12/02/2022, 2:41 PMRick Lamers
Alexsander Pereira
12/02/2022, 3:08 PMRick Lamers
Alexsander Pereira
12/02/2022, 3:22 PMJacopo
12/02/2022, 3:23 PMAlexsander Pereira
12/02/2022, 3:39 PMRick Lamers
Abount EBS with provisioned IOPS, wouldn't it just set the StorageClass in the cluster manifest?I don't see why in a single-node setup this wouldn't work indeed
I see, let's map this limitation then. About RDS, any space for us to contribute?We'd welcome a PR that makes the database configurable. Something like "use a postgres container managed by the orchest-controller (default, what it is now) or accept RDS credentials and use that"
Alexsander Pereira
12/02/2022, 6:17 PMRick Lamers
can we have support if we have any questions?Glad to answer specific questions about how things work in the current implementation to help you navigate implementing the RDS support 👍