https://www.orchest.io/ logo
r

Robert Sun

08/24/2022, 10:12 PM
I've been trying to setup orchest on microk8s - multi node, with NFS storage backing. I'm getting to one part that is really flaky, doesn't seem to work - and it's the environment building step.
Copy code
Starting worker...
Starting image build.............
Copying context.....There was a problem building the image. The building script had a non 0 exit code, build failed.
The logs from the image-build-task container are :
Copy code
init time="2022-08-24T22:11:21.922Z" level=info msg="Starting Workflow Executor" executorType │
│ init time="2022-08-24T22:11:21.924Z" level=info msg="Creating a emissary executor"            │
│ init time="2022-08-24T22:11:21.924Z" level=info msg="Executor initialized" deadline="0001-01- │
│ init time="2022-08-24T22:11:21.952Z" level=info msg="Start loading input artifacts..."        │
│ init time="2022-08-24T22:11:21.952Z" level=info msg="Alloc=4910 TotalAlloc=9295 Sys=74065 Num │
│ main STEP 1/6: FROM <http://docker.io/orchest/base-kernel-py:v2022.08.8|docker.io/orchest/base-kernel-py:v2022.08.8>                     │
│ main STEP 2/6: LABEL _orchest_project_uuid=f5f62dca-b5ad-40e3-a522-a29220d966f2               │
│ main --> Using cache 438dc990cbb2581f4d42e4c852b0630acc0ee6c016865ea762e37eb332af237b         │
│ main --> 438dc990cbb                                                                          │
│ main STEP 3/6: LABEL _orchest_environment_uuid=1bfd8c08-5b24-4a66-9a84-a50dd5e16652           │
│ main --> Using cache 823aad6fe66e4bcfba5da3946e8acae6a8eb36df47922c7eb49b46cc7461e816         │
│ main --> 823aad6fe66                                                                          │
│ main STEP 4/6: WORKDIR /project-dir                                                           │
│ main --> Using cache 1593801dc92911eadd946ad8cf53027fe64a7d893896a251bd70307b6f49bd0a         │
│ main --> 1593801dc92                                                                          │
│ main STEP 5/6: COPY . .                                                                       │
│ main error committing container for step {Env:[PATH=/opt/conda/bin:/usr/local/sbin:/usr/local │
│ main Error: exit status 125                                                                   │
│ Stream closed EOF for orchest/image-build-task-4a159dc2-a4b3-4050-816e-976dfbe7858e (init)    │
│ Stream closed EOF for orchest/image-build-task-4a159dc2-a4b3-4050-816e-976dfbe7858e (main)    │
│ wait time="2022-08-24T22:11:22.437Z" level=info msg="Starting Workflow Executor" executorType │
│ wait time="2022-08-24T22:11:22.440Z" level=info msg="Creating a emissary executor"            │
│ wait time="2022-08-24T22:11:22.440Z" level=info msg="Executor initialized" deadline="0001-01- │
│ wait time="2022-08-24T22:11:22.440Z" level=info msg="Starting deadline monitor"               │
│ wait time="2022-08-24T22:11:26.440Z" level=info msg="Main container completed"                │
│ wait time="2022-08-24T22:11:26.441Z" level=info msg="No Script output reference in workflow.  │
│ wait time="2022-08-24T22:11:26.441Z" level=info msg="No output parameters"                    │
│ wait time="2022-08-24T22:11:26.441Z" level=info msg="No output artifacts"                     │
│ wait time="2022-08-24T22:11:26.441Z" level=info msg="Killing sidecars []"                     │
│ wait time="2022-08-24T22:11:26.441Z" level=info msg="Alloc=4742 TotalAlloc=9333 Sys=74065 Num │
│ Stream closed EOF for orchest/image-build-task-4a159dc2-a4b3-4050-816e-976dfbe7858e (wait)
Has anyone run into this? Any help or clue as to where the configuration is wrong?
j

juanlu

08/24/2022, 10:15 PM
hello @Robert Sun, sorry you're having a rough experience. my colleagues will have a look at this tomorrow. we are in the process of changing how environments are build, so there is a chance than in a few days or weeks the problem goes away by itself. but we'd still like to understand what's going on. I see you're using Orchest v2022.08.8. could you also share your setup script just in case?
r

Robert Sun

08/24/2022, 10:25 PM
What do you mean by setup script? My controller and orchest cluster charts?
Thanks for the quick response btw!
y

Yannick

08/25/2022, 6:00 AM
Hi @Robert Sun 👋. With “setup script” @juanlu means the script that is part of Environments, where you can specify the dependencies to install. @Navid H I remember we used to have NFS support, what is the current state of this and what would be the best way for Robert to set this up with microk8s?
I suddenly remembered we have an issue open for a similar issue: https://github.com/orchest/orchest/issues/1171 in which there are some temporary solutions for the problem: For example running
curl -X POST <http://localorchest.io/catch/api-proxy/api/ctl/cleanup-builder-cache>
once and try again.
n

Navid H

08/25/2022, 11:11 AM
We used to deploy NFS as part of orchest, right now we don't do that any more, because we relied on rook NFS and rook deprecated rook NFS support, @Robert Sun You can follow this instruction to deploy an NFS server inside your cluster, and make that storage class a default one, then you can enjoy using orchest in multi node setup.
r

Robert Sun

08/25/2022, 8:44 PM
My Setup script is completely empty. I am already using
<http://nfs.csi.k8s.io|nfs.csi.k8s.io>
for my default storage class. Seems to work and volumes are created successfully. Other services like mysql seem to work fine too. If you guys don't recommend NFS as a storage class for orchest, what would be a good multi node storage class you do recommend? I can try that too to see if it fixes the problem.
y

Yannick

08/26/2022, 7:08 AM
I think your problem isn't directly with NFS but a flaky issue we are seeing on other storage classes as well (as per this Slack comment). If you follow that thread then you should be able to side-step the issue you are seeing.
r

Robert Sun

08/26/2022, 9:17 PM
I tried this - I fail on both the first build after a fresh install of Orchest. After subsequent installs with caching. And after cleaning up the builder cache. It's strangely repeatable - I've reinstalled Orchest like 10 times in various ways and I still get the same issue.
y

Yannick

08/29/2022, 9:54 AM
Hmmm I am sorry to hear that (and a bit saddened that it doesn't work :/). We are currently re-working how we are doing Environment building in Orchest which should subsequently fix this issue (because of the different approach). We hope to be releasing this as soon as possible!
r

Robert Sun

08/29/2022, 7:36 PM
Awesome! I'll try it after I see the release!
y

Yannick

08/30/2022, 6:47 PM
@Robert Sun Yesterday we released a temporary fix for Environment building (at the cost of performance) to make sure builds are working again. Of course you can also wait for the improved version 😇
👀 1
@Robert Sun I forgot to mention it in this thread (😓), but as of release
v2022.09.2
the Environments issue should be solved and are much much faster!
r

Robert Sun

09/28/2022, 9:54 PM
Awesome! Thanks for updating me.
😸 1
4 Views