Title
#announcements
Robert Sun

Robert Sun

08/24/2022, 10:12 PM
I've been trying to setup orchest on microk8s - multi node, with NFS storage backing. I'm getting to one part that is really flaky, doesn't seem to work - and it's the environment building step.
Starting worker...
Starting image build.............
Copying context.....There was a problem building the image. The building script had a non 0 exit code, build failed.
The logs from the image-build-task container are :
init time="2022-08-24T22:11:21.922Z" level=info msg="Starting Workflow Executor" executorType │
│ init time="2022-08-24T22:11:21.924Z" level=info msg="Creating a emissary executor"            │
│ init time="2022-08-24T22:11:21.924Z" level=info msg="Executor initialized" deadline="0001-01- │
│ init time="2022-08-24T22:11:21.952Z" level=info msg="Start loading input artifacts..."        │
│ init time="2022-08-24T22:11:21.952Z" level=info msg="Alloc=4910 TotalAlloc=9295 Sys=74065 Num │
│ main STEP 1/6: FROM <http://docker.io/orchest/base-kernel-py:v2022.08.8|docker.io/orchest/base-kernel-py:v2022.08.8>                     │
│ main STEP 2/6: LABEL _orchest_project_uuid=f5f62dca-b5ad-40e3-a522-a29220d966f2               │
│ main --> Using cache 438dc990cbb2581f4d42e4c852b0630acc0ee6c016865ea762e37eb332af237b         │
│ main --> 438dc990cbb                                                                          │
│ main STEP 3/6: LABEL _orchest_environment_uuid=1bfd8c08-5b24-4a66-9a84-a50dd5e16652           │
│ main --> Using cache 823aad6fe66e4bcfba5da3946e8acae6a8eb36df47922c7eb49b46cc7461e816         │
│ main --> 823aad6fe66                                                                          │
│ main STEP 4/6: WORKDIR /project-dir                                                           │
│ main --> Using cache 1593801dc92911eadd946ad8cf53027fe64a7d893896a251bd70307b6f49bd0a         │
│ main --> 1593801dc92                                                                          │
│ main STEP 5/6: COPY . .                                                                       │
│ main error committing container for step {Env:[PATH=/opt/conda/bin:/usr/local/sbin:/usr/local │
│ main Error: exit status 125                                                                   │
│ Stream closed EOF for orchest/image-build-task-4a159dc2-a4b3-4050-816e-976dfbe7858e (init)    │
│ Stream closed EOF for orchest/image-build-task-4a159dc2-a4b3-4050-816e-976dfbe7858e (main)    │
│ wait time="2022-08-24T22:11:22.437Z" level=info msg="Starting Workflow Executor" executorType │
│ wait time="2022-08-24T22:11:22.440Z" level=info msg="Creating a emissary executor"            │
│ wait time="2022-08-24T22:11:22.440Z" level=info msg="Executor initialized" deadline="0001-01- │
│ wait time="2022-08-24T22:11:22.440Z" level=info msg="Starting deadline monitor"               │
│ wait time="2022-08-24T22:11:26.440Z" level=info msg="Main container completed"                │
│ wait time="2022-08-24T22:11:26.441Z" level=info msg="No Script output reference in workflow.  │
│ wait time="2022-08-24T22:11:26.441Z" level=info msg="No output parameters"                    │
│ wait time="2022-08-24T22:11:26.441Z" level=info msg="No output artifacts"                     │
│ wait time="2022-08-24T22:11:26.441Z" level=info msg="Killing sidecars []"                     │
│ wait time="2022-08-24T22:11:26.441Z" level=info msg="Alloc=4742 TotalAlloc=9333 Sys=74065 Num │
│ Stream closed EOF for orchest/image-build-task-4a159dc2-a4b3-4050-816e-976dfbe7858e (wait)
Has anyone run into this? Any help or clue as to where the configuration is wrong?
juanlu

juanlu

08/24/2022, 10:15 PM
hello @Robert Sun, sorry you're having a rough experience. my colleagues will have a look at this tomorrow. we are in the process of changing how environments are build, so there is a chance than in a few days or weeks the problem goes away by itself. but we'd still like to understand what's going on. I see you're using Orchest v2022.08.8. could you also share your setup script just in case?
Robert Sun

Robert Sun

08/24/2022, 10:25 PM
What do you mean by setup script? My controller and orchest cluster charts?
10:25 PM
Thanks for the quick response btw!
Yannick

Yannick

08/25/2022, 6:00 AM
Hi @Robert Sun 👋. With “setup script” @juanlu means the script that is part of Environments, where you can specify the dependencies to install. @Navid H I remember we used to have NFS support, what is the current state of this and what would be the best way for Robert to set this up with microk8s?
7:50 AM
I suddenly remembered we have an issue open for a similar issue: https://github.com/orchest/orchest/issues/1171 in which there are some temporary solutions for the problem: For example running
curl -X POST <http://localorchest.io/catch/api-proxy/api/ctl/cleanup-builder-cache>
once and try again.
Navid H

Navid H

08/25/2022, 11:11 AM
We used to deploy NFS as part of orchest, right now we don't do that any more, because we relied on rook NFS and rook deprecated rook NFS support, @Robert Sun You can follow this instruction to deploy an NFS server inside your cluster, and make that storage class a default one, then you can enjoy using orchest in multi node setup.
Robert Sun

Robert Sun

08/25/2022, 8:44 PM
My Setup script is completely empty. I am already using
<http://nfs.csi.k8s.io|nfs.csi.k8s.io>
for my default storage class. Seems to work and volumes are created successfully. Other services like mysql seem to work fine too. If you guys don't recommend NFS as a storage class for orchest, what would be a good multi node storage class you do recommend? I can try that too to see if it fixes the problem.
Yannick

Yannick

08/26/2022, 7:08 AM
I think your problem isn't directly with NFS but a flaky issue we are seeing on other storage classes as well (as per this Slack comment). If you follow that thread then you should be able to side-step the issue you are seeing.
Robert Sun

Robert Sun

08/26/2022, 9:17 PM
I tried this - I fail on both the first build after a fresh install of Orchest. After subsequent installs with caching. And after cleaning up the builder cache. It's strangely repeatable - I've reinstalled Orchest like 10 times in various ways and I still get the same issue.
Yannick

Yannick

08/29/2022, 9:54 AM
Hmmm I am sorry to hear that (and a bit saddened that it doesn't work 😕). We are currently re-working how we are doing Environment building in Orchest which should subsequently fix this issue (because of the different approach). We hope to be releasing this as soon as possible!
Robert Sun

Robert Sun

08/29/2022, 7:36 PM
Awesome! I'll try it after I see the release!
Yannick

Yannick

08/30/2022, 6:47 PM
@Robert Sun Yesterday we released a temporary fix for Environment building (at the cost of performance) to make sure builds are working again. Of course you can also wait for the improved version 😇
7:41 AM
@Robert Sun I forgot to mention it in this thread (😓), but as of release
v2022.09.2
the Environments issue should be solved and are much much faster!
Robert Sun

Robert Sun

09/28/2022, 9:54 PM
Awesome! Thanks for updating me.