https://www.orchest.io/ logo
Title
s

Serhii

10/13/2022, 7:01 AM
@Jacopo Super weird behavior. Since i have restarted job 3 times on orchest.io for client because of aborts and there is no performance management i went to self-hosting of orchest to identify problems - so i created minikube cluster with provider 'none' and kubernetes 1.23. Then I just put netdata on instance to observe. And what i found was jobs hasn't reclaimed memory it seems after first run 0_o. I might be wrong with instance config or smth but after abortion memory restored You can see clearly when second job run starts on 13th I guess there is alot more to this, than just memory stuff, but this is disturbing
j

Jacopo

10/13/2022, 7:34 AM
Hi @Serhii, are you able to reproduce this issue with minikube using the docker driver? We currently do not support bare metal so I'd like to first confirm that the issue is not caused by the underlying platform
And what i found was jobs hasn't reclaimed memory it seems after first run 0_o. I might be wrong with instance config or smth but after abortion memory restored
What exactly do you mean with abortion here?
s

Serhii

10/13/2022, 8:12 AM
"ingress" was not enabling on docker driver so i went to "none"
@Jacopo like similar to what was on orchest.io job failed (aborted) and i cannot know why 🙂
I would say solving "not knowing why" should be a top priority for Orchest since it became much less stable
j

Jacopo

10/13/2022, 8:16 AM
"ingress" was not enabling on docker driver so i went to "none"
Could you clarify what's happening here? We routinely perform installations of Orchest using minikube + docker driver so I'm wondering if the issue you are facing here is a symptom related to the memory leak you are also observing
"ingress" was not enabling on docker driver so i went to "none"
I'm afraid having the minikube ingress addon work is not under the umbrella covered by Orchest and that the issue might be unrelated, but I'm happy to help, did you use the provided convenience script to install orchest on minikube or did you install minikube yourself? If so, could tell me what operations you have performed?
s

Serhii

10/13/2022, 8:19 AM
Well from what i looked up on web its some popular issue that ingress is not enabling 🙂
Yep i just used a script to install orchest and copied job data over
j

Jacopo

10/13/2022, 8:20 AM
On what kind of machine are you running?
s

Serhii

10/13/2022, 8:21 AM
Its hetzner virtual machine with shared CPU and 32G ram, not bare metal :)
Its not your job solving this for different providers though. I might try this on "docker" version if i make it work. Just raising this, as it might be related to issues i had on orchest.io
(and i dont know if it is cause there is no log of "abortion" i can see easily 🙂 )
j

Jacopo

10/13/2022, 8:27 AM
Its not your job solving this for different providers though.
I was actually hoping you were deploying on a provider we already use for ease of debugging ahah 😛
(and i dont know if it is cause there is no log of "abortion" i can see easily 🙂 )
We are working on a solution for this but our first priority is to minimize the amount of times things can go wrong in the cloud due to high resource contention to fix the issue upstream I'll try to install orchest with minikube with driver none on my end later today or (more probably) tomorrow to see if I'm getting the same issue, it might indeed lead to some interesting findings, so thank you for reporting the issue regardless
👍 1
r

Rick Lamers

10/13/2022, 11:50 AM
Thanks for reporting this @Serhii, let us know if the Docker driver deployment of minikube ends up working. Just to at a high level describe what the desired behavior should be re: memory usage (and what we are observing on our managed Orchest Cloud instances). Once a job run completes their containers are automatically removed by Argo and this causes the system resources (e.g. memory) to be freed again.
s

Serhii

10/13/2022, 11:56 AM
Will do @Jacopo Btw on self-installed instance for some reason drug-and-dropping of steps in ui didnt work (steps "teleported" out of the view) so i had to use auto-layout, not sure what was the reason
j

Jacopo

10/13/2022, 11:58 AM
@Serhii I'll call in the FE cavalry, thanks for reporting the issue 🙂
👀 1
s

Serhii

10/13/2022, 10:49 PM
that was fast 🙂
🚀 1
👍 2