https://www.orchest.io/ logo
Title
e

Eshwaran Venkat

06/05/2022, 5:06 PM
I’m having a weird issue where python libraries are successfully identified and run within JupyterLab, but in pipeline view, the libraries are not identified. (cc: @Shaleen Bengani)
r

Rick Lamers

06/05/2022, 9:01 PM
Did you
pip install pandas
or similar in the environment?
Similar to eg:
j

juanlu

06/05/2022, 9:42 PM
I think it might be an case of https://github.com/orchest/orchest/issues/425 - @Eshwaran Venkat could you share your setup script (where you installed the dependencies) so we can try to reproduce the problem?
y

Yannick

06/06/2022, 7:24 AM
The way it works internally, is that running code in JupyterLab (inside a kernel) is different from running code when doing a pipeline run; a different code path is invoked (link to source code). Like Juanlu alluded to with his comment, your issue is likely to be caused by something "custom" in your environments set-up script which leads to a wrong use/initialization of the underlying
conda
environment.
could you share your setup script
This would indeed be a great start for us to debug your problem :)
e

Eshwaran Venkat

06/06/2022, 7:26 AM
Hey guys! Thanks for the quick turnaround. Yes, we have a custom setup for Git and also an internal git package we use within the org. Let me share the setup script
#!/bin/bash
sudo apt-get update
sudo apt-get upgrade -y
sudo apt-get install -y git

pip install orchest
pip install git+https://{ACCESS_KEY}@github.com/dotlas/swarm.git -q
pip install geoalchemy2
(I have hidden the Access key)
πŸ‘ 1
All libraries understood correctly in JupyterLab as shown
y

Yannick

06/06/2022, 7:56 AM
Thank you for sharing! What version of Orchest are you running on?
e

Eshwaran Venkat

06/06/2022, 8:07 AM
How do I check the version?
j

juanlu

06/06/2022, 8:18 AM
@Eshwaran Venkat you can go to the
/settings
page, there should be something like this:
image.png
e

Eshwaran Venkat

06/06/2022, 8:18 AM
Ah apologies, here you go!
This is on the cloud hosted version btw, we haven’t self hosted yet
πŸ‘πŸΌ 1
j

juanlu

06/06/2022, 8:20 AM
I'll try to reproduce without installing your private package
πŸ‘ 1
e

Eshwaran Venkat

06/06/2022, 8:20 AM
Sure, thanks!
j

juanlu

06/06/2022, 8:25 AM
I tried this setup script:
#!/bin/bash
sudo apt-get update
sudo apt-get upgrade -y
sudo apt-get install -y git

pip install orchest
# pip install git+https://{ACCESS_KEY}@github.com/dotlas/swarm.git -q
pip install geoalchemy2
and everything seems to work fine on JupyterLab βœ”οΈ , interactive pipeline runs βœ”οΈ , and one-off jobs βœ”οΈ the only explanation is that some dependency of
swarm
is causing some disruption. let's take this privately.
e

Eshwaran Venkat

06/06/2022, 8:26 AM
Got it. Thanks!
j

juanlu

06/06/2022, 8:48 AM
I can reproduce the issue with this setup script:
#!/bin/bash
sudo apt-get update
sudo apt-get upgrade -y
sudo apt-get install -y git

pip install orchest
pip install "ipython==8.2.0" "ipykernel==6.12.1" "jupyter-client==7.2.1" "numpy==1.22.3" "SQLAlchemy==1.4.34"
pip install geoalchemy2
to fix this in the short term, I'm afraid that you will have to control some of these dependencies. many look like dev deps (like
pre-commit
,
mkdocs
and such) so maybe you can tweak
swarm
package metadata to use extras to avoid pulling all these
πŸ‘ 1
in the meantime, I will try to narrow down the issue even more and see if we can provide better docs that explain what to avoid, or even a mechanism to prevent this from happening
this is the root cause:
pip install "ipykernel==6.12.1"
πŸ‘€ 1
y

Yannick

06/06/2022, 9:10 AM
That partially makes sense to me πŸ€” I don't understand why running inside a Jupyter kernel works, but inside a pipeline run it does not. Could it be that
nbformat
(which we use to execute Notebooks in pipeline runs) has an incompatibility with that specific
ipykernel
version?
πŸ€” 1
πŸ‘€ 1
j

juanlu

06/06/2022, 9:32 AM
for completeness, the latest Orchest version is also affected by this issue
e

Eshwaran Venkat

06/06/2022, 9:43 AM
Thank you @juanlu, @Shaleen Bengani appears to have found out that using pipreqs instead of pip freeze works for our private package.
πŸ‘ 1
πŸ‘πŸΌ 1
But I agree with @Yannick, it’s weird that it automatically worked on JupyterHub
j

juanlu

06/07/2022, 8:08 AM
Hi @Eshwaran Venkat, we noticed that adding
python -m ipykernel install --sys-prefix
at the end fixes the problem, in case you want to try it out (if you have already simplified the dependencies and have a working solution, no need to do anything then)