Title
#announcements
Alexsander Pereira

Alexsander Pereira

10/05/2022, 5:47 PM
What would this error be?
[W 2022-10-05 17:45:33.160 ServerApp] HTTP 500: Internal Server Error (Error attempting to connect to Gateway server url '<http://jupyter-eg-e4fd61e1-cfee-426a0bee97bf-8b41-4d2e:8888/jupyter-server-e4fd61e1-cfee-426a0bee97bf-8b41-4d2e>'.  Ensure gateway url is valid and the Gateway instance is running.)
Rick Lamers

Rick Lamers

10/05/2022, 5:49 PM
This means that the Jupyter Enterprise Gateway container could not be reached. This container is started as part of a session. The container image is
orchest/jupyter-enterprise-gateway
5:50 PM
On the cluster you have Orchest installed, do you have access to
k9s
?
Alexsander Pereira

Alexsander Pereira

10/05/2022, 5:52 PM
I'm using EKS (AWS)
5:52 PM
The gateway is running
Rick Lamers

Rick Lamers

10/05/2022, 5:53 PM
When debugging these issues having access to the EKS cluster with
k9s
can be very useful. You could quickly check for example the logs of the jupyter-enterprise-gateway to see whether there is an issue with it (some error).
5:54 PM

https://www.youtube.com/watch?v=hwMevai3_wQ▾

This might be helpful if you haven't used k9s + EKS before
Alexsander Pereira

Alexsander Pereira

10/05/2022, 5:54 PM
See logs with Kubernetes Dashboard.
Rick Lamers

Rick Lamers

10/05/2022, 5:55 PM
(Kubernetes Dashboard should be sufficient too 👍 just a tip) Do you see any errors in the gateway log? You can also dump the log here and we can scan it for potential issues
Alexsander Pereira

Alexsander Pereira

10/05/2022, 6:01 PM
Any way to recreate a jupiter notebook?
6:04 PM
Some jupyter notebooks were deleted to free up space in the cluster.
Rick Lamers

Rick Lamers

10/05/2022, 6:05 PM
recreate a jupiter notebook
Do you mean restore a Jupyter notebook file (e.g.
my-notebook.ipynb
)?
Alexsander Pereira

Alexsander Pereira

10/05/2022, 6:06 PM
Not, I would like to recreate the deployment and pods that were deleted.
Rick Lamers

Rick Lamers

10/05/2022, 6:07 PM
If you start a new session (in the Orchest UI) it should create the deployment/pods automatically. They're managed by the
orchest-controller
.
6:08 PM
A session is started by opening a pipeline file (e.g.
main.orchest
) in the pipeline editor
Alexsander Pereira

Alexsander Pereira

10/05/2022, 6:11 PM
It worked, but I still have a problem with the gateway.
HTTP 500: Internal Server Error (Error attempting to connect to Gateway server url '<http://jupyter-eg-e4fd61e1-cfee-426a0bee97bf-8b41-4d2e:8888/jupyter-server-e4fd61e1-cfee-426a0bee97bf-8b41-4d2e>' . Ensure gateway url is valid and the Gateway instance is running.)
Rick Lamers

Rick Lamers

10/05/2022, 6:14 PM
Can you paste the logs of the gateway?
Alexsander Pereira

Alexsander Pereira

10/05/2022, 6:18 PM
I just found out I don't have enough pods.
Rick Lamers

Rick Lamers

10/05/2022, 6:18 PM
Do you mean nodes? That it couldn't schedule the pod on a node? (What error did you find?)
6:21 PM
From the log I can see that the
orchest-api
could not be reached by the
jupyter-enterprise-gateway
. Meaning that while the gateway tried to start the kernel by calling
orchest-api
(a POST request), it failed to reach
orchest-api
. Can you confirm the
orchest-api
pod is running without errors (as reported by the k8s dashboard)?
6:36 PM
I'll be AFK for a bit. Please leave any notes here if you still have issues and we'll help you out tomorrow 🙌
Alexsander Pereira

Alexsander Pereira

10/05/2022, 6:39 PM
The type of machine I'm using in the EKS only lets up 29 pods.
7:24 PM
@Rick Lamers Any ideas on how to get around this pod limit? https://github.com/awslabs/amazon-eks-ami/blob/master/files/eni-max-pods.txt
Yannick

Yannick

10/06/2022, 8:06 AM
As you can read in the Kubernetes docs (link and link) it is possible to set ResourceQuotas that determine the number of Pods someone is allowed to run. I am not sure whether EKS uses ResourceQuotas to enforce these limits, nor whether it would be possible to change them. I guess your best bet would be to upgrade the instance type so that you can spawn more Pods. Given the resources that Pods consume, EKS probably came up with (somewhat) sensible limits to prevent you from overloading the instance/node.
Rick Lamers

Rick Lamers

10/06/2022, 8:33 AM
Looks like a workaround could be https://stackoverflow.com/a/69715615
Alexsander Pereira

Alexsander Pereira

10/06/2022, 1:34 PM
Now all pods are OK, but communication error with jupyter gateway may continue: HTTP 500: Internal Server Error (Error attempting to connect to Gateway server url 'http://jupyter-eg-e4fd61e1-cfee-426a0bee97bf-8b41-4d2e:8888/jupyter-server-e4fd61e1-cfee-426a0bee97bf-8b41-4d2e'. Ensure gateway url is valid and the Gateway instance is running.)
Rick Lamers

Rick Lamers

10/06/2022, 1:47 PM
From the log I can see the gateway is failing to reach the
orchest-api
. (It's getting a 404). Running
wget -O - <http://orchest-api/api/sessions>
in the gateway container would establish that the Orchest API is reachable from the gateway. Worth checking that quickly to establish a baseline.
Alexsander Pereira

Alexsander Pereira

10/06/2022, 1:58 PM
The request works.
(base) root@jupyter-eg-e4fd61e1-cfee-426a0bee97bf-8b41-4d2e-b6fd6b89-qpb4s:/usr/local/bin# wget -O - <http://orchest-api/api/sessions>
--2022-10-06 13:56:18-- <http://orchest-api/api/sessions>
Resolving orchest-api (orchest-api)... 10.100.209.91
Connecting to orchest-api (orchest-api)|10.100.209.91|:80...connected.
HTTP request sent, awaiting response... 308 PERMANENT REDIRECT
Location: <http://orchest-api/api/sessions/> [following]
--2022-10-06 13:56:18-- <http://orchest-api/api/sessions/>
Reusing existing connection to orchest-api:80.
HTTP request sent, awaiting response... 200 OK
Length: 2135 (2.1K) [application/json]
Saving to: 'STDOUT'
Rick Lamers

Rick Lamers

10/06/2022, 2:02 PM
Then it could be the case that the specific project/pipeline UUID that it's trying to start the kernel for don't have an active session. The gateway container has two environment variables ORCHEST_PROJECT_UUID and ORCHEST_PIPELINE_UUID. Can you confirm that the session endpoint (you truncated the output of that wget command) have a session that has the project and pipeline UUID that are the environment variables of the gateway? (You can inspect the gateway environment variables e.g. by running
env
or checking the k8s manifest).
Alexsander Pereira

Alexsander Pereira

10/06/2022, 2:07 PM
{
  "project_uuid": "e4fd61e1-cfee-426a-83aa-1b0230681b16",
  "pipeline_uuid": "0bee97bf-8b41-4d2e-a502-45fd3f139f5d",
  "status": "RUNNING",
  "base_url": "/jupyter-server-e4fd61e1-cfee-426a0bee97bf-8b41-4d2e",
  "user_services": {
    "data-app": {
      "name": "data-app",
      "image": "environment@d839e5bc-ce97-479b-841f-2403e6e86a33",
      "order": 1,
      "ports": [
        8000
      ],
      "scope": [
        "interactive",
        "noninteractive"
      ],
      "exposed": false,
      "requires_authentication": true
    },
    "fast-api": {
      "name": "fast-api",
      "image": "environment@d839e5bc-ce97-479b-841f-2403e6e86a33",
      "order": 2,
      "ports": [
        8000
      ],
      "scope": [
        "interactive",
        "noninteractive"
      ],
      "exposed": false,
      "requires_authentication": true
    },
    "tensorboard-demo": {
      "name": "tensorboard-demo",
      "image": "environment@d839e5bc-ce97-479b-841f-2403e6e86a33",
      "order": 3,
      "ports": [
        8000
      ],
      "scope": [
        "interactive",
        "noninteractive"
      ],
      "exposed": false,
      "requires_authentication": true
    }
  }
}
Yannick

Yannick

10/06/2022, 3:15 PM
@Alexsander Pereira You could try to trigger a build of the JupyterLab image: settings > configure jupyterland > build. Although we didn’t change the jupyter-server and gateway dependencies some time, still worth a shot.
Rick Lamers

Rick Lamers

10/07/2022, 10:26 AM
@Alexsander Pereira did you verify that the environment variables of ORCHEST_PROJECT_UUID and ORCHEST_PIPELINE_UUID match any of the session objects you get from the /api/sessions/ object?
Alexsander Pereira

Alexsander Pereira

10/07/2022, 12:51 PM
Yes, match. And the build didn't work either.
Yannick

Yannick

10/07/2022, 2:02 PM
And the build didn't work either.
Okay that is strange... Could you share the
OrchestCluster
object so that we can see what parts of the system you've configured. Thanks
Alexsander Pereira

Alexsander Pereira

10/07/2022, 2:15 PM
apiVersion: <http://orchest.io/v1alpha1|orchest.io/v1alpha1>
kind: OrchestCluster
metadata:
  name: cluster-1
  namespace: orchest
  annotations:
    <http://controller.orchest.io/deploy-ingress|controller.orchest.io/deploy-ingress>: "false"
spec:
  singleNode: true
  orchest:
    version: v2022.10.0
    authServer:
      image: public.ecr.aws/u5k1d2l0/orchest-auth-server:v2022.10.0-1.0.2
    orchestWebServer:
      image: public.ecr.aws/u5k1d2l0/orchest-webserver:v2022.10.0-1.0.2
Yannick

Yannick

10/07/2022, 2:51 PM
And so when freshly installing the above
OrchestCluster
you can't build JupyterLab and you can't connect to the
Gateway
? (Asking again to make sure I have everything correct.)
Alexsander Pereira

Alexsander Pereira

10/07/2022, 2:52 PM
Yes!
2:53 PM
Orchest Controller YAML
Yannick

Yannick

10/07/2022, 2:55 PM
And you are running on a single-node EKS cluster? Nothing "special" there?
Alexsander Pereira

Alexsander Pereira

10/07/2022, 2:55 PM
Yes, is a EKS Single-Node: r5.xlarge
Yannick

Yannick

10/07/2022, 3:11 PM
Hmmm... doesn't ring a bell for me whatsoever. @Navid H Any ideas?
Navid H

Navid H

10/07/2022, 3:14 PM
@Alexsander Pereira: Could you kill docker-registry pod (so new one will be created) and try again. never mind
3:19 PM
@Alexsander Pereira: Can you check the jupyter-egw and the associated service labels and see if they match?
3:23 PM
It can also be lack of resources for core dns pod.
Alexsander Pereira

Alexsander Pereira

10/07/2022, 5:05 PM
I will do these checks, the DNS can is up and running. I'm creating a new cluster to see if the problem can only be in this one.
Yannick

Yannick

10/10/2022, 8:06 AM
I'm creating a new cluster to see if the problem can only be in this one.
Any luck? 🤞