https://www.orchest.io/ logo
r

Rafael Rodrigues Santana

01/27/2023, 8:27 PM
Hi guys, I've created a new orchest cluster, however, the celery worker is unable to access the celery-backend-result database while trying to build new images. Any thoughts on why this may happen?
Copy code
psycopg2.OperationalError: FATAL:  database "celery_result_backend" does not exist


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/celery/backends/database/__init__.py", line 47, in _inner
    return fun(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/celery/backends/database/__init__.py", line 115, in _store_result
    session = self.ResultSession()
  File "/usr/local/lib/python3.9/site-packages/celery/backends/database/__init__.py", line 106, in ResultSession
    return session_manager.session_factory(
  File "/usr/local/lib/python3.9/site-packages/celery/backends/database/session.py", line 88, in session_factory
    self.prepare_models(engine)
  File "/usr/local/lib/python3.9/site-packages/celery/backends/database/session.py", line 72, in prepare_models
    ResultModelBase.metadata.create_all(engine)
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/sql/schema.py", line 4864, in create_all
    bind._run_ddl_visitor(
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 3122, in _run_ddl_visitor
    with self.begin() as conn:
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 3038, in begin
    conn = self.connect(close_with_result=close_with_result)
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 3210, in connect
    return self._connection_cls(self, close_with_result=close_with_result)
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 96, in __init__
    else engine.raw_connection()
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 3289, in raw_connection
    return self._wrap_pool_connect(self.pool.connect, _connection)
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 3259, in _wrap_pool_connect
    Connection._handle_dbapi_exception_noconnection(
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 2106, in _handle_dbapi_exception_noconnection
    util.raise_(
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/util/compat.py", line 207, in raise_
    raise exception
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 3256, in _wrap_pool_connect
    return fn()
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 310, in connect
    return _ConnectionFairy._checkout(self)
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 868, in _checkout
    fairy = _ConnectionRecord.checkout(pool)
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 476, in checkout
    rec = pool._do_get()
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/impl.py", line 256, in _do_get
    return self._create_connection()
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 256, in _create_connection
    return _ConnectionRecord(self)
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 371, in __init__
    self.__connect()
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 666, in __connect
    pool.logger.debug("Error on connect(): %s", e)
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/util/langhelpers.py", line 70, in __exit__
    compat.raise_(
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/util/compat.py", line 207, in raise_
    raise exception
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 661, in __connect
    self.dbapi_connection = connection = pool._invoke_creator(self)
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/create.py", line 590, in connect
    return dialect.connect(*cargs, **cparams)
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 597, in connect
    return self.dbapi.connect(*cargs, **cparams)
  File "/usr/local/lib/python3.9/site-packages/psycopg2/__init__.py", line 127, in connect
    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) FATAL:  database "celery_result_backend" does not exist

(Background on this error at: <https://sqlalche.me/e/14/e3q8>)
The problem was solved by creating the database. Not sure why the database
celery_result_backend
was not created on the orchest startup though.
👍 1
r

Rick Lamers

01/29/2023, 4:02 PM
https://github.com/orchest/orchest/blob/20c8b1555875de4ea93ec4246714d634241ba1df/services/orchest-api/app/app/celery_app.py#L9 The
celery_result_backend
should be created if it doesn't exist. Maybe the deployment hadn't started the celery app yet which would have attempted creation?
@Jacopo can dig deeper. Potentially some race condition where when celery was up the DB was unavailable. Which typically shouldn't happen when
orchest-controller
manages the services.
1
j

Jacopo

01/30/2023, 10:07 AM
This is fairly surprising since we automatically deploy multiple Orchest instances on different platforms on releases, e.g. eks, minikube, etc. Haven't had the issue so far. Could you provide more details about how is the deployment being performed? As Rick mentioned this should indeed not happen
r

Rafael Rodrigues Santana

02/16/2023, 2:00 PM
Sorry for the delay. The same problem ocurred to day in another environment. The deploy is being done using the following process: 1. Create EKS cluster / Node group using terraform 2. Install calico ( to overcome the EKS limitation of network interfaces ) 3. Create iamserviceaccount 4. Install ebs-csi-controller 5. Create namespace orchest 6. Install nginx in the EKS cluster. 7. Deploy orchest-controller.yaml using kubectl apply 8. Deploy orchest-cluster.yaml using kubectl apply.
j

Jacopo

02/17/2023, 8:13 AM
Anything special (i.e. any changes w.r.t. the original yaml) when it comes to the database? Or any chance that some interaction with calico is going wrong ? Both the
orchest-api
and
celery-worker
will attempt to create the database if it doesn't exist when they start (link). The following logs would be interesting (the important part of the logs is likely the head rather than the tail given that the logic which creates the db is run on start) •
orchest-api
celery-worker
orchest-database
9 Views