• Eshwaran Venkat

    Eshwaran Venkat

    1 month ago
    Couple feature questions: • Is there a provision for some kind of Orchest API that allows one to sync a git repo with Orchest in some way? That is, is it possible to develop on another computer other than orchest's instances and then commit the developments to test / deploy directly on orchest's execution environment? Perhaps this could be achieved through some form of Github actions • Is it possible to use some sort of simple DAG syntax (like airflow) that orchest can then use to auto-construct pipelines? • Can we consider a "library" / "marketplace" of commonly-used Orchest nodes in the pipeline available for general use? For ex: ◦ A node that reads data from any database based on the pipeline / job parameters and gets data. ◦ Common text transformations like uppercase / lowercase / removing nulls, etc ◦ Integrations with libraries like Great Expectations, etc by default without having to initialize the container with them
  • juanlu

    juanlu

    1 month ago
    these are all great suggestions. we have in fact talked recently about a library of common steps, although it will take us some time to get there. about an airflow-like DAG syntax, what exactly do you have in mind? a Python script that can be used to generate an Orchest pipeline?
  • Eshwaran Venkat

    Eshwaran Venkat

    1 month ago
    That's great! And yes on the dag syntax it could be a python script, yml or json file where one can specify the definitions, say locally on my system
  • And upon pushing this DDL to orchest via an API of sorts, have the pipeline available in an instance that we specify
  • Rick Lamers

    Rick Lamers

    1 month ago
    The JSON format is fairly straightforward and could be generated by a script. It's specified here: https://docs.orchest.io/en/stable/development/how_orchest_works.html#pipeline-json-schema Any
    .orchest
    file in a project directory will automatically be detected by Orchest. We might create a Python module that allows one to easily specify and generate DAGs that generate this JSON under the hood. We're also working on an Orchest REST API that will allow you to programmatically trigger most of the things you can do in the Orchest UI. These roadmap items are actually going live on our GitHub roadmap today (we wanted make it easier for users to know what's coming in the future)
  • Is there a provision for some kind of Orchest API that allows one to sync a git repo with Orchest in some way? This is actually an interesting feature for the REST API.
    Right now we allow you to use
    git
    inside Orchest using the JupyterLab
    git
    GUI client or JupyterLab integrated terminal. We could make a REST endpoint available that changes the checked out version of the project in Orchest (e.g. to a certain branch or commit).
  • Yannick

    Yannick

    1 month ago
    Great suggestions! ❤️ For completeness I wanted to list the existing GH issues we have open for these: • I would describe this as a form of GitOps, which we actually don't have an issue open for (but it is similar enough to Orchest API) • Similar but not identical to: Ability to dynamically change the pipeline definition through code • Extended SQL Nodes in Orchest Pipelines to have more of these templates.