I saw an ad for orchest.io about an hour ago and have been reading the docs since... prob need to keep reading but maybe I can get a good answer here.
I run a rather large data pipeline in R scraping some government websites, ultimately this data ends up in an AWS Aurora database. How can I securely go about getting this data to there. Also regarding idle time of my orchest.io instances do I pay for that?
I've done 3 things in the past 1) buy a large expensive on demand EC2 instance and start and stop the instance when scripts need to execute. This works always but is too expensive. 2) move code to AWS Lambda, this works for some of my scripts but others have runtimes longer than 15mins. 3) Cloud containers using Google Cloud Build + Google Cloud Runner + Google Scheduler, this is pretty good by overly complex at times. Also have run into issues with some scripts needing very different compute / memory.
09/28/2022, 6:37 PM
Hi Sean! Great use case. Orchest Cloud has built-in auto start and stop that will follow your job schedule and start/stop the instance automatically.
You can try it out in the free tier 👍🏻
All Orchest Cloud instances are single tenant, TLS encrypted and password protected by default.
Welcome to the Slack channel by the way, we’re here to help.
09/30/2022, 1:56 AM
Yeah I've been trying it out, very cool, and easy to get started.
Still wondering though how I can push this data. So currently I connect to my DB using just a username / password. Then I have security groups to whitelist connections from my EC2's to the DB. Is there anyway I can fetch the IP of my orchest instance, or maybe a couple IP's orchest operates out of that I can whitelist, preferably the latter?
09/30/2022, 8:48 AM
We’re working on making static IPs an Orchest Cloud feature you can manage yourself.
For now you can DM your instance URL to @Yannick and he’ll assign your instance a static IP. 🙌