In addition, for data set sizes that are very large you could also manually manage read/write by just writing to /data. That mounted directory is shared between all pipeline steps. You could use the data passing of
orchest.output(some_path)
to pass the path to the next step (to not hardcode where the files can be found).