DEV Community

Cover image for Scientists - Instant and free Data Pipeline
liortal1
liortal1

Posted on

Scientists - Instant and free Data Pipeline

We were recently asked by a company that develops molecular models to help build an automation pipeline for testing an algorithm on data. This is a common use case, and the steps of the pipelines are detailed below.

This pipeline can be useful for many companies that need to run algorithms on data. It is now configured to run on Oracle Cloud but can easily be changed to any other cloud. Notification is configured for Microsoft Teams but can also point to Slack, email, PagerDuty or other platforms.

This pipeline can be set up in 10-15 minutes completely free. Happy to assist in creating an environment with this pipeline for whoever is interested. Please send a private message.

Pipeline steps:

  1. Create an instance using a predefined image. The placement and type of instance will be defined in advance.

  2. In case the requested placement/type of instance are not available, try the next option according to a predefined list.

  3. Run a pre-written script on the instance.

  4. Check when the job is completed (by checking the presence of a file or keyword in a log file).

  5. Check if the job was properly completed. If not, send an error message to the user (Email/Teams).

  6. In case the job did not run, try to rerun it according to a predefined number of attempts.

  7. In case the quota of instances has been reached, let the job wait and re-run when availability returns.

  8. Upon completion, save a zipped directory with all results into a bucket. The directory should have an easy-to-identify name (user/date etc.).

  9. Make sure that data was transferred properly (Comparing the initial and copied size of the data).

  10. Close the instance.

  11. Notify the user (Email/Teams), providing all the identifications of the completed job.

Top comments (0)