# Pipeline in a Box Containerizing a next generation sequencing pipeline
### In Collaboration With [Elizabeth Bartom](http://www.feinberg.northwestern.edu/faculty-profiles/az/profile.html?xid=32311) [Janna Nugent](http://www.it.northwestern.edu/research/about/rcs-staff.html)
### Ceto Ceto is the name given to the NGS pipeline developed by Elizabeth and her collaborators. [https://github.com/ebartom/NGSbartom/](https://github.com/ebartom/NGSbartom/) 
### Ceto and the Bioinformatics community Elizabeth wished to make Ceto available to the wider bioinformatics community. Namely, people without Quest access.
### Containers are popular for a reason There are benefits to putting code in containers: - easy distribution - easy scaling - encourages good architecture Let's capture those benefits for Ceto!
### Ceto, described A Perl script That writes out multiple shell scripts Which are (optionally) submitted to the Quest scheduler A set of R and Perl scripts, called via the shell scripts, that do post-alignment analysis of samples Documentation (This is how I view it, not how a bioinformatician views it.)
### Ceto, its assumptions - Quest modules - Quest scheduler - Quest storage / filesystem
### Mapping Ceto to Docker Execution environment must match Quest. Must replicate scheduler in some way. "Quest in a box".

### Containerizing Ceto Reading through `buildPipelineScripts.pl` and the documentation to find module dependencies, determine order of execution of shell scripts, and write a wrapper to simulate the scheduler.