- BS or MS with 2 to 3 years of experiences in computer sciences, software engineering or data sciences areas.
- Strong programming skills in a variety of languages, such as Python (strongly preferred), R, and Perl, and familiarity with the command line interface (CLI).
- Expertise in bioinformatics and genomics, including knowledge of commonly used tools and pipelines for processing and analyzing genomic data.
- Experience with Nextflow or other workflow management systems (e.g., CWL, WDL), including building and deploying bioinformatics pipelines.
- Familiarity with common bioinformatics file formats, such as FASTQ, BAM, and VCF, and experience with data wrangling and manipulation.
- Experience with cloud infrastructure providers such as AWS, Google Cloud, or Microsoft Azure, and familiarity with deploying and managing bioinformatics pipelines in the cloud.
- Knowledge of DevOps practices, including continuous integration, continuous deployment, and automated testing, and experience with version control systems such as Git.
- Experience with software validation and quality control, and familiarity with regulatory and quality standards such as CLIA, CAP, and HIPAA.
- Excellent communication and collaboration skills, with the ability to work effectively in a remote or distributed team environment.
Responsibilities:
- Use Nextflow to build a bioinformatics pipeline that takes FASTQ files as input and processes them using bioinformatic tools.
- Integrate common bioinformatic tools such as BWA-MEM, GATK, or FreeBayes into the pipeline.
- Write Python/R scripts to process, summarize, and visualize outputs created by other tools.
- Ensure that the pipeline is modular and flexible, with the ability to add or remove tools as needed.
- Implement quality control measures such as FastQC to ensure that the input data is of high quality and meets the required standards.
- Implement data preprocessing steps such as trimming, filtering, and adapter removal to prepare the data for downstream analysis.
- Implement variant calling and annotation tools to identify and annotate variants in the data.
- Implement filtering and prioritization steps to identify clinically relevant variants and exclude non-pathogenic variants.
- Implement a reporting system to generate a clinic-ready report that summarizes the findings and provides actionable recommendations.
- Ensure that the pipeline is reproducible, with the ability to generate the same results from the same input data.
- Provide clear and concise documentation on how to use and manage the pipeline, including instructions on how to install and configure the necessary software and tools.???????