How replacing a CSV-join pipeline with Apache Iceberg and a long-format data model cut an ETL pipeline from ~15 minutes to a minute
How I deployed MLflow as a authenticated experiment tracking server on AWS and integrated it into a reusable ML toolkit.
How I built a CI/CD system for a Nextflow methylation sequencing pipeline: from pre-commit linting through four layers of testing to automated promotion across development, staging, and production, all backed by reusable GitHub Actions and container image promotion via ECR.
A Python package with a YAML-driven pipeline builder and a prediction CLI.
How parallelising hyperparameter tuning on SageMaker turned a single-instance grid search into a 100x faster training workflow.
Designing and building a Python CLI that lets data scientists create, manage, and safely shut down cloud research environments without needing to know Terraform or the AWS console.
How I built a maximal exact matching algorithm in C++ with two-bit encoding to discover novel mobile genetic elements from metagenomic sequencing data and how it found applications from antimicrobial resistance surveillance to gene therapy manufacturing.