<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Home on Victoria Dyster, PhD</title><link>http://victoriadyster.com/</link><description>Recent content in Home on Victoria Dyster, PhD</description><generator>Hugo -- gohugo.io</generator><language>en</language><copyright>© 2026 Victoria Dyster, PhD</copyright><lastBuildDate>Sun, 14 Dec 2025 00:00:00 +0000</lastBuildDate><atom:link href="http://victoriadyster.com/index.xml" rel="self" type="application/rss+xml"/><item><title>From CSVs to Iceberg: Scaling a Genomics ETL Pipeline for ML Training on a budget</title><link>http://victoriadyster.com/projects/csv-to-iceberg-methylation-analytics/</link><pubDate>Sun, 14 Dec 2025 00:00:00 +0000</pubDate><guid>http://victoriadyster.com/projects/csv-to-iceberg-methylation-analytics/</guid><description>How replacing a CSV-join pipeline with Apache Iceberg and a long-format data model cut an ETL pipeline from ~15 minutes to a minute</description></item><item><title>Building a Private MLOps Platform on AWS</title><link>http://victoriadyster.com/projects/private-mlops-platform-aws/</link><pubDate>Mon, 01 Sep 2025 00:00:00 +0000</pubDate><guid>http://victoriadyster.com/projects/private-mlops-platform-aws/</guid><description>How I deployed MLflow as a authenticated experiment tracking server on AWS and integrated it into a reusable ML toolkit.</description></item><item><title>CI/CD for a Genomic Data Pipeline: Testing, Security, and Multi-Environment Deployment</title><link>http://victoriadyster.com/projects/ci-cd-genomic-data-pipeline/</link><pubDate>Mon, 23 Jun 2025 00:00:00 +0000</pubDate><guid>http://victoriadyster.com/projects/ci-cd-genomic-data-pipeline/</guid><description>How I built a CI/CD system for a Nextflow methylation sequencing pipeline: from pre-commit linting through four layers of testing to automated promotion across development, staging, and production, all backed by reusable GitHub Actions and container image promotion via ECR.</description></item><item><title>Using Learning Curves to Know Whether More Data Will Help</title><link>http://victoriadyster.com/blog/learning-curves-sample-size/</link><pubDate>Mon, 10 Mar 2025 00:00:00 +0000</pubDate><guid>http://victoriadyster.com/blog/learning-curves-sample-size/</guid><description>Learning curves won&amp;rsquo;t tell you exactly how many samples to collect, but they will tell you whether collecting more is worth it at all. In domains where each sample costs real money, that&amp;rsquo;s the question that actually matters.</description></item><item><title>Building a Reusable ML Toolkit for Genomic Models</title><link>http://victoriadyster.com/projects/ml-toolkit/</link><pubDate>Sat, 01 Mar 2025 00:00:00 +0000</pubDate><guid>http://victoriadyster.com/projects/ml-toolkit/</guid><description>A Python package with a YAML-driven pipeline builder and a prediction CLI.</description></item><item><title>When Random Features Work Just as Well</title><link>http://victoriadyster.com/blog/when-random-features-work-just-as-well/</link><pubDate>Mon, 20 Jan 2025 00:00:00 +0000</pubDate><guid>http://victoriadyster.com/blog/when-random-features-work-just-as-well/</guid><description>On the counterintuitive finding that randomly selecting features from high-dimensional genomic data often matches the performance of careful feature engineering and why that makes mathematical sense.</description></item><item><title>Scaling ML Training for Epigenetic Age Prediction</title><link>http://victoriadyster.com/projects/epigenetic-age-prediction/</link><pubDate>Fri, 15 Nov 2024 00:00:00 +0000</pubDate><guid>http://victoriadyster.com/projects/epigenetic-age-prediction/</guid><description>How parallelising hyperparameter tuning on SageMaker turned a single-instance grid search into a 100x faster training workflow.</description></item><item><title>Building a Self-Service Analysis Environment for Data Scientists</title><link>http://victoriadyster.com/projects/self-service-ec2-platform/</link><pubDate>Fri, 15 Mar 2024 00:00:00 +0000</pubDate><guid>http://victoriadyster.com/projects/self-service-ec2-platform/</guid><description>Designing and building a Python CLI that lets data scientists create, manage, and safely shut down cloud research environments without needing to know Terraform or the AWS console.</description></item><item><title>Palidis: A C++ Algorithm for Discovering Insertion Sequences in Metagenomic Data</title><link>http://victoriadyster.com/projects/palidis-insertion-sequence-discovery/</link><pubDate>Tue, 10 Oct 2023 00:00:00 +0000</pubDate><guid>http://victoriadyster.com/projects/palidis-insertion-sequence-discovery/</guid><description>How I built a maximal exact matching algorithm in C++ with two-bit encoding to discover novel mobile genetic elements from metagenomic sequencing data and how it found applications from antimicrobial resistance surveillance to gene therapy manufacturing.</description></item><item><title>Glorified Excel: The Dashboard Feature Request You Should Push Back On</title><link>http://victoriadyster.com/blog/glorified-excel/</link><pubDate>Fri, 01 Sep 2023 00:00:00 +0000</pubDate><guid>http://victoriadyster.com/blog/glorified-excel/</guid><description>In biotech, the most common dashboard feature request is a filterable, sortable table. The fastest solution is usually a CSV download and the spreadsheet software people already know.</description></item><item><title>FormulAI: Using LangChain to Generate Skincare Formulations</title><link>http://victoriadyster.com/blog/formulai-langchain-formulations/</link><pubDate>Sun, 11 Jun 2023 00:00:00 +0000</pubDate><guid>http://victoriadyster.com/blog/formulai-langchain-formulations/</guid><description>A hackathon project that chains LLM calls with a product ingredient database and Wikipedia to generate skincare formulations — ingredients, assembly protocols, and allergen warnings.</description></item><item><title>How to Impress Someone with a Bioinformatics Pipeline</title><link>http://victoriadyster.com/blog/how-to-impress-someone-with-a-bioinformatics-pipeline/</link><pubDate>Sun, 02 Apr 2023 00:00:00 +0000</pubDate><guid>http://victoriadyster.com/blog/how-to-impress-someone-with-a-bioinformatics-pipeline/</guid><description>What makes a good bioinformatics pipeline? A short, non-technical take on the things that matter — from user requirements to reproducibility to knowing when good enough is good enough.</description></item></channel></rss>