Genomics on Victoria Dyster, PhD

Genomics on Victoria Dyster, PhDhttp://victoriadyster.com/tags/genomics/Recent content in Genomics on Victoria Dyster, PhDHugo -- gohugo.ioen© 2026 Victoria Dyster, PhDSun, 14 Dec 2025 00:00:00 +0000From CSVs to Iceberg: Scaling a Genomics ETL Pipeline for ML Training on a budgethttp://victoriadyster.com/projects/csv-to-iceberg-methylation-analytics/Sun, 14 Dec 2025 00:00:00 +0000http://victoriadyster.com/projects/csv-to-iceberg-methylation-analytics/How replacing a CSV-join pipeline with Apache Iceberg and a long-format data model cut an ETL pipeline from ~15 minutes to a minuteUsing Learning Curves to Know Whether More Data Will Helphttp://victoriadyster.com/blog/learning-curves-sample-size/Mon, 10 Mar 2025 00:00:00 +0000http://victoriadyster.com/blog/learning-curves-sample-size/Learning curves won’t tell you exactly how many samples to collect, but they will tell you whether collecting more is worth it at all. In domains where each sample costs real money, that’s the question that actually matters.When Random Features Work Just as Wellhttp://victoriadyster.com/blog/when-random-features-work-just-as-well/Mon, 20 Jan 2025 00:00:00 +0000http://victoriadyster.com/blog/when-random-features-work-just-as-well/On the counterintuitive finding that randomly selecting features from high-dimensional genomic data often matches the performance of careful feature engineering and why that makes mathematical sense.