<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Genomics on Victoria Dyster, PhD</title><link>http://victoriadyster.com/tags/genomics/</link><description>Recent content in Genomics on Victoria Dyster, PhD</description><generator>Hugo -- gohugo.io</generator><language>en</language><copyright>© 2026 Victoria Dyster, PhD</copyright><lastBuildDate>Sun, 14 Dec 2025 00:00:00 +0000</lastBuildDate><atom:link href="http://victoriadyster.com/tags/genomics/index.xml" rel="self" type="application/rss+xml"/><item><title>From CSVs to Iceberg: Scaling a Genomics ETL Pipeline for ML Training on a budget</title><link>http://victoriadyster.com/projects/csv-to-iceberg-methylation-analytics/</link><pubDate>Sun, 14 Dec 2025 00:00:00 +0000</pubDate><guid>http://victoriadyster.com/projects/csv-to-iceberg-methylation-analytics/</guid><description>How replacing a CSV-join pipeline with Apache Iceberg and a long-format data model cut an ETL pipeline from ~15 minutes to a minute</description></item><item><title>Using Learning Curves to Know Whether More Data Will Help</title><link>http://victoriadyster.com/blog/learning-curves-sample-size/</link><pubDate>Mon, 10 Mar 2025 00:00:00 +0000</pubDate><guid>http://victoriadyster.com/blog/learning-curves-sample-size/</guid><description>Learning curves won&amp;rsquo;t tell you exactly how many samples to collect, but they will tell you whether collecting more is worth it at all. In domains where each sample costs real money, that&amp;rsquo;s the question that actually matters.</description></item><item><title>When Random Features Work Just as Well</title><link>http://victoriadyster.com/blog/when-random-features-work-just-as-well/</link><pubDate>Mon, 20 Jan 2025 00:00:00 +0000</pubDate><guid>http://victoriadyster.com/blog/when-random-features-work-just-as-well/</guid><description>On the counterintuitive finding that randomly selecting features from high-dimensional genomic data often matches the performance of careful feature engineering and why that makes mathematical sense.</description></item></channel></rss>