From CSVs to Iceberg: Scaling a Genomics ETL Pipeline for ML Training on a budget14 December 2025·7 minsData-Engineering Apache-Iceberg Aws-Athena Parquet GenomicsHow replacing a CSV-join pipeline with Apache Iceberg and a long-format data model cut an ETL pipeline from ~15 minutes to a minute