How I deployed MLflow as a authenticated experiment tracking server on AWS and integrated it into a reusable ML toolkit.
Learning curves won’t tell you exactly how many samples to collect, but they will tell you whether collecting more is worth it at all. In domains where each sample costs real money, that’s the question that actually matters.
A Python package with a YAML-driven pipeline builder and a prediction CLI.
On the counterintuitive finding that randomly selecting features from high-dimensional genomic data often matches the performance of careful feature engineering and why that makes mathematical sense.
How parallelising hyperparameter tuning on SageMaker turned a single-instance grid search into a 100x faster training workflow.