Home/
Blog/

Blog

Thoughts, learnings, and notes on ML engineering, data infrastructure, and computational biology.

Using Learning Curves to Know Whether More Data Will Help

10 March 2025·4 mins

Machine-Learning Sample-Size Genomics Methodology

Learning curves won’t tell you exactly how many samples to collect, but they will tell you whether collecting more is worth it at all. In domains where each sample costs real money, that’s the question that actually matters.

When Random Features Work Just as Well

20 January 2025·3 mins

Machine-Learning Feature-Selection Genomics Dimensionality

On the counterintuitive finding that randomly selecting features from high-dimensional genomic data often matches the performance of careful feature engineering and why that makes mathematical sense.

Glorified Excel: The Dashboard Feature Request You Should Push Back On

1 September 2023·2 mins

Software-Engineering Biotech Dashboards Product

In biotech, the most common dashboard feature request is a filterable, sortable table. The fastest solution is usually a CSV download and the spreadsheet software people already know.

FormulAI: Using LangChain to Generate Skincare Formulations

11 June 2023·2 mins

Llm Langchain Python Streamlit Hackathon

A hackathon project that chains LLM calls with a product ingredient database and Wikipedia to generate skincare formulations — ingredients, assembly protocols, and allergen warnings.

How to Impress Someone with a Bioinformatics Pipeline

2 April 2023·4 mins

Bioinformatics Pipelines Software-Engineering Reproducibility

What makes a good bioinformatics pipeline? A short, non-technical take on the things that matter — from user requirements to reproducibility to knowing when good enough is good enough.

↑