Engineering Journal

How we built this

What worked, what didn’t, what it cost. Full transparency — because genomics is too important to build behind closed doors.

This research is open source

All tools, validation scripts, and pipeline code are freely available.

View on GitHub
Entry 1March 2026

Building Accurate Polygenic Risk Scores from Consumer DNA Data

TL;DR:We scored 3,550 disease risk models from a consumer DNA file. We spent weeks trying to make scores “better” — Bayesian weight recomputation, Ridge regression corrections, GPU-accelerated validation. Every improvement made things worse. The scores were already good. We just needed to be honest about which ones to trust.

3,550+
PGS Models
2.37B
Variant Weights
4,257
Validation Samples
28M
Imputed Variants
$400–800
Total Cost
Phase 1 — The Naive Version

Start Simple, See What Breaks

The PGS Catalog publishes 3,550+ peer-reviewed polygenic score models. Each one is a list of genetic variants with effect weights. The math is straightforward: look up a user’s genotype at each variant, multiply by the weight, sum it up, compare against a reference population.

Our first implementation scored every model and produced percentiles using 1000 Genomes Phase 3 as the reference (2,504 samples, 5 ancestry populations). It ran in under 2 minutes.

Three Compounding Errors

  • No ancestry matching. A single EUR reference distribution for all users.
  • Allele alignment errors. Some models use alternate allele as effect, others use reference. Variants scored backwards.
  • Strand-ambiguous inflation. A/T and C/G SNP pairs complement-flipped instead of directly matched. Height PRS had a z-score of 14+.
Phase 2 — Scale Up

174GB Database, GPU-Accelerated Scoring

We built a 174GB SQLite database containing all 2.375 billion variant weights across 3,550 models. We constructed a 6.2GB sparse weight matrix for GPU-accelerated batch scoring.

On a Vast.ai build server (256 vCPUs, 503GB RAM, RTX 3060 Ti), we scored all 4,257 QC’d OpenSNP genomes in 7.4 minutes using chunked sparse CSR tensor multiplication via PyTorch.

Validation Results

TraitMetricOursUK Biobank
HeightPearson r0.107r ≈ 0.45–0.50
Red hairAUC0.67
Black hairAUC0.63
Eye colourAUC0.54~0.95
Phase 3 — PRS-CSx Experiment (Failed)

Bayesian Weight Recomputation

We ran PRS-CSx on 33 quantitative traits using UK Biobank GWAS summary statistics. After 2–3 weeks of continuous GPU compute: the posterior weights did not improve our validation metrics. In some cases, they made things worse.

Phase 4 — PGP Batch Imputation (Failed)

943 Genomes, 95% Failure Rate

Of 569 PGP genomes we tried to impute: 538 failed. Heterogeneous chip formats, Beagle memory exhaustion, bcftools timeouts. Only 84 survived.

Phase 5 — Correction Model Trap (Failed)

Ridge Regression Pulled Everything to the Mean

We trained Ridge regression correction models. Every condition showed as 40th–60th percentile. Nothing stood out. Nothing was actionable.

Extreme scores are not bugs. They’re the whole point. A 99th percentile Type 1 Diabetes PRS from a model validated at AUC > 0.80 is genuinely meaningful. The problem was never that scores were “too extreme.” It was that we presented low-confidence and high-confidence scores identically.

The Bill

What It Cost

ItemDetailCost
GPU computeVast.ai instances$200–400
Claude API6 parallel domain agents$100–300
VPS hostingProduction server$20/mo
Domain + SSLhelixsequencing.com$15/yr
Total$400–800
Current State

Where We Are Now

  • 3,550 PGS Catalog models with proper allele alignment and ancestry-matched distributions
  • Beagle 5.5 imputation expanding ~700K chip SNPs to ~28M variants
  • Ancestry detection with per-population model selection
  • Raw percentiles preserved — extreme scores kept when model is well-powered
  • Zero data retention — all user data deleted after report generation

Lessons Learned

More data doesn’t automatically mean better scores. Adding PRS-CSx weights, Ridge regression corrections, and ensemble methods actively degraded accuracy when our validation cohort was small.

Extreme percentiles are features, not bugs. The mistake is presenting all models with equal confidence.

The validation bottleneck is the real constraint. We have 3,550 models and 2.37 billion variant weights. What we lack is ground truth.

Go slowly. Validate each improvement individually. If it doesn’t measurably improve predictions, it doesn’t ship.

Entry 2March 2026

Pipeline V2: Haiku Collectors + Opus Narrators

TL;DR: We redesigned the agent pipeline from a single-phase system into a two-phase architecture: fast Haiku models collect and rank genetic data, then Opus models write detailed clinical narratives from pre-analyzed briefings. Result: 7x more variants analyzed (3,500+ vs 500), 5x longer narratives, and better cross-domain signal detection.

3,587
Variants Analyzed
7
Haiku Collectors
5
Opus Narrators

The original pipeline ran 6-8 agents, each making multiple MCP tool calls to query genetic data. This meant expensive models spent tokens on data retrieval instead of analysis. Worse, each agent only saw ~500 pre-filtered variants — a static cap that missed convergence signals.

Phase 1: Haiku Collectors. Six specialized collectors (cardiovascular, cancer/immune, neuro, metabolic, pharmacogenomics, traits) receive the full enriched variant set — every ClinVar pathogenic, every SNPedia annotation, no cap. They rank findings by clinical significance, identify pathway convergences, and flag cross-domain signals. A seventh “synthesizer” agent reads all collector outputs and finds what they missed individually.

Phase 2: Opus Narrators. Five Opus agents receive only the condensed collector briefings — no MCP tools, no data retrieval. Clean context, focused entirely on writing detailed, personalized health narratives. Each section went from ~3,000 characters to 10,000-17,000 characters.

Phase 3: Finalization. Health index scoring, supplement protocols, and category summaries run on Sonnet with full domain context from both phases.

Key Improvements

  • 7x more variants — raised cap from 500 to uncapped (typically 3,000-4,000 per genome)
  • 5x richer narratives — Opus narrators produce 10-17K chars per section vs ~3K before
  • Cross-domain synthesis — dedicated synthesizer catches multi-pathway convergences individual collectors miss
  • Imputation source tagging — every variant now labeled “chip” or “imputed” with GP confidence score
  • Cost neutral — Haiku collectors are cheap; Opus narrators get cleaner context so they’re more efficient
Entry 3March 2026

iOS App, SEO Foundation, and Going Public

TL;DR:Helix Sequencing now has a native iOS app (upload DNA, watch the pipeline, read your report — all in-app), 7 SEO articles targeting high-intent search terms, proper Google indexing, and a TikTok presence. The site was accidentally telling Google not to index it. Fixed.

iOS App (Expo/React Native). Upload a zip file from your phone, watch all 22 chromosomes impute in real-time, see Haiku collectors and Opus narrators progress with color-coded phases, then read your full report natively — Overview with Health Index, PRS scores with category filtering, variant explorer, and complete protocol with supplements, diet, and monitoring schedule. No Safari redirect.

SEO: From Invisible to Indexed. Discovered the site had <meta name="robots" content="noindex"/> — Google was completely ignoring us. Fixed with proper robots meta, sitemap.xml, and robots.txt. Then wrote 7 long-form articles targeting high-intent keywords:

New Pricing Page. Converted from standalone HTML to Next.js — now shares the site’s nav, footer, and theme system. single report or + /mo living report with monthly re-analysis.

Pipeline Demo. Cinematic redesign of the pipeline visualization for social media content — floating background orbs, DNA base pair stream, enhanced neural network canvas with comet trail particles, glass morphism panels, and terminal-style activity log.

What’s next for April: App Store submission, couples DNA comparison as a flagship feature (the only platform that reads two genomes together), and the -for-a-video-review campaign to bootstrap social proof.

helixsequencing.com · Privacy-first DNA analysis · Zero data retention · Open Research