TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Towards a future free of disease (& looking for collaborators)

7 点作者 ammon大约 2 个月前
Hello HN! Michael Poon and I are starting a bioinformatics company (Tabula) to improve complex disease diagnosis. We’re just getting started, and we’d love feedback (harden our hypotheses), new ideas, and anyone interested in collaborating.<p>Many of the most devastating human diseases are heritable. Whether an individual develops schizophrenia, obesity, diabetes, or autism depends more on their genes (and epi-genome) than it does on any other factor. However, current genetic models do not explain all of this heritability. Twin studies show that schizophrenia, for example, is 80% heritable (over a broad cohort of Americans), but our best genetic model only explains ~9% of variance in cases. I selected a dramatic example here (models for other diseases perform better). Still, the gap between heritability and prediction stands in the way of personalized genetic medicine. We are launching Tabula Bio to close this gap. We have a three-part thesis on how to approach this.<p>1. <i>The path forward is machine learning.</i> The human genome is staggeringly complex. In the 20 years since the Human Gnome Project, much progress has been made, but we are still entirely short of a mechanistic, bottom-up model that would allow anything like disease prediction. Instead, we have to rely on statistical modeling. And statistical methods are winning over expert systems across domains. Expert-system chess AIs have fallen to less-opinionated ML, syntax-aware NLP models were left in the dust by LLMs, and more recently constraint-based robotics is being replaced by pixel-to-control machine learning. We are betting on the same trend extending to biology.<p>2. <i>The core problem is limited data and large genomes.</i> The human genome contains &gt; 3 billion base pairs, while labeled biobank datasets (genomes and disease diagnoses) are numbered in the few hundreds of thousands. Complex models thus hopelessly overfit and fail to generalize. Additionally, the human genome is highly repetitive and as much as 60% likely has no relation to phenotype. Because of these problems, we can’t currently train high-parameter black-box models (or treat DNA like language in a language model).<p>3. <i>Given 1 and 2, novel ML architectures will be required.</i> This is consistent with other breakthroughs in AI. Different problems require different inductive biases. An ML architecture for disease prediction should:<p>a) Make use of unlabeled genomics data (human and non-human) as well as homogeneous biobank data. Most genomes available lack labels. And much genetic coding is conserved across species. Ignoring this is leaving data on the table<p>b) Include priors from human expert research. The idea of throwing data at a complex model is appealing (to ML people). But, for example, we know that DNA is a 3D molecule and that the distance (in 3D space) between genes and regulatory sequences matters. There are approximately 1000 hard-learned research results like this. An genetics ML architecture needs to incorporate these priors, not rediscover them (most blank-slate ML efforts to date only succeed at rediscovering a portion of the research).<p>c) Include epigenetic data. We know it’s part of the story (maybe a large part).<p>We’re interested in probabilistic programming as a method to build such a model.<p>This is not going to be easy. But if we look into the future, to a world where humans have closed the heritability gap and personalized genetic medicine has eradicated great swaths of disease, it’s hard for me to imagine we did not get there via an effort like this.<p>Our team is currently me (Ammon Bartram), previously cofounder of Triplebyte, and Michael Poon, who has spent the past several years working on polygenic screening and studied CS at MIT. We’re honored to be backed by Michael Seibel and Emmett Shear.<p>Please reach out to us at team@tabulabio.com. We’re especially interested in who you think we should talk to.

1 comment

mpoon大约 2 个月前
There&#x27;s a bit of discussion happening over here: <a href="https:&#x2F;&#x2F;www.lesswrong.com&#x2F;posts&#x2F;SsLkxCxmkbBudLHQr&#x2F;tabula-bio-towards-a-future-free-of-disease-and-looking-for" rel="nofollow">https:&#x2F;&#x2F;www.lesswrong.com&#x2F;posts&#x2F;SsLkxCxmkbBudLHQr&#x2F;tabula-bio...</a>