About

967
Publications
132,136
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
86,179
Citations
Research Experience
Position
  • Harvard
Position
  • HMS
Position
  • HSPH
Education
September 1972 - August 1974
Duke University
Field of study
  • Chemistry & Zoology

Publications

Publications (967)
Article
Full-text available
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
Preprint
Full-text available
Techniques that can both spatially map out molecular features and discriminate many targets would be highly valued for their utility in studying fundamental nanoscale processes. In spite of decades of development, no current technique can achieve both nanoscale resolution and discriminate hundreds of targets. Here, we report the development of a no...
Preprint
Full-text available
The promise of biotechnology is tempered by its potential for accidental or deliberate misuse. Reliably identifying telltale signatures characteristic to different genetic designers, termed genetic engineering attribution, would deter misuse, yet is still considered unsolved. Here, we show that recurrent neural networks trained on DNA motifs and ba...
Preprint
Neuron-derived extracellular vesicles (NDEVs) present a tremendous opportunity to learn about the biochemistry of brain cells in living patients. L1CAM is a transmembrane protein expressed in neurons that is presumed to be found on NDEVs in human biofluids. Previous studies have used L1CAM immuno-isolation from human plasma to isolate NDEVs for neu...
Article
The endangered whale shark ( Rhincodon typus ) is the largest fish on Earth and a long-lived member of the ancient Elasmobranchii clade. To characterize the relationship between genome features and biological traits, we sequenced and assembled the genome of the whale shark and compared its genomic and physiological features to those of 83 animals a...
Article
Full-text available
There is a need for methods that can image chromosomes with genome-wide coverage, as well as greater genomic and optical resolution. We introduce OligoFISSEQ, a suite of three methods that leverage fluorescence in situ sequencing (FISSEQ) of barcoded Oligopaint probes to enable the rapid visualization of many targeted genomic regions. Applying Olig...
Article
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
Preprint
Full-text available
T-box riboswitches constitute a large family of tRNA-binding leader sequences that play a central role in gene regulation in many gram-positive bacteria. Accurate inference of the tRNA binding to T-boxes is critical to predict their cis-regulatory activity. However, there is no central repository of information on the tRNA binding specificities of...
Article
Exploiting bacteriophage-derived homologous recombination processes has enabled precise, multiplex editing of microbial genomes and the construction of billions of customized genetic variants in a single day. The techniques that enable this, multiplex automated genome engineering (MAGE) and directed evolution with random genomic mutations (DIvERGE)...
Preprint
Full-text available
The CRISPR RNA-guided endonucleases Cas9, and Cas9-derived adenine/cytosine base editors (ABE/CBE), have been used in both research and therapeutic applications. However, broader use of this gene editing toolbox is hampered by the great variability of efficiency among different target sites. Here we present TRAP-seq, a versatile and scalable approa...
Preprint
Methods for highly multiplexed RNA imaging are limited in spatial resolution, and thus in their ability to localize transcripts to nanoscale and subcellular compartments. We adapt expansion microscopy, which physically expands biological specimens, for long-read untargeted and targeted in situ RNA sequencing. We applied untargeted expansion sequenc...
Article
Full-text available
We present the initial phase of the Korean Genome Project (Korea1K), including 1094 whole genomes (sequenced at an average depth of 31×), along with data of 79 quantitative clinical traits. We identified 39 million single-nucleotide variants and indels of which half were singleton or doubleton and detected Korean-specific patterns based on several...
Article
Fluorescent spatial sequencing brings next-generation sequencing into a new realm capable of identifying nucleic acids in the cell's natural environment. For the first time, scientists are able to multiplex the assignment of specific locations to hundreds of transcriptional targets and lay the foundation for understanding how genetic changes contro...
Preprint
Full-text available
Biological tissues contain thousands of different proteins yet conventional antibody staining can only assay a few at a time because of the limited number of spectrally distinct fluorescent labels. The capacity to map the location of hundreds or thousands of proteins within a single sample would allow for an unprecedented investigation of the spati...
Article
Full-text available
To extend the frontier of genome editing and enable editing of repetitive elements of mammalian genomes, we made use of a set of dead-Cas9 base editor (dBE) variants that allow editing at tens of thousands of loci per cell by overcoming the cell death associated with DNA double-strand breaks and single-strand breaks. We used a set of gRNAs targetin...
Preprint
Full-text available
Bacterial genome editing methods are used to engineer strains for biotechnology and fundamental research. Homologous recombination (HR) is the most versatile method of genome editing, but traditional techniques using endogenous RecA-mediated pathways are inefficient and laborious. Phage encoded RecT proteins can improve HR over 1000-fold, but these...
Article
The extreme shortage of human donor organs for treatment of patients with end-stage organ failures is well known. Xenotransplantation, which might provide unlimited organ supply, is a most promising strategy to solve this problem. Domestic pigs are regarded as ideal organ-source animals owing to similarity in anatomy, physiology and organ size to h...
Article
Due to the rapid emergence of antibiotic-resistant bacteria, there is a growing need to discover new antibiotics. To address this challenge, we trained a deep neural network capable of predicting molecules with antibacterial activity. We performed predictions on multiple chemical libraries and discovered a molecule from the Drug Repurposing Hub—hal...
Preprint
Full-text available
DNA polymerases have revolutionized the biotechnology field due to their ability to precisely replicate stored genetic information. Screening variants of these enzymes for unique properties gives the opportunity to identify polymerases with novel features. We have previously developed a single-molecule DNA sequencing platform by coupling a DNA poly...
Preprint
Full-text available
Tremendous genetic variation exists in nature, but our ability to create and characterize individual genetic variants remains far more limited in scale. Likewise, engineering proteins and phenotypes requires the introduction of synthetic variants, but design of variants outpaces experimental measurement of variant effect. Here, we optimize efficien...
Preprint
Full-text available
Segmental duplications are important for understanding human diseases and evolution. The challenge to distinguish allelic and duplication sequences has hindered their phased assembly as well as characterization of structural variant calls. Here we have developed a novel graph-based approach that leverages single nucleotide differences in overlappin...
Preprint
Full-text available
New storage technologies are needed to keep up with the global demands of data generation. DNA is an ideal storage medium due to its stability, information density and ease of readout with advanced sequencing techniques. However, progress in writing DNA is stifled by the continued reliance on chemical synthesis methods. The enzymatic synthesis of D...
Article
Full-text available
Due to the rapid emergence of antibiotic-resistant bacteria, there is a growing need to discover new antibiotics. To address this challenge, we trained a deep neural network capable of predicting molecules with antibacterial activity. We performed predictions on multiple chemical libraries and discovered a molecule from the Drug Repurposing Hub-hal...
Preprint
Full-text available
We have exploited the repetitive nature of transposable elements of the human genome to generate synthetic circuits. Transposable elements such as LINE-1 and Alu have successfully replicated in mammalian genomes throughout evolution to reach a copy number ranging from thousands to more than a million. Targeting these repetitive elements with progra...
Preprint
Full-text available
Protein engineering has enormous academic and industrial potential. However, it is limited by the lack of experimental assays that are consistent with the design goal and sufficiently high-throughput to find rare, enhanced variants. Here we introduce a machine learning-guided paradigm that can use as few as 24 functionally assayed mutant sequences...
Preprint
Full-text available
Exploiting bacteriophage-derived homologous recombination processes has enabled precise, multiplex editing of microbial genomes and the construction of billions of customized genetic variants in a single day. The techniques that enable this, Multiplex Automated Genome Engineering (MAGE) and directed evolution with random genomic mutations (DIvERGE)...
Preprint
Over the past decade, studies of the human genome and microbiome have deepened our understanding of the connections between human genes, environments, microbes, and disease. For example, the sheer number of indicators of the microbiome and human genetic common variants associated with disease has been immense, but clinical utility has been elusive....
Article
Motivation: Reconstructing high-quality haplotype-resolved assemblies for related individuals has important applications in Mendelian diseases and population genomics. Through major genomics sequencing efforts such as the Personal Genome Project, the Vertebrate Genome Project (VGP), and the Genome in a Bottle project (GIAB), a variety of sequencin...
Preprint
Xenotransplantation, specifically the use of porcine organs for human transplantation, has long been sought after as an alternative for patients suffering from organ failure. However, clinical application of this approach has been impeded by two main hurdles: 1) risk of transmission of porcine endogenous retroviruses (PERVs) and 2) molecular incomp...
Article
Full-text available
DNA is an emerging medium for digital data and its adoption can be accelerated by synthesis processes specialized for storage applications. Here, we describe a de novo enzymatic synthesis strategy designed for data storage which harnesses the template-independent polymerase terminal deoxynucleotidyl transferase (TdT) in kinetically controlled condi...
Article
Full-text available
Rational protein engineering requires a holistic understanding of protein function. Here, we apply deep learning to unlabeled amino-acid sequences to distill the fundamental features of a protein into a statistical representation that is semantically rich and structurally, evolutionarily and biophysically grounded. We show that the simplest models...
Article
Adeno-associated virus (AAV) capsids can deliver transformative gene therapies, but our understanding of AAV biology remains incomplete. We generated the complete first-order AAV2 capsid fitness landscape, characterizing all single-codon substitutions, insertions, and deletions across multiple functions relevant for in vivo delivery. We discovered...
Article
Full-text available
There is increasing demand for single-stranded DNA (ssDNA) of lengths >200 nucleotides (nt) in synthetic biology, biological imaging and bionanotechnology. Existing methods to produce high-purity long ssDNA face limitations in scalability, complexity of protocol steps and/or yield. We present a rapid, high-yielding and user-friendly method for in v...
Preprint
Full-text available
Motivation: DNA has been reported as a promising medium of data storage for its remarkable durability and space-efficient storage capacity. Here, we propose a robust DNA-based data storage method based on a new codec algorithm, namely 'Yin-Yang'. Results: Using this strategy, we successfully stored different formats of files in one synthetic DNA ol...
Article
Full-text available
Comorbidity is common as age increases, and currently prescribed treatments often ignore the interconnectedness of the involved age-related diseases. The presence of any one such disease usually increases the risk of having others, and new approaches will be more effective at increasing an individual’s health span by taking this systems-level view...
Preprint
Full-text available
Haplotype-resolved or phased sequence assembly provides a complete picture of genomes and complex genetic variations. However, current phased assembly algorithms either fail to generate chromosome-scale phasing or require pedigree information, which limits their application. We present a method that leverages long accurate reads and long-range conf...
Preprint
Full-text available
Multi-cellular organisms originate from a single cell, ultimately giving rise to mature organisms of heterogeneous cell type composition in complex structures. Recent work in the areas of stem cell biology and tissue engineering have laid major groundwork in the ability to convert certain types of cells into other types, but there has been limited...
Preprint
Full-text available
The growing number of health-data breaches, the use of genomic databases for law enforcement purposes and the lack of transparency of personal-genomics companies are raising unprecedented privacy concerns. To enable a secure exploration of genomic datasets with controlled and transparent data access, we propose a novel approach that combines crypto...
Article
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
Article
Full-text available
The mechanisms that extend lifespan in humans are poorly understood. Here we show that extended longevity in humans is associated with a distinct transcriptome signature in the cerebral cortex that is characterized by downregulation of genes related to neural excitation and synaptic function. In Caenorhabditis elegans, neural excitation increases w...
Article
Spatial mapping of proteins in tissues is hindered by limitations in multiplexing, sensitivity and throughput. Here we report immunostaining with signal amplification by exchange reaction (Immuno-SABER), which achieves highly multiplexed signal amplification via DNA-barcoded antibodies and orthogonal DNA concatemers generated by primer exchange rea...
Preprint
Full-text available
Ageing is a degenerative process leading to tissue dysfunction and death. A proposed cause of ageing is the accumulation of epigenetic noise, which disrupts youthful gene expression patterns that are required for cells to function optimally and recover from damage. Changes to DNA methylation patterns over time form the basis of an 'ageing clock', b...
Article
Full-text available
The fast-growing Gram-negative bacterium Vibrio natriegens is an attractive microbial system for molecular biology and biotechnology due to its remarkably short generation time1,2 and metabolic prowess3,4. However, efforts to uncover and utilize the mechanisms underlying its rapid growth are hampered by the scarcity of functional genomic data. Here...
Article
Full-text available
The human gut microbiome is linked to many states of human health and disease1. The metabolic repertoire of the gut microbiome is vast, but the health implications of these bacterial pathways are poorly understood. In this study, we identify a link between members of the genus Veillonella and exercise performance. We observed an increase in Veillon...
Article
Clustered regularly interspaced short palindromic repeat (CRISPR)/associated protein (CRISPR/Cas) system is an adaptable immune mechanism used by many bacteria to protect themselves from invading nucleic acids, and it has been recently exploited as an efficient tool for site-specific, programmable genome editing in both single cells and whole organ...
Article
Recombinant adeno-associated virus (rAAV)-mediated gene delivery can efficiently target muscle tissues to serve as "biofactories" for secreted proteins in prophylactic and therapeutic scenarios. Nevertheless, efficient rAAV-mediated gene delivery is often limited by host immune responses against the transgene product. The development of strategies...
Preprint
Full-text available
Recording biological signals can be difficult in three-dimensional matrices, such as tissue. We present a DNA polymerase-based strategy that records temporal biosignals locally into DNA to be read out later, which could obviate the need to extract information from tissue on the fly. We use a template-independent DNA polymerase, terminal deoxynucleo...
Article
If they are able to spread in wild populations, CRISPR-based gene-drive elements would provide new ways to address ecological problems by altering the traits of wild organisms, but the potential for uncontrolled spread tremendously complicates ethical development and use. Here, we detail a self-exhausting form of CRISPR-based drive system comprisin...
Article
Full-text available
In this Review, the year of publication of reference 54 should be 2005, not 2015. In Box 2, “1982: GenBank (https://www.ncbi.nlm.nih.gov/genbank/statistics/)” should read “1982: Genbank/ENA/DDBJ” and “2007: NCBI Short Read Archive” should read “2007: NCBI and ENA Short Read Archives”; this is because the launches of these American, European and Jap...
Preprint
Full-text available
Rational protein engineering requires a holistic understanding of protein function. Here, we apply deep learning to unlabelled amino acid sequences to distill the fundamental features of a protein into a statistical representation that is semantically rich and structurally, evolutionarily, and biophysically grounded. We show that the simplest model...
Preprint
Motivation: Reconstructing high-quality haplotype-resolved assemblies for related individuals of various species has important applications in understanding Mendelian diseases along with evolutionary and comparative genomics. Through major genomics sequencing efforts such as the Personal Genome Project, the Vertebrate Genome Project (VGP), the Eart...
Preprint
Full-text available
To extend the frontier of genome editing and enable the radical redesign of mammalian genomes, we developed a set of dead-Cas9 base editor (dBEs) variants that allow editing at tens of thousands of loci per cell by overcoming the cell death associated with DNA double-strand breaks (DSBs) and single-strand breaks (SSBs). We used a set of gRNAs targe...
Article
Full-text available
The marine bacterium Vibrio natriegens has garnered considerable attention as an emerging microbial host for biotechnology due to its fast growth rate. A general protocol is described for the preparation of V. natriegens crude cell extracts using common laboratory equipment. This high yielding protocol has been specifically optimized for user acces...
Article
We present ampliCan, an analysis tool for genome editing that unites highly precise quantification and visualization of genuine genome editing events. ampliCan features nuclease-optimized alignments, filtering of experimental artifacts, event-specific normalization, and off-target read detection and quantifies insertions, deletions, HDR repair, as...
Article
Full-text available
Blockchain is a shared distributed digital ledger technology that can better facilitate data management, provenance and security, and has the potential to transform healthcare. Importantly, blockchain represents a data architecture, whose application goes far beyond Bitcoin – the cryptocurrency that relies on blockchain and has popularized the tech...
Article
Full-text available
Bacteriophage λ encodes a DNA recombination system that includes a 5'-3' exonuclease (λ Exo) and a single strand annealing protein (Redβ). The two proteins form a complex that is thought to mediate loading of Redβ directly onto the single-stranded 3'-overhang generated by λ Exo. Here, we present a 2.3 ? crystal structure of the λ Exo trimer bound t...