Selected Publications

(See also Google Scholar; star for equal contribution; for corresponding authors; bold fontface for lab members/interns)

  • Li H (2024) BWT construction and search at the terabase scale. Bioinformatics. [PMID: 39607778]
  • Li H, Marin M, Farhat MR (2024) Exploring gene content with pangene graphs. Bioinformatics. [PMID: 39041615]
  • Tan K-T, Slevin MK, Leibowitz ML, Garrity-Janger M, Shan J, Li H, Meyerson M (2024) Neotelomeres and telomere-spanning chromosomal arm fusions in cancer genomes revealed by long-read sequencing. Cell Genom.. [PMID: 38917803]
  • Zhou Y, Song L, Li H (2024) Full resolution HLA and KIR gene annotations for human genome assemblies. Genome Res. [PMID: 38839374]
  • Chu J, Rong J, Feng X, Li H (2024) ntsm: an alignment-free, ultra-low-coverage, sequencing technology agnostic, intraspecies sample comparison tool for sample swap detection. Gigascience, 13:giae024. [PMID: 38832466]
  • Cheng H, Asri M, Lucas J, Koren S, Li H (2024) Scalable telomere-to-telomere assembly for diploid and polyploid genomes with double graph. Nat Methods. [PMID: 38730258]
  • Li H, Durbin R (2024) Genome assembly in the telomere-to-telomere era. Nat Rev Genet. [PMID: 38649458]
  • Feng X, Li H (2024) Evaluating and improving the representation of bacterial contents in long-read metagenome assemblies. Genome Biol, 25:92. [PMID: 38605401]
  • Zhang Y, Chu J, Cheng H, Li H (2023) De novo reconstruction of satellite repeat units from sequence data. Genome Res, 33:1994-2001. [PMID: 37918962]
  • Guo Y, Feng X, Li H (2023) Evaluation of haplotype-aware long-read error correction with hifieval. Bioinformatics, 39:btad631. [PMID: 37851384]
  • Huang N, Li H (2023) Compleasm: a faster and more accurate reimplementation of BUSCO. Bioinformatics, 39:btad595. [PMID: 37758247]
  • Song L, Bai G, Liu XS, Li B, Li H (2023) Efficient and accurate KIR and HLA genotyping with next-generation sequencing data. Genome Res, 33:923-931. [PMID: 37169596]
  • Liao W-W*, Asri M*, Ebler J*, …, Garrison E, Marschall T, Hall IM, Li H, Paten B (2023) A draft human pangenome reference. Nature, 617:312-324. [PMID: 37165242]
  • Deorowicz S, Danek A, Li H (2023) AGC: compact representation of assembled genomes with fast queries and updates. Bioinformatics, 39:btad097. [PMID: 36864624]
  • Li H (2023) Protein-to-genome alignment with miniprot. Bioinformatics, 39:btad014. [PMID: 36648328]
  • Zhang H, Wu S, Aluru S, Li H (2022) Fast sequence to graph alignment using the graph wavefront algorithm. arXiv:2206.13574 (preprint). [PMID: ]
  • Tan K-T, Slevin MK, Meyerson M, Li H (2022) Identifying and correcting repeat-calling errors in nanopore sequencing of telomeres. Genome Biol., 23:180. [PMID: 36028900]
  • Feng X, Cheng H, Portik D, Li H (2022) Metagenome assembly of high-fidelity long reads with hifiasm-meta. Nat Methods, 19:671-674. [PMID: 35534630]
  • Cheng H, Jarvis ED, Fedrigo O, Koepfli K-P, Urban L, Gemmell NJ, Li H (2022) Haplotype-resolved assembly of diploid individuals without parental data. Nat Biotechnol, published online. [PMID: 35332338]
  • Kokot M, Gudys, Li H, Deorowicz S (2022) CoLoRd: compressing long reads. Nat Methods, 19:441-444. [PMID: 35347321]
  • Zhang H*, Song L*, Wang X, Cheng H, Wang C, Meyer C. A, Liu T, Tang M, Aluru S, Yue F, Liu XS, Li H (2021) Fast alignment and preprocessing of chromatin profiles with Chromap. Nat Commun, 12:6566. [PMID: 34772935]
  • Li H (2021) New strategies to improve minimap2 alignment accuracy. Bioinformatics, 37:4572-4574. [PMID: 34623391]
  • Zhang H, Li H, Jain C, Cheng H, Au KF, Li H, Aluru S (2021) Real-time mapping of nanopore raw signals. Bioinformatics, 37:i477-i483. [PMID: 34252938]
  • Feng X, Li H (2021) Higher rates of processed pseudogene acquisition in humans and three great apes revealed by long read assemblies. Mol Biol Evol, 38:2958-2966. [PMID: 33681998]
  • Li H, Rong J (2021) Bedtk: Finding Interval Overlap with Implicit Interval Tree. Bioinformatics, 37:1315-1316. [PMID: 32966548]
  • Garg S, Fungtammasan A, Carroll A, Chou M, Schmitt A, …, Chin C-S, Church GM, Li H (2021) Chromosome-scale haplotype-resolved assembly of human genomes. Nat Biotechnol, 39:309-312. [PMID: 33288905]
  • Xing D*, Tan L*, Chang C-H, Li H, Xie XS (2021) Accurate SNV detection in single cells by transposon-based whole-genome amplification of complementary strands. Proc Natl Acad Sci, 118:e2013106118. [PMID: 33593904]
  • Cheng H, Concepcion GT, Feng X, Zhang H, Li H (2021) Haplotype-resolved de novo assembly with phased assembly graphs with hifiasm. Nat Methods, 18:170-175. [PMID: 33526886]
  • Li H, Feng X, Chu C (2020) The design and construction of reference pangenome graphs with minigraph. Genome Biol, 21:265. [PMID: 33066802]
  • Ruan J and Li H (2020) Fast and accurate long-read assembly with wtdbg2. Nat. Methods, 17:155-158. [PMID: 31819265]
  • Wenger AM*, Peluso P*, Rowell WJ, Chang PC, Hall RJ, …, Li H, …, Rank DR, Hunkapiller MW (2019) Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat Biotechnol., 37:1155-1162. [PMID: 31406327]
  • Li H (2019) Identifying centromeric satellites with dna-brnn. Bioinformatics, 35:4408-4410. [PMID: 30989183]
  • Li H (2018) Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics, 34:3094-3100. [PMID: 29750242]
  • Tan L*, Xing D*, Chang CH, Li H, Xie XS (2018) Three-dimensional genome structures of single diploid human cells. Science, 361:924-928. [PMID: 30166492]
  • Li H, Bloom JM, Farjoun Y, Fleharty M, Gauthier L, Neale B, MacArthur D (2018) A synthetic-diploid benchmark for accurate variant-calling evaluation. Nat Methods, 15:595-597. [PMID: 30013044]
  • Chen C*, Xing D*, Tan L*, Li H*, Zhou G, Huang L, Xie XS (2017) Single-cell whole-genome analyses by Linear Amplification via Transposon Insertion (LIANTI). Science, 356:189-194. [PMID: 28408603]
  • Mallick S*, Li H*, Lipson M*, Mathieson I*, Gymrek M, Racimo F, …, Reich D (2016) The Simons Genome Diversity Project: 300 genomes from 142 diverse populations. Nature, 538:201-206. [PMID: 27654912]
  • Li H (2016) Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences. Bioinformatics, 32:2103-2110. [PMID: 27153593]
  • Li H (2016) BGT: efficient and flexible genotype query across many samples. Bioinformatics, 32:590-592. [PMID: 26500154]
  • The 1000 Genomes Project Consortium (2015) A global reference for human genetic variation. Nature, 526:68-74. [PMID: 26432245]
  • Li H (2015) FermiKit: assembly-based variant calling for Illumina resequencing data. Bioinformatics, 31:3694-3696. [PMID: 26220959]
  • Li H (2015) BFC: correcting Illumina sequencing errors. Bioinformatics, 31:2885-2887. [PMID: 25953801]
  • Palkopoulou E, Mallick S, Skoglund P, Enk J, Rohland N, Li H, Omrak A, Vartanyan S, Poinar H, Götherström, …, Dalén L (2015) Complete genomes reveal signatures of demographic and genetic declines in the woolly mammoth. Curr Biol., 25:1395-400. [PMID: 25913407]
  • Do R, Balick D, Li H, Adzhubei I, Sunyaev S, Reich D (2014) No evidence that selection has been less effective at removing deleterious mutations in Europeans than in Africans. Nat Genet., 47:126-31. [PMID: 25581429]
  • Fu Q, Li H, Moorjani P, Jay F, Slepchenko SM, …, Reich D, Kelso J, Viola TB, Pääbo S (2014) Genome sequence of a 45,000-year-old modern human from western Siberia. Nature, 514:445-449. [PMID: 25341783]
  • Li H (2014) Fast construction of FM-index for long sequence reads. Bioinformatics, 30:3274-3275. [PMID: 25107872]
  • Li H (2014) Towards Better Understanding of Artifacts in Variant Calling from High-Coverage Samples. Bioinformatics, 30:2843-2851. [PMID: 24974202]
  • Prüfer K, Racimo F, Patterson N, Jay F, …, Li H, …, Slatkin M, Reich D, Kelso J, Pääbo S (2014) The complete genome sequence of a Neanderthal from the Altai Mountains. Nature, 505:43-49. [PMID: 24352235]
  • Li H (2012) Exploring single-sample SNP and INDEL calling with whole-genome de novo assembly. Bioinformatics, 28:1838-1844. [PMID: 22569178]
  • Li H (2011) A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics, 27:2987-2993. [PMID: 21903627]
  • Li H, Durbin R (2011) Inference of human population history from individual whole-genome sequences. Nature, 475:493-496. [PMID: 21753753]
  • Li H (2011) Improving SNP discovery by base alignment quality. Bioinformatics, 27:1157-1158. [PMID: 21320865]
  • Li H (2011) Tabix: Fast retrieval of sequence features from generic TAB-delimited files. Bioinformatics, 27:718-719. [PMID: 21208982]
  • Li H, Homer N (2010) A survey of sequence alignment algorithms for next-generation sequencing. Brief Bioinform, 11:473-83. [PMID: 20460430]
  • Green RE*,†, Krause J*, Briggs AW*, Maricic T*, Stenzel U*, Kircher M*, Patterson N*, Li H, …, Reich D, Pääbo S (2010) A draft sequence of the Neandertal genome. Science, 328:680-684. [PMID: 20448178]
  • Li H, Durbin R (2010) Fast and accurate long-read alignment with Burrows-Wheeler Transform. Bioinformatics, 26:589-95. [PMID: 20080505]
  • Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler Transform. Bioinformatics, 25:1754-1760. [PMID: 19451168]
  • Li H*, Handsaker B*, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics, 25:2078-2079. [PMID: 19505943]
  • Li H, Ruan J and Durbin R (2008) Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res., 18:1851-8. [PMID: 18714091]
  • Ruan J*, Li H*, Chen Z, Coghlan A, Coin LJ, …, Wang J, Durbin R (2008) TreeFam: 2008 Update. Nucleic Acids Res, 36:D735-740. [PMID: 18056084]
  • Li H*, Guan L*, Liu T*, Zheng W, Wong G and Wang J (2007) A cross-species alignment tool (CAT). BMC Bioinformatics, 8:439. [PMID: 17880681]
  • Li H, Coghlan A, Ruan J, Coin LJ, Hériché JK, …, Durbin R (2006) TreeFam: a curated database of phylogenetic trees of animal gene families. Nucleic Acids Res., 34:D572-580. [PMID: 16381935]
  • Li H*, Liu J*, Xu Z*, Jin J, Fang L, …, Hao B-L (2005) Test data sets and evaluation of gene prediction programs on the rice genome. J Comput Sci & Technol., 20:446-53. [PMID: ]