Software and Resources
Our lab developed several alignment and assembly algorithms critical to high-throughput sequence analysis.
These include samtools, BWA, minimap2 and hifiasm,
with each cited for 1000+ times per year.
We also explore a variety of algorithms related to
variant calling (e.g. longcallR and longcallD),
pangenome analysis (e.g. minigraph and pangene),
protein alignment (e.g. miniprot),
full-text indexing (e.g. ropebwt3),
immunology (e.g. Immuannot and T1K),
evolution (e.g. psmc and compleasm)
and high-performance data structures in general (e.g. bedtk and BGT).
Most of our tools work years after their initial publications and are often well received.
Software
Current
- myloasm: metagenome assembler for PacBio HiFi and Nanopore R10 reads, unpublished

- minisplice: splice site scoring for improving spliced alignment, unpublished

- longcallD: small and large variant calling from long genomic reads, unpublished

- longcallR: SNP calling and haplotype-specific analysis for long RNA-seq reads, preprinted in Huang et al (2025)

- colorSV: somatic SV calling via tumor-normal co-assembly, preprinted in Le et al (2024)

- ropebwt3: construction and utility of BWT for DNA string sets, published in Li (2014) and Li (2024).

- Immuannot: annotating HLA and KIR genes in phased assemblies, published in Zhou et al (2024).

- pangene: constructing pangenome gene graphs, published in Li (2024).

- compleasm: a reimplementation of BUSCO for evaluating the gene completeness of an assembly, published in Huang and Li (2023).

- srf: assembling satellite DNA, published in Zhang et al (2023).

- miniprot: protein-to-genome alignment allowing splicing and frameshift, published in Li (2023).

- bedtk and cgranges: a fast toolkit and library for working with BED files, published in Li and Rong (2020).

- yak: k-mer counting and assembly evaluation, developed for Cheng et al (2021).

- gwfa: graph wavefront alignment with edit distance, preprinted at Zhang et al (2022).
Merged into gfatools and used by minigraph.

- minigraph: pangenome construction and sequence-to-graph alignment, published in Li et al (2020).

- dipcall: variant calling for phased diploid assemblies, developed for Li et al (2019).

- minimap2: widely used long-read aligner, published in Li (2018) and improved in Li (2021)

- miniasm: a simple long-read assembler, published in Li (2016).
Useful for assembly at small scale; not recommended for production.

- BWA: widely used short-read aligner,
published in Li and Durbin (2009), Li and Durbin (2010) and Li (2013).

- minipileup: simple pileup-based variant caller, unpublished

- seqtk: a small toolkit for manipulating sequences in FASTA/FASTQ, unpublished

- gfatools: a toolkit for working with graphs in the GFA format, unpublished

- miniwfa: a reimplementation of the wavefront alignment algorithm at low memory. Unpublished but used in minigraph.

- jstreeview: interactive phylogenetic tree viewer/editor in JavaScript, unpublished

Developed by past members or maintained by others
- ntsm: detecting sample swaps, published in Chu and Li (2024).

- hifiasm: genome assembly with PacBio HiFi, Nanopore and Hi-C data,
published in Cheng et al (2021), Cheng et al (2022) and Cheng et al (2024).
Maintained by Haoyu Cheng.

- hifiasm-meta: metagenome assembly with PacBio HiFi,
published in Feng et al (2022) and Feng et al (2024).
Maintained by Xiaowen Feng.

- T1K: HLA and KIR genotyping with short reads, published in Song et al (2023).
Maintained by Li Song.

- chromap: aligning short ChIP-seq, ATAC-seq or Hi-C reads, published in Zhang et al (2021).
Maintained by Haowen Zhang and Li Song.

- hifieval: evaluating error correction accuracy for HiFi data, published in Guo et al (2023).

- tabix: indexing and querying coordinate-sorted formats such as VCF and BED,
published in Li (2011).
Now part of the samtools project.

- samtools: utilities for manipulating alignments in the SAM format.
Initially published in Li et al (2009), Li (2011a) and Li (2011b).
Maintained by Sanger since 2013.

- TreeBeST: the core engine behind TreeFam for tree building.
Some components are described in PI’s thesis.
Maintained by Ensembl Compara.

Old but functional
- dna-nn: model and predict short DNA sequence features with neural networks, published in Li (2019).

- hickit: 3D modeling for single-cell Hi-C, developed for Tan et al (2018).
It was not used in this paper but used in Longzhi Tan’s later work.

- BGT: fast and lightweight genotype query across many samples, published in Li (2016).

- fermi, fermi2 and FermiKit: short-read assembler,
published in Li (2012) and Li (2015).

- fermi-lite: a library in C for short-read assembly in small regions, adapted from FermiKit

- BFC: correcting sequencing errors in short reads, published in Li (2015).

- bioawk: BWK awk modified for biological data, unpublished

- psmc: infer historical population sizes from a diploid genome, published in Li and Durbin (2011).

Graveyard
- MAQ: short-read aligner, published in Li et al (2008).
It is still working but there is no point to use it now.
Resources
Updated resources since publication
Unpublished resources
notable genomic regions in T2T-CHM13 and GRCh38
portable binaries for samtools v1.14 and for GCC v10.3.0 on CentOS 7.
pantree VCFs generated from HPRC graphs
human reference genome analysis sets including BWA and Bowtie2 indices.
easy genomic regions for short-read variant calling
haplotype-resolved PGP1 assembly
Graveyard