Software and Resources
Our lab developed several alignment and assembly algorithms critical to high-throughput sequence analysis.
These include samtools, BWA, minimap2 and hifiasm,
with each cited for 1000+ times per year.
We also explore a variety of algorithms related to
variant calling (e.g. longcallR and longcallD),
pangenome analysis (e.g. minigraph and pangene),
protein alignment (e.g. miniprot),
full-text indexing (e.g. ropebwt3),
immunology (e.g. Immuannot and T1K),
evolution (e.g. psmc and compleasm)
and high-performance data structures in general (e.g. bedtk and BGT).
Most of our tools work years after their initial publications and are often well received.
Software
Current
- myloasm: metagenome assembler for PacBio HiFi and Nanopore R10 reads, unpublished

- minisplice: splice site scoring for improving spliced alignment, unpublished

- longcallD: small and large variant calling from long genomic reads, unpublished

- longcallR: SNP calling and haplotype-specific analysis for long RNA-seq reads, preprinted in Huang et al (2025)

- colorSV: somatic SV calling via tumor-normal co-assembly, preprinted in Le et al (2024)

- ropebwt3: construction and utility of BWT for DNA string sets, published in Li (2014) and Li (2024).

- Immuannot: annotating HLA and KIR genes in phased assemblies, published in Zhou et al (2024).

- pangene: constructing pangenome gene graphs, published in Li (2024).

- compleasm: a reimplementation of BUSCO for evaluating the gene completeness of an assembly, published in Huang and Li (2023).

- srf: assembling satellite DNA, published in Zhang et al (2023).

- miniprot: protein-to-genome alignment allowing splicing and frameshift, published in Li (2023).

- bedtk and cgranges: a fast toolkit and library for working with BED files, published in Li and Rong (2020).

- yak: k-mer counting and assembly evaluation, developed for Cheng et al (2021).

- gwfa: graph wavefront alignment with edit distance, preprinted at Zhang et al (2022).
Merged into gfatools and used by minigraph.

- minigraph: pangenome construction and sequence-to-graph alignment, published in Li et al (2020).

- dipcall: variant calling for phased diploid assemblies, developed for Li et al (2019).

- minimap2: widely used long-read aligner, published in Li (2018) and improved in Li (2021)

- miniasm: a simple long-read assembler, published in Li (2016).
Useful for assembly at small scale; not recommended for production.

- BWA: widely used short-read aligner,
published in Li and Durbin (2009), Li and Durbin (2010) and Li (2013).

- minipileup: simple pileup-based variant caller, unpublished

- seqtk: a small toolkit for manipulating sequences in FASTA/FASTQ, unpublished

- gfatools: a toolkit for working with graphs in the GFA format, unpublished

- miniwfa: a reimplementation of the wavefront alignment algorithm at low memory. Unpublished but used in minigraph.

- jstreeview: interactive phylogenetic tree viewer/editor in JavaScript, unpublished

Developed by past members or maintained by others
- ntsm: detecting sample swaps, published in Chu and Li (2024).

- hifiasm: genome assembly with PacBio HiFi, Nanopore and Hi-C data,
published in Cheng et al (2021), Cheng et al (2022) and Cheng et al (2024).
Maintained by Haoyu Cheng.

- hifiasm-meta: metagenome assembly with PacBio HiFi,
published in Feng et al (2022) and Feng et al (2024).
Maintained by Xiaowen Feng.

- T1K: HLA and KIR genotyping with short reads, published in Song et al (2023).
Maintained by Li Song.

- chromap: aligning short ChIP-seq, ATAC-seq or Hi-C reads, published in Zhang et al (2021).
Maintained by Haowen Zhang and Li Song.

- hifieval: evaluating error correction accuracy for HiFi data, published in Guo et al (2023).

- tabix: indexing and querying coordinate-sorted formats such as VCF and BED,
published in Li (2011).
Now part of the samtools project.

- samtools: utilities for manipulating alignments in the SAM format.
Initially published in Li et al (2009), Li (2011a) and Li (2011b).
Maintained by Sanger since 2013.

- TreeBeST: the core engine behind TreeFam for tree building.
Some components are described in PI’s thesis.
Maintained by Ensembl Compara.

Old but functional
- dna-nn: model and predict short DNA sequence features with neural networks, published in Li (2019).

- hickit: 3D modeling for single-cell Hi-C, developed for Tan et al (2018).
It was not used in this paper but used in Longzhi Tan’s later work.

- BGT: fast and lightweight genotype query across many samples, published in Li (2016).

- fermi, fermi2 and FermiKit: short-read assembler,
published in Li (2012) and Li (2015).

- fermi-lite: a library in C for short-read assembly in small regions, adapted from FermiKit

- BFC: correcting sequencing errors in short reads, published in Li (2015).

- bioawk: BWK awk modified for biological data, unpublished

- psmc: infer historical population sizes from a diploid genome, published in Li and Durbin (2011).

Graveyard
- MAQ: short-read aligner, published in Li et al (2008).
It is still working but there is no point to use it now.
Resources
Updated resources since publication
Unpublished resources
Graveyard