View on GitHub

HapCHAT

Adaptive haplotype assembly for efficiently leveraging high coverage in long reads

HapCHAT

Adaptive haplotype assembly for efficiently leveraging high coverage in long reads

Citation

A description of the algorithm, as well as a detailed comparison experiment with other haplotype assembly tools is presented in:

Stefano Beretta, Murray Patterson, Simone Zaccaria, Gianluca Della Vedova and Paola Bonizzoni. HapCHAT: adaptive haplotype assembly for efficiently leveraging high coverage in long reads. BMC Bioinformatics, 19(1):252 (2018). *Joint first authors

DOI: 10.1186/s12859-018-2253-8

To replicate the experiments of this paper, go here

Quick Install

If you have docker installed, to install and run HapCHAT you only have to run docker run -v DATADIR:/data algolab/hapchat where DATADIR is a directory with the input data files genome.fasta, file.bam, file.vcf. If DATADIR is empty, then HapCHAT is run on the example files in the example directory (the example genome.fasta is too large, so it is downloaded if necessary).

Input and output files

Installation for Experts

HapCHAT has been developed and tested on Ubuntu Linux, but should work any system which has python(3), C++(>=11), as well other utilities that appear on most *nix-based systems (such as bash, awk, git, cmake and make)

Some more specific dependencies that may not be installed are python3-dev, python3-networkx and virtualenv. These can be obtained in Ubuntu with, e.g., the command apt install python3-dev, etc.

Then, in principle, one needs to simply execute setup.sh, which is located in the same directory as this README, and then HapCHAT can be run by executing HapCHAT.py (located in this directory as well)

Note: that setup.sh simply checks out a HapCHAT-specific git branch of WhatsHap, installs it in a virtual environment, and then builds with cmake the C++ code located in src/. While this should work automagically for most *nix-based systems with bash, solutions could be found for other systems by slightly modifying setup.sh.

The main advantage of the expert installation is that you can adapt the Snakefile in the example directory to run on your files, using custom filenames and directory tree.