Introduction

ParaBWT is a new and practical parallelized Burrows-Wheeler transform (BWT) and suffix array construction algorithm for big genome data, which has a linear space complexity with a small constant factor. The performance of ParaBWT has been evaluated using two sequences generated from two human genome assemblies: the Ensembl Homo sapiens assembly and the human reference genome, on a workstation with two Intel Xeon X5650 hex-core CPUs and 96 GB RAM, running the Ubuntu 12.04 LTS operating system. Our performance comparison to FMDindex and Bwt-disk reveals that on 12 CPU cores, ParaBWT runs up to 2.2 times faster than FMD-index, reducing the runtime from 26.56 hours to 12.34 hours for a sequence of about 60 billion nucleotides, and up to 99.0 times faster than Bwt-disk.


Downloads


Citation

Other related papers


Parameters


Installation and Usage

Prerequisites

  1. Intel Threading Building Blocks (TBB) library, which is available at here.

Download and compiling

  1. Download the binary tarball and uncompress using the "tar" command.

Typical Usage

This algorithm accepts FASTA/FASTQ format and the typical usage is like "ParaBWT [options] infile.fa [infile1.fa]".

  1. Get the command line options

    type command line: parabwt.

  2. Build the Burrows-Wheeler transform and suffix array

    type command line: parabwt -o parabwt_test -s 3 -a 1 -t 4 genome.fa


Change Log


Contact

If any questions or improvements, please contact Liu Yongchao (Email: yliu860 (at) gatech (dot) edu).