We have moved!
Lars Hahn left the working group of Prof. Morgenstern (gobics.de) in December 2017.While version 1.4.0 is still available here, future versions of rasbhari will be published on a new project website:
Introduction
Many algorithms for sequence analysis use patterns or spaced seeds consisting of match and don’t-care positions, such that only characters at the match positions are considered when sub-words of the sequences are counted or compared. The performance of these approaches depends on the underlying patterns. rasbhari is a novel tool which generates optimized sets of patterns for database searching, read mapping and alignment-free sequence comparison. rasbhari uses an improved hill-climbing algorithm which produces patterns with slightly higher sensitivity than seeds calculated with other tools.rasbhari is described in this paper:
-
Hahn L., Leimeister C.-A., Ounit R., Lonardi S., Morgenstern B. (2016)
rasbhari: Optimizing Spaced Seeds for Database Searching, Read Mapping and Alignment-Free Sequence Comparison.
PLos Comput Biol 12(10): e1005107
doi:10.1371/journal.pcbi.1005107
Related Approaches
Spaced-Words (Leimeister et al., 2014) is an approach to fast alignment-free sequence comparison. The Spaced-Words approach calculates distances between pairs of sequences based on spaced-word frequencies. rasbhari is integrated in the Spaced-Words software to generate sets of patterns which reduce the variance of the number of word matches.
rasbhari can be downloaded here
In addition, the download is available at the rasbhari GitHub repository
Use the following commands for compiling
After compiling the program can be run with
An example program usage could be
Usage
For compilation change your directory to the folder containing the archiv and extract it.Use the following commands for compiling
cd rasbhari-[version]
make
make
After compiling the program can be run with
./rasbhari [options]
Options - Extract
Some of the available options are the following. For more information please have a look at the README.
Options for the algorithms
--variance: Calculate the variance instead of Overlap Complexity.
--permut [int]: Select [int] times a pattern and try to modify and improve the patternset.
Options for the pattern
-m [int]: Number of patterns (default: m=10).
-d [int]: Number of don't care positions.
-w [int]: Number of match positions, the pattern weight.
Options for the variance
-S [int]: Sequence length of a theoretical dataset.
-p [double]: Background probability for all 4 nucleotides (A,C,G,T)
--variance: Calculate the variance instead of Overlap Complexity.
--permut [int]: Select [int] times a pattern and try to modify and improve the patternset.
Options for the pattern
-m [int]: Number of patterns (default: m=10).
-d [int]: Number of don't care positions.
-w [int]: Number of match positions, the pattern weight.
Options for the variance
-S [int]: Sequence length of a theoretical dataset.
-p [double]: Background probability for all 4 nucleotides (A,C,G,T)
An example program usage could be
./rasbhari -m 10 -w 8 -d 6-15 --permut 25000
Output
The output of the program is a a set of pattern, e.g.:
10110001
10100101
11001001
10011001
10000111
11100001
10101001
Contact
For comments, or if you encounter any technichal issues, please send an email at: lhahn(at)data-learning.deReferences
Scientific publications using rasbhari should cite:-
Hahn L., Leimeister C.-A., Ounit R., Lonardi S., Morgenstern B. (2016)
rasbhari: Optimizing Spaced Seeds for Database Searching, Read Mapping and Alignment-Free Sequence Comparison.
PLos Comput Biol 12(10): e1005107
doi:10.1371/journal.pcbi.1005107