Bloom Filter Trie
|
Interface containing all functions to use a BFT. More...
Go to the source code of this file.
Data Structures | |
struct | BFT_annotation |
Annotation associated with a BFT_kmer. More... | |
Typedefs | |
typedef BFT_Root | BFT |
Root vertex of a BFT. More... | |
typedef size_t(* | BFT_func_ptr) (BFT_kmer *bft_kmer, BFT *bft, va_list args) |
Pointer on function used by iterate_over_kmers() and v_iterate_over_kmers(). More... | |
Functions | |
Annotation functions | |
These functions manipulate annotations (color sets). | |
uint8_t | intersection_annots (const uint8_t a, const uint8_t b) |
uint8_t | union_annots (const uint8_t a, const uint8_t b) |
uint8_t | sym_difference_annots (const uint8_t a, const uint8_t b) |
BFT_annotation * | create_BFT_annotation () |
Function creating an empty BFT_annotation. More... | |
void | free_BFT_annotation (BFT_annotation *bft_annot) |
Function freeing a BFT_annotation. More... | |
BFT_annotation * | get_annotation (BFT_kmer *bft_kmer) |
Function extracting the annotation (set of colors) associated with a k-mer of a BFT. More... | |
bool | presence_genome (uint32_t id_genome, BFT_annotation *bft_annot, BFT *bft) |
Function testing if a k-mer occured in a genome. More... | |
BFT_annotation * | intersection_annotations (BFT *bft, uint32_t nb_annotations,...) |
Function computing the intersection of a set of annotations. More... | |
BFT_annotation * | union_annotations (BFT *bft, uint32_t nb_annotations,...) |
Function computing the union of a set of annotations. More... | |
BFT_annotation * | sym_difference_annotations (BFT *bft, uint32_t nb_annotations,...) |
Function computing the symmetric difference of a set of annotations. More... | |
uint32_t * | get_list_id_genomes (BFT_annotation *bft_annot, BFT *bft) |
Function extracting a list of genome identifiers from an annotation. More... | |
uint32_t | get_count_id_genomes (BFT_annotation *bft_annot, BFT *bft) |
Function counting the number of genome identifiers in an annotation. More... | |
uint32_t * | intersection_list_id_genomes (uint32_t *list_a, uint32_t *list_b) |
Graph functions | |
These functions manipulate a colored de Bruijn graph stored in a BFT. | |
BFT * | create_cdbg (int k, int treshold_compression) |
Function creating a colored de Bruijn graph stored in a BFT. More... | |
void | free_cdbg (BFT *bft) |
Free an allocated colored de Bruijn graph stored in a BFT. More... | |
Insertion functions | |
These functions insert genomes in a colored de Bruijn graph stored in a BFT. | |
void | insert_genomes_from_files (int nb_files, char **paths, BFT *bft, char *prefix_bft_filename) |
Function inserting genomes (k-mer file) in a BFT. More... | |
void | insert_kmers_new_genome (int nb_kmers, char **kmers, char *genome_name, BFT *bft) |
Function inserting k-mers of a new genome in a BFT. More... | |
void | insert_kmers_last_genome (int nb_kmers, char **kmers, BFT *bft) |
Function inserting k-mers of the last inserted genome in a BFT. More... | |
K-mer functions | |
These functions manipulate k-mers. | |
BFT_kmer * | create_kmer (const char *kmer, int k) |
Function creating a BFT_kmer object from a k-mer encoded as an ASCII string (char*). More... | |
BFT_kmer * | create_empty_kmer () |
Function creating an empty BFT_kmer object (all its components are NULL). More... | |
void | free_BFT_kmer (BFT_kmer *bft_kmer, int nb_bft_kmer) |
Function freeing allocated BFT_kmers. More... | |
void | free_BFT_kmer_content (BFT_kmer *bft_kmer, int nb_bft_kmer) |
Function freeing the content of allocated BFT_kmers. More... | |
void | extract_kmers_to_disk (BFT *bft, char *filename_output, bool compressed_output) |
Function extracting the k-mers of a BFT in a file. More... | |
size_t | write_kmer_ascii_to_disk (BFT_kmer *bft_kmer, BFT *bft, va_list args) |
Function writing an ASCII k-mer in a file. More... | |
size_t | write_kmer_comp_to_disk (BFT_kmer *bft_kmer, BFT *bft, va_list args) |
Function writing an 2 bits encoded k-mer in a file. More... | |
Query functions | |
These functions query for k-mers or sequences. | |
BFT_kmer * | get_kmer (const char *kmer, BFT *bft) |
Function searching for a k-mer in a BFT. More... | |
bool | is_kmer_in_cdbg (BFT_kmer *bft_kmer) |
Function testing if a k-mer is in a BFT. More... | |
uint32_t * | query_sequence (BFT *bft, char *sequence, double threshold, bool canonical_search) |
Function querying a BFT for a sequence. More... | |
Pattern matching functions | |
These functions provide pattern matching functionalities over the k-mers or paths of a colored de Bruijn graph stored as a BFT. | |
bool | prefix_matching (BFT *bft, char *prefix, BFT_func_ptr f,...) |
Function for prefix matching over the k-mers of a BFT. More... | |
Marking functions | |
These functions allow to mark k-mers of a colored de Bruijn graph with flags. | |
void | set_marking (BFT *bft) |
Function locking and preparing the graph for vertices marking (no insertion can happen before unlocking). More... | |
void | unset_marking (BFT *bft) |
Function unlocking and the graph locked for vertices marking. More... | |
void | set_flag_kmer (uint8_t flag, BFT_kmer *bft_kmer, BFT *bft) |
Function marking a k-mer of a BFT with a flag. More... | |
uint8_t | get_flag_kmer (BFT_kmer *bft_kmer, BFT *bft) |
Function getting a k-mer of a BFT with a flag. More... | |
Traversal functions | |
These functions allow to traverse a colored de Bruijn graph stored as a BFT. | |
void | set_neighbors_traversal (BFT *bft) |
Function locking the graph for traversal. More... | |
void | unset_neighbors_traversal (BFT *bft) |
Function unlocking a locked graph for traversal. More... | |
BFT_kmer * | get_neighbors (BFT_kmer *bft_kmer, BFT *bft) |
Function extracting the neighbors of a k-mer. More... | |
BFT_kmer * | get_predecessors (BFT_kmer *bft_kmer, BFT *bft) |
Function extracting the predecessors of a k-mer. More... | |
BFT_kmer * | get_successors (BFT_kmer *bft_kmer, BFT *bft) |
Function extracting the successors of a k-mer. More... | |
Iteration functions | |
These functions iterate over the k-mers of a colored de Bruijn graph stored as a BFT. | |
void | iterate_over_kmers (BFT *bft, BFT_func_ptr f,...) |
Function iterating over the k-mers of a BFT. More... | |
void | v_iterate_over_kmers (BFT *bft, BFT_func_ptr f, va_list args) |
Function iterating over the k-mers of a BFT. More... | |
Disk I/O functions | |
These functions write and load a BFT from disk. | |
void | write_BFT (BFT *bft, char *filename, bool compress_annotations) |
Function writing a BFT to disk. More... | |
BFT * | load_BFT (char *filename) |
Function loading a BFT from disk. More... | |
Interface containing all functions to use a BFT.
Code snippets using this interface are provided in snippets.h.
Root vertex of a BFT.
A BFT_Root contains the k-mer size as well as the number and name of the inserted genomes. Other contained structures and variables are for internal use only and must not be modified.
Pointer on function used by iterate_over_kmers() and v_iterate_over_kmers().
Such a function (user written) is called on every k-mer of a BFT.
bft_kmer | is a k-mer from a BFT. |
bft | is the BFT from which bft_kmer is from. |
args | contains all additional parameters given to iterate_over_kmers() / v_iterate_over_kmers(). |
|
inline |
Function creating an empty BFT_annotation.
BFT* create_cdbg | ( | int | k, |
int | treshold_compression | ||
) |
Function creating a colored de Bruijn graph stored in a BFT.
k | is the length of k-mers. |
treshold_compression | indicates when the color compression should be triggered (every treshold_compression genome inserted). |
BFT_kmer* create_empty_kmer | ( | ) |
BFT_kmer* create_kmer | ( | const char * | kmer, |
int | k | ||
) |
void extract_kmers_to_disk | ( | BFT * | bft, |
char * | filename_output, | ||
bool | compressed_output | ||
) |
Function extracting the k-mers of a BFT in a file.
bft | is a BFT containing the k-mers to iterate over. |
filename_output | is the name of a file to which the k-mers are written. File is overwritten if it already exists. |
compressed_output | is a boolean indicating if the k-mers should be written in their 2 bits form (true) or ASCII form (false). |
void free_BFT_annotation | ( | BFT_annotation * | bft_annot | ) |
Function freeing a BFT_annotation.
bft_annot | is a pointer to the BFT_annotation to free. |
void free_BFT_kmer | ( | BFT_kmer * | bft_kmer, |
int | nb_bft_kmer | ||
) |
void free_BFT_kmer_content | ( | BFT_kmer * | bft_kmer, |
int | nb_bft_kmer | ||
) |
void free_cdbg | ( | BFT * | bft | ) |
Free an allocated colored de Bruijn graph stored in a BFT.
bft | is an allocated BFT. |
BFT_annotation* get_annotation | ( | BFT_kmer * | bft_kmer | ) |
Function extracting the annotation (set of colors) associated with a k-mer of a BFT.
bft_kmer | is a k-mer obtained via search or iteration over a BFT (via get_kmer() for example). |
uint32_t get_count_id_genomes | ( | BFT_annotation * | bft_annot, |
BFT * | bft | ||
) |
Function counting the number of genome identifiers in an annotation.
bft_annot | is an annotation. |
bft | is a BFT from which the annotation was extracted. |
Function getting a k-mer of a BFT with a flag.
bft_kmer | is a k-mer obtained via search/iteration over a BFT for which the function returns the flag. |
bft | is a BFT locked for vertices marking. |
Function searching for a k-mer in a BFT.
kmer | is an an ASCII encoded k-mer string (char*) to search for in the BFT. |
bft | is a BFT in which k-mer is searched |
uint32_t* get_list_id_genomes | ( | BFT_annotation * | bft_annot, |
BFT * | bft | ||
) |
Function extracting a list of genome identifiers from an annotation.
bft_annot | is an annotation from which the ids must be extracted. |
bft | is a BFT from which the annotation was extracted. |
Function extracting the neighbors of a k-mer.
bft_kmer | is a k-mer obtained via search/iteration over a BFT. |
bft | is a BFT from which was extracted bft_kmer |
Function extracting the predecessors of a k-mer.
bft_kmer | is a k-mer obtained via search/iteration over a BFT. |
bft | is a BFT from which was extracted bft_kmer |
Function extracting the successors of a k-mer.
bft_kmer | is a k-mer obtained via search/iteration over a BFT. |
bft | is a BFT from which was extracted bft_kmer |
void insert_genomes_from_files | ( | int | nb_files, |
char ** | paths, | ||
BFT * | bft, | ||
char * | prefix_bft_filename | ||
) |
Function inserting genomes (k-mer file) in a BFT.
nb_files | is the number of files to insert. |
paths | is an nb_files size array of strings (char*). Each string is the name of a file (+ eventually its path) to insert. |
bft | is a BFT where the genomes are inserted. |
prefix_bft_filename | is a prefix filename (including path) where temporary data can be written to. The prefix must be unique in its directory. |
void insert_kmers_last_genome | ( | int | nb_kmers, |
char ** | kmers, | ||
BFT * | bft | ||
) |
Function inserting k-mers of the last inserted genome in a BFT.
nb_kmers | is the number of k-mers to insert. |
kmers | is a pointer to an array of strings (char *) that are the k-mers to insert. The arrayis of length nb_kmers. |
bft | is a colored de Bruijn graph stored as a BFT. |
void insert_kmers_new_genome | ( | int | nb_kmers, |
char ** | kmers, | ||
char * | genome_name, | ||
BFT * | bft | ||
) |
Function inserting k-mers of a new genome in a BFT.
nb_kmers | is the number of k-mers to insert. |
kmers | is a pointer to an array of strings (char *) that are the k-mers to insert. The array is of length nb_kmers. |
genome_name | is the name of the new genome to which the inserted k-mers come from. |
bft | is a colored de Bruijn graph stored as a BFT. |
BFT_annotation* intersection_annotations | ( | BFT * | bft, |
uint32_t | nb_annotations, | ||
... | |||
) |
Function computing the intersection of a set of annotations.
bft | is a BFT from which the input annotations are originated. |
nb_annotations | indicates how many annotations must be included in the intersection. |
... | is a list of nb_annotations BFT_annotation pointers of which the intersection is computed. |
bool is_kmer_in_cdbg | ( | BFT_kmer * | bft_kmer | ) |
Function testing if a k-mer is in a BFT.
bft_kmer | is a k-mer obtained via search or iteration over a BFT (via get_kmer() for example). |
void iterate_over_kmers | ( | BFT * | bft, |
BFT_func_ptr | f, | ||
... | |||
) |
Function iterating over the k-mers of a BFT.
bft | is a BFT containing the k-mers to iterate over. |
f | is a pointer on function that will be called on each k-mer. If f returns 0, the calling function returns. |
... | are the additional arguments that must be transmitted to f. They can be extracted in f via its parameter of type va_list. |
BFT* load_BFT | ( | char * | filename_and_path | ) |
Function loading a BFT from disk.
filename_and_path | is the path and name of the file in which the BFT to load is be written. |
bool prefix_matching | ( | BFT * | bft, |
char * | prefix, | ||
BFT_func_ptr | f, | ||
... | |||
) |
Function for prefix matching over the k-mers of a BFT.
bft | is a BFT containing the k-mers to match. |
prefix | is string containing a prefix the k-mers must match. |
f | is a pointer on function that will be called on each k-mer matching the prefix. If f returns 0, the calling function returns. |
... | are the additional arguments that must be transmitted to f. They can be extracted in f via its parameter of type va_list. |
bool presence_genome | ( | uint32_t | id_genome, |
BFT_annotation * | bft_annot, | ||
BFT * | bft | ||
) |
Function testing if a k-mer occured in a genome.
id_genome | is the genome identifier. |
bft_annot | is the annotation of the k-mer to test the presence in genome. |
bft | is a BFT in which the k-mer is is stored. |
uint32_t* query_sequence | ( | BFT * | bft, |
char * | sequence, | ||
double | threshold, | ||
bool | canonical_search | ||
) |
Function querying a BFT for a sequence.
bft | is a BFT to be queried. |
sequence | is a string to query. |
threshold | is a float (0 < threshold <= 1) indicating the minimum percentage of k-mers from the queried sequence that must be present in a genome to have the queried sequence reported present in this genome. |
canonical_search | is a boolean indicating if the searched k-mers of the queried sequence must be canonical (lexicographically smaller one between a k-mer and its reverse-complement) or not. |
Function marking a k-mer of a BFT with a flag.
flag | is the mark to add to a k-mer. It can have value 0, 1, 2 or 3. |
bft_kmer | is a k-mer obtained via search/iteration over a BFT that must be marked. |
bft | is a BFT locked for vertices marking. |
void set_marking | ( | BFT * | bft | ) |
Function locking and preparing the graph for vertices marking (no insertion can happen before unlocking).
By default, all k-mers of the graph are initialized with a 0 flag value.
bft | is a BFT to lock and prepare for vertices marking. |
void set_neighbors_traversal | ( | BFT * | bft | ) |
Function locking the graph for traversal.
It is not necessary to lock the graph for traversal (no insertion can happen during the locking) but traversing a locked graph is faster than traversing an unlocked graph.
bft | is a BFT to lock for traversal. |
BFT_annotation* sym_difference_annotations | ( | BFT * | bft, |
uint32_t | nb_annotations, | ||
... | |||
) |
Function computing the symmetric difference of a set of annotations.
bft | is a BFT from which the input annotations are originated. |
nb_annotations | indicates how many annotations must be included in the symmetric difference. |
... | is a list of nb_annotations BFT_annotation pointers of which the symmetric difference is computed. |
BFT_annotation* union_annotations | ( | BFT * | bft, |
uint32_t | nb_annotations, | ||
... | |||
) |
Function computing the union of a set of annotations.
bft | is a BFT from which the input annotations are originated. |
nb_annotations | indicates how many annotations must be included in the union. |
... | is a list of nb_annotations BFT_annotation pointers of which the union is computed. |
void unset_marking | ( | BFT * | bft | ) |
Function unlocking and the graph locked for vertices marking.
bft | is a BFT locked for vertices marking. |
void unset_neighbors_traversal | ( | BFT * | bft | ) |
Function unlocking a locked graph for traversal.
bft | is a locked BFT for traversal that must be unlocked. |
void v_iterate_over_kmers | ( | BFT * | bft, |
BFT_func_ptr | f, | ||
va_list | args | ||
) |
Function iterating over the k-mers of a BFT.
This function should be used only when called from a function with a variable number of arguments. If not, you must use iterate_over_kmers().
bft | is a BFT containing the k-mers to iterate over. |
f | is a pointer on function that will be called on each k-mer. If f returns 0, the calling function returns. |
args | should contain all additional arguments to pass to f. They can be extracted in f via its parameter of type va_list. |
void write_BFT | ( | BFT * | bft, |
char * | filename, | ||
bool | compress_annotations | ||
) |
Function writing a BFT to disk.
bft | is the BFT to write on disk. |
filename | is the name of the file in which bft will be written. |
compress_annotations | is a boolean indicating if the annotations of the BFT must be compressed before writing to disk. |
Function writing an ASCII k-mer in a file.
This function is of type BFT_func_ptr and is intended to be a parameter of iterate_over_kmers() or v_iterate_over_kmers().
bft_kmer | is a k-mer to write to disk. |
bft | is a BFT from which bft_kmer was extracted. |
args | is a variable list of arguments. It contains a pointer to a file where to write bft_kmer. |
Function writing an 2 bits encoded k-mer in a file.
This function is of type BFT_func_ptr and is intended to be a parameter of iterate_over_kmers() or v_iterate_over_kmers().
bft_kmer | is a k-mer to write to disk. |
bft | is a BFT from which bft_kmer was extracted. |
args | is a variable list of arguments. It contains a pointer to a file where to write bft_kmer. |