Golden is a suite of tools developped at the institut Pasteur by Nicolas Joly a few years ago.
It contains 2 tools:
goldin : for indexing data banks,
golden : for retrieving information.
New version of Golden dedicated to optimizing IO.
The golden tool is used to process blast results.
The golden tool could only search for 1 element in 1 databank.
The problem :¶
Need to look for more and more elements in different databanks (100 000 is a common number of bank:AC requests that golden has to process).
This causes high IO traffic on the cluster slowing other applications...
Higher network throughput only solves the problem temporarily since databanks always contain more and more information and thus more and more requests will have to be processes.
Give the user the possibility to pass several requests as arguments in one golden call. Requests are processes databank by dadabank avoiding multiple useless open/close of index files. Results are still given to the user in the order in which the AC/locus were requested.
Where can I get the new distrib?¶
It will be available in git and installed on the cluster as soon as possible. A .tar.zip file containing the distrib is available in attachment.
New version of goldin for dadabanks indexation. Work in progress.
The goldin tool is used for databanks indexation.
It is used by biomaj.
It is a "sequential" tool and its code is "monolithic".
It doesn't support the FASTA format.
The problem :¶
As databanks are getting bigger and bigger, their indexation takes more and more time.
Since biomaj deployment, an up-to-date version of each bank must be provided to the end user every week.
Some databanks are so big that their indexation takes more than 1 week (ex: wgs).
Parallelize several steps of the indexation.
Make goldin a modular tool so that the indexation steps can be performed individually and parallelized.
Please, keep in mind that this is still work in progress.