Best hits of 11110110111 : model-free selection and parameter-free sensitivity calculation of spaced seeds

This webpage provides better and extra spaced seeds: for the main datasets and scripts used in the paper, please see the original index.html before.

These results have been obtained with help of the computing power of the teaching computers of the FIL, they would probably never been obtained without the strong support provided by Mickael Carlier on BOINC.

6.bis Experiments

(a) Comparative analysis of Table 3

Before running the full parameter-free analysis (see part b), we performed, during one week, an optimization of the results provided in the table 3 of the original article, itself extracted from the rasbhari article, to test the computing capacities of the FIL BOINC cluster.

The cluster consists of about 120 to 373 (in peak) running computers (see the BOINC CPU stats). The iedera tool was run with help of the BOINC wrapper, together with an additional script layer and an add-hoc job creator/results merger adapted, usually during day hours, on already started but unused computers (jobs are stopped when a user logs in).

In this part, note that iedera performed numerical (and not symbolical) evaluations to speed up the search.

No animals (penguins, ants, ...) were harmed during this optimisation process ☺.

wpSpEEDAcoSeeDFastHCMuteHCrasbharicurrent sensitivity
SHRiMP2: 4 patterns (ℓ=50)
0.8094.986195.037 94.945395.019495.038695.0586
160.8584.821284.982984.655884.876484.969 85.0422
180.8573.166473.27 72.9558 73.220973.4258
0.9093.712093.777893.6030 93.78 93.8452
0.9599.750099.759999.7399 99.755799.7664
PatternHunterII: 16 patterns (ℓ=64)
110.7093.2526 93.0585 93.465393.4105
0.7598.6882 98.6352 98.757398.7520
0.8099.8820 99.8750 99.890799.8876
BFAST: 10 patterns (ℓ=50)
220.8560.8127 60.0943 60.991961.0746
0.9088.5969 88.0426 88.800588.9067
0.9599.3659 99.2923 99.409999.4413

(b) Full parameter-free analysis

Please refer first to the original 6. Experiments part from main webpage and the Experiments section of the article for more details. The results are summarized on these set of plots (pdf [sep 8, 2017] or html [sep 8, 2017]). Apart form the fact that tools were run on a cluster, the main difference with the previous 6. Experiments is on the parameters used :

  1. Seeds of weight w in [3..16] or in [18,20,22,24], span sw + 20, with a number of seed patterns n in [1..4], have been only locally optimized by the hill-climbing process of iedera,
  2. Alignment lengths in [w+1 .. 64] have also been fully enumerated.

See the final set of Pareto plots (pdf [sep 8, 2017] or html [sep 8, 2017]), data in txt format (tar.gz [sep 8, 2017]).

Tools :

Data & Results :