For MSR-CA, our recipes were as follows. Running the assembler simply amounts to specifying the locations of input files for various input data types, such as short paired end Illumina reads and jump library mate pairs, in the configuration file. (A brief manual for MSR-CA assembler is available here). Running runSRCA.pl from the assembler bin directory, with the configuration file specified as the only command line parameter, produces the assemble.sh script which contains a complete set of commands to run the assembly. The core assembly engine for MSR-CA is Celera (CABOG) Assembler. The CABOG version runs under the CA folder in the assembly directory. The final products of the assembly such as contig and scaffold fasta files along with additional assembly information can be found under CA/9-terminator/.
We specified the following parameters in the config file:CA_PARAMETERS= cgwErrorRate=0.25 merylMemory=8192 ovlMemory=4GB EXTEND_JUMP_READS=1The EXTEND_JUMP_READS parameter triggered an additional step of extending the jump library reads on the 3’ ends to make them compatible with CA. Originally jump library reads were 37 bases long, whereas CA requires reads to be at least 64 bases long. We also used only part of the jump library reads – the first 400,000 reads.
We specified the following parameters to CA in the config file:CA_PARAMETERS= ovlMerSize=30 cgwErrorRate=0.25 merylMemory=8192 ovlMemory=4GBWe kept all other parameters at their default values. We also used only a part of the jump library reads -- the first 400,000 reads, because CA is not designed to handle data sets with over 100x clone coverage.
Human Chromosome 14:
We specified the following parameters in the config file:CA_PARAMETERS= cgwErrorRate=0.25 merylMemory=8192 ovlMemory=4GB utgErrorRate=0.03We also manually reverse complemented the long jump library before the assembly because MSR-CA assembler assumes that the jump libraryes are “outties”, that is the 3’ ends of the mated reads are on the fragment ends. We used all reads.
We specified the following parameters in the config file:CA_PARAMETERS= cgwErrorRate=0.15 merylMemory=8192 ovlMemory=4GB. We used all reads and kept the parameters at their default values.