Dongqing Yang1, Christian P. Kubicek2, Youzhi Miao1, Igor Grigoriev3, Alexey Kopchinskiy2, Jian Zhang1, Lea Atanasova2, Ruifu Zhang1, Qirong Shen1, Irina S. Druzhinina2
1Jiangsu Key Lab for Organic Waste Utilization and National Engineering Research Center for Organic-based Fertilizers, Nanjing Agricultural University, Nanjing, China
2Microbiology Group, Research Area Biotechnology and Microbiology, Institute of Chemical Engineering, Vienna University of Technology, Vienna, Austria
3JGI, Walnut Creek, CA, USA
email: dongqingyang7+gmail.com , miaoyouzhi+gmail.com, irina.druzhinina+tuwien.ac.at
trichoCODE is the modular genome annotation pipeline that includes a high quality training set for ab initio gene calling in Trichoderma. It is based on the fungal genome annotation protocol of the Broad Institute. It includes four stages: (i) preparation of the training set; (ii) training and prediction; (iii) combination and fusion of multiple outputs; (iv) updating of gene structures with UTR (if EST/RNA-Seq data provided). The novelty of the pipeline comes from the fact that it is semi-automated, i.e. it allows the user to set up the best possible training set and to evaluate the annotation at several checkpoints.
The analysis of the standard Trichoderma genomes made using JGI, MAKER2 and trichoCODE revealed that JGI’s pipeline gave the most possible gene models while MAKER2’s were very conservative and trichoCODE’s stay in the middle. Nevertheless, trichoCODE allowed the calling of 607 new gene models in T. reesei, 506 for T. virens and 468 in T. atroviride, respectively.