By SR 10/17/13 6:02 PM

Reading the genome, two years of hard work

Una camada de linces de 2012.
Una camada de linces de 2012.

Lynxgenomics team opted for the application of new sequencing technologies to assemble the Iberian lynx genome. Sequencing is the key process of creating the genetic map of any living being, which is composed of sequences of letters (called bases) organized in a certain way. It is such a huge number of letters to put in order, about 3,000 million, and so complex the genome organisation that conventional analysis cannot confront it. The DNA information must be cut in manageable pieces and then rely on large computers and complex algorithms to analyse and put them in the right place.

Scientists face the challenge of assembling a large puzzle... or of recomposing a novel whose pages had been through a paper shredder. Great complexity and a high degree of difficulty are the words that define this task. To achieve this goal Lynxgenomics team decided to use the two benchmark technologies in the market, Illumina and Roche 454 and two complementary strategies.

The first one consists in generating millions of pairs of 100 base pairs reads separated by a predefined distance and randomly distributed throughout the genome. In this way, by knowing the distance separating two sentences in the novel we can start arranging sentences into larger paragraphs, even if the same sentence appears several times in the novel. The project has used distances of 500, 4000, 5200 and 40000 bases, what helps to deal with repeats of different sizes. The larger fragment reads were obtained from the end of cloned libraries produced by José Luis Garcia at CIB-CSIC in Madrid.

This approach, says Lynxgenomics coordinator, José Antonio Godoy, is very different from the previous one, because "in the first one the computer had a huge amount of little elements, pieces of phrases, that could go anywhere in our novel, whereas now they come from only 1,200 pages taken at random, that the complexity of the problem is reduced by orders of magnitude. At as first stage we would rebuild dozens of batches of 1,200 pages, and then try to reconstruct the order of pages in a second phase."

All of these operations have taken two years of hard work, that culminated by the end of 2012. With all this information at hand, there remains the difficult task of assembling the whole puzzle, a process that started at the beginning of 2013.

With the culmination of the genome assembly we will have a 'blank' map of genome, with no legends. The last phase will be to add the name of the streets, that is to say, to 'annotate' the genome adding captions showing the location of genes and other functional and structural elements within this complex genome map.

Available languages: Spanish
Powered by