Research
10/27/14 3:41 PM

Lynx genome project reaches a new milestone identifying 20.000 genes and faces it's final phase

José Antonio Godoy de la Estación Biológica de Doñana.

LynxGenomics, the Iberian Lynx species genome project, reached a new milestone with the identification of about 20.000 genes within the feline DNA, a similar number as other mammal species. This is the culmination of the annotation process, which leads to the final phase: the interpretation one, where scientist makes comparisons with another genomes, investigates the species history and draws the first conclusions.

LynxGenomics, the Iberian Lynx species genome project, reached a new milestone with the identification of about 20.000 genes within the feline DNA, a similar number as other mammal species. This is the culmination of the annotation process, which leads to the final phase: the interpretation one, where scientist makes comparisons with another genomes, investigates the species history and draws the first conclusions.

Q: You are now facing the 'last mile' in this project. How did it started?

José A. Godoy: It's being a long process. The first idea dates back to 2008 and, maybe, it was a crazy idea by then, because by that time scientist started talking about new methods to sequence whole genome but not a single genome had been sequence relying on these new technologies... so it was kind of daring to propose to sequence a genome from scratch, and most to do so with an species like Iberian lynx, which has no economic interest by itself. We didn't get funding.

Q: Did it take you a lot of time to get funding?

José A. Godoy: Yes, in fact it was hard to fit a proposal combining genomics and conservation in a single call. So we focussed on trying to get private funding. We didn't succeed until Fundación General CSIC created a call specifically targeting endangered species with a focus on the use of new technologies, so our project fitted perfectly.  We got funds in 2010, and that was how the project started.  We got limited funds but we thought that it was possible, although we had still to prove it, to get a reliable and high quality draft genome.

Q: Did you reach your goal?

José A. Godoy: Well, it has been a long process, but after three years we came up with a first draft, the first assembly of the Iberian lynx genome based exclusively on new technologies and with a very limited budget. It was a challenge, it took us several years of hard work, trying different approaches to improve the quality of the draft. But finally we are satisfied with the outcome; it can be considered a high quality draft.

This is an entirely Spanish project, which is a peculiar thing in a genome research

We designed the project to be performed within Spain. There were a few teams that had already participated in the sequencing and annotation of the human genome and had good background and wide expertise in bioinformatics. We relied on CRG Institute, CNAG, CIB, and the EBD. Later we have been able to attract other groups and the consortium has expanded.

Q:With the team assembled, you needed to pick a lynx to start the project. How did you do it?

José A. Godoy:We used several criteria to select the lynx. One was that it has to be a male, it was a brave option. Other projects that preferentially used females, because they have two X chromosomes, while males have X and Y. So having an extra chromosome makes the assembly of the sexual chromosomes especially complicated, but we opted for including this extra information.

In addition, it has to be an individual with a medium range of inbreeding ... the more inbred the easier the assembly would be, but we opted for the middle rate to represent more genetic variation within one individual.

The other thing that we took into account is that is better to have an animal already available for extra sampling and for extra information (about behaviour, illness, ect.). That's why we opted for a captive individual.

Candiles was the one that satisfied these criteria (a six year old male from the Captive Program in Jaén), so he was selected to provide the reference genome.

But from the start one of the main aspects of the project was to describe the genetic variation within the species and one single individual is not enough to provide a good picture of the variation in the species.

So we proposed to sequence 10 additional individuals and those were selected based on the same criteria as Candiles, males and preferentially captive individuals. It was important that those 10 lynxes represented the two remaining lynx populations in order to obtain a global perspective of the variation of the species, because we know that they are genetically different.

And additionally we sequenced one Eurasian lynx, the sister species of the Iberian lynx, to investigate the relationship between both species, their origin and why they became different. Besides, it will provide a reference point to see what is unique in the Iberian lynx genome.

We used blood samples to obtain the required DNA. The CNAG assumed all the responsibility of generating data and we took advantage of the new sequencing technologies, machines with huge capabilities ... so the next step was producing the raw data. That was the easy part, taking the animal, getting the sample, and sequencing its DNA, producing the raw data.

Q: What was the most challenging part?

José A. Godoy:The most challenging by far was trying to assemble that information. New technologies are cheap, they produce a lot of data but they come in a form that is not very convenient: in short stretches, called reads, with only one hundred bases each. We generated millions and millions of reads coming from Candiles. And that leaves you the problem to try to assemble those reads, arranging them in order to reconstruct the contiguous sequence along the genome.

Q: Is the lynx genome very big?

José A. Godoy: As big as the human. We are talking about 3000 million bases (letters) in the genome and you only have stretches with only one hundred letters, so put them in order it's a real challenge.

It took us a couple of years or more to generate the first draft of the lynx genome with the quality we demanded. In the process we produced more than 20 versions of the genome, but we were not satisfied until version 23, that met our quality standards. It required a lot of time, a lot of effort and a lot of imagination. The people at CNAG involved in the assembly did a very nice work.

Q: How did they do it?

José A. Godoy: In the process assembling the genome we came up with novel ways of approaching that problem. We had short sequences and we required the use of various libraries, which are collections of DNA fragments of different sizes, to guide us to fit the puzzle. That approach was used by other ongoing projects at that time. We used that but we didn't have the budget to use many different libraries sizes, so we opted for a different approach that proved convenient and cost-effective.

The most useful approach for the Iberian lynx was the use of collections of clones –fragments of DNA preserved on a bacteria-- to reduce the complexity of the task. We sequenced separately different pools containing many of those fragments, simplifying the task to the assembly of each one of those pools, instead of targeting the whole genome at once.

Another novelty that proved also very useful was to use RNA sequence to assembly the genome. We took advantage of the fact that the genes within the genome are interrupted... I mean, coding sequences -–exons-- are interrupted by non-coding sequences --introns-- and we can estimate the distance that separate two contiguous coding sequences, i.e. the corresponding intron length. We used the length observed in cats, after proving that it was highly correlated with that observed in lynx.

Q: So, once you finished this task, you have the genome

José A. Godoy: Yes, you have the genome, but it's useless, I have to say... What could you do with an encyclopedia of 3000 million letters with no key to what that means? The next step was to put flags on that text pointing to where the important information is. Basically we try to locate genes, the units that allows the synthesis of proteins or RNA.

Genes occupy only the 30% of the genome, and the coding sequences, those that code for the proteins, are even smaller, only 1-2% of the genome. So finding where the genes are was also a challenge, and the CNAG and the CRG teams did a great job.

Q: How many genes did you find within the lynx genome?

José A. Godoy: The number of genes they came out with was around 20.000.  This number is similar to that of other mammalian genomes and very close to the number of genes in the human genome. It should not be taken like a definitive number because annotation is a kind of dynamic process: it can be refined and refined as we learn about genes.

Apart from genes there are other functional structural elements within the genome that have also been annotated: we know were transposable elements and different families of repeats are. RNAs that do not code for proteins have been specific targets for annotation as well... so we have now a reasonably complete picture of the lynx genome. We have a whole enciclopedia, but we have chapters and some annotations there that help us to make some sense out of this incredible amount of text.

The next step is to try to identify the genes and to find out what function they are  involved in. For this task we to use information that is available from other deeply studied species like the human. That's the third phase, that is called functional annotation.

This is an important milestone because now we have provided a high quality reference genome, that is basically an incredible resource to learn about the species or to help to conserve it.

Q: What do you want to learn about the Iberian lynx?

José A. Godoy:We have generated several resources, one is the reference genome, we also have gene expression data because we have sequenced RNA, we have variation data because we have sequenced with a lower depth 10 additional lynxes... now the teams are extracting different information from these data, regarding issues like the evolution of the species, its unique adaptations, the impact of the decline on its genetic variation and the reconstruction of the historical demography of the lynx.

The comparative genomics is an important research area. How is the lynx genome compared to other species, but more important, what's unique on it, which are the genes that have evolved in a different way. Those genes must be related to adaptations. For example, by comparing to the Eurasian lynx, that lives in a different climate, maybe we could find the genes that needed to change for the species to become adapted to the warmer Peninsula's environment.

Another goal is to analyse genetic variation to assess the impact of the decline that the species went through in the last century. Besides, it is an open question whether the lynx has been always a low size species, or even if the lynx has gone through previous bottlenecks in its recent or deep past. We are most concerned about the amount of genetic variation lost and the inbreeding accumulated in the species. The capability to adapt to future environmental challenges heavily depends on the amount of genetic diversity remaining.

Available languages: Spanish
Lynxexitu
 
Powered by