Genomi su sve češće korišteni podaci u proučavanju biologije i evolucije organizama, a broj dostupnih genoma u Banci gena se u posljednih nekoliko godina udvostručio kao posljedica razvoja tehnologije sekvenciranja. Razvoj ovih tehnologija utjecao je na populariziranje područja genomike i to prvenstveno zbog značajnog pada cijene sekvenciranja.
Rekonstrukcija genoma provodi se u tri koraka: sekvenciranje, sastavljanje i anotacija, a za svaki korak postoji više različitih pristupa. Sastavljanje genoma je računalno i vremenski nazahtjevniji korak te je jedan od glavnih fokusa istraživanja u području genomike. Tri su trenutno dostupne metode za sastavljanje genoma (mapiranje, de novo i hibridna metoda), a odabir metode ovisi o nekoliko glavnih parametara koji uključuju: vrstu organizma koji se proučava, pokrivenost genomskih podataka, dostupnost referentne sekvence, broj uzoraka, dostupnost računalnog servera za provođenje analiza i sl. Prema tome, svaki genomski projekt je jedinstven i teško je odabrati samo jednu metodu koja će dati najbolje rezultate, pogotovo kada se proučavaju nemodelne vrste.
Divokoza (Rupicapra spp.) je zbog svoje rasprostranjenosti i predložene sistematike dobar model za proučavanje utjecaja povijesnih i evolucijskih događaja. U ovoj se disertaciji koristilo nekoliko metoda za sastavljanje i anotaciju mitohondrijskih i jezgrinih genoma divokoze, a dobiveni su se rezultati usporedili. Na temelju usporedbi rezultata metoda za sastavljanje mitohondrijske i jezgrine DNA, procijenile su se pogodnosti različitih metoda za sastavljanje i anotaciju genoma, uspoređen je utjecaj korištenja osam genoma divokozi srodnih vrsta kao referenci u metodi mapiranja te su se rekonstruirali filogenetski odnosi s ciljem boljeg razumijevanja povezanosti taksonomskih jedinica roda Caprini i vrste Rupicapra. Uz navedeno, testirana je točnost novosastavljenih genoma divokoze usporedbom izoliranih fragmenata introna s intronskim sekvencama divokoza dostupnih u Banci gena.
Rezultati ovog istraživanja pridonijet će boljem poznavanju raznolikosti i evolucije genoma divokoze, razjašnjavanju taksonomskih odnosa podvrsta, a sastavljeni genomi pružit će dobru referentnu osnovu za buduće populacijske i genomske analize divokoze i njenih srodnika.
|Abstract (english)|| |
The genome is a collection of all biological information necessary for the functioning of an organism, and in humans and animals consists of the mitochondrial genome (mtDNA) and the nuclear genome (nDNA). With the development of genomic technologies, the genome data are increasingly being used to study the biology and evolution of organisms, which is confirmed by the fact that the number of available genomes in the Gene bank has doubled in recent years. In addition, the development of these technologies has influenced the popularization of the field of genomics, mainly due to the significant decrease in the cost of sequencing. Genomic analysis can be used: to identify genes responsible for inherited diseases or adaptations to the environment, to study structural changes, to identify common conserved sites, to find genes specific to a group of organisms, etc. To perform any of these analyzes, the genome of the species under study must be reconstructed.
Reconstruction of a genome involves three steps: sequencing, assembly, and annotation. There are different approaches for each step, and the choice of method depends primarily on whether the reconstruction is of the mitochondrial or nuclear genome. Genome assembly is a computationally intensive and time-consuming step, and there are currently three available methods for genome assembly (mapping, de novo, and hybrid methods). The choice of method depends on several key parameters that include: the type of organism studied, the coverage of genomic data, the availability of reference sequences, the number of samples, the availability of a computer server for analysis, etc. Therefore, each genome project is unique and it is difficult to determine the method that will be most successful, especially when studying non-model species.
The chamois (Rupicapra spp.) is a good model for studying the effects of historical and evolutionary events because of its wide distribution and proposed systematics. In this dissertation, 12 completely sequenced chamois samples were used and their genome reconstruction was performed using different methods for assembling and annotating mtDNA and nDNA. Mapping and de novo methods were used for mtDNA assembly, while GeSeq and MITOS annotators were used for annotation. All sequences obtained with both assembly methods were compared and validated with the web application BLAST. In this process, each sequence was compared to the chamois mtDNA reference sequences. For nDNA assembly, eight available genomes of closely related species were used as references for the mapping method, followed by SNP calling procedure. Based on the filtered SNPs and references, 56 combinations were identified (newly assembled chamois genomes) that were validated and annotated using BUSCO tools. Then, three smaller sets of genes were defined from the common set of annotated genes, based on which distance matrices were calculated and the relationships were visualized using the multidimensional scaling (MDS) method. Newly assembled genomes generated by mapping chamois samples against the domestic goat genome were used to verify structure by comparing them with the chamois genome sequences (23 sets of introns) available in Gene bank. Phylogenetic analyses of mtDNA (maximum likelihood and Bayesian methods) were performed using a dataset containing 10 mtDNA sequences from this dissertation, 5 chamois mtDNA sequences from Gene bank, and two related sequences as outgroup. Phylogenetic analyses (maximum likelihood and Bayesian methods) of the genus Caprinae were performed on a data set containing 40 sequences of the genus Caprinae and 5 sequences of Bovidae as outgroup. The final phylogenetic analysis of chamois was performed using the program BEAST on a common
alignment of an intron data set consisting of 21 chamois sequences and three sequences representing an outgroup.
After performing two methods for reconstructing complete mtDNA sequences (mapping and de novo) and comparing the obtained mtDNA sequences, it was clearly determined that both methods were suitable for reconstruction. The de novo method proved to be the better choice because of its speed and simpler procedure. In addition, the de novo methods successfully isolated complete mtDNA sequences from samples that had failed quality control (Gams53, Gams85, OSIL-06). For this reason, all mtDNA analyses were performed on a larger number of samples. In other words, if only the mapping method had been used, these three samples could not have been included in the further analysis.
The mtDNA annotation tools MITOS and GeSEQ gave very similar results for all 10 samples, and variations in START and STOP codons were detected in four genes (ND1, ND2, ND3, ND5). The variations found refer to two or three bases found in the START and STOP codons. Their occurrence can be interpreted as a consequence of the different algorithms used by the annotators in the analyzes and as a consequence of the larger variation in these genes.
Phylogenetic analyses (maximum likelihood and Bayesian methods) performed on ten mtDNA sequences from chamois reconstructed in this dissertation, in combination with five mtDNA sequences from Gene bank, yielded identical phylogenetic trees for the genus Rupicapra. These results confirmed previous research on chamois in which they were divided into three mtDNA clusters (W, C, and E). Performed phylogenetic analyses (maximum likelihood and Bayesian method) on 40 mtDNA sequences of the genus Caprinae (including the four sequences obtained in this dissertation) and five sequences of the genus Bovidae also revealed the same topology as presented in previous studies.
From the comparison of the newly assembled chamois genomes with the genomes of related species, it was concluded that almost all of the related genomes used can serve as good references. Although all the species used were non-model species, the best results were obtained with the genomes of domestic goats and domestic sheep, which was to be expected since these species are extremely important species in agriculture and are often the focus of research. The number of genes found for most combinations of chamois and related species was very high, confirming that these genomes can be used for mapping processes. However, during the mapping processes it was found that some of the genomes used were of low quality, while some genomes were found to have irregularities in the information available in the Gene bank. This confirms once again that not all available genomes are of good quality. In other words, any sequence available in the Gene bank should be verified before use.
From the similarity analyses, it can be concluded that the relationships between all combinations depend primarily on the gene or genome fragments used for these analyses. Although the number of polymorphisms found had a greater impact on the results when single gene fragments were used, this number was negligible when longer portions of the genome (100 and 500 genes) were used, with differences between samples of approximately 1 %. In other words, larger distances were calculated between combinations from shorter alignments. The results of the MDS for a set of 100 and 500 genes clearly showed that samples from chamois samples mapped to different references were more similar to each other, while still exhibiting some differences in amino acid composition. Smaller differences between samples were found for combinations with domestic sheep and American mountain goat (about 1 % and 000,5 %, respectively).
The comparisons of the intron regions of the newly assembled chamois genomes with the introns available in the Gene bank suggest that the intron sequences obtained from the newly assembled genomes are of satisfactory quality and have been grouped with other chamois samples at the species and subspecies level.
The results of this research will contribute to a better knowledge of the diversity and evolution of the chamois genome, elucidate the taxonomic relationships among subspecies, and assembled genomes will provide a good reference base for future population and genome analyses of chamois and its relatives.