Sex identification in rainbow trout using genomic information and machine learning
Kudinov, Andrei A.; Kause, Antti (2024)
Kudinov, Andrei A.
Kause, Antti
Julkaisusarja
Genetics selection evolution
Volyymi
56
Numero
1
Sivut
8 p.
BioMed Central
2024
How to cite: Kudinov, A.A., Kause, A. Sex identification in rainbow trout using genomic information and machine learning. Genet Sel Evol 56, 79 (2024). https://doi.org/10.1186/s12711-024-00944-0
Julkaisun pysyvä osoite on
http://urn.fi/URN:NBN:fi-fe202501102371
http://urn.fi/URN:NBN:fi-fe202501102371
Tiivistelmä
Sex identification in farmed fish is important for the management of fish stocks and breeding programs, but identification based on visual characteristics is typically difficult or impossible in juvenile or premature fish. The amount of genomic data obtained from farmed fish is rapidly growing with the implementation of genomic selection in aquaculture. In comparison to mammals and birds, ray-finned fishes exhibit a greater diversity of sex determination systems, with an absence of conserved genomic regions. A group of genomic markers located on a standard genotyping array has been reported to potentially be linked with sex determination in rainbow trout. However, the set of markers suitable for sex identification may vary between populations. Sex identification from genomic data is usually performed using probabilistic methods, where suitable markers are known beforehand. In our study, we demonstrated the use of the Extreme Gradient Boosting approach from the supervised machine learning gradient boost framework to predict sex from unimputed genomic data, when the suitability of the markers was unknown a priori. The accuracy of the method was assessed using four simulated datasets with different genotyping error rates and one real dataset from the Finnish Rainbow Trout Breeding Program. The method showed high prediction quality on both simulated and real datasets. For simulated datasets with low (5%) and high (50%) genotyping error rates, the accuracies were 1.0 and 0.60, respectively. In the real data, the method achieved a prediction accuracy of 98%, which is suitable for routine use.
Collections
- Julkaisut [86773]