MTT is publishing its research findings in two series of publications: MTT Science and MTT Growth. The MTT Science series includes scientific presentations and abstracts from conferences arranged by MTT Agrifood Research Finland. Doctoral dissertations by MTT research scientists will also be published in this series. The topics range from agricultural and food research to environmental research in the field of agriculture. MTT, FI-31600 Jokioinen, Finland. email julkaisut@mtt.fi 30 Variance component estimation exploiting Monte Carlo methods and linearization with complex models and large data in animal breeding Doctoral Dissertation Kaarina Matilainen MTT CREATES VITALITY THROUGH SCIENCE www.mtt.fi/julkaisut Variance component estimation exploiting Monte Carlo methods and linearization with complex models and large data in animal breeding Doctoral Dissertation Kaarina Matilainen Academic dissertation To be presented, with the permission of the Faculty of Agriculture and Forestry of the University of Helsinki, for public examination in Auditorium XIV, Main building, Fabianinkatu 33, on December 5th 2014, at 12 noon. ISBN 978-952-487-568-4 (Print) ISBN 978-952-487-569-1 (Electronic) ISSN 1798-1824 (Printed version) ISSN 1798-1840 (Electronic version) URN http://urn.fi/URN:ISBN:978-952-487-569-1 www.mtt.fi/mtttiede/pdf/mtttiede30.pdf Copyright MTT Agrifood Research Finland Kaarina Matilainen Distribution and sale MTT Agrifood Research Finland, Media and Information Services, FI-31600 Jokioinen, e-mail julkaisut@mtt.fi Printing year 2014 Cover image Jaakko Suvala/Rodeo / MTT archive, graph/Kaarina Matilainen Printing house Tampereen Yliopistopaino Juvenes Print Oy Custos: Professor Asko Mäki-Tanila, Department of Agricul- tural Sciences, P.O. Box 28, University of Helsinki, Finland Supervisors: Docent Ismo Strandén, MTT Agrifood Research Finland, Biotechnology and Food Research, Humppilantie 7, 31600 Jokioinen, Finland Professor Esa A. Mäntysaari, MTT Agrifood Re- search Finland, Biotechnology and Food Research, Humppilantie 7, 31600 Jokioinen, Finland Reviewers: Professor Just Jensen, Aarhus University, Depart- ment of Molecular Biology and Genetics, Blichers Allé 20, 8830 Tjele, Denmark Dr Bjørg Heringstad, Norwegian University of Life Sciences, Department of Animal and Aquacultural Sciences, P.O. Box 5003, 1432 Ås, Norway Opponent: Dr John Hickey, The Roslin Institute, The Univer- sity of Edinburgh, Easter Bush, Midlothian, EH25 9RG, UK MTT SCIENCE 30 3 Variance component estimation exploiting Monte Carlo methods and linearization with complex models and large data in animal breeding Kaarina Matilainen MTT Agrifood Research Finland, Biotechnology and Food Research, Humppilantie 7, FI-31600 Jokioinen, Finland, kaarina.matilainen@mtt.fi Abstract Inference for the variance components in linear mixed effects models is theoretically well un- derstood. Many methods have also been pre- sented for nonlinear models. Genetic evalua- tions in animal breeding are however character- ized by the enormous size of the models and data. This means that the methods in estima- tion have to be computationally efficient. The purpose of this study was to find efficient methods for the estimation of the variance components for large data sets and complex mixed effects models in animal breeding. The focus of the study was, first, on the re- stricted maximum likelihood (REML) estima- tion applied to a linearized model of nonlinear mixed effects model and, second, on the REML estimation of large linear mixed effects models by the Monte Carlo (MC) method. Performance of the methods were mostly stud- ied using simulated data sets, but the feasibility of the MC based expectation maximization (EM) REML was also studied using dairy cattle field data. The analyses of a data set mimicking pig live weights showed that linearization works mod- erately well when the data is good, but estima- tion of parameters related to adult weight be- comes unstable when weight observations from the right tail of the animals’ growth curve were missing. However, the simulation study showed that having even a small proportion of animals with adult weights improved the results when compared to the estimates based on observations from prematurely slaughtered animals only. The MC based EM REML method converged to the same solutions as the analytical EM REML, and a small number of MC samples did not introduce systematic bias to the esti- mates of genetic parameters in the analysis of simulated dairy cattle data set. Furthermore, analyses of field data proved the MC EM REML to be superior to the analytical EM REML both in computing time and in the memory needed. Compared to MC EM REML, the MC Newton-type methods con- verged faster, but sampling variation of the estimates increased. Sampling variation differed somewhat also between the Newton-type methods. Developing a fast algorithm for MC based REML estimation requires a convergence crite- rion that is robust for sampling variation. A stopping rule that can be calculated during the analysis was introduced. The applied conver- gence criterion monitored the progress of con- vergence and was only a little influenced by MC noise. It also worked reasonably well with small number of MC samples, which is a prop- erty that may be useful for analyzing large scale and complex models. Keywords: mixed effects model, variance com- ponents, linearization, Monte Carlo, REML, EM- algorithm, Newton’s method 4 MTT SCIENCE 30 Monte Carlo -menetelmät ja linearisointi varianssikomponenttien arvioinnissa analysoitaessa suuria aineistoja vaativilla jalostusarvostelumalleilla Kaarina Matilainen MTT, Biotekniikka- ja elintarviketutkimus, Humppilantie 7, 31600 Jokioinen, kaarina.matilainen@mtt.fi Tiivistelmä Keskeinen osa eläinjalostusta on geneettisistä tekijöistä johtuvan vaihtelun erottaminen ha- vaintojen kokonaisvaihtelusta. Geneettistä vaih- telua arvioidaan tilastotieteestä tutuilla sekamal- leilla. Sekamallien sisältämien varianssikompo- nenttien arviointiin liittyvää teoriaa on tutkittu paljon lineaarisilla sekamalleilla. Myös epälineaa- risille malleille on esitetty monia arviointimene- telmiä. Koska jalostusarvosteluissa käytettävät aineistot ja mallit ovat usein suuria, arviointime- netelmien yksi tärkeä ominaisuus on laskennalli- nen tehokkuus. Tämän tutkimuksen tarkoituksena oli löytää tehokas menetelmä varianssikomponenttien arviointiin analysoitaessa eläinjalostuksessa käy- tettäviä suuria aineistoja monimutkaisilla seka- malleilla. Erityisesti tutkimus keskittyi 1) epä- lineaaristen sekamallien varianssikomponenttien arviointiin mallin linearisoinnilla ja REML- tyyppisellä menetelmällä ja 2) Monte Carlo (MC) –menetelmän hyödyntämiseen sekä EM (expectation maximization) että Newtonin tyyppisissä REML-analyyseissä lineaarisille se- kamalleille. Menetelmiä tutkittiin simuloiduilla aineistoilla. MC-menetelmää hyödyntävän EM REML:n soveltuvuutta testattiin myös todellisel- la lypsykarja-aineistolla. Linearisointi toimi kohtuullisesti simuloidulla aineistolla, joka kuvasi eläinten painon kehitystä syntymästä aikuispainoon asti. Aikuispainoon liittyvien varianssikomponenttien arvioiden luotettavuus heikkeni, kun aineisto sisälsi ha- vaintoja ainoastaan aikuispainoa edeltävältä ajalta. Pienikin aikuispainohavaintojen lisäys kuitenkin paransi luotettavuutta. MC-menetelmää käyttävä EM REML konver- goi samoihin varianssikomponenttien arvioihin kuin analyyttinen menetelmä, eikä pieni MC- otosten määrä näkynyt systemaattisena harhana arvioissa. Todellisen aineiston analyysit osoittivat MC EM REML -menetelmän olevan parempi sekä laskenta-ajaltaan että tietokoneen muistitar- peeltaan kuin analyyttinen EM REML -menetelmä. MC-menetelmän soveltaminen Newtonin tyyppisiin menetelmiin sai REML- ratkaisut konvergoimaan nopeammin kuin MC EM REML:llä, mutta arvioiden otosvaihtelu oli suurempaa. Otosvaihtelun suuruus vaihteli käytetyn Newtonin menetelmän mukaan. Jokaisessa MC REML -analyysissä tarkasteltiin myös uutta konvergenssikriteeriä. Uudella kri- teerillä pystyttiin vähentämään MC- menetelmästä johtuvaa vaihtelua konvergenssin seurannassa. Kriteeri toimi melko hyvin myös pienillä MC-otosmäärillä, mikä on toivottava ominaisuus analysoitaessa suuria aineistoja ja monimutkaisia malleja. Avainsanat: sekamalli, varianssikomponent- ti, linearisointi, Monte Carlo, REML, EM- algoritmi, Newtonin menetelmä MTT SCIENCE 30 5 Acknowledgements I wish to express my sincere gratitude to my supervisors Ismo Strandén and Esa Mänty- saari for introducing me into the research field of animal breeding and their expert guidance in this work. Also I wish to express gratitude to co-authors Martin Lidauer, Robin Thompson and Marja-Liisa Sevón- Aimonen for their fruitful collaboration. You contributed to a pleasant working team and I am most deeply grateful for your sup- port and encouragement during my work. The research work for this thesis has been done at MTT Agrifood Research Finland and financial support was provided by Rothamsted Research. Data was provided by Faba, Hollola, Finland. These organiza- tions are greatly acknowledged. Thanks to Pekka Uimari for always helping with questions related to my studies. You kept the wheels in motion at the final crunch. Further, my thanks go to thesis reviewers, Just Jensen and Bjørg Heringstad, for valuable suggestions. Thanks also go to my custos Asko Mäki-Tanila. I warmly thank Kirsi Muuttoranta for ad- vice in so many official and unofficial mat- ters that came up against. My gratitude goes also to current and former colleagues at the biometrical genetics team. In addition to all the advice and assistant I have got, I wish to thank you for a fun-filled moments and shared sport hours. Thanks to all of you. Keep going! I would like to express my deepest apprecia- tion to my family for providing the re- quirements for everyday life. To my parents, my sister and brother and their families, for listening if needed but not asking too much about my work; to my mother- , father- and sister-in-law not least for being babysitters on so many nights during the last months; But foremost to my sons Esko and Niilo for their patience when I have been an absent and absent-minded mother and my hus- band Jukka for his patience, love and sup- port. Jokioinen, October 2014 Kaarina Matilainen 6 MTT SCIENCE 30 List of original publications This thesis is based on the following publications: I Vuori K., Strandén I., Sevón-Aimonen M.-L., Mäntysaari E.A. (2006). Estimation of non-linear growth models by linearization: a simulation study using a Gompertz function. Genetics, Selection, Evolution, 38, 343-358. doi:10.1051/gse:2006008 II Matilainen K., Mäntysaari E.A., Lidauer M.H., Strandén I., Thompson R. (2012). Employing a Monte Carlo algorithm in expectation maximization restricted maxi- mum likelihood estimation of the linear mixed model. Journal of Animal Breeding and Genetics, 129, 457-468. doi:10.1111/j.1439-0388.2012.01000.x III Matilainen K., Mäntysaari E.A., Lidauer M.H., Strandén I., Thompson R. (2013). Employing a Monte Carlo algorithm in Newton-type methods for restricted maxi- mum likelihood estimation of genetic parameters. PLoS ONE, 8, e80821. doi:10.1371/journal.pone.0080821 The publications are referred to in the text by their Roman numerals. Reprints of the origi- nal articles are published with the kind permission of BioMed Central Ltd., John Wiley & Sons Ltd. and PLOS. The author (née Vuori) participated in planning the studies, performed the data simulations and all analyses, interpreted the results with the co-authors and was the main writer of all the articles. MTT SCIENCE 30 7 Abbreviations AI average information BM Broyden’s method EBLUP empirical best linear unbiased prediction EM expectation maximization MC Monte Carlo MCMC Markov chain Monte Carlo ML maximum likelihood MME mixed model equations NR Newton-Raphson PEV prediction error variance PX-EM parameter expansion version of expectation maximization REML restricted maximum likelihood SAEM stochastic approximation expectation maximization TD test day VC variance component 8 MTT SCIENCE 30 Contents 1 Introduction .............................................................................................................................. 9 1.1 Genetic Parameters ......................................................................................................... 9 1.2 Estimation methods of genetic parameters ..................................................................... 9 1.2.1 Exploiting Monte Carlo methods ........................................................................ 11 1.2.2 Exploiting linearization ....................................................................................... 11 1.3 Objectives of the study ................................................................................................. 12 2 Materials and methods ........................................................................................................... 13 2.1 Likelihood of linear mixed effects model ..................................................................... 13 2.2 Likelihood of nonlinear mixed effects model by linearization ..................................... 13 2.3 REML ........................................................................................................................... 14 2.3.1 REML by EM algorithm ..................................................................................... 15 2.3.2 REML by Newton-type methods ........................................................................ 16 2.4 REML utilizing MC ..................................................................................................... 17 2.4.1 MC EM REML ................................................................................................... 17 2.4.2 MC Newton-type REML .................................................................................... 18 2.4.3 Convergence criterion ......................................................................................... 19 2.5 Testing of methods ....................................................................................................... 19 2.5.1 Data sets and analyses ......................................................................................... 20 2.5.2 Software .............................................................................................................. 21 3 Main results ............................................................................................................................ 23 3.1 Linearization ................................................................................................................. 23 3.2 MC EM REML ............................................................................................................. 23 3.3 MC Newton-type REML .............................................................................................. 24 3.4 Assessment of convergence .......................................................................................... 24 4 Discussion .............................................................................................................................. 25 4.1 Linearization ................................................................................................................. 25 4.2 MC EM REML ............................................................................................................. 27 4.3 MC Newton-type REML .............................................................................................. 28 4.4 Recommendations for further research ......................................................................... 29 References .................................................................................................................................. 31 MTT SCIENCE 30 9 1 Introduction 1.1 Genetic parameters Estimation of genetic parameters is one of the essential steps necessary for reliable ani- mal evaluations. Genetic parameters reveal the amount of genetic variation in traits, the heritability of traits and possible repeatabil- ity of traits as well as correlations between traits. The heritability of a trait indicates the ability to change the appearance of the trait in a population when selective breeding is applied to the trait. The higher the herita- bility the more accurately breeding values can be estimated with the same number of observations. Because genetic gain is af- fected by accuracy together with genetic variance, selection intensity and generation interval, more genetic change can be achieved within the same time period using more accurate breeding values given the other parameters are kept unchanged. In addition to heritability, breeding programs pay attention to genetic and phenotypic correlations between traits so that the posi- tive correlation between two traits can be exploited or undesired co-response due to selection can be decreased. Estimation of genetic parameters is based on correlated phenotypes between relatives. Therefore, analyses require statistical models which describe environmental components and the sources of variation between obser- vations. Estimation of variance components (VC) tries to quantify the extent to which variation between the observations is due to genetic and non-genetic environmental differences. Hereby the values of genetic parameters are population and data depend- ent. Also the different models used may lead to different estimates about the true state of the population investigated. Therefore the VC need to be re-estimated when some of these components change. 1.2 Estimation methods of genetic parameters One of the most traditional methods to estimate VC is analysis of variance (ANOVA). Because it is applicable for a very limited data structure, Henderson (1953) presented three different methods. The most versatile method is known as Henderson’s method III. The disadvantage of the method was its inability to take into account relationships between animals, i.e., it could not account for the effect of selec- tion. Interest in maximum likelihood (ML) approaches increased in 1970’s (Harville, 1977). The ML approach uses probability distributions as likelihood functions: While data have probability distribution given the parameters, ML maximizes the likelihood function of parameters given the data. ML is a flexible modeling technique, which allows users to account for more assump- tions in the model and patterns of missing traits in multivariate analysis. It is also able to take into account relationships between animals through the numerator relationship matrix. VC estimates by the ML method are, however, dependent on the number of levels within the fixed effects. In theory, when simple models allowed by ANOVA are used, ML estimation leads to biased variance estimates whereas ANOVA esti- mates are unbiased. This deficiency is cor- rected in the restricted maximum likelihood (REML) method, which takes into account 10 MTT SCIENCE 30 the loss of information due to simultaneous estimation of fixed effects (Patterson and Thompson, 1971). A competing estimation approach is based on Bayesian inference which accounts for the loss of information due to estimation of all parameters other than the one of interest. In practice, REML and Bayesian approaches often lead to simi- lar inferences. The REML approach may lead to massive computations and many methods have been presented over the years to calculate REML estimates. The simplest methods use values of the REML likelihood only. More effi- cient methods use first derivatives of the REML likelihood. These are the so-called first-order methods. Even better is to use both the first and the second derivatives of the REML likelihood. These are called second-order methods. Because the matrix of second derivatives can be difficult to calculate, interesting alternatives are the quasi-Newton methods. They rely on ap- proximations of the second derivatives based on the direction of the most recent step. Quasi-Newton methods usually result in faster convergence compared to the first- order methods but slower convergence compared to the true second-order methods because the information matrix is replaced by an approximation. A good summary of the history of VC estimation is given e.g. by Hofer (1998) and Thompson et al. (2005). After decades of development work, infer- ence for VC in linear mixed effects models is theoretically well understood. However, animal breeding evaluations are often char- acterized by a large number of observations and complex models which may lead to computational problems. This underlines the fact that the methods in estimation, especially in model development tools, have to be computationally effective. For VC estimation, existing computer packages are often prohibitively slow (Bayesian ap- proaches) or require huge computer mem- ory capacity (REML approaches). Notable challenges are random regression test day (TD) models, international multivariate comparisons and individual gene models which all require up-to-date statistical and numerical techniques. Linear models are flexible. They can be used, for example, to model the lactation curves of cows by random regression mod- els. Sometimes animal growth curves are modeled by a sigmoid or S-shaped function. As an example, Gompertz function has been shown to fit pig growth data, such as live weight and protein retention (Wellock et al., 2004; Whittemore and Green, 2002; Whittemore et al., 1988) well. Models us- ing S-shaped function are called nonlinear because the fixed and random effects do not enter the model linearly. An advantage of the Gompertz function is that predictions beyond the observed data range can be made more realistic. This is a useful prop- erty that is needed, for example, to predict the adult weight of prematurely slaughtered pigs. Nonlinear models are, however, more complicated to solve than linear models (Davidian and Giltinan, 1995). In animal evaluations, the Bayesian framework has been popular due to the use of Markov chain Monte Carlo (MCMC) methods that allow the solution for the required numeri- cally complicated integration (e.g. Blasco et al., 2003; Rekaya et al., 2001). Another possibility is to approximate the likelihood function using linearization (Davidian and Giltinan, 1995; Pinheiro and Bates, 1995; Wolfinger, 1993; Wolfinger and Lin, 1997) or numerical integration (Pinheiro and MTT SCIENCE 30 11 Bates, 1995). However, all these methods are computationally intensive. 1.2.1 Exploiting Monte Carlo methods Monte Carlo (MC) simulation is used to estimate the integral needed for calculation of expectations of a random variable. MC methods are often used when closed-form solutions are difficult or impossible to ob- tain. Wei and Tanner (1990) introduced the MC expectation maximization (EM) algorithm for cases where maximization for complete data is simple and the expectation can be approximated by MC simulations. This has mainly been used within the classi- cal likelihood framework in the analysis of complex models, like generalized linear mixed models, for which expectations can- not be calculated analytically (e.g. Walker, 1996; McCulloch, 1997; Booth and Hobert, 1999). It can also be utilized in VC estimation of linear mixed effects models via Gibbs sampling to avoid insurmountable computations in the analysis of large data sets and models by the analytical EM REML method (e.g. Guo and Thompson, 1991; Thompson, 1994; García-Cortés and Sorensen, 2001; Harville, 2004). García- Cortés et al. (1992) applied MC EM to VC estimation in a different way. To estimate the prediction error variances (PEV) within each REML round, independent data sets, corresponding to the original data, are gen- erated using the assumed linear model. Fixed and random effects are then estimated from the simulated data sets. This enables the calculation of PEV without inversion or decomposition of the coefficient matrix and leads to memory requirements equal to those used in solving the mixed model equations (MME). Because EM REML may converge slowly, improvements have been offered by second- order methods. Such methods include for example the Newton-Raphson (NR) method using an observed information matrix and Fisher’s method of scoring with observed information matrix replaced by an expected information matrix. Klassen and Smith (1990) presented a sampling scheme for Fisher’s method of scoring. Interestingly it does not need any derivatives, but may be inapplicable for large and complex models as is often the case with derivative free methods in general. At the same time as Wei and Tanner (1990) introduced MC EM, they proposed an improvement for convergence via MC approximation of Louis’s method (Louis, 1982). Again, the use has mainly been in analyses involving complex likelihoods as is the case e.g. in utilization of MC NR method for general- ized linear mixed models in Kuk and Cheng (1997). However, use of the simple sam- pling method presented in García-Cortés et al. (1992) has not been applied with sec- ond-order methods. 1.2.2 Exploiting linearization Analytical solutions for the integral of the nonlinear likelihood function can only rarely be found. Therefore approximation methods based on Taylor-series expansion or Laplacian approximation have been dis- cussed extensively. Early methods used first- order Taylor series expansion of nonlinear functions around expectation of the random effects (Davidian and Giltinan, 1995). Lindstrom and Bates (1990) suggested a more accurate method where the expansion is made around current estimates of the random effects. This is recommended espe- cially for cases where the VCs are large and substantial inter-individual variation exists (Lindstrom and Bates, 1990). Subsequent research has focused more on the second- 12 MTT SCIENCE 30 order Taylor series expansion of integrals invoked by the Laplacian approximation (Pinheiro and Bates, 1995; Wolfinger, 1993; Wolfinger and Lin, 1997). Wolfinger and Lin (1997) presented an interesting choice of approximation for the integrant because of general and familiar formulation. They gave two alternative approaches to select points of expansion: a zero expansion method using expected val- ues, and an EBLUP-expansion method using the empirical best linear unbiased predictors (EBLUP) of the random effects. Both approximations lead to algorithms that iteratively fit linear mixed effects mod- els to the suitably transformed data using either ML or REML. Therefore, they allow the use of commonly applied methods for linear mixed effects models, and they are called linearization methods here. Because conditions for the functionality of the ap- proximation methods are difficult to iden- tify, Wolfinger and Lin (1997) recom- mended simulation studies for assessing the performance of the methods in diverse kinds of nonlinear models and data sets. To our knowledge, the appropriateness of the EBLUP REML method has prior to this study not been verified for large animal breeding data sets. 1.3 Objectives of the study The purpose of this study was to find an efficient method for estimation of the VC of large data and complex mixed effects models in animal breeding. The focus was on the REML estimation of large nonlinear mixed effects models utilizing linearization and the REML estimation of large linear mixed effects models by the MC method. More specifically aims were: 1. To study the applicability of a method based on Taylor series expansion (Wolf- inger and Lin, 1997) in animal breeding with data simulated and analyzed using a Gompertz function growth model. 2. To show the feasibility of the MC EM REML method proposed in García- Cortés et al. (1992) for large data sets and complex linear models by compar- ing it with analytical EM REML using simulated and field data. 3. To study the performance of the MC method in different Newton-type meth- ods for VC estimation of linear mixed effects models by analyzing small simu- lated data. 4. To study a new convergence criterion applicable for the MC methods. MTT SCIENCE 30 13 2 Materials and methods In the following, the linear and nonlinear mixed effects models are introduced and the likelihood based estimation is considered. The ML and REML methods are explained, and some methods to maximize the REML likelihood function by analytical and MC methods are described. Also assessment of the convergence in case of the MC based estimation methods is discussed. Finally, data and programs used in the analyses are given. 2.1 Likelihood of linear mixed effects model Consider a simple single trait linear mixed effects model (1) ,eZuXby ++= where y is the vector of n observations, b is the vector of p fixed effects, u is the vector of q random breeding values, and e is the vector of n random residuals. Fur- thermore, X and Z are known design ma- trices, which relate records to fixed effects and random genetic effects, respectively. Usually, the random effects u and e are assumed to be independent of each other and to follow Gaussian distribution: ( )G0u ,~N and ( )R0e ,~N , where 2uσAG = , A is qq× numerator relationship matrix, 2 eσIR = and I is nn× identity matrix. The unknown VC are genetic variance 2uσ in G and residual variance 2eσ in R . Denote the vector of unknowns by the parameter vector θ . The assumptions of the model result in Gaussian distribution for observations: ( )VXby ,~N , where RZGZV += T . The like- lihood function for the Gaussian distributed data in y is (2) ( ) ( ) ( ) ( ) ( )( ) , exp 22|, 1 2 1 1 2 1 2 1 22 1 2 uuGu XbyRXby GRyθb d L T T qn − − − − − − − −−− =  ππ which leads to the logarithmic likelihood function (3) ( ) ( ) ( ) ( ) ( ). ln2ln |,log|, 1 2 1 2 1 2 XbyVXby V yθbyθb −−− −−= = − T n Ll π This function is used in the ML framework (Searle et al, 1992). The aim is to find pa- rameter values θ that maximizes the likeli- hood function given the observed data y . This can be obtained by taking derivatives of the log likelihood function with respect to θ , equating to zero and solving for θ . Often this approach requires the use of so- called profile likelihood of θ constructed by substituting the current estimates of the fixed effects bˆ , (4) ( ) ,ˆˆˆ 111 yVXXVXb −−−= TT for b . 2.2 Likelihood of nonlinear mixed effects model by linearization Consider a single trait nonlinear model. Let the model be ( ) euZbXy += ,,,f , where y is the vector of n observations, f is the nonlinear function and e is the vector of n random residuals. Assume further that each 14 MTT SCIENCE 30 of the nonlinear model parameters can be modeled by a linear mixed effects model using terms X , b , Z and u as defined earlier. For example, the nonlinear function could be a three parametric Gompertz func- tion ))exp(exp( tf κβα −−= where the pa- rameters α , β and κ can be modeled by linear mixed effects model, i.e., (5) ( ) ( )(( )( )) .exp exp eZuXb ZuXbZuXby ++− +−+= tκκ ββαα Now the fixed effects are [ ]TTTT κβα bbbb = , and the random effects are [ ]TTTT κβα uuuu = and e . Again the random effects u and e are assumed to be inde- pendent from each other and to follow Gaussian distribution: ( )G0u ,~N and ( )R0e ,~N , where AGG ⊗= 0 and 2eσIR = . Because the Gompertz function results in three levels for each fixed and random effect in the model, covariance matrix 0G is 33× matrix with unique elements of 2 α σu , 2βσu , 2 κ σu , αβσu , ακσu and βκσu . For the nonlinear model considered here, the profile likelihood function is (6) ( ) ( ) ( ) ( )( ) ( )( ) ) .ˆ ˆexp 22| 1 2 1 1 2 1 2 1 22 1 2 uuGuZubXy RZubXy GRyθ df f L T T qn − − − − − − −+−   +−− =  ππ Calculation of this likelihood requires calcu- lation of the integral. When the function f is nonlinear, a closed form can only rarely be found and the integral must be solved numerically. One possibility is to approxi- mate the integral by quadratic Taylor-series expansion of the exponent (Wolfinger and Lin, 1997). The second-order expansion with respect to the random effects gives (7) ( ) ( ) ( ) ( ) ( ),ˆˆ ln2ln |log| *1** 2 1 * 2 1 2 * bXYVbXY V yθyθ −−− −−= = − T n Ll π where (8) ( ) uu bb u Z b X uZbXuZbXyY RGZZV ˆ * ˆ * ** *** ˆˆˆˆ = = ∂ ∂ = ∂ ∂ = +++−= += T T T f f f are calculated at bˆ as the current estimate of the fixed effects and uˆ as the current EBLUP of the random effects. Conse- quently, linearization leads to a log likeli- hood function that iteratively fits linear mixed effects models to the suitably trans- formed data Y . Therefore, it allows the use of commonly applied methods for linear mixed effects models. 2.3 REML The ML estimates tend to be biased down- wards for VC (Patterson and Thompson, 1974). Instead of the ML estimates, REML estimates proposed by Patterson and Thompson (1971) are widely preferred in practice. These estimates account for loss in degrees of freedom caused by the estimation of fixed effects b . In principle the REML method is based on linear transformation of the data y : yKz T= , where K is ( )pnn −× matrix of full rank for which 0XK =T , and n and p are numbers of observations and fixed effects, respectively. Elements in z may be referred to as error contrasts and they follow the multivariate normal distri- bution ( )VKK0z TN ,~ . Log likelihood func- tion for error contrasts will be MTT SCIENCE 30 15 (9) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ),ˆˆ ln ln2ln ln2ln| 1 2 1 1 2 1 2 1 2 1 2 1 2 1 2 bXyVbXy XVX V yKVKKyK VKKyθ −−− − −−= − −−= − − − T T n TTTT Tn REMLl π π and the REML estimates are parameter val- ues θ that maximize the REML likelihood function given the observed data y . Similarly, REML for the approximated log likelihood function of the nonlinear model is possible. The error contrasts YKz T= , where Y is transformed data and K is a full rank matrix such that 0XK =*T , follow the multivariate normal distribution ( )KVK0z *,~ TN . The REML solutions for the approximated log likelihood function are obtained by maximizing the equation (10) ( ) ( ) ( ) ( ),ˆˆ ln ln2ln| *1** 2 1 *1** 2 1 * 2 1 2 * bXYVbXY XVX Vyθ −−− − −−= − − T T n REMLl π where transformed data Y , covariance matrix *V and the working incidence ma- trices *X and *Z are obtained by lineariza- tion (Wolfinger and Lin, 1997). In general, maximization of the REML likelihood requires use of numerical meth- ods. The EM algorithm and three Newton- type methods are considered in the follow- ing sections. The approaches are shown for a simple linear mixed effects model. Maxi- mization of the approximated REML likeli- hood of the nonlinear model corresponds to the maximization of the REML likelihood of a random regression model as presented for the EM algorithm in II. For multivariate models maximization by EM REML is considered e.g. in Henderson (1984) and in Mäntysaari and Van Vleck (1989). There is no single method that will be best for every application and some guidelines were given by Misztal (2008). 2.3.1 REML by EM algorithm The EM algorithm was presented by Dempster et al. (1977) for a variety of ex- amples including the VC estimation. Cen- tral to the EM algorithm is that by complet- ing observations with unobserved data the maximization becomes easy. This can be achieved by completing the observations with values of random effects u and residu- als e . The EM algorithm iterates between two steps called the E-step and the M-step. The E-step computes a Q-function, defined as an expectation of the logarithmic likeli- hood function for the complete data given the current parameters: (11) ( ) ( )( ) ( )( ),ˆˆln ˆˆln | 1 2 1 2 1 1 2 1 2 1 uuT TT k tr tr constQ CuuGG WCWeeRR θθ +−− +−− = − − where [ ]ZXW = , C is the inverse of the full MME coefficient matrix, uuC is a sub- matrix of C corresponding to random ef- fects, and uˆ and eˆ are solutions of the MME using current values of variance components. The M-step maximizes the Q- function. Taking derivatives with respect to 2 uσ in 0G and 2eσ in 0R and equating them to zero gives the following estimates: (12) ( ) q tr uuT u CAuAu 112 ˆˆˆ −− +=σ and (13) ( ) .ˆˆˆ 2 n tr TT e WCWee + =σ 16 MTT SCIENCE 30 The EM algorithm is based on first deriva- tives only and it does not give standard errors for parameters as a by-product. The EM algorithm is known to be relatively stable although slow to converge. 2.3.2 REML by Newton-type methods Methods relying on the first and second derivatives of the REML log likelihood function ( )yθ |REMLl with respect to θ are called Newton-type methods. The principle in all Newton-type methods is to find a solution vector where the first derivative of the function to be maximized is zero. The first derivatives of the log-likelihood form the gradient vector (14) ( ) ( ) ( ) ( )               + −−     + −− =    ∂ ∂ − ∂ ∂ = ∂ ∂ = −− 42 4 11 2 2 1 2 1 ˆˆ 2 1 ˆˆ 2 1 | e TT e u uuT u T trn trq tr l σσ σσ WCWee CAuAu θ VPPyθ VPy θ yθθJ and the second derivatives of the log- likelihood with respect to all parameters yield the observed information matrix (15) ( ) ( ) , | 2 1 2    ∂ ∂ ∂ ∂ − ∂ ∂ ∂ ∂ = ∂∂ ∂ −= θ VPθ VPPyθ VPθ VPy θθ yθθH tr l T where 1111 )( −−−−− −= VXXVXXVVP TT . The NR method uses the observed information matrix and the gradient vector in calculating new estimates of parameters kθˆ at iteration round k : (16) ( ) ( ),ˆˆˆˆ 1111 −−−− −= kkkk θJθHθθ where information matrix and gradient vector are computed with current VC esti- mates 1ˆ −kθ . A variant of the NR method, named Fisher scoring, replaces the observed information matrix with the expected information ma- trix: (17) ( )( ) .21    ∂ ∂ ∂ ∂ −= θ VPθ VPθH trE Another Newton-type method is the aver- age information (AI) method (Johnson and Thompson (1995) and Gilmour et al. (1995)), which utilizes the average of the observed and expected information matri- ces: (18) ( ) ( ) ( )( )( ) . 2 1 Pyθ VPθ VPy θHθHθAI ∂ ∂ ∂ ∂ = += T E The AI method is attractive because compu- tation of the average of the observed and expected information matrices is easier than either of the components. It removes the complicated trace calculation and the AI matrix can be computed by solving the MME once for each VC parameter with data replaced by a suitable working vector (Jensen et al., 1997). AI REML is currently the most common VC estimation method used in animal breeding. Newton-type methods reach fast convergence in the neighborhood of the maximum. Further- more, the observed information matrix, as well as its good approximations, the ex- pected information matrix and the AI ma- trix, can be used to calculate standard errors for the parameters. Approximation of second derivatives may also be based on the direction of the most recent estimation step (e.g. in Nocedal and MTT SCIENCE 30 17 Wright, 2006). These quasi-Newton meth- ods usually result in faster convergence compared to linear methods but slower convergence compared to Newton-type methods because the information matrix is replaced by an approximation. Broyden’s method (BM) is a quasi-Newton method for numerical solution of nonlinear equa- tions (Broyden, 1965). BM updates the inverse of the information matrix (instead of the information matrix itself) within each round: (19) ( ) ( ) ( ) ( ) ( )( ) ( ) ( ) ( ) ,1 11 1 1 θJθBθ θBθθJθBθ θB θBθH ΔΔ ΔΔ−Δ + = ≈ − −− − − kT kTk k k where 21 ˆˆ −− −=Δ kk θθθ and ( ) ( )−=Δ −1ˆ kθJθJ( )2ˆ −kθJ . Instead of true gradients, round-to- round changes in the EM estimates can be used for both the update of the inverse in- formation matrix and the update of new estimates: ( ) ( ) ( −−−≈Δ −−− 211 ˆˆˆ kkEMk θθθθJ )2ˆ −kEMθ . Apart from scaling, these changes are rela- tive to the original gradients (Jamshidian and Jennrich, 1993). BM leads to superlin- ear convergence although sequence ( ) }{ kθB does not converge to the observed informa- tion matrix at the maximum (Dennis and Moré, 1974). A disadvantage is that BM does not give estimates of the standard er- rors for the parameters. 2.4 REML utilizing MC The computational problem in VC estima- tion for animal breeding is not necessarily the complexity of the model, but rather the need to analyze large-scale data sets to ob- tain sufficiently accurate genetic parameter estimates. Both EM and Newton-type methods use the first derivative of the likeli- hood in the maximization. Calculation of the first derivative requires the elements of the inverse of the coefficient matrix of MME. When the data are large and the model has many random effects, it is often impossible to calculate the exact inverse of the coefficient matrix of MME using direct methods. Instead, the inverse can be esti- mated by MC sampling methods. The MC sampling scheme presented by García- Cortés et al. (1992) is used both in the EM and the Newton-type REML methods in the following. 2.4.1 MC EM REML The idea in García-Cortés et al. (1992) is to estimate the elements of the inverse coeffi- cient matrix C by generating samples from the same distribution as the original data: hhh euZy ~~~ += , sh ,,1= . Here hu~ denotes the hth sample drawn, such that ( )G0u ,~~ Nh . After solving the MME for huˆ using the sampled data as observations, we can calcu- late average PEV over the samples. As a matter of fact, a variety of methods can be used to calculate the PEV (Hickey et al., 2009). Two of the methods in Hickey et al. (2009) seem to be useful in many models and are here called methods 1 and 2 by García-Cortés et al. (1995). In linear mixed effects model theory, the true variance of random effects is the variance of predicted values plus the variance of prediction errors. On this basis, method 1 by García-Cortés et al. (1995) defines that approximation of the trace part for the genetic random effect in equation (12) can be obtained via (20) ( ) ( )hThuh qSS u uAu ˆˆ 122 −−= σσ and similarly approximation of the trace part for the residual effect in equation (13) can be obtained via 18 MTT SCIENCE 30 (21) ( ) ( ).ˆˆ22 hTheh nSS e ee−= σ σ In method 2 by García-Cortés et al. (1995) the trace of the inverse of the coefficient matrix is estimated based on the actual pre- diction errors of the simulated effects and estimated effects solved from the model given the simulated data: (22) ( ) ( )hhThhh u SS uuAuu ˆ~ˆ~ 12 −−= −σ and (23) ( )( ) ( )( ) .ˆ~~ˆ~~2 hhhThhhh e SS yyeyye −−−−= σ Now the new MC EM REML estimates are (24)    +=  = − s h h s T qu u SS1 1112 2ˆˆˆ σσ uAu and (25) .ˆˆˆ 1 112 2    +=  = s h h s T ne e SS σ σ ee The larger the s , the better is the approxi- mation of the trace. 2.4.2 MC Newton-type REML The simple sampling method to approxi- mate PEV presented in García-Cortés et al. (1992) can be utilized in Newton-type methods as well. Recall that the Newton- type methods are based on first and either exact or approximated second derivatives of the log likelihood function. The first deriva- tives of the log likelihood function contain the same trace elements as the EM algo- rithm and the same sampling based ap- proximation can hence be used: (26) ( )         + −− = ∂ ∂  = − 4 1 11 2 2 2ˆˆ 2 1 | u s h h s T u u u SSq l σσ σ σ uAu yθ and (27) ( ) . ˆˆ 2 1 | 4 1 1 2 2 2         + −− = ∂ ∂  = e s h h s T e e e SSn l σσ σ σ ee yθ By definition, the expected information matrix at convergence is ( )( ) ( ) ( )( )TEE θJθJθH = . Use of the MC method with independent and identically distributed samples enables the approximation of the information ma- trix by the variances of the gradients over the samples within each NR REML round. However, method 1 using equations (20) and (21) for estimation of trace needs to be used in computation of the sampling vari- ance of the gradients, because method 2 using equations (22) and (23) only gives the variances of PEV. In particular, the infor- mation matrix ( )θH for the single trait model in (1) is approximated by the follow- ing elements: (28) ( ) ( ) ( )         ==     = ∂ ∂ − shVar shlVar u hTh u h ,,1; ˆˆ ,,1;| 4 1 2 1 2   σ σ uAu yθ (29) ( ) ( ) ( )         ==     = ∂ ∂ shVar shlVar e hTh e h ,,1; ˆˆ ,,1;| 4 2 1 2   σ σ ee yθ and MTT SCIENCE 30 19 (30) ( ) ( ) ( ) ( ) ( ) ( ) .,,1;ˆˆ , ˆˆ ,,1;|,| 4 2 1 4 1 2 1 22     =     =     = ∂ ∂ ∂ ∂ − sh Cov shllCov e hTh u hTh e h u h   σ σ σσ ee uAu yθyθ Interestingly, MC sampling is needed only for the estimation of first derivatives in AI and BM methods. Both the AI matrix in MC AI REML and the approximated in- verse of information matrix in MC BM REML can be calculated using the current VC estimates. 2.4.3 Convergence criterion Definition of the convergence criterion is not straightforward in methods based on MC sampling because of the sampling variation. For example, a criterion like the relative round-to-round change in consecu- tive VC estimates becomes unreliable. Re- ducing the MC noise by increasing the number of MC samples is impractical be- cause that makes the estimation computa- tionally inefficient (Booth and Hobert, 1999). To reduce the effect of sampling variation, a linear regression on the latest rounds of estimates may be fitted. An idea of this approach is predicting VC estimates by using linear regression on estimates of previous REML rounds and replacing the change in consecutive VC estimates by the changes in linear predictions. The greater the number of rounds used in a linear re- gression, the smaller is the effect of MC noise on the convergence criterion. For the Newton-type methods, convergence may be checked by a similar method. Hence the approach is the same but the prediction method is applied to the gradients instead of the estimates. 2.5 Testing of methods Two simulated data sets and one field data were used to illustrate the linearization and the MC REML methods described here. The first simulated data contained growth curve data and was used to illustrate the linearization method, whereas the other simulated data and the field data were used to compare the VC estimation by different REML methods. The outlines of the data sets and models are described below and displayed in Tables 1 and 2, respectively. Lastly, the programs used in the analyses are described. 20 MTT SCIENCE 30 Table 1. Structure of the data sets used in the studies I, II and III. Response No. of observations No. of ao1 No. of ap2 No. of eq3 To study performance of Simulated growth data Live weight 144,000 4,800 210 15,036 Linearization for full data (I) Live weight 54,405 4,800 210 15,036 Linearization for truncated data(I) Live weight 58,799 4,800 210 15,036 Linearization for partly truncated data (I) Simulated 305-day data Milk yield 3,000 Fat yield 3,000 3,000 3,150 6,400 MC EM REML and new conver- gence criterion (II) Milk yield 569 Fat yield 569 569 715 1,450 MC Newton-type methods and new convergence criterion (III) Field TD data Milk yield 51,004 Fat yield 25,316 Protein yield 25,316 5,399 10,822 160,221 MC EM REML and new conver- gence criterion (II) 1animals with observations 2animals in pedigree 3equations in MME Table 2. Description of models used in the studies I, II and III. Response Fixed effects Random effects No. of VC Study Simulated live weights Sex Additive genetic sire + Non-genetic animal 13 Applicability of lineariza- tion method (I) Simulated 305-day milk and fat yield Herd Additive genetic animal 6 Performance of MC based methods and new convergence criterion (II and III) Field TD data for milk, protein and fat Herd and TD interac- tion + lactation curve Additive genetic animal lactation curve + Non- genetic animal lactation curve 96 Feasibility of MC EM REML and new conver- gence criterion for large data sets (II) 2.5.1 Data sets and analyses To study the effectiveness of the lineariza- tion method, the first simulated data con- tained live weights based on the nonlinear Gompertz function (I). The simulated data included growth records from 4800 tested animals and the pedigree included 210 sires (Table 1). The statistical model included MTT SCIENCE 30 21 one fixed effect, random additive genetic sire effects, random non-genetic animal effects other than the sire and random re- siduals. Hence, the total number of variance components to be estimated was 13 (Table 2). The method was examined through the analysis of full data and two different sub- sets of the data. The general performance of the linearization method was tested with the full data. The second data was a truncated time trajectory set, and it was used to test the performance of the method for incom- plete data (Table 1). In practice incomplete data are common e.g. in pig production, where adult weights are unavailable due to early slaughter ages. The third data com- bined the first two data sets by having some animals with complete data and most ani- mals with incomplete data (Table 1). The VC estimates were only solved by analytical EM REML because of its stability. The results are based on 50 simulation replicates. The performance was measured by relative bias and relative standard deviation, as per- centage from the true value, for the VC estimates. The second simulated data was used to compare different MC REML and analyti- cal REML methods (II and III). Because large data sets appear mostly in dairy cattle, the data structure resembled the dairy cattle observations for 305-day milk and fat yields. For the EM REML comparisons, records were simulated for 3000 cows with a pedigree comprising 3150 animals (Table 1). A subset of this data, with 569 animals with records and a total of 715 animals in the pedigree, was used in comparisons of the Newton-type REML methods (Table 1). Phenotypic records were simulated by a bivariate linear model with fixed herd ef- fects, random additive genetic animal effects and random residuals. The total number of estimated VC was 6 (Table 2). MC EM REML, as well as each MC Newton-type REML method, was tested with 20, 100 and 1000 MC samples per REML iteration round. The computing efficiency of MC EM REML was illustrated by a multiple-trait random regression TD model applied to true Finnish Ayrshire TD data (II). A subset from a large data set was also taken to en- able calculation of analytical EM REML estimates. The data comprised 5399 ani- mals with records and 10822 animals in the pedigree. There were 51004 TD records for milk yield, of which approximately half were associated with observations for pro- tein and fat yield (Table 1). The multiple- trait model consists of a vector of fixed herd and TD interactions, a vector of fixed lacta- tion curve regression effects, a vector of random non-genetic regression effects, a vector of random additive genetic regression effects, and a vector of random residuals. The total number of estimated VC for this model was 96 (Table 2). Three alternatives for the number of MC samples within an MC EM REML round (20, 5 and 1) were tested. 2.5.2 Software The general computing software MiX99 (Lidauer et al., 2011) and DMU (Madsen and Jensen, 2012) were used for solving large scale MME and VC estimates when possible. New methods required some modifications however. Implementation of the linearization procedure required the Gompertz function formulas to be included in MiX99. Only MME for linearized model was solved by MiX99. The pseudo data and 22 MTT SCIENCE 30 working variables were written to an exter- nal file and read by DMU for the estima- tion of VC. Thus, there was no need to make changes to DMU. Also MC EM REML was implemented in MiX99. New modules needed in the software were data generation and calculation of quadratic forms. The available routines to solve MME were used as they had been implemented in MiX99. However, all three different MC based Newton-type REML methods ap- peared to be too complicated to be included in MiX99 for this study. These methods were implemented by R software (R Core Team, 2014) instead. Consequently, the analyzed data had to be of moderate size. MTT SCIENCE 30 23 3 Main results The results from the linearization study are presented first (I). Then, the efficiency of the MC method in EM REML and appli- cability of the MC method in Newton-type methods are explored (II and III). Finally, assessment of convergence is considered (II and III). 3.1 Linearization The performance of the linearization method was investigated based on analysis of 50 simulation replicates of pig growth. The summary of the results in Table 1 and Table 2 in Article I are given here. The full data with observations from the entire growth period worked satisfactorily. The least accurate estimates were obtained for the covariance components of genetic ef- fects (the relative bias and the relative stan- dard deviation were on average 12.5% and 61.8% respectively). Truncated time trajec- tory increased the relative bias and standard deviation of the VC estimates for all effects. Analyses of the truncated data set were most unstable for adult weight parameters (ߙ) but they were also unstable for parameters related to the exponential decay of the ini- tial growth rate (ߢ). However, having even a small proportion of animals with weights from birth to adult weight improved the results when compared to the estimates produced from completely truncated data. An improvement was especially seen in the estimates of genetic covariance components (relative bias and standard deviation from on average 33.4% and 118.7% to 6.6% and 74.0% respectively), which were the most unstable parameters in the truncated time trajectory data. 3.2 MC EM REML VC estimates for linear mixed effects mod- els by MC EM REML and analytical EM REML were monitored along the iteration process (Figure 1 and Figure 3 in Article II). When the number of MC samples was large, the estimates by MC EM REML followed the corresponding analytical esti- mates well. Variation around the analytical estimates was larger when the number of MC samples was reduced. Although devia- tions from the analytical EM REML esti- mates can be large with one MC sample, on average VC estimates by MC EM REML did stay at the same level as with more sam- ples. Furthermore, heritability as well as genetic and phenotypic correlations corre- sponded well with the analytical estimates despite the differences in the VC parameter estimates (Table 2 in Article II). For the field data set, MC EM REML proved to be far superior to analytical EM REML both in computing time and memory need. Calcu- lation of 1000 EM REML rounds using the analytical EM REML method needed 56 days while the MC EM REML method required 65, 20 and 7 hours with 20, 5 and 1 MC samples per MC EM REML round, respectively. Although the data used for the TD model were small, the large number of VC led to time-consuming analysis because of the slow convergence of the EM algo- rithm. 24 MTT SCIENCE 30 3.3 MC Newton-type REML The MC method in different Newton-type methods for estimation of VC was found to be feasible. The approximated information matrix for MC NR REML was based on variance over MC samples within each REML round. Hence, the sampling varia- tion depended on the number of MC sam- ples per REML round. In contrast, the approximated information matrices in MC AI REML and MC BM REML had sam- pling variation only through the current estimates and gradients used to update the information matrix. Compared to MC EM REML, convergence of second derivative methods was fast but sampling variation of estimates along the iteration process was large for all Newton-type MC methods (Figure 1 in Article III). The number of MC samples needed was found to be de- pendent on the Newton-type method used. Based on the analyses with the number of MC samples 20, 100 and 1000, the mini- mum number of MC samples to obtain sufficiently accurate information matrix estimates for the small simulated data was 100, 20 and 1000 for MC NR REML, MC AI REML and MC BM REML, respec- tively. 3.4 Assessment of convergence A convergence criterion that takes into ac- count the sampling variation was developed. First the new criterion was tested with the analytical EM REML. The values based on the new convergence criterion stayed above those of the traditional criterion measuring round to round change (Figure 2 and Fig- ure 4 in Article II). When the linear predic- tion convergence criterion was applied to the MC EM REML estimates calculated from large number of samples, the criterion followed the corresponding values based on the analytical EM REML estimates. A de- crease in the number of MC samples in- creased the fluctuation in the convergence criterion values (Figure 2 and Figure 4 in Article II). The larger MC variation in the Newton-type methods resulted in extra challenges in defining the convergence. However, a convergence criterion similar to MC EM REML was also found to be feasi- ble for MC AI REML estimates (III). MTT SCIENCE 30 25 4 Discussion The aim of this study was to find an effi- cient method for estimation of VC of nonlinear and large linear mixed effects models. In this chapter we discuss our find- ings concerning linearization, MC EM REML and MC Newton-type REML. Fi- nally, recommendations for future studies are considered. 4.1 Linearization An advantage of the method presented by Wolfinger and Lin (1997) is its generality. Their linearization approach can be used for different types of models because it was developed for general nonlinear mixed ef- fects models. For example, generalization of the simple Gompertz model used in this study on multiple effects and traits is straightforward. As also discussed by Pin- heiro and Bates (1995), the main advantage of the linearization is a possibility to use efficient programs that maximize the re- stricted maximum likelihood for the linear mixed effects models, even though the two- step iterative procedure with each step itself being iterative can be regarded as computa- tionally intensive. Linearization enables the use of linear mixed effects model procedures if the approxima- tion is valid. In order to arrive at linear mixed model equations, the dependency of the VC on the fixed effects through the working incidence matrices has to be ig- nored. Both Lindstrom and Bates (1991) and Wolfinger and Lin (1997) justified this by appealing to intrinsic nonlinearity in- stead of the nonlinearity of the parameters. Either way, asymptotic correlations between the estimators of the VC and the fixed ef- fects were not found in several data sets analyzed by Pinheiro and Bates (1995). The full data set in this simulation study shows that linearization also works moderately well for the Gompertz function. Furthermore, the method was successfully used for real pig growth data (Koivula et al., 2008). However, statistical properties of estimation methods for linear models do not transmit to nonlinear models due to the approxima- tion. For example, in the simulation study of Pinheiro and Bates (1995), the bias cor- rection ability of REML depended on the nonlinear model used. However, the success of the approximation method to analyze the nonlinear model greatly depends on the amount and nature of available information (Vonesh, 1996; Wolfinger and Lin, 1997). As expected, the truncated data analysis in this study showed that if observations are missing from the tails of growth curves of all the animals, uncertainty increases and the estimation method can become distorted. This distor- tion diminished considerably when at least some of the animals had observations up to or close to their mature weights. Close prox- imity of slaughter time and the inflection point is common in pig field data. Due to selection of tested pigs for breeding, there may be pigs that have observations up to their adult weights. Our simulation study showed that even a small fraction of fully observed animals improved the results when compared to the estimates produced from 26 MTT SCIENCE 30 completely truncated data. General recom- mendations about the proportion of ani- mals with full data were not made, because the results may be influenced by the popula- tion structure. Convergence of the linearization method has been reported to be difficult by Meza et al. (2007) among others. First of all, con- vergence depends on the starting values for the fixed effects. Also in our study, good starting values were found to be important. General and simple equations to provide good starting values may be difficult to define. One choice could be to fit a simpler model without random effects first (Lind- strom and Bates, 1990). To improve con- vergence, one possibility is a different parameterization of the Gompertz model. For example, a multiplicative model using log-transformed data can be analyzed (Vuori et al., 2006; Koivula et al., 2008). This takes into account the common nature of the residuals in real growth data, but also removes the dependence of the derivative of the adult weight parameter on the others. Instead of the approximation of the likeli- hood function by linearization, Walker (1996) presented MC EM for nonlinear random models. Studies about exact maxi- mization methods for ML and REML in- volving numerical integration techniques or MC methods have continued (e.g. stochas- tic approximation EM (SAEM) in Kuhn and Lavielle (2005) and REML SAEM in Meza et al. (2007)). A comparison of the linearization procedure with the SAEM method in Kuhn and Lavielle (2005) re- vealed that SAEM was better in terms of robustness to starting values for VC and the accuracy of the estimates. Similar to linear mixed effects models, REML SAEM im- proved the accuracy of the VC estimates even more, especially in unbalanced cases (Meza et al., 2007). In addition, Meza et al. (2007) found REML SAEM to be very stable. The main disadvantage of the lin- earization methods in their study was non- convergence for large number of data sets while REML SAEM converged in all cases. The EM algorithms may be very slow to converge for some problems. For example when variance parameters are small or the dimensions of random effects are huge. The parameter expansion version of EM (PX- EM) accelerates EM while maintaining its stability (Liu et al., 1998; Foulley and van Dyk, 2000). Also MC based PX-EM has been presented for nonlinear mixed effects models. Wu (2004) recommended it for both exact and approximate inferences, although the benefit depends on the model used and the analyzed data. In Lavielle and Meza (2007), a PX version of SAEM (PX- SAEM) for nonlinear model converged in all runs, while plain SAEM failed to con- verge in 20 out of 100 runs. The PX-SAEM method improved the convergence particu- larly during the first iterations when the parameters were highly correlated. This would be the case also in the growth curve analysis between the three parameters of the Gompertz function. The good properties found for MC based methods in analysis of nonlinear mixed effects models make the approach interest- ing. However, MC EM for nonlinear model is somewhat different than MC EM REML studied in this thesis. The main difference is that there was no need to sample fixed ef- fects in the VC estimation of linear mixed effects models, while it is needed for estima- tion of nonlinear models. Because of the MTT SCIENCE 30 27 more complicated simulation step, this is likely to lead to the use of MCMC instead of the MC method (Kuhn and Lavielle, 2005). 4.2 MC EM REML In general, MC EM REML converged to the same solutions as analytical EM REML, and a small number of MC samples did not introduce systematic bias to the estimates of genetic parameters. However, the required number of MC samples to obtain estimates with an acceptably low MC error depended on the size of the data. For the field data, MC EM REML gave reliable estimates with five MC samples per round and reasonable results with just one MC sample within a round. Thus, analysis of the field data dem- onstrated the potential of MC EM REML for large and complex models. VC estimates were obtained with relatively fast comput- ing times compared to analytical EM REML and also with low memory require- ments corresponding to those needed for breeding value estimation. Recommendations for increasing the num- ber of MC samples presented in the MC EM literature tend to suggest numbers that may become too high for our purposes in the context of large data sets and many VC (e.g. in Booth and Hobert (1999), Ripatti et al. (2002) and Levine and Fan (2004)). Delyon et al. (1999) used SAEM method which eliminates the need of increasing the number of MC samples as it accounts for estimates from previous REML rounds. The method requires specifying a smoothing parameter that however is not simple to choose. The commonly used recommenda- tion (e.g. Jaffrézic et al., 2006) leads to fast reduction of MC noise. Therefore the VC estimates need to be close to the final values when the SAEM method is started – a situa- tion that may not necessarily be viable with only one MC sample within an MC EM REML round. In addition to the number of MC samples, the time required to solve MME is the most critical point affecting the computing time by the MC EM REML method. An effi- cient solving of the MME is vitally impor- tant for the presented MC EM REML method, because it requires solutions for the location parameters from the data and from a number of simulated samples of data within each MC EM REML round. We used preconditioned conjugate gradient iteration where a block diagonal precondi- tioner matrix was used to approximate the MME (Strandén and Lidauer, 1999). How- ever, the time needed in simulation of the samples and computing quadratics proved to be negligible. Typically, in MCMC analyses, it is recom- mended to check the convergence by using multiple chains and plotting the parameter estimates along the iteration process. In studies of the MC EM REML method, an automated procedure has been suggested to avoid postprocessing of the chain of esti- mates (e.g. in Booth and Hobert (1999) and Ripatti et al. (2002)). Also in this study a stopping rule that can be calculated dur- ing the analysis was introduced. Basically this alternative convergence criterion for MC EM REML was based on the relative differences in the linear regression coeffi- cients from zero. The applied convergence criterion monitored the progress of conver- gence and was only little influenced by MC noise. The criterion also worked reasonably well with a small number of MC samples – a property that may be useful when analyz- 28 MTT SCIENCE 30 ing complex models with many VC to be estimated. Applying the same value for the MC EM REML analysis as for the analyti- cal EM REML analysis unnecessarily in- creased the number of MC EM REML rounds, although it protects against uncer- tainty due to sampling variation. 4.3 MC Newton-type REML The use of MC Newton-type methods instead of the MC EM algorithm in REML speeded up the convergence, i.e., led to a lower number of iterations until conver- gence. However, the sampling variation of the estimates increased compared to the MC EM REML analysis. This is due to multiplication of the gradient by the inverse of the information matrix. Due to different styles for forming an approximation of the information matrix, sampling variation differed between the Newton-type methods. The MC NR REML method is easy to implement, but may require a large number of MC samples to produce sufficiently accu- rate approximations of the variances of the first derivatives over samples. MC AI REML, in contrast, works better even with small number of MC samples, because the AI matrix has no extra sampling noise as it depends only on variance parameters esti- mated in the previous round. With the same number of MC samples, iterations in MC AI REML are computationally more demanding than in MC NR REML because the MME system needs to be solved at each MC AI REML round as many times as there are VC parameters to be estimated. An advantage of the MC NR and the MC AI REML methods is the possibility to obtain easily standard errors of estimates. MC BM REML is computationally the least expensive of the considered methods when the number of REML rounds and the number of MC samples are kept the same. To circumvent evaluation of the informa- tion matrix, BM REML corrects the ap- proximation of the inverse of information matrix from round to round based on the gradients. Analytical BM REML worked reasonably well with the small data set in our study, whereas the MC BM REML method required a large number of MC samples. This indicates the sensitivity of the MC BM REML method to changes in gradients from round to round. Further- more, extra computations are needed for standard errors after convergence has been reached. If each round of iteration in the studied Newton-type methods requires many more samples than in MC EM REML, the overall solving time will only be reduced in cases where the Newton method can enhance the convergence dramatically. Obtaining a fast algorithm for MC based REML estimation requires the development of a practical convergence criterion for the Newton-type methods. Although convergence is the same regardless of the number of MC samples, MC variation affects the values of the con- vergence criteria. The criterion presented by Booth and Hobert (1999) is based on a change in consecutive parameter estimates relative to their standard errors whereas the criterion by Kuk and Cheng (1997) relies on the gradient vector and its standard er- rors. Both of these may need an enormous number of MC samples to work properly. Our convergence criterion for the MC Newton-type methods is based on relative differences in the linear predictions of gra- MTT SCIENCE 30 29 dients. This seemed to be a suitable and simple approach for large-scale genetic evaluations. Kuk and Cheng (1997) and Gauderman and Navidi (2001) presented an MC NR method which was based on the use of MC approximation for the observed information matrix used by Wei and Tanner (1990). However, the estimator of the information matrix was often not positive definite. In both studies, modified estimator of the observed information matrix was presented to guarantee positive definiteness. The new estimator in Gauderman and Navidi (2001) approximates the expected information matrix, like the one presented here. It re- quires only the first derivatives of the log- likelihood. However, they made no com- parisons with the MC EM algorithm. In- stead in an example by Kuk and Cheng (1997), their improved MC NR method was indeed found to be faster than the MC EM algorithm. One interesting approxima- tion of the information matrix is called the empirical Fisher information matrix (Meilijson, 1989; Scott, 2002), where the idea is to calculate the covariance over indi- viduals’ gradients instead of the sampled gradients. 4.4 Recommendations for further research Simulation study about the linearization method was done only using the Gompertz function with a model, data structure and parameters mimicking a pig breeding data set. For other studies, like Gompertz growth curve in chicken, or perhaps for Brody’s function, the linearization method may work differently. In all cases the two-step iterative algorithm studied here would need careful examination with regard to the ini- tial values and convergence criteria. For linear mixed effects models, the MC based EM algorithm is superior only for very large data sets. Therefore, number of MC samples can be chosen to be small, e.g. 1-3. I would suggest further study to focus on detection of convergence instead of in- creasing the sample size along the iteration. In practice, it would be computationally efficient to use SAEM or another averaging method after some (perhaps loose) conver- gence is met. Suitable critical values for convergence criterion should be studied with different kinds of models and data sets both for MC EM and MC Newton-type REML methods. Our results show that the use of MC meth- ods in different Newton-type methods for VC estimation is feasible, although there was variation in efficiency between the im- plementations. However, analysis of our small simulated data implies that the num- ber of MC samples needed for accurate estimation depends on the method used. This work encourages testing the perform- ance of the presented methods in analysis of large-scale problems. As models grow larger and more complex, the efficiency of differ- ent MC Newton-type methods becomes more difficult to predict. Further experience is especially needed on the behavior of MC BM REML in VC estimation of complex models. The estimates of the MC Newton-type REML analyses presented here were re- gressed towards corresponding EM REML estimates whenever they were outside the parameter space. Yet, this does not guaran- tee convergence to the true solutions, espe- cially with respect to the MC BM REML 30 MTT SCIENCE 30 method. One way to increase the robustness of estimation methods is reparameterization of the VC matrices by Cholesky decomposi- tion (Groeneveld, 1994). For some reason however, Cholesky decomposition of the VC matrix did not work in Wolfinger and Lin (1997) for nonlinear mixed effects model. Anyway the performance of this option is worth considering in future VC estimation studies. In conclusion, all the methods studied gave positive results with respect to efficient VC estimation for large scale linear and nonlin- ear mixed effects models. A recommenda- tion for further studies about MC EM REML for linear mixed effects models is related to the assessment of convergence. For the MC Newton-type methods and the linearization above all more experience would be needed with different kinds of models and data sets. Therefore modifica- tions in efficient software are needed in order to use MC based Newton-type meth- ods. The literature considered here presents many proposals for stability, like reparame- terization and PX-EM, which could be considered in the future. MTT SCIENCE 30 31 References Blasco A., Piles M. and Varona L. 2003. A Bayesian analysis of the effect of selection for growth curves in rabbits. Genetics Se- lection Evolution, 35, 21–41. Booth J.G. and Hobert J.P. 1999. Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algo- rithm. Journal of the Royal Statistical So- ciety: Series B, 61, 265–285. Broyden C.G. 1965. A class of methods for solving nonlinear simultaneous equations. Mathematics of Computation, 19, 577– 593. Davidian M. and Giltinan D.M. 1995. Nonline- ar Models for Repeated Measurement Da- ta. Chapman & Hall, London. Delyon B., Lavielle M. and Moulines E. 1999. Convergence of a stochastic approxima- tion version of the EM algorithm. The An- nals of Statistics, 27, 94–128. Dempster A.P., Laird N.M. and Rubin D.B. 1977. Maximum likelihood from incom- plete data via the EM algorithm. Journal of the Royal Statistical Society: Series B, 39, 1–38. Dennis J.E. and Moré J.J. 1974. A characteri- zation of superlinear convergence and its application to quasi-Newton methods. Mathematics of Computation, 28, 549– 560. Foulley J.-L. and van Dyk D.A. 2000. The PX- EM algorithm for fast stable fitting of Hen- derson’s mixed model. Genetics Selection Evolution, 32, 143–163. Gauderman W.J. and Navidi W. 2001. A Mon- te Carlo Newton-Raphson procedure for maximizing complex likelihoods on pedi- gree data. Computational Statistics & Data Analysis, 35, 395–415. García-Cortés L.A. and Sorensen D. 2001. Alternative implementations of Monte Car- lo EM algorithms for likelihood inference. Genetics Selection Evolution, 33, 443– 452. García-Cortés L.A., Moreno C., Varona L. and Altarriba J. 1992. Variance component es- timation by resampling. Journal of Animal Breeding and Genetics, 109, 358–363. García-Cortés L.A., Moreno C., Varona L. and Altarriba J. 1995. Estimation of prediction- error variances by resampling. Journal of Animal Breeding and Genetics, 112, 176– 182. Gilmour A.R., Thompson R. and Cullis B.R. 1995. Average information REML: An effi- cient algorithm for variance parameter es- timation in linear mixed models. Biomet- rics, 51, 1440–1450. Groeneveld E. 1994. A reparamete-rization to improve numerical optimization in multi- variate REML (co)variance component es- timation. Genetics Selection Evolution, 26, 537–545. Guo S.W. and Thompson E.A. 1991. Monte Carlo estimation of variance component models for large complex pedigrees. IMA Journal of Mathematics Applied in Medi- cine and Biology, 8, 171–189. Harville D.A. 1977. Maximum likelihood ap- proaches to variance component estima- tion and to related problems. Journal of the American Statistical Association, 72, 320–338. Harville D.A. 2004. Making REML computa- tionally feasible for large data sets: Use of Gibbs sampler. Journal of Statistical Computation and Simulation, 74, 135– 153. Henderson C.R. 1953. Estimation of variance and covariance components, Biometrics, 9, 226–252. Henderson C.R. 1984. Estimation of vari- ances and covariances under multiple trait model. Journal of Dairy Science, 67, 1581–1589. Hickey J.M., Veerkamp R.F., Calus M.P.L., Mulder H.A. and Thompson R. 2009. Es- timation of prediction error variances via 32 MTT SCIENCE 30 Monte Carlo sampling methods using dif- ferent formulations of the prediction error variances. Genetics Selection Evolution, 41, 23. Hofer A. 1998. Variance component estima- tion in animal breeding: a review. Journal of Animal Breeding and Genetics, 115, 247–265. Jaffrézic F., Meza C., Lavielle M. and Foulley J.-L. 2006. Genetic analysis of growth curves using the SAEM algorithm. Genet- ics Selection Evolution, 38, 583–600. Jamshidian M. and Jennrich R.I. 1993. Con- jugate gradient acceleration of the EM al- gorithm. Journal of the American Statisti- cal Association, 88, 221–228. Jensen J., Mäntysaari E.A., Madsen P. and Thompson R. 1997. Residual maximum likelihood estimation of (co)variance com- ponents in multivariate mixed linear mod- els using average information. Journal of the Indian Society of Agricultural Statis- tics, 49, 215–236. Johnson D.L. and Thompson R. 1995. Re- stricted maximum likelihood estimation of variance components for univariate animal models using sparse matrix techniques and average information. Journal of Dairy Science, 78, 449–456. Klassen D.J. and Smith S.P. 1990. Animal model estimation using simulated REML. In: Proceedings 4th World Congress on Genetics Applied to Livestock Production XIII: 472–475. Edinburgh, 23–27 July 1990. Koivula M., Sevón-Aimonen M.-L., Strandén I., Matilainen K., Serenius T., Stalder K.J. and Mäntysaari E.A. 2008. Genetic (co)variances and breeding value estima- tion of Gompertz growth curve parameters in Finnish Yorkshire boars, gilts and bar- rows. Journal of Animal Breeding and Ge- netics, 125, 168–175. Kuhn E. and Lavielle M. 2005. Maximum likelihood estimation in nonlinear mixed ef- fects models. Computational Statistics & Data Analysis, 49, 1020–1038. Kuk A.Y.C. and Cheng Y.W. 1997. The Monte Carlo Newton-Raphson algorithm. Journal of Statistical Computation and Simulation, 59, 233–250. Lavielle M. and Meza C. 2007 A parameter expansion version of the SAEM algorithm. Statistics and Computing, 17, 121–130. Levine R.A. and Fan J. 2004. An automated (Markov chain) Monte Carlo EM algorithm. Journal of Statistical Computation & Simu- lation, 74, 349–360. Lidauer M.H., Matilainen K., Mäntysaari E.A. and Strandén I. 2011. Technical reference guide for MiX99. Release VI/2011. MTT Agrifood Research Finland. URL http://www.mtt.fi/BGE/Software/MiX9. Lindstrom M.J. and Bates D.M. 1990. Nonlin- ear mixed effects models for repeated measures data. Biometrics, 46, 673–687. Liu C., Rubin D.B. and Wu Y.N. 1998. Pa- rameter expansion to accelerate EM: The PX-EM algorithm, Biometrika, 85, 755– 770. Louis T.A. 1982. Finding the observed infor- mation matrix when using the EM algo- rithm. Journal of the Royal Statistical So- ciety: Series B, 44, 226–233. Madsen P. and Jensen J. 2012. A user’s guide to DMU. A package for analysing multivariate mixed models. Version 6, re- lease 5.1. University of Aarhus, Denmark. McCulloch C.E. 1997. Maximum likelihood algorithms for generalized linear mixed models. Journal of the American Statisti- cal Association, 92, 162–170. Meilijson I. 1989. A fast improvement to the EM algorithm on its own terms. Journal of the Royal Statistical Society: Series B, 51, 127–138. Meza C., Jaffrézic F. and Foulley J.-L. 2007. REML estimation of variance parameters in nonlinear mixed effects models using the SAEM algorithm. Biometrical Journal, 49, 876–888. Misztal I. 2008. Reliable computing in estima- tion of variance components. Journal of Animal Breeding and Genetics, 125, 363– 370. Mäntysaari E. and Van Vleck L.D. 1989. Re- stricted maximum likelihood estimates of variance components from multitrait sire MTT SCIENCE 30 33 models with large number of fixed effects. Journal of Animal Breeding and Genetics, 106, 409–422. Nocedal J. and Wright S.J. 2006. Numerical optimization. Springer, New York, 2nd edi- tion. Patterson H.D. and Thompson R. 1971. Re- covery of inter-block information when block sizes are unequal. Biometrika, 58, 545–554. Patterson H.D. and Thompson R. 1974. Max- imum likelihood estimation of components of variance. Proceedings of the 8th inter- national biometric conference, 197–207. Pinheiro J.C. and Bates D.M. 1995. Approxi- mations to the log-likelihood function in the nonlinear mixed-effects model. Journal of Computational and Graphical Statistics, 4, 12–35. R Core Team. 2014. R: A language and envi- ronment for statistical computing. R Foun- dation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/. Rekaya R., Weigel K.A. and Gianola D. 2001. Hierarchical nonlinear model for persis- tency of milk yield in the first three lacta- tions of Holsteins. Livestock Production Science, 68, 181–187. Ripatti S., Larsen K. and Palmgren J. 2002. Maximum likelihood inference for multivar- iate frailty models using an automated Monte Carlo EM algorithm. Lifetime Data Analysis, 8, 349–360. Scott W.A. 2002. Maximum likelihood estima- tion using the empirical Fisher information matrix. Journal of Statistical Computation and Simulation, 72, 599–611. Searle S.R., Casella G. and McCulloch C.E. 1992. Variance components. John Wiley & Sons, Inc., New York. Strandén I. and Lidauer M. 1999. Solving large mixed linear models using precondi- tioned conjugate gradient iteration. Jour- nal of Dairy Science, 82, 2779–2787. Thompson R. 1994. Integrating best linear unbiased prediction and maximum likeli- hood estimation. In: Proceedings 5th World Congress on Genetics Applied to Livestock Production XVIII: 337–340. Guelph, 7–12 August 1994. Thompson R., Brotherstone S. and White I.M.S. 2005. Estimation of quantitative ge- netic parameters. Philosophical Transac- tions of the Royal Society B, 360, 1469– 1477. Vonesh E.F. 1996. A note on the use of La- place’s approximation for nonlinear mixed- effects models. Biometrika, 83, 447–452. Vuori K., Strandén I. and Mäntysaari E.A. 2006. Solving large scale nonlinear mixed models. 8th World Congress on Genetics Applied to Livestock Production, Belo Horizonte, MG, Brazil. Walker S. 1996. An EM algorithm for nonline- ar random effects models. Biometrics, 52, 934–944. Wei G. and Tanner M. 1990. A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algo- rithms. Journal of the American Statistical Association, 85, 699–704. Wellock I.J., Emmans G.C. and Kyriazakis I. 2004. Describing and predicting potential growth in pig. Animal Science, 78, 379– 388. Whittemore C.T. and Green D.M. 2002. The description of the rate of protein and lipid growth in pigs in relation to live weight. The Journal of Agricultural Science, 138, 415–423. Whittemore C.T., Tullis J.B. and Emmans G.C. 1988. Protein growth in pigs. Animal Production, 46, 437–445. Wolfinger R.D. 1993. Laplace’s approximation for nonlinear mixed models. Biometrika, 80, 791–795. Wolfinger R.D. and Lin X. 1997. Two Taylor- series approximation methods for nonline- ar mixed models. Computational Statistics & Data Analysis, 25, 465–490. Wu L. 2004. Exact and approximate infer- ences for nonlinear mixed-effects models with missing covariates. Journal of the American Statistical Association, 99, 700– 709. MTT is publishing its research findings in two series of publications: MTT Science and MTT Growth. The MTT Science series includes scientific presentations and abstracts from conferences arranged by MTT Agrifood Research Finland. Doctoral dissertations by MTT research scientists will also be published in this series. The topics range from agricultural and food research to environmental research in the field of agriculture. MTT, FI-31600 Jokioinen, Finland. email julkaisut@mtt.fi 30 Variance component estimation exploiting Monte Carlo methods and linearization with complex models and large data in animal breeding Doctoral Dissertation Kaarina Matilainen MTT CREATES VITALITY THROUGH SCIENCE www.mtt.fi/julkaisut