MTT is publishing its research findings in two series of publications:
MTT Science and MTT Growth.
The MTT Science series includes scientific presentations and abstracts from 
conferences arranged by MTT Agrifood Research Finland.
Doctoral dissertations by MTT research scientists
will also be published in this series.
The topics range from agricultural and food research
to environmental research in the field of agriculture. 
MTT, FI-31600 Jokioinen, Finland.
email julkaisut@mtt.fi
30
Variance component estimation 
exploiting Monte Carlo methods 
and linearization with complex 
models and large data in animal 
breeding
Doctoral Dissertation
Kaarina Matilainen
MTT CREATES VITALITY THROUGH SCIENCE
www.mtt.fi/julkaisut
  
 
 
Variance component estimation  
exploiting Monte Carlo methods  
and linearization with complex models 
and large data in animal breeding 
 
Doctoral Dissertation 
 
Kaarina Matilainen 
 
 
 
 
 
 
 
 
 
 
 
Academic dissertation 
To be presented, with the permission of the Faculty of Agriculture and Forestry of 
the University of Helsinki, for public examination in Auditorium XIV, Main building, 
Fabianinkatu 33, on December 5th 2014, at 12 noon. 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ISBN 978-952-487-568-4  (Print) 
ISBN 978-952-487-569-1  (Electronic) 
ISSN 1798-1824  (Printed version) 
ISSN 1798-1840  (Electronic version) 
URN http://urn.fi/URN:ISBN:978-952-487-569-1 
www.mtt.fi/mtttiede/pdf/mtttiede30.pdf     
Copyright  MTT Agrifood Research Finland  
Kaarina Matilainen 
Distribution and sale  
MTT Agrifood Research Finland,  
Media and Information Services,  
FI-31600 Jokioinen,  
e-mail julkaisut@mtt.fi 
Printing year  2014  
Cover image Jaakko Suvala/Rodeo / MTT 
archive, graph/Kaarina Matilainen 
Printing house  Tampereen Yliopistopaino 
Juvenes Print Oy 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Custos:  
Professor Asko Mäki-Tanila, Department of Agricul-
tural Sciences, P.O. Box 28, University of Helsinki, 
Finland 
 
Supervisors: 
Docent Ismo Strandén, MTT Agrifood Research 
Finland, Biotechnology and Food Research, 
Humppilantie 7, 31600 Jokioinen, Finland 
 
Professor Esa A. Mäntysaari, MTT Agrifood Re-
search Finland, Biotechnology and Food Research, 
Humppilantie 7, 31600 Jokioinen, Finland 
 
Reviewers:  
Professor Just Jensen, Aarhus University, Depart-
ment of Molecular Biology and Genetics, Blichers 
Allé 20, 8830 Tjele, Denmark 
 
Dr Bjørg Heringstad, Norwegian University of Life 
Sciences, Department of Animal and Aquacultural 
Sciences, P.O. Box 5003, 1432 Ås, Norway 
 
Opponent:  
Dr John Hickey, The Roslin Institute, The Univer-
sity of Edinburgh, Easter Bush, Midlothian, EH25 
9RG, UK 
MTT SCIENCE 30  3
Variance component estimation exploiting 
Monte Carlo methods and linearization with 
complex models and large data in animal 
breeding 
Kaarina Matilainen 
MTT Agrifood Research Finland, Biotechnology and Food Research, Humppilantie 7,  
FI-31600 Jokioinen, Finland, kaarina.matilainen@mtt.fi 
 
Abstract 
Inference for the variance components in linear 
mixed effects models is theoretically well un-
derstood. Many methods have also been pre-
sented for nonlinear models. Genetic evalua-
tions in animal breeding are however character-
ized by the enormous size of the models and 
data. This means that the methods in estima-
tion have to be computationally efficient. The 
purpose of this study was to find efficient 
methods for the estimation of the variance 
components for large data sets and complex 
mixed effects models in animal breeding.  
The focus of the study was, first, on the re-
stricted maximum likelihood (REML) estima-
tion applied to a linearized model of nonlinear 
mixed effects model and, second, on the 
REML estimation of large linear mixed effects 
models by the Monte Carlo (MC) method. 
Performance of the methods were mostly stud-
ied using simulated data sets, but the feasibility 
of the MC based expectation maximization 
(EM) REML was also studied using dairy cattle 
field data.  
The analyses of a data set mimicking pig live 
weights showed that linearization works mod-
erately well when the data is good, but estima-
tion of parameters related to adult weight be-
comes unstable when weight observations from 
the right tail of the animals’ growth curve were 
missing. However, the simulation study 
showed that having even a small proportion of 
animals with adult weights improved the results 
when compared to the estimates based on 
observations from prematurely slaughtered 
animals only. 
The MC based EM REML method converged 
to the same solutions as the analytical EM 
REML, and a small number of MC samples 
did not introduce systematic bias to the esti-
mates of genetic parameters in the analysis of 
simulated dairy cattle data set. Furthermore, 
analyses of field data proved the MC EM 
REML to be superior to the analytical EM 
REML both in computing time and in the 
memory needed. Compared to MC EM 
REML, the MC Newton-type methods con-
verged faster, but sampling variation of the 
estimates increased. Sampling variation differed 
somewhat also between the Newton-type 
methods.  
Developing a fast algorithm for MC based 
REML estimation requires a convergence crite-
rion that is robust for sampling variation. A 
stopping rule that can be calculated during the 
analysis was introduced. The applied conver-
gence criterion monitored the progress of con-
vergence and was only a little influenced by 
MC noise. It also worked reasonably well with 
small number of MC samples, which is a prop-
erty that may be useful for analyzing large scale 
and complex models.  
Keywords: mixed effects model, variance com-
ponents, linearization, Monte Carlo, REML, EM-
algorithm, Newton’s method 
4  MTT SCIENCE 30 
Monte Carlo -menetelmät ja linearisointi 
varianssikomponenttien arvioinnissa 
analysoitaessa suuria aineistoja vaativilla 
jalostusarvostelumalleilla 
Kaarina Matilainen 
MTT, Biotekniikka- ja elintarviketutkimus, Humppilantie 7, 31600 Jokioinen,  
kaarina.matilainen@mtt.fi 
 
Tiivistelmä 
Keskeinen osa eläinjalostusta on geneettisistä 
tekijöistä johtuvan vaihtelun erottaminen ha-
vaintojen kokonaisvaihtelusta. Geneettistä vaih-
telua arvioidaan tilastotieteestä tutuilla sekamal-
leilla. Sekamallien sisältämien varianssikompo-
nenttien arviointiin liittyvää teoriaa on tutkittu 
paljon lineaarisilla sekamalleilla. Myös epälineaa-
risille malleille on esitetty monia arviointimene-
telmiä. Koska jalostusarvosteluissa käytettävät 
aineistot ja mallit ovat usein suuria, arviointime-
netelmien yksi tärkeä ominaisuus on laskennalli-
nen tehokkuus.  
Tämän tutkimuksen tarkoituksena oli löytää 
tehokas menetelmä varianssikomponenttien 
arviointiin analysoitaessa eläinjalostuksessa käy-
tettäviä suuria aineistoja monimutkaisilla seka-
malleilla. Erityisesti tutkimus keskittyi 1) epä-
lineaaristen sekamallien varianssikomponenttien 
arviointiin mallin linearisoinnilla ja REML-
tyyppisellä menetelmällä ja 2) Monte Carlo 
(MC) –menetelmän hyödyntämiseen sekä EM 
(expectation maximization) että Newtonin 
tyyppisissä REML-analyyseissä lineaarisille se-
kamalleille. Menetelmiä tutkittiin simuloiduilla 
aineistoilla. MC-menetelmää hyödyntävän EM 
REML:n soveltuvuutta testattiin myös todellisel-
la lypsykarja-aineistolla. 
Linearisointi toimi kohtuullisesti simuloidulla 
aineistolla, joka kuvasi eläinten painon kehitystä 
syntymästä aikuispainoon asti. Aikuispainoon 
liittyvien varianssikomponenttien arvioiden 
luotettavuus heikkeni, kun aineisto sisälsi ha-
vaintoja ainoastaan aikuispainoa edeltävältä 
ajalta. Pienikin aikuispainohavaintojen lisäys 
kuitenkin paransi luotettavuutta.  
MC-menetelmää käyttävä EM REML konver-
goi samoihin varianssikomponenttien arvioihin 
kuin analyyttinen menetelmä, eikä pieni MC-
otosten määrä näkynyt systemaattisena harhana 
arvioissa. Todellisen aineiston analyysit osoittivat 
MC EM REML -menetelmän olevan parempi 
sekä laskenta-ajaltaan että tietokoneen muistitar-
peeltaan kuin analyyttinen EM REML  
-menetelmä. MC-menetelmän soveltaminen 
Newtonin tyyppisiin menetelmiin sai REML-
ratkaisut konvergoimaan nopeammin kuin MC 
EM REML:llä, mutta arvioiden otosvaihtelu oli 
suurempaa. Otosvaihtelun suuruus vaihteli 
käytetyn Newtonin menetelmän mukaan.  
Jokaisessa MC REML -analyysissä tarkasteltiin 
myös uutta konvergenssikriteeriä. Uudella kri-
teerillä pystyttiin vähentämään MC-
menetelmästä johtuvaa vaihtelua konvergenssin 
seurannassa. Kriteeri toimi melko hyvin myös 
pienillä MC-otosmäärillä, mikä on toivottava 
ominaisuus analysoitaessa suuria aineistoja ja 
monimutkaisia malleja. 
Avainsanat: sekamalli, varianssikomponent-
ti, linearisointi, Monte Carlo, REML, EM-
algoritmi, Newtonin menetelmä 
MTT SCIENCE 30  5 
Acknowledgements 
 
I wish to express my sincere gratitude to my 
supervisors Ismo Strandén and Esa Mänty-
saari for introducing me into the research 
field of animal breeding and their expert 
guidance in this work. Also I wish to express 
gratitude to co-authors Martin Lidauer, 
Robin Thompson and Marja-Liisa Sevón-
Aimonen for their fruitful collaboration. 
You contributed to a pleasant working team 
and I am most deeply grateful for your sup-
port and encouragement during my work.  
The research work for this thesis has been 
done at MTT Agrifood Research Finland 
and financial support was provided by 
Rothamsted Research. Data was provided 
by Faba, Hollola, Finland. These organiza-
tions are greatly acknowledged. 
Thanks to Pekka Uimari for always helping 
with questions related to my studies. You 
kept the wheels in motion at the final 
crunch. Further, my thanks go to thesis 
reviewers, Just Jensen and Bjørg Heringstad, 
for valuable suggestions. Thanks also go to 
my custos Asko Mäki-Tanila. 
I warmly thank Kirsi Muuttoranta for ad-
vice in so many official and unofficial mat-
ters that came up against. My gratitude goes 
also to current and former colleagues at the 
biometrical genetics team. In addition to all 
the advice and assistant I have got, I wish to 
thank you for a fun-filled moments and 
shared sport hours. Thanks to all of you. 
Keep going! 
I would like to express my deepest apprecia-
tion to my family for providing the re-
quirements for everyday life. To my parents, 
my sister and brother and their families, for 
listening if needed but not asking too much 
about my work; to my mother- , father- and 
sister-in-law not least for being babysitters 
on so many nights during the last months; 
But foremost to my sons Esko and Niilo for 
their patience when I have been an absent 
and absent-minded mother and my hus-
band Jukka for his patience, love and sup-
port. 
Jokioinen, October 2014 
Kaarina Matilainen 
 
 
  
6  MTT SCIENCE 30 
List of original publications 
 
This thesis is based on the following publications: 
I Vuori K., Strandén I., Sevón-Aimonen M.-L., Mäntysaari E.A. (2006). Estimation of 
non-linear growth models by linearization: a simulation study using a Gompertz 
function. Genetics, Selection, Evolution, 38, 343-358. doi:10.1051/gse:2006008 
II Matilainen K., Mäntysaari E.A., Lidauer M.H., Strandén I., Thompson R. (2012). 
Employing a Monte Carlo algorithm in expectation maximization restricted maxi-
mum likelihood estimation of the linear mixed model. Journal of Animal Breeding 
and Genetics, 129, 457-468. doi:10.1111/j.1439-0388.2012.01000.x 
III Matilainen K., Mäntysaari E.A., Lidauer M.H., Strandén I., Thompson R. (2013). 
Employing a Monte Carlo algorithm in Newton-type methods for restricted maxi-
mum likelihood estimation of genetic parameters. PLoS ONE, 8, e80821. 
doi:10.1371/journal.pone.0080821 
 
The publications are referred to in the text by their Roman numerals. Reprints of the origi-
nal articles are published with the kind permission of BioMed Central Ltd., John Wiley & 
Sons Ltd. and PLOS. 
The author (née Vuori) participated in planning the studies, performed the data simulations 
and all analyses, interpreted the results with the co-authors and was the main writer of all the 
articles. 
 
  
MTT SCIENCE 30  7 
Abbreviations 
 
 
AI average information 
BM Broyden’s method 
EBLUP empirical best linear unbiased prediction 
EM expectation maximization 
MC Monte Carlo 
MCMC Markov chain Monte Carlo 
ML maximum likelihood 
MME mixed model equations 
NR Newton-Raphson 
PEV prediction error variance 
PX-EM parameter expansion version of expectation maximization 
REML restricted maximum likelihood 
SAEM stochastic approximation expectation maximization 
TD test day 
VC variance component 
 
 
  
8  MTT SCIENCE 30 
Contents 
1 Introduction .............................................................................................................................. 9 
1.1 Genetic Parameters ......................................................................................................... 9 
1.2 Estimation methods of genetic parameters ..................................................................... 9 
1.2.1 Exploiting Monte Carlo methods ........................................................................ 11 
1.2.2 Exploiting linearization ....................................................................................... 11 
1.3 Objectives of the study ................................................................................................. 12 
2 Materials and methods ........................................................................................................... 13 
2.1 Likelihood of linear mixed effects model ..................................................................... 13 
2.2 Likelihood of nonlinear mixed effects model by linearization ..................................... 13 
2.3 REML ........................................................................................................................... 14 
2.3.1 REML by EM algorithm ..................................................................................... 15 
2.3.2 REML by Newton-type methods ........................................................................ 16 
2.4 REML utilizing MC ..................................................................................................... 17 
2.4.1 MC EM REML ................................................................................................... 17 
2.4.2 MC Newton-type REML .................................................................................... 18 
2.4.3 Convergence criterion ......................................................................................... 19 
2.5 Testing of methods ....................................................................................................... 19 
2.5.1 Data sets and analyses ......................................................................................... 20 
2.5.2 Software .............................................................................................................. 21 
3 Main results ............................................................................................................................ 23 
3.1 Linearization ................................................................................................................. 23 
3.2 MC EM REML ............................................................................................................. 23 
3.3 MC Newton-type REML .............................................................................................. 24 
3.4 Assessment of convergence .......................................................................................... 24 
4 Discussion .............................................................................................................................. 25 
4.1 Linearization ................................................................................................................. 25 
4.2 MC EM REML ............................................................................................................. 27 
4.3 MC Newton-type REML .............................................................................................. 28 
4.4 Recommendations for further research ......................................................................... 29 
References .................................................................................................................................. 31 
 
 
MTT SCIENCE 30  9 
1 Introduction  
1.1 Genetic parameters  
Estimation of genetic parameters is one of 
the essential steps necessary for reliable ani-
mal evaluations. Genetic parameters reveal 
the amount of genetic variation in traits, the 
heritability of traits and possible repeatabil-
ity of traits as well as correlations between 
traits. The heritability of a trait indicates the 
ability to change the appearance of the trait 
in a population when selective breeding is 
applied to the trait. The higher the herita-
bility the more accurately breeding values 
can be estimated with the same number of 
observations. Because genetic gain is af-
fected by accuracy together with genetic 
variance, selection intensity and generation 
interval, more genetic change can be 
achieved within the same time period using 
more accurate breeding values given the 
other parameters are kept unchanged. In 
addition to heritability, breeding programs 
pay attention to genetic and phenotypic 
correlations between traits so that the posi-
tive correlation between two traits can be 
exploited or undesired co-response due to 
selection can be decreased.  
Estimation of genetic parameters is based 
on correlated phenotypes between relatives. 
Therefore, analyses require statistical models 
which describe environmental components 
and the sources of variation between obser-
vations. Estimation of variance components 
(VC) tries to quantify the extent to which 
variation between the observations is due to 
genetic and non-genetic environmental 
differences. Hereby the values of genetic 
parameters are population and data depend-
ent. Also the different models used may lead 
to different estimates about the true state of 
the population investigated. Therefore the 
VC need to be re-estimated when some of 
these components change. 
1.2 Estimation methods of 
genetic parameters 
One of the most traditional methods to 
estimate VC is analysis of variance 
(ANOVA). Because it is applicable for a 
very limited data structure, Henderson 
(1953) presented three different methods. 
The most versatile method is known as 
Henderson’s method III. The disadvantage 
of the method was its inability to take into 
account relationships between animals, i.e., 
it could not account for the effect of selec-
tion. Interest in maximum likelihood (ML) 
approaches increased in 1970’s (Harville, 
1977). The ML approach uses probability 
distributions as likelihood functions: While 
data have probability distribution given the 
parameters, ML maximizes the likelihood 
function of parameters given the data. ML 
is a flexible modeling technique, which 
allows users to account for more assump-
tions in the model and patterns of missing 
traits in multivariate analysis. It is also able 
to take into account relationships between 
animals through the numerator relationship 
matrix. VC estimates by the ML method 
are, however, dependent on the number of 
levels within the fixed effects. In theory, 
when simple models allowed by ANOVA 
are used, ML estimation leads to biased 
variance estimates whereas ANOVA esti-
mates are unbiased. This deficiency is cor-
rected in the restricted maximum likelihood 
(REML) method, which takes into account 
10  MTT SCIENCE 30 
the loss of information due to simultaneous 
estimation of fixed effects (Patterson and 
Thompson, 1971). A competing estimation 
approach is based on Bayesian inference 
which accounts for the loss of information 
due to estimation of all parameters other 
than the one of interest. In practice, REML 
and Bayesian approaches often lead to simi-
lar inferences.  
The REML approach may lead to massive 
computations and many methods have been 
presented over the years to calculate REML 
estimates. The simplest methods use values 
of the REML likelihood only. More effi-
cient methods use first derivatives of the 
REML likelihood. These are the so-called 
first-order methods. Even better is to use 
both the first and the second derivatives of 
the REML likelihood. These are called 
second-order methods. Because the matrix 
of second derivatives can be difficult to 
calculate, interesting alternatives are the 
quasi-Newton methods. They rely on ap-
proximations of the second derivatives 
based on the direction of the most recent 
step. Quasi-Newton methods usually result 
in faster convergence compared to the first-
order methods but slower convergence 
compared to the true second-order methods 
because the information matrix is replaced 
by an approximation. A good summary of 
the history of VC estimation is given e.g. by 
Hofer (1998) and Thompson et al. (2005). 
After decades of development work, infer-
ence for VC in linear mixed effects models 
is theoretically well understood. However, 
animal breeding evaluations are often char-
acterized by a large number of observations 
and complex models which may lead to 
computational problems. This underlines 
the fact that the methods in estimation, 
especially in model development tools, have 
to be computationally effective. For VC 
estimation, existing computer packages are 
often prohibitively slow (Bayesian ap-
proaches) or require huge computer mem-
ory capacity (REML approaches). Notable 
challenges are random regression test day 
(TD) models, international multivariate 
comparisons and individual gene models 
which all require up-to-date statistical and 
numerical techniques. 
Linear models are flexible. They can be 
used, for example, to model the lactation 
curves of cows by random regression mod-
els. Sometimes animal growth curves are 
modeled by a sigmoid or S-shaped function. 
As an example, Gompertz function has been 
shown to fit pig growth data, such as live 
weight and protein retention (Wellock et 
al., 2004; Whittemore and Green, 2002; 
Whittemore et al., 1988) well. Models us-
ing S-shaped function are called nonlinear 
because the fixed and random effects do not 
enter the model linearly. An advantage of 
the Gompertz function is that predictions 
beyond the observed data range can be 
made more realistic. This is a useful prop-
erty that is needed, for example, to predict 
the adult weight of prematurely slaughtered 
pigs. Nonlinear models are, however, more 
complicated to solve than linear models 
(Davidian and Giltinan, 1995). In animal 
evaluations, the Bayesian framework has 
been popular due to the use of Markov 
chain Monte Carlo (MCMC) methods that 
allow the solution for the required numeri-
cally complicated integration (e.g. Blasco et 
al., 2003; Rekaya et al., 2001). Another 
possibility is to approximate the likelihood 
function using linearization (Davidian and 
Giltinan, 1995; Pinheiro and Bates, 1995; 
Wolfinger, 1993; Wolfinger and Lin, 1997) 
or numerical integration (Pinheiro and 
MTT SCIENCE 30  11 
Bates, 1995). However, all these methods 
are computationally intensive. 
1.2.1 Exploiting Monte Carlo methods  
Monte Carlo (MC) simulation is used to 
estimate the integral needed for calculation 
of expectations of a random variable. MC 
methods are often used when closed-form 
solutions are difficult or impossible to ob-
tain. Wei and Tanner (1990) introduced 
the MC expectation maximization (EM) 
algorithm for cases where maximization for 
complete data is simple and the expectation 
can be approximated by MC simulations. 
This has mainly been used within the classi-
cal likelihood framework in the analysis of 
complex models, like generalized linear 
mixed models, for which expectations can-
not be calculated analytically (e.g. Walker, 
1996; McCulloch, 1997; Booth and 
Hobert, 1999). It can also be utilized in VC 
estimation of linear mixed effects models via 
Gibbs sampling to avoid insurmountable 
computations in the analysis of large data 
sets and models by the analytical EM 
REML method (e.g. Guo and Thompson, 
1991; Thompson, 1994; García-Cortés and 
Sorensen, 2001; Harville, 2004). García-
Cortés et al. (1992) applied MC EM to VC 
estimation in a different way. To estimate 
the prediction error variances (PEV) within 
each REML round, independent data sets, 
corresponding to the original data, are gen-
erated using the assumed linear model. 
Fixed and random effects are then estimated 
from the simulated data sets. This enables 
the calculation of PEV without inversion or 
decomposition of the coefficient matrix and 
leads to memory requirements equal to 
those used in solving the mixed model 
equations (MME). 
Because EM REML may converge slowly, 
improvements have been offered by second-
order methods. Such methods include for 
example the Newton-Raphson (NR) 
method using an observed information 
matrix and Fisher’s method of scoring with 
observed information matrix replaced by an 
expected information matrix. Klassen and 
Smith (1990) presented a sampling scheme 
for Fisher’s method of scoring. Interestingly 
it does not need any derivatives, but may be 
inapplicable for large and complex models 
as is often the case with derivative free 
methods in general. At the same time as 
Wei and Tanner (1990) introduced MC 
EM, they proposed an improvement for 
convergence via MC approximation of 
Louis’s method (Louis, 1982). Again, the 
use has mainly been in analyses involving 
complex likelihoods as is the case e.g. in 
utilization of MC NR method for general-
ized linear mixed models in Kuk and Cheng 
(1997). However, use of the simple sam-
pling method presented in García-Cortés et 
al. (1992) has not been applied with sec-
ond-order methods. 
1.2.2 Exploiting linearization 
Analytical solutions for the integral of the 
nonlinear likelihood function can only 
rarely be found. Therefore approximation 
methods based on Taylor-series expansion 
or Laplacian approximation have been dis-
cussed extensively. Early methods used first-
order Taylor series expansion of nonlinear 
functions around expectation of the random 
effects (Davidian and Giltinan, 1995). 
Lindstrom and Bates (1990) suggested a 
more accurate method where the expansion 
is made around current estimates of the 
random effects. This is recommended espe-
cially for cases where the VCs are large and 
substantial inter-individual variation exists 
(Lindstrom and Bates, 1990). Subsequent 
research has focused more on the second-
12  MTT SCIENCE 30 
order Taylor series expansion of integrals 
invoked by the Laplacian approximation 
(Pinheiro and Bates, 1995; Wolfinger, 
1993; Wolfinger and Lin, 1997). 
Wolfinger and Lin (1997) presented an 
interesting choice of approximation for the 
integrant because of general and familiar 
formulation. They gave two alternative 
approaches to select points of expansion: a 
zero expansion method using expected val-
ues, and an EBLUP-expansion method 
using the empirical best linear unbiased 
predictors (EBLUP) of the random effects. 
Both approximations lead to algorithms 
that iteratively fit linear mixed effects mod-
els to the suitably transformed data using 
either ML or REML. Therefore, they allow 
the use of commonly applied methods for 
linear mixed effects models, and they are 
called linearization methods here. Because 
conditions for the functionality of the ap-
proximation methods are difficult to iden-
tify, Wolfinger and Lin (1997) recom-
mended simulation studies for assessing the 
performance of the methods in diverse 
kinds of nonlinear models and data sets. To 
our knowledge, the appropriateness of the 
EBLUP REML method has prior to this 
study not been verified for large animal 
breeding data sets. 
1.3 Objectives of the study 
The purpose of this study was to find an 
efficient method for estimation of the VC 
of large data and complex mixed effects 
models in animal breeding. The focus was 
on the REML estimation of large nonlinear 
mixed effects models utilizing linearization 
and the REML estimation of large linear 
mixed effects models by the MC method. 
More specifically aims were: 
1. To study the applicability of a method 
based on Taylor series expansion (Wolf-
inger and Lin, 1997) in animal breeding 
with data simulated and analyzed using a 
Gompertz function growth model. 
2. To show the feasibility of the MC EM 
REML method proposed in García-
Cortés et al. (1992) for large data sets 
and complex linear models by compar-
ing it with analytical EM REML using 
simulated and field data. 
3. To study the performance of the MC 
method in different Newton-type meth-
ods for VC estimation of linear mixed 
effects models by analyzing small simu-
lated data.  
4. To study a new convergence criterion 
applicable for the MC methods. 
 
 
MTT SCIENCE 30  13 
2 Materials and methods
In the following, the linear and nonlinear 
mixed effects models are introduced and the 
likelihood based estimation is considered. 
The ML and REML methods are explained, 
and some methods to maximize the REML 
likelihood function by analytical and MC 
methods are described. Also assessment of 
the convergence in case of the MC based 
estimation methods is discussed. Finally, 
data and programs used in the analyses are 
given. 
2.1 Likelihood of linear mixed 
effects model 
Consider a simple single trait linear mixed 
effects model  
(1) ,eZuXby ++=  
where y  is the vector of n  observations, b  
is the vector of p  fixed effects, u  is the 
vector of q  random breeding values, and e  
is the vector of n  random residuals. Fur-
thermore, X  and Z  are known design ma-
trices, which relate records to fixed effects 
and random genetic effects, respectively. 
Usually, the random effects u  and e  are 
assumed to be independent of each other 
and to follow Gaussian distribution: 
( )G0u ,~N  and ( )R0e ,~N , where 2uσAG = , 
A  is qq×  numerator relationship matrix, 
2
eσIR =  and I  is nn×  identity matrix. The 
unknown VC are genetic variance 2uσ  in G  
and residual variance 2eσ  in R . Denote the 
vector of unknowns by the parameter vector 
θ . 
The assumptions of the model result in 
Gaussian distribution for observations: 
( )VXby ,~N , where RZGZV += T . The like-
lihood function for the Gaussian distributed 
data in y  is 
(2) 
( ) ( ) ( )
( ) ( )(
) ,
exp
22|,
1
2
1
1
2
1
2
1
22
1
2
uuGu
XbyRXby
GRyθb
d
L
T
T
qn
−
−
−
−
−
−
−
−−−
=

ππ
 
which leads to the logarithmic likelihood 
function  
(3) 
( ) ( )
( )
( ) ( ).
ln2ln
|,log|,
1
2
1
2
1
2
XbyVXby
V
yθbyθb
−−−
−−=
=
−
T
n
Ll
π  
This function is used in the ML framework 
(Searle et al, 1992). The aim is to find pa-
rameter values θ  that maximizes the likeli-
hood function given the observed data y . 
This can be obtained by taking derivatives 
of the log likelihood function with respect 
to θ , equating to zero and solving for θ . 
Often this approach requires the use of so-
called profile likelihood of θ  constructed by 
substituting the current estimates of the 
fixed effects bˆ , 
(4) ( ) ,ˆˆˆ 111 yVXXVXb −−−= TT  
for b . 
2.2 Likelihood of nonlinear 
mixed effects model by 
linearization 
Consider a single trait nonlinear model. Let 
the model be ( ) euZbXy += ,,,f , where y  is 
the vector of n  observations, f  is the 
nonlinear function and e  is the vector of n  
random residuals. Assume further that each 
14  MTT SCIENCE 30 
of the nonlinear model parameters can be 
modeled by a linear mixed effects model 
using terms X , b , Z  and u  as defined 
earlier. For example, the nonlinear function 
could be a three parametric Gompertz func-
tion ))exp(exp( tf κβα −−=  where the pa-
rameters α , β  and κ  can be modeled by 
linear mixed effects model, i.e., 
(5) ( ) ( )(( )( )) .exp
exp
eZuXb
ZuXbZuXby
++−
+−+=
tκκ
ββαα
 
Now the fixed effects are
 
[ ]TTTT κβα bbbb = , 
and the random effects are 
[ ]TTTT κβα uuuu =  and e . Again the random 
effects u  and e  are assumed to be inde-
pendent from each other and to follow 
Gaussian distribution: ( )G0u ,~N  and 
( )R0e ,~N , where AGG ⊗= 0  and 2eσIR = . 
Because the Gompertz function results in 
three levels for each fixed and random effect 
in the model, covariance matrix 0G  is 33×  
matrix with unique elements of 2
α
σu , 2βσu , 
2
κ
σu , αβσu , ακσu  and βκσu . 
For the nonlinear model considered here, 
the profile likelihood function is 
(6) 
( ) ( ) ( )
( )( )
( )( ) ) .ˆ
ˆexp
22|
1
2
1
1
2
1
2
1
22
1
2
uuGuZubXy
RZubXy
GRyθ
df
f
L
T
T
qn
−
−
−
−
−
−
−+−


+−−
=

ππ
 
Calculation of this likelihood requires calcu-
lation of the integral. When the function f  
is nonlinear, a closed form can only rarely 
be found and the integral must be solved 
numerically. One possibility is to approxi-
mate the integral by quadratic Taylor-series 
expansion of the exponent (Wolfinger and 
Lin, 1997). The second-order expansion 
with respect to the random effects gives  
(7) 
( ) ( )
( )
( ) ( ),ˆˆ
ln2ln
|log|
*1**
2
1
*
2
1
2
*
bXYVbXY
V
yθyθ
−−−
−−=
=
−
T
n
Ll
π  
where 
(8) 
( )
uu
bb
u
Z
b
X
uZbXuZbXyY
RGZZV
ˆ
*
ˆ
*
**
***
ˆˆˆˆ
=
=
∂
∂
=
∂
∂
=
+++−=
+=
T
T
T
f
f
f
 
are calculated at bˆ  as the current estimate 
of the fixed effects and uˆ  as the current 
EBLUP of the random effects. Conse-
quently, linearization leads to a log likeli-
hood function that iteratively fits linear 
mixed effects models to the suitably trans-
formed data Y . Therefore, it allows the use 
of commonly applied methods for linear 
mixed effects models. 
2.3 REML 
The ML estimates tend to be biased down-
wards for VC (Patterson and Thompson, 
1974). Instead of the ML estimates, REML 
estimates proposed by Patterson and 
Thompson (1971) are widely preferred in 
practice. These estimates account for loss in 
degrees of freedom caused by the estimation 
of fixed effects b . In principle the REML 
method is based on linear transformation of 
the data y : yKz T= , where K  is ( )pnn −×  
matrix of full rank for which 0XK =T , and 
n  and p  are numbers of observations and 
fixed effects, respectively. Elements in z  
may be referred to as error contrasts and 
they follow the multivariate normal distri-
bution ( )VKK0z TN ,~ . Log likelihood func-
tion for error contrasts will be 
MTT SCIENCE 30  15 
(9) 
( ) ( )
( ) ( ) ( )
( )
( ) ( ),ˆˆ
ln
ln2ln
ln2ln|
1
2
1
1
2
1
2
1
2
1
2
1
2
1
2
bXyVbXy
XVX
V
yKVKKyK
VKKyθ
−−−
−
−−=
−
−−=
−
−
−
T
T
n
TTTT
Tn
REMLl
π
π
 
and the REML estimates are parameter val-
ues θ  that maximize the REML likelihood 
function given the observed data y . 
Similarly, REML for the approximated log 
likelihood function of the nonlinear model 
is possible. The error contrasts YKz T= , 
where Y  is transformed data and K  is a 
full rank matrix such that 0XK =*T , follow 
the multivariate normal distribution 
( )KVK0z *,~ TN . The REML solutions for 
the approximated log likelihood function 
are obtained by maximizing the equation 
(10) 
( ) ( )
( ) ( ),ˆˆ
ln
ln2ln|
*1**
2
1
*1**
2
1
*
2
1
2
*
bXYVbXY
XVX
Vyθ
−−−
−
−−=
−
−
T
T
n
REMLl π
 
where transformed data Y , covariance 
matrix *V  and the working incidence ma-
trices *X  and *Z  are obtained by lineariza-
tion (Wolfinger and Lin, 1997). 
In general, maximization of the REML 
likelihood requires use of numerical meth-
ods. The EM algorithm and three Newton-
type methods are considered in the follow-
ing sections. The approaches are shown for 
a simple linear mixed effects model. Maxi-
mization of the approximated REML likeli-
hood of the nonlinear model corresponds to 
the maximization of the REML likelihood 
of a random regression model as presented 
for the EM algorithm in II. For multivariate 
models maximization by EM REML is 
considered e.g. in Henderson (1984) and in 
Mäntysaari and Van Vleck (1989). There is 
no single method that will be best for every 
application and some guidelines were given 
by Misztal (2008). 
2.3.1 REML by EM algorithm 
The EM algorithm was presented by 
Dempster et al. (1977) for a variety of ex-
amples including the VC estimation. Cen-
tral to the EM algorithm is that by complet-
ing observations with unobserved data the 
maximization becomes easy. This can be 
achieved by completing the observations 
with values of random effects u  and residu-
als e . The EM algorithm iterates between 
two steps called the E-step and the M-step. 
The E-step computes a Q-function, defined 
as an expectation of the logarithmic likeli-
hood function for the complete data given 
the current parameters: 
(11) 
( ) ( )( )
( )( ),ˆˆln
ˆˆln
|
1
2
1
2
1
1
2
1
2
1
uuT
TT
k
tr
tr
constQ
CuuGG
WCWeeRR
θθ
+−−
+−−
=
−
−  
where [ ]ZXW = , C  is the inverse of the 
full MME coefficient matrix, uuC  is a sub-
matrix of C  corresponding to random ef-
fects, and uˆ  and eˆ  are solutions of the 
MME using current values of variance 
components. The M-step maximizes the Q-
function. Taking derivatives with respect to 
2
uσ  in 0G  and 2eσ  in 0R  and equating 
them to zero gives the following estimates: 
(12) ( )
q
tr uuT
u
CAuAu 112 ˆˆˆ −− +=σ  
and 
(13) ( ) .ˆˆˆ 2
n
tr TT
e
WCWee +
=σ  
16  MTT SCIENCE 30 
The EM algorithm is based on first deriva-
tives only and it does not give standard 
errors for parameters as a by-product. The 
EM algorithm is known to be relatively 
stable although slow to converge. 
2.3.2 REML by Newton-type methods 
Methods relying on the first and second 
derivatives of the REML log likelihood 
function ( )yθ |REMLl  with respect to θ  are 
called Newton-type methods. The principle 
in all Newton-type methods is to find a 
solution vector where the first derivative of 
the function to be maximized is zero. The 
first derivatives of the log-likelihood form 
the gradient vector 
(14) 
( ) ( )
( )
( )













 +
−−



 +
−−
=



∂
∂
−
∂
∂
=
∂
∂
=
−−
42
4
11
2
2
1
2
1
ˆˆ
2
1
ˆˆ
2
1
|
e
TT
e
u
uuT
u
T
trn
trq
tr
l
σσ
σσ
WCWee
CAuAu
θ
VPPyθ
VPy
θ
yθθJ
 
and the second derivatives of the log-
likelihood with respect to all parameters 
yield the observed information matrix 
(15) 
( ) ( )
,
|
2
1
2



∂
∂
∂
∂
−
∂
∂
∂
∂
=
∂∂
∂
−=
θ
VPθ
VPPyθ
VPθ
VPy
θθ
yθθH
tr
l
T
 
where 1111 )( −−−−− −= VXXVXXVVP TT . The 
NR method uses the observed information 
matrix and the gradient vector in calculating 
new estimates of parameters kθˆ  at iteration 
round k : 
(16) ( ) ( ),ˆˆˆˆ 1111 −−−− −= kkkk θJθHθθ  
where information matrix and gradient 
vector are computed with current VC esti-
mates 1ˆ −kθ . 
A variant of the NR method, named Fisher 
scoring, replaces the observed information 
matrix with the expected information ma-
trix: 
(17) ( )( ) .21 


∂
∂
∂
∂
−= θ
VPθ
VPθH trE  
Another Newton-type method is the aver-
age information (AI) method (Johnson and 
Thompson (1995) and Gilmour et al. 
(1995)), which utilizes the average of the 
observed and expected information matri-
ces: 
(18) 
( ) ( ) ( )( )( )
.
2
1
Pyθ
VPθ
VPy
θHθHθAI
∂
∂
∂
∂
=
+=
T
E
 
The AI method is attractive because compu-
tation of the average of the observed and 
expected information matrices is easier than 
either of the components. It removes the 
complicated trace calculation and the AI 
matrix can be computed by solving the 
MME once for each VC parameter with 
data replaced by a suitable working vector 
(Jensen et al., 1997). AI REML is currently 
the most common VC estimation method 
used in animal breeding. Newton-type 
methods reach fast convergence in the 
neighborhood of the maximum. Further-
more, the observed information matrix, as 
well as its good approximations, the ex-
pected information matrix and the AI ma-
trix, can be used to calculate standard errors 
for the parameters. 
Approximation of second derivatives may 
also be based on the direction of the most 
recent estimation step (e.g. in Nocedal and 
MTT SCIENCE 30  17 
Wright, 2006). These quasi-Newton meth-
ods usually result in faster convergence 
compared to linear methods but slower 
convergence compared to Newton-type 
methods because the information matrix is 
replaced by an approximation. Broyden’s 
method (BM) is a quasi-Newton method 
for numerical solution of nonlinear equa-
tions (Broyden, 1965). BM updates the 
inverse of the information matrix (instead of 
the information matrix itself) within each 
round: 
(19) 
( ) ( )
( )
( ) ( )( ) ( )
( ) ( ) ,1
11
1
1
θJθBθ
θBθθJθBθ
θB
θBθH
ΔΔ
ΔΔ−Δ
+
=
≈
−
−−
−
−
kT
kTk
k
k
 
where 21 ˆˆ −− −=Δ kk θθθ  and ( ) ( )−=Δ −1ˆ kθJθJ( )2ˆ −kθJ . Instead of true gradients, round-to-
round changes in the EM estimates can be 
used for both the update of the inverse in-
formation matrix and the update of new 
estimates: ( ) ( ) ( −−−≈Δ −−− 211 ˆˆˆ kkEMk θθθθJ )2ˆ −kEMθ . 
Apart from scaling, these changes are rela-
tive to the original gradients (Jamshidian 
and Jennrich, 1993). BM leads to superlin-
ear convergence although sequence ( ) }{ kθB  
does not converge to the observed informa-
tion matrix at the maximum (Dennis and 
Moré, 1974). A disadvantage is that BM 
does not give estimates of the standard er-
rors for the parameters. 
2.4 REML utilizing MC 
The computational problem in VC estima-
tion for animal breeding is not necessarily 
the complexity of the model, but rather the 
need to analyze large-scale data sets to ob-
tain sufficiently accurate genetic parameter 
estimates. Both EM and Newton-type 
methods use the first derivative of the likeli-
hood in the maximization. Calculation of 
the first derivative requires the elements of 
the inverse of the coefficient matrix of 
MME. When the data are large and the 
model has many random effects, it is often 
impossible to calculate the exact inverse of 
the coefficient matrix of MME using direct 
methods. Instead, the inverse can be esti-
mated by MC sampling methods. The MC 
sampling scheme presented by García-
Cortés et al. (1992) is used both in the EM 
and the Newton-type REML methods in 
the following. 
2.4.1 MC EM REML 
The idea in García-Cortés et al. (1992) is to 
estimate the elements of the inverse coeffi-
cient matrix C  by generating samples from 
the same distribution as the original data: 
hhh euZy ~~~ += , sh ,,1= . Here hu~  denotes 
the hth sample drawn, such that ( )G0u ,~~ Nh
. After solving the MME for huˆ  using the 
sampled data as observations, we can calcu-
late average PEV over the samples. As a 
matter of fact, a variety of methods can be 
used to calculate the PEV (Hickey et al., 
2009). Two of the methods in Hickey et al. 
(2009) seem to be useful in many models 
and are here called methods 1 and 2 by 
García-Cortés et al. (1995). In linear mixed 
effects model theory, the true variance of 
random effects is the variance of predicted 
values plus the variance of prediction errors. 
On this basis, method 1 by García-Cortés et 
al. (1995) defines that approximation of the 
trace part for the genetic random effect in 
equation (12) can be obtained via 
(20) ( ) ( )hThuh qSS
u
uAu ˆˆ 122 −−= σσ  
and similarly approximation of the trace 
part for the residual effect in equation (13) 
can be obtained via 
18  MTT SCIENCE 30 
(21) ( ) ( ).ˆˆ22 hTheh nSS
e
ee−= σ
σ
 
In method 2 by García-Cortés et al. (1995) 
the trace of the inverse of the coefficient 
matrix is estimated based on the actual pre-
diction errors of the simulated effects and 
estimated effects solved from the model 
given the simulated data:  
(22) ( ) ( )hhThhh
u
SS uuAuu ˆ~ˆ~ 12 −−= −σ  
and  
(23) ( )( ) ( )( ) .ˆ~~ˆ~~2 hhhThhhh
e
SS yyeyye −−−−=
σ
 
Now the new MC EM REML estimates are 
(24) 


+= 
=
−
s
h
h
s
T
qu u
SS1
1112
2ˆˆˆ σσ uAu  
and  
(25) .ˆˆˆ 1
112
2 


+= 
=
s
h
h
s
T
ne e
SS
σ
σ ee  
The larger the s , the better is the approxi-
mation of the trace. 
2.4.2 MC Newton-type REML 
The simple sampling method to approxi-
mate PEV presented in García-Cortés et al. 
(1992) can be utilized in Newton-type 
methods as well. Recall that the Newton-
type methods are based on first and either 
exact or approximated second derivatives of 
the log likelihood function. The first deriva-
tives of the log likelihood function contain 
the same trace elements as the EM algo-
rithm and the same sampling based ap-
proximation can hence be used: 
(26) 
( )








+
−−
=
∂
∂

=
−
4
1
11
2
2
2ˆˆ
2
1
|
u
s
h
h
s
T
u
u
u
SSq
l
σσ
σ
σ
uAu
yθ
 
and  
(27) 
( )
.
ˆˆ
2
1
|
4
1
1
2
2
2








+
−−
=
∂
∂

=
e
s
h
h
s
T
e
e
e
SSn
l
σσ
σ
σ
ee
yθ
 
By definition, the expected information 
matrix at convergence is ( )( ) ( ) ( )( )TEE θJθJθH =
. Use of the MC method with independent 
and identically distributed samples enables 
the approximation of the information ma-
trix by the variances of the gradients over 
the samples within each NR REML round. 
However, method 1 using equations (20) 
and (21) for estimation of trace needs to be 
used in computation of the sampling vari-
ance of the gradients, because method 2 
using equations (22) and (23) only gives the 
variances of PEV. In particular, the infor-
mation matrix ( )θH  for the single trait 
model in (1) is approximated by the follow-
ing elements: 
(28) 
( )
( ) ( )








==




=
∂
∂
−
shVar
shlVar
u
hTh
u
h
,,1;
ˆˆ
,,1;|
4
1
2
1
2


σ
σ
uAu
yθ
 
(29) 
( )
( ) ( )








==




=
∂
∂
shVar
shlVar
e
hTh
e
h
,,1;
ˆˆ
,,1;|
4
2
1
2


σ
σ
ee
yθ
 
and  
MTT SCIENCE 30  19 
(30) 
( ) ( )
( ) ( )
( ) ( ) .,,1;ˆˆ
,
ˆˆ
,,1;|,|
4
2
1
4
1
2
1
22




=




=




=
∂
∂
∂
∂
−
sh
Cov
shllCov
e
hTh
u
hTh
e
h
u
h


σ
σ
σσ
ee
uAu
yθyθ
 
Interestingly, MC sampling is needed only 
for the estimation of first derivatives in AI 
and BM methods. Both the AI matrix in 
MC AI REML and the approximated in-
verse of information matrix in MC BM 
REML can be calculated using the current 
VC estimates. 
2.4.3 Convergence criterion 
Definition of the convergence criterion is 
not straightforward in methods based on 
MC sampling because of the sampling 
variation. For example, a criterion like the 
relative round-to-round change in consecu-
tive VC estimates becomes unreliable. Re-
ducing the MC noise by increasing the 
number of MC samples is impractical be-
cause that makes the estimation computa-
tionally inefficient (Booth and Hobert, 
1999). To reduce the effect of sampling 
variation, a linear regression on the latest 
rounds of estimates may be fitted. An idea 
of this approach is predicting VC estimates 
by using linear regression on estimates of 
previous REML rounds and replacing the 
change in consecutive VC estimates by the 
changes in linear predictions. The greater 
the number of rounds used in a linear re-
gression, the smaller is the effect of MC 
noise on the convergence criterion. For the 
Newton-type methods, convergence may be 
checked by a similar method. Hence the 
approach is the same but the prediction 
method is applied to the gradients instead of 
the estimates. 
2.5 Testing of methods 
Two simulated data sets and one field data 
were used to illustrate the linearization and 
the MC REML methods described here. 
The first simulated data contained growth 
curve data and was used to illustrate the 
linearization method, whereas the other 
simulated data and the field data were used 
to compare the VC estimation by different 
REML methods. The outlines of the data 
sets and models are described below and 
displayed in Tables 1 and 2, respectively. 
Lastly, the programs used in the analyses are 
described. 
 
 
 
 
 
 
 
 
 
 
 
 
 
20  MTT SCIENCE 30 
Table 1. Structure of the data sets used in the studies I, II and III. 
Response No. of observations No. of 
ao1 
No. of 
ap2 
No. of 
eq3 
To study performance of 
Simulated growth 
data  
Live weight 144,000 4,800 210 15,036 Linearization for full data (I) 
Live weight 54,405 4,800 210 15,036 Linearization for truncated data(I) 
Live weight 58,799 4,800 210 15,036 Linearization for partly truncated 
data (I) 
Simulated  
305-day data 
Milk yield 3,000 
Fat yield 3,000 
3,000 3,150 6,400 MC EM REML and new conver-
gence criterion (II) 
Milk yield 569
Fat yield 569 
569 715 1,450 MC Newton-type methods and 
new convergence criterion (III) 
Field TD data Milk yield 51,004
Fat yield 25,316 
Protein yield 25,316 
5,399 10,822 160,221 MC EM REML and new conver-
gence criterion (II) 
1animals with observations 
2animals in pedigree 
3equations in MME 
 
 
 
 
Table 2. Description of models used in the studies I, II and III. 
Response Fixed effects Random effects No. of VC Study 
Simulated live weights Sex Additive genetic sire + 
Non-genetic animal  
13 Applicability of lineariza-
tion method (I) 
Simulated 305-day milk 
and fat yield 
Herd Additive genetic animal 6 Performance of MC 
based methods and new 
convergence criterion (II 
and III) 
Field TD data for milk, 
protein and fat 
Herd and TD interac-
tion + lactation curve
Additive genetic animal 
lactation curve + Non-
genetic animal lactation 
curve 
96 Feasibility of MC EM 
REML and new conver-
gence criterion for large 
data sets (II) 
 
 
 
2.5.1 Data sets and analyses 
To study the effectiveness of the lineariza-
tion method, the first simulated data con-
tained live weights based on the nonlinear 
Gompertz function (I). The simulated data 
included growth records from 4800 tested 
animals and the pedigree included 210 sires 
(Table 1). The statistical model included 
MTT SCIENCE 30  21 
one fixed effect, random additive genetic 
sire effects, random non-genetic animal 
effects other than the sire and random re-
siduals. Hence, the total number of variance 
components to be estimated was 13 (Table 
2). The method was examined through the 
analysis of full data and two different sub-
sets of the data. The general performance of 
the linearization method was tested with the 
full data. The second data was a truncated 
time trajectory set, and it was used to test 
the performance of the method for incom-
plete data (Table 1). In practice incomplete 
data are common e.g. in pig production, 
where adult weights are unavailable due to 
early slaughter ages. The third data com-
bined the first two data sets by having some 
animals with complete data and most ani-
mals with incomplete data (Table 1). The 
VC estimates were only solved by analytical 
EM REML because of its stability. The 
results are based on 50 simulation replicates. 
The performance was measured by relative 
bias and relative standard deviation, as per-
centage from the true value, for the VC 
estimates. 
The second simulated data was used to 
compare different MC REML and analyti-
cal REML methods (II and III). Because 
large data sets appear mostly in dairy cattle, 
the data structure resembled the dairy cattle 
observations for 305-day milk and fat 
yields. For the EM REML comparisons, 
records were simulated for 3000 cows with 
a pedigree comprising 3150 animals (Table 
1). A subset of this data, with 569 animals 
with records and a total of 715 animals in 
the pedigree, was used in comparisons of 
the Newton-type REML methods (Table 
1). Phenotypic records were simulated by a 
bivariate linear model with fixed herd ef-
fects, random additive genetic animal effects 
and random residuals. The total number of 
estimated VC was 6 (Table 2). MC EM 
REML, as well as each MC Newton-type 
REML method, was tested with 20, 100 
and 1000 MC samples per REML iteration 
round. 
The computing efficiency of MC EM 
REML was illustrated by a multiple-trait 
random regression TD model applied to 
true Finnish Ayrshire TD data (II). A subset 
from a large data set was also taken to en-
able calculation of analytical EM REML 
estimates. The data comprised 5399 ani-
mals with records and 10822 animals in the 
pedigree. There were 51004 TD records for 
milk yield, of which approximately half 
were associated with observations for pro-
tein and fat yield (Table 1). The multiple-
trait model consists of a vector of fixed herd 
and TD interactions, a vector of fixed lacta-
tion curve regression effects, a vector of 
random non-genetic regression effects, a 
vector of random additive genetic regression 
effects, and a vector of random residuals. 
The total number of estimated VC for this 
model was 96 (Table 2). Three alternatives 
for the number of MC samples within an 
MC EM REML round (20, 5 and 1) were 
tested. 
2.5.2 Software 
The general computing software MiX99 
(Lidauer et al., 2011) and DMU (Madsen 
and Jensen, 2012) were used for solving 
large scale MME and VC estimates when 
possible. New methods required some 
modifications however. Implementation of 
the linearization procedure required the 
Gompertz function formulas to be included 
in MiX99. Only MME for linearized model 
was solved by MiX99. The pseudo data and 
22  MTT SCIENCE 30 
working variables were written to an exter-
nal file and read by DMU for the estima-
tion of VC. Thus, there was no need to 
make changes to DMU. Also MC EM 
REML was implemented in MiX99. New 
modules needed in the software were data 
generation and calculation of quadratic 
forms. The available routines to solve MME 
were used as they had been implemented in 
MiX99. However, all three different MC 
based Newton-type REML methods ap-
peared to be too complicated to be included 
in MiX99 for this study. These methods 
were implemented by R software (R Core 
Team, 2014) instead. Consequently, the 
analyzed data had to be of moderate size. 
 
MTT SCIENCE 30  23 
3 Main results
The results from the linearization study are 
presented first (I). Then, the efficiency of 
the MC method in EM REML and appli-
cability of the MC method in Newton-type 
methods are explored (II and III). Finally, 
assessment of convergence is considered (II 
and III). 
3.1 Linearization 
The performance of the linearization 
method was investigated based on analysis 
of 50 simulation replicates of pig growth. 
The summary of the results in Table 1 and 
Table 2 in Article I are given here. The full 
data with observations from the entire 
growth period worked satisfactorily. The 
least accurate estimates were obtained for 
the covariance components of genetic ef-
fects (the relative bias and the relative stan-
dard deviation were on average 12.5% and 
61.8% respectively). Truncated time trajec-
tory increased the relative bias and standard 
deviation of the VC estimates for all effects. 
Analyses of the truncated data set were most 
unstable for adult weight parameters (ߙ) 
but they were also unstable for parameters 
related to the exponential decay of the ini-
tial growth rate (ߢ). However, having even a 
small proportion of animals with weights 
from birth to adult weight improved the 
results when compared to the estimates 
produced from completely truncated data. 
An improvement was especially seen in the 
estimates of genetic covariance components 
(relative bias and standard deviation from 
on average 33.4% and 118.7% to 6.6% and 
74.0% respectively), which were the most 
unstable parameters in the truncated time 
trajectory data. 
3.2 MC EM REML 
VC estimates for linear mixed effects mod-
els by MC EM REML and analytical EM 
REML were monitored along the iteration 
process (Figure 1 and Figure 3 in Article II). 
When the number of MC samples was 
large, the estimates by MC EM REML 
followed the corresponding analytical esti-
mates well. Variation around the analytical 
estimates was larger when the number of 
MC samples was reduced. Although devia-
tions from the analytical EM REML esti-
mates can be large with one MC sample, on 
average VC estimates by MC EM REML 
did stay at the same level as with more sam-
ples. Furthermore, heritability as well as 
genetic and phenotypic correlations corre-
sponded well with the analytical estimates 
despite the differences in the VC parameter 
estimates (Table 2 in Article II). For the 
field data set, MC EM REML proved to be 
far superior to analytical EM REML both in 
computing time and memory need. Calcu-
lation of 1000 EM REML rounds using the 
analytical EM REML method needed 56 
days while the MC EM REML method 
required 65, 20 and 7 hours with 20, 5 and 
1 MC samples per MC EM REML round, 
respectively. Although the data used for the 
TD model were small, the large number of 
VC led to time-consuming analysis because 
of the slow convergence of the EM algo-
rithm. 
 
24  MTT SCIENCE 30 
3.3 MC Newton-type REML 
The MC method in different Newton-type 
methods for estimation of VC was found to 
be feasible. The approximated information 
matrix for MC NR REML was based on 
variance over MC samples within each 
REML round. Hence, the sampling varia-
tion depended on the number of MC sam-
ples per REML round. In contrast, the 
approximated information matrices in MC 
AI REML and MC BM REML had sam-
pling variation only through the current 
estimates and gradients used to update the 
information matrix. Compared to MC EM 
REML, convergence of second derivative 
methods was fast but sampling variation of 
estimates along the iteration process was 
large for all Newton-type MC methods 
(Figure 1 in Article III). The number of 
MC samples needed was found to be de-
pendent on the Newton-type method used. 
Based on the analyses with the number of 
MC samples 20, 100 and 1000, the mini-
mum number of MC samples to obtain 
sufficiently accurate information matrix 
estimates for the small simulated data was 
100, 20 and 1000 for MC NR REML, MC 
AI REML and MC BM REML, respec-
tively. 
3.4 Assessment of 
convergence 
A convergence criterion that takes into ac-
count the sampling variation was developed. 
First the new criterion was tested with the 
analytical EM REML. The values based on 
the new convergence criterion stayed above 
those of the traditional criterion measuring 
round to round change (Figure 2 and Fig-
ure 4 in Article II). When the linear predic-
tion convergence criterion was applied to 
the MC EM REML estimates calculated 
from large number of samples, the criterion 
followed the corresponding values based on 
the analytical EM REML estimates. A de-
crease in the number of MC samples in-
creased the fluctuation in the convergence 
criterion values (Figure 2 and Figure 4 in 
Article II). The larger MC variation in the 
Newton-type methods resulted in extra 
challenges in defining the convergence. 
However, a convergence criterion similar to 
MC EM REML was also found to be feasi-
ble for MC AI REML estimates (III). 
 
MTT SCIENCE 30  25 
4 Discussion
The aim of this study was to find an effi-
cient method for estimation of VC of 
nonlinear and large linear mixed effects 
models. In this chapter we discuss our find-
ings concerning linearization, MC EM 
REML and MC Newton-type REML. Fi-
nally, recommendations for future studies 
are considered. 
4.1 Linearization 
An advantage of the method presented by 
Wolfinger and Lin (1997) is its generality. 
Their linearization approach can be used for 
different types of models because it was 
developed for general nonlinear mixed ef-
fects models. For example, generalization of 
the simple Gompertz model used in this 
study on multiple effects and traits is 
straightforward. As also discussed by Pin-
heiro and Bates (1995), the main advantage 
of the linearization is a possibility to use 
efficient programs that maximize the re-
stricted maximum likelihood for the linear 
mixed effects models, even though the two-
step iterative procedure with each step itself 
being iterative can be regarded as computa-
tionally intensive. 
Linearization enables the use of linear mixed 
effects model procedures if the approxima-
tion is valid. In order to arrive at linear 
mixed model equations, the dependency of 
the VC on the fixed effects through the 
working incidence matrices has to be ig-
nored. Both Lindstrom and Bates (1991) 
and Wolfinger and Lin (1997) justified this 
by appealing to intrinsic nonlinearity in-
stead of the nonlinearity of the parameters. 
Either way, asymptotic correlations between 
the estimators of the VC and the fixed ef-
fects were not found in several data sets 
analyzed by Pinheiro and Bates (1995). The 
full data set in this simulation study shows 
that linearization also works moderately well 
for the Gompertz function. Furthermore, 
the method was successfully used for real 
pig growth data (Koivula et al., 2008). 
However, statistical properties of estimation 
methods for linear models do not transmit 
to nonlinear models due to the approxima-
tion. For example, in the simulation study 
of Pinheiro and Bates (1995), the bias cor-
rection ability of REML depended on the 
nonlinear model used. 
However, the success of the approximation 
method to analyze the nonlinear model 
greatly depends on the amount and nature 
of available information (Vonesh, 1996; 
Wolfinger and Lin, 1997). As expected, the 
truncated data analysis in this study showed 
that if observations are missing from the 
tails of growth curves of all the animals, 
uncertainty increases and the estimation 
method can become distorted. This distor-
tion diminished considerably when at least 
some of the animals had observations up to 
or close to their mature weights. Close prox-
imity of slaughter time and the inflection 
point is common in pig field data. Due to 
selection of tested pigs for breeding, there 
may be pigs that have observations up to 
their adult weights. Our simulation study 
showed that even a small fraction of fully 
observed animals improved the results when 
compared to the estimates produced from 
26  MTT SCIENCE 30 
completely truncated data. General recom-
mendations about the proportion of ani-
mals with full data were not made, because 
the results may be influenced by the popula-
tion structure.  
Convergence of the linearization method 
has been reported to be difficult by Meza et 
al. (2007) among others. First of all, con-
vergence depends on the starting values for 
the fixed effects. Also in our study, good 
starting values were found to be important. 
General and simple equations to provide 
good starting values may be difficult to 
define. One choice could be to fit a simpler 
model without random effects first (Lind-
strom and Bates, 1990). To improve con-
vergence, one possibility is a different 
parameterization of the Gompertz model. 
For example, a multiplicative model using 
log-transformed data can be analyzed 
(Vuori et al., 2006; Koivula et al., 2008). 
This takes into account the common nature 
of the residuals in real growth data, but also 
removes the dependence of the derivative of 
the adult weight parameter on the others.  
Instead of the approximation of the likeli-
hood function by linearization, Walker 
(1996) presented MC EM for nonlinear 
random models. Studies about exact maxi-
mization methods for ML and REML in-
volving numerical integration techniques or 
MC methods have continued (e.g. stochas-
tic approximation EM (SAEM) in Kuhn 
and Lavielle (2005) and REML SAEM in 
Meza et al. (2007)). A comparison of the 
linearization procedure with the SAEM 
method in Kuhn and Lavielle (2005) re-
vealed that SAEM was better in terms of 
robustness to starting values for VC and the 
accuracy of the estimates. Similar to linear 
mixed effects models, REML SAEM im-
proved the accuracy of the VC estimates 
even more, especially in unbalanced cases 
(Meza et al., 2007). In addition, Meza et al. 
(2007) found REML SAEM to be very 
stable. The main disadvantage of the lin-
earization methods in their study was non-
convergence for large number of data sets 
while REML SAEM converged in all cases. 
The EM algorithms may be very slow to 
converge for some problems. For example 
when variance parameters are small or the 
dimensions of random effects are huge. The 
parameter expansion version of EM (PX-
EM) accelerates EM while maintaining its 
stability (Liu et al., 1998; Foulley and van 
Dyk, 2000). Also MC based PX-EM has 
been presented for nonlinear mixed effects 
models. Wu (2004) recommended it for 
both exact and approximate inferences, 
although the benefit depends on the model 
used and the analyzed data. In Lavielle and 
Meza (2007), a PX version of SAEM (PX-
SAEM) for nonlinear model converged in 
all runs, while plain SAEM failed to con-
verge in 20 out of 100 runs. The PX-SAEM 
method improved the convergence particu-
larly during the first iterations when the 
parameters were highly correlated. This 
would be the case also in the growth curve 
analysis between the three parameters of the 
Gompertz function.  
The good properties found for MC based 
methods in analysis of nonlinear mixed 
effects models make the approach interest-
ing. However, MC EM for nonlinear model 
is somewhat different than MC EM REML 
studied in this thesis. The main difference is 
that there was no need to sample fixed ef-
fects in the VC estimation of linear mixed 
effects models, while it is needed for estima-
tion of nonlinear models. Because of the 
MTT SCIENCE 30  27 
more complicated simulation step, this is 
likely to lead to the use of MCMC instead 
of the MC method (Kuhn and Lavielle, 
2005). 
4.2 MC EM REML 
In general, MC EM REML converged to 
the same solutions as analytical EM REML, 
and a small number of MC samples did not 
introduce systematic bias to the estimates of 
genetic parameters. However, the required 
number of MC samples to obtain estimates 
with an acceptably low MC error depended 
on the size of the data. For the field data, 
MC EM REML gave reliable estimates with 
five MC samples per round and reasonable 
results with just one MC sample within a 
round. Thus, analysis of the field data dem-
onstrated the potential of MC EM REML 
for large and complex models. VC estimates 
were obtained with relatively fast comput-
ing times compared to analytical EM 
REML and also with low memory require-
ments corresponding to those needed for 
breeding value estimation. 
Recommendations for increasing the num-
ber of MC samples presented in the MC 
EM literature tend to suggest numbers that 
may become too high for our purposes in 
the context of large data sets and many VC 
(e.g. in Booth and Hobert (1999), Ripatti et 
al. (2002) and Levine and Fan (2004)). 
Delyon et al. (1999) used SAEM method 
which eliminates the need of increasing the 
number of MC samples as it accounts for 
estimates from previous REML rounds. The 
method requires specifying a smoothing 
parameter that however is not simple to 
choose. The commonly used recommenda-
tion (e.g. Jaffrézic et al., 2006) leads to fast 
reduction of MC noise. Therefore the VC 
estimates need to be close to the final values 
when the SAEM method is started – a situa-
tion that may not necessarily be viable with 
only one MC sample within an MC EM 
REML round.  
In addition to the number of MC samples, 
the time required to solve MME is the most 
critical point affecting the computing time 
by the MC EM REML method. An effi-
cient solving of the MME is vitally impor-
tant for the presented MC EM REML 
method, because it requires solutions for the 
location parameters from the data and from 
a number of simulated samples of data 
within each MC EM REML round. We 
used preconditioned conjugate gradient 
iteration where a block diagonal precondi-
tioner matrix was used to approximate the 
MME (Strandén and Lidauer, 1999). How-
ever, the time needed in simulation of the 
samples and computing quadratics proved 
to be negligible. 
Typically, in MCMC analyses, it is recom-
mended to check the convergence by using 
multiple chains and plotting the parameter 
estimates along the iteration process. In 
studies of the MC EM REML method, an 
automated procedure has been suggested to 
avoid postprocessing of the chain of esti-
mates (e.g. in Booth and Hobert (1999) 
and Ripatti et al. (2002)). Also in this study 
a stopping rule that can be calculated dur-
ing the analysis was introduced. Basically 
this alternative convergence criterion for 
MC EM REML was based on the relative 
differences in the linear regression coeffi-
cients from zero. The applied convergence 
criterion monitored the progress of conver-
gence and was only little influenced by MC 
noise. The criterion also worked reasonably 
well with a small number of MC samples – 
a property that may be useful when analyz-
28  MTT SCIENCE 30 
ing complex models with many VC to be 
estimated. Applying the same value for the 
MC EM REML analysis as for the analyti-
cal EM REML analysis unnecessarily in-
creased the number of MC EM REML 
rounds, although it protects against uncer-
tainty due to sampling variation. 
4.3 MC Newton-type REML 
The use of MC Newton-type methods 
instead of the MC EM algorithm in REML 
speeded up the convergence, i.e., led to a 
lower number of iterations until conver-
gence. However, the sampling variation of 
the estimates increased compared to the 
MC EM REML analysis. This is due to 
multiplication of the gradient by the inverse 
of the information matrix. Due to different 
styles for forming an approximation of the 
information matrix, sampling variation 
differed between the Newton-type methods. 
The MC NR REML method is easy to 
implement, but may require a large number 
of MC samples to produce sufficiently accu-
rate approximations of the variances of the 
first derivatives over samples. MC AI 
REML, in contrast, works better even with 
small number of MC samples, because the 
AI matrix has no extra sampling noise as it 
depends only on variance parameters esti-
mated in the previous round. With the 
same number of MC samples, iterations in 
MC AI REML are computationally more 
demanding than in MC NR REML because 
the MME system needs to be solved at each 
MC AI REML round as many times as 
there are VC parameters to be estimated. 
An advantage of the MC NR and the MC 
AI REML methods is the possibility to 
obtain easily standard errors of estimates. 
MC BM REML is computationally the 
least expensive of the considered methods 
when the number of REML rounds and the 
number of MC samples are kept the same. 
To circumvent evaluation of the informa-
tion matrix, BM REML corrects the ap-
proximation of the inverse of information 
matrix from round to round based on the 
gradients. Analytical BM REML worked 
reasonably well with the small data set in 
our study, whereas the MC BM REML 
method required a large number of MC 
samples. This indicates the sensitivity of the 
MC BM REML method to changes in 
gradients from round to round. Further-
more, extra computations are needed for 
standard errors after convergence has been 
reached. 
If each round of iteration in the studied 
Newton-type methods requires many more 
samples than in MC EM REML, the overall 
solving time will only be reduced in cases 
where the Newton method can enhance the 
convergence dramatically. Obtaining a fast 
algorithm for MC based REML estimation 
requires the development of a practical 
convergence criterion for the Newton-type 
methods. Although convergence is the same 
regardless of the number of MC samples, 
MC variation affects the values of the con-
vergence criteria. The criterion presented by 
Booth and Hobert (1999) is based on a 
change in consecutive parameter estimates 
relative to their standard errors whereas the 
criterion by Kuk and Cheng (1997) relies 
on the gradient vector and its standard er-
rors. Both of these may need an enormous 
number of MC samples to work properly. 
Our convergence criterion for the MC 
Newton-type methods is based on relative 
differences in the linear predictions of gra-
MTT SCIENCE 30  29 
dients. This seemed to be a suitable and 
simple approach for large-scale genetic 
evaluations.  
Kuk and Cheng (1997) and Gauderman 
and Navidi (2001) presented an MC NR 
method which was based on the use of MC 
approximation for the observed information 
matrix used by Wei and Tanner (1990). 
However, the estimator of the information 
matrix was often not positive definite. In 
both studies, modified estimator of the 
observed information matrix was presented 
to guarantee positive definiteness. The new 
estimator in Gauderman and Navidi (2001) 
approximates the expected information 
matrix, like the one presented here. It re-
quires only the first derivatives of the log-
likelihood. However, they made no com-
parisons with the MC EM algorithm. In-
stead in an example by Kuk and Cheng 
(1997), their improved MC NR method 
was indeed found to be faster than the MC 
EM algorithm. One interesting approxima-
tion of the information matrix is called the 
empirical Fisher information matrix 
(Meilijson, 1989; Scott, 2002), where the 
idea is to calculate the covariance over indi-
viduals’ gradients instead of the sampled 
gradients. 
4.4 Recommendations for 
further research 
Simulation study about the linearization 
method was done only using the Gompertz 
function with a model, data structure and 
parameters mimicking a pig breeding data 
set. For other studies, like Gompertz growth 
curve in chicken, or perhaps for Brody’s 
function, the linearization method may 
work differently. In all cases the two-step 
iterative algorithm studied here would need 
careful examination with regard to the ini-
tial values and convergence criteria. 
For linear mixed effects models, the MC 
based EM algorithm is superior only for 
very large data sets. Therefore, number of 
MC samples can be chosen to be small, e.g. 
1-3. I would suggest further study to focus 
on detection of convergence instead of in-
creasing the sample size along the iteration. 
In practice, it would be computationally 
efficient to use SAEM or another averaging 
method after some (perhaps loose) conver-
gence is met. Suitable critical values for 
convergence criterion should be studied 
with different kinds of models and data sets 
both for MC EM and MC Newton-type 
REML methods. 
Our results show that the use of MC meth-
ods in different Newton-type methods for 
VC estimation is feasible, although there 
was variation in efficiency between the im-
plementations. However, analysis of our 
small simulated data implies that the num-
ber of MC samples needed for accurate 
estimation depends on the method used. 
This work encourages testing the perform-
ance of the presented methods in analysis of 
large-scale problems. As models grow larger 
and more complex, the efficiency of differ-
ent MC Newton-type methods becomes 
more difficult to predict. Further experience 
is especially needed on the behavior of MC 
BM REML in VC estimation of complex 
models.  
The estimates of the MC Newton-type 
REML analyses presented here were re-
gressed towards corresponding EM REML 
estimates whenever they were outside the 
parameter space. Yet, this does not guaran-
tee convergence to the true solutions, espe-
cially with respect to the MC BM REML 
30  MTT SCIENCE 30 
method. One way to increase the robustness 
of estimation methods is reparameterization 
of the VC matrices by Cholesky decomposi-
tion (Groeneveld, 1994). For some reason 
however, Cholesky decomposition of the 
VC matrix did not work in Wolfinger and 
Lin (1997) for nonlinear mixed effects 
model. Anyway the performance of this 
option is worth considering in future VC 
estimation studies. 
In conclusion, all the methods studied gave 
positive results with respect to efficient VC 
estimation for large scale linear and nonlin-
ear mixed effects models. A recommenda-
tion for further studies about MC EM 
REML for linear mixed effects models is 
related to the assessment of convergence. 
For the MC Newton-type methods and the 
linearization above all more experience 
would be needed with different kinds of 
models and data sets. Therefore modifica-
tions in efficient software are needed in 
order to use MC based Newton-type meth-
ods. The literature considered here presents 
many proposals for stability, like reparame-
terization and PX-EM, which could be 
considered in the future. 
 
 
  
MTT SCIENCE 30  31 
References
Blasco A., Piles M. and Varona L. 2003. A 
Bayesian analysis of the effect of selection 
for growth curves in rabbits. Genetics Se-
lection Evolution, 35, 21–41. 
Booth J.G. and Hobert J.P. 1999. Maximizing 
generalized linear mixed model likelihoods 
with an automated Monte Carlo EM algo-
rithm. Journal of the Royal Statistical So-
ciety: Series B, 61, 265–285. 
Broyden C.G. 1965. A class of methods for 
solving nonlinear simultaneous equations. 
Mathematics of Computation, 19, 577–
593. 
Davidian M. and Giltinan D.M. 1995. Nonline-
ar Models for Repeated Measurement Da-
ta. Chapman & Hall, London.  
Delyon B., Lavielle M. and Moulines E. 1999. 
Convergence of a stochastic approxima-
tion version of the EM algorithm. The An-
nals of Statistics, 27, 94–128. 
Dempster A.P., Laird N.M. and Rubin D.B. 
1977. Maximum likelihood from incom-
plete data via the EM algorithm. Journal of 
the Royal Statistical Society: Series B, 39, 
1–38. 
Dennis J.E. and Moré J.J. 1974. A characteri-
zation of superlinear convergence and its 
application to quasi-Newton methods. 
Mathematics of Computation, 28, 549–
560. 
Foulley J.-L. and van Dyk D.A. 2000. The PX-
EM algorithm for fast stable fitting of Hen-
derson’s mixed model. Genetics Selection 
Evolution, 32, 143–163. 
Gauderman W.J. and Navidi W. 2001. A Mon-
te Carlo Newton-Raphson procedure for 
maximizing complex likelihoods on pedi-
gree data. Computational Statistics & Data 
Analysis, 35, 395–415. 
García-Cortés L.A. and Sorensen D. 2001. 
Alternative implementations of Monte Car-
lo EM algorithms for likelihood inference. 
Genetics Selection Evolution, 33, 443–
452.  
García-Cortés L.A., Moreno C., Varona L. and 
Altarriba J. 1992. Variance component es-
timation by resampling. Journal of Animal 
Breeding and Genetics, 109, 358–363. 
García-Cortés L.A., Moreno C., Varona L. and 
Altarriba J. 1995. Estimation of prediction-
error variances by resampling. Journal of 
Animal Breeding and Genetics, 112, 176–
182. 
Gilmour A.R., Thompson R. and Cullis B.R. 
1995. Average information REML: An effi-
cient algorithm for variance parameter es-
timation in linear mixed models. Biomet-
rics, 51, 1440–1450. 
Groeneveld E. 1994. A reparamete-rization to 
improve numerical optimization in multi-
variate REML (co)variance component es-
timation. Genetics Selection Evolution, 26, 
537–545. 
Guo S.W. and Thompson E.A. 1991. Monte 
Carlo estimation of variance component 
models for large complex pedigrees. IMA 
Journal of Mathematics Applied in Medi-
cine and Biology, 8, 171–189. 
Harville D.A. 1977. Maximum likelihood ap-
proaches to variance component estima-
tion and to related problems. Journal of 
the American Statistical Association, 72, 
320–338. 
Harville D.A. 2004. Making REML computa-
tionally feasible for large data sets: Use of 
Gibbs sampler. Journal of Statistical 
Computation and Simulation, 74, 135–
153. 
Henderson C.R. 1953. Estimation of variance 
and covariance components, Biometrics, 
9, 226–252. 
Henderson C.R. 1984. Estimation of vari-
ances and covariances under multiple trait 
model. Journal of Dairy Science, 67, 
1581–1589. 
Hickey J.M., Veerkamp R.F., Calus M.P.L., 
Mulder H.A. and Thompson R. 2009. Es-
timation of prediction error variances via 
32  MTT SCIENCE 30 
Monte Carlo sampling methods using dif-
ferent formulations of the prediction error 
variances. Genetics Selection Evolution, 
41, 23. 
Hofer A. 1998. Variance component estima-
tion in animal breeding: a review. Journal 
of Animal Breeding and Genetics, 115, 
247–265.  
Jaffrézic F., Meza C., Lavielle M. and Foulley 
J.-L. 2006. Genetic analysis of growth 
curves using the SAEM algorithm. Genet-
ics Selection Evolution, 38, 583–600. 
Jamshidian M. and Jennrich R.I. 1993. Con-
jugate gradient acceleration of the EM al-
gorithm. Journal of the American Statisti-
cal Association, 88, 221–228. 
Jensen J., Mäntysaari E.A., Madsen P. and 
Thompson R. 1997. Residual maximum 
likelihood estimation of (co)variance com-
ponents in multivariate mixed linear mod-
els using average information. Journal of 
the Indian Society of Agricultural Statis-
tics, 49, 215–236. 
Johnson D.L. and Thompson R. 1995. Re-
stricted maximum likelihood estimation of 
variance components for univariate animal 
models using sparse matrix techniques 
and average information. Journal of Dairy 
Science, 78, 449–456. 
Klassen D.J. and Smith S.P. 1990. Animal 
model estimation using simulated REML. 
In: Proceedings 4th World Congress on 
Genetics Applied to Livestock Production 
XIII: 472–475. Edinburgh, 23–27 July 
1990. 
Koivula M., Sevón-Aimonen M.-L., Strandén 
I., Matilainen K., Serenius T., Stalder K.J. 
and Mäntysaari E.A. 2008. Genetic 
(co)variances and breeding value estima-
tion of Gompertz growth curve parameters 
in Finnish Yorkshire boars, gilts and bar-
rows. Journal of Animal Breeding and Ge-
netics, 125, 168–175. 
Kuhn E. and Lavielle M. 2005. Maximum 
likelihood estimation in nonlinear mixed ef-
fects models. Computational Statistics & 
Data Analysis, 49, 1020–1038. 
Kuk A.Y.C. and Cheng Y.W. 1997. The Monte 
Carlo Newton-Raphson algorithm. Journal 
of Statistical Computation and Simulation, 
59, 233–250. 
Lavielle M. and Meza C. 2007 A parameter 
expansion version of the SAEM algorithm. 
Statistics and Computing, 17, 121–130. 
Levine R.A. and Fan J. 2004. An automated 
(Markov chain) Monte Carlo EM algorithm. 
Journal of Statistical Computation & Simu-
lation, 74, 349–360. 
Lidauer M.H., Matilainen K., Mäntysaari E.A. 
and Strandén I. 2011. Technical reference 
guide for MiX99. Release VI/2011. MTT 
Agrifood Research Finland. URL 
http://www.mtt.fi/BGE/Software/MiX9. 
Lindstrom M.J. and Bates D.M. 1990. Nonlin-
ear mixed effects models for repeated 
measures data. Biometrics, 46, 673–687. 
Liu C., Rubin D.B. and Wu Y.N. 1998. Pa-
rameter expansion to accelerate EM: The 
PX-EM algorithm, Biometrika, 85, 755–
770. 
Louis T.A. 1982. Finding the observed infor-
mation matrix when using the EM algo-
rithm. Journal of the Royal Statistical So-
ciety: Series B, 44, 226–233. 
Madsen P. and Jensen J. 2012. A user’s 
guide to DMU. A package for analysing 
multivariate mixed models. Version 6, re-
lease 5.1. University of Aarhus, Denmark. 
McCulloch C.E. 1997. Maximum likelihood 
algorithms for generalized linear mixed 
models. Journal of the American Statisti-
cal Association, 92, 162–170. 
Meilijson I. 1989. A fast improvement to the 
EM algorithm on its own terms. Journal of 
the Royal Statistical Society: Series B, 51, 
127–138. 
Meza C., Jaffrézic F. and Foulley J.-L. 2007. 
REML estimation of variance parameters 
in nonlinear mixed effects models using 
the SAEM algorithm. Biometrical Journal, 
49, 876–888. 
Misztal I. 2008. Reliable computing in estima-
tion of variance components. Journal of 
Animal Breeding and Genetics, 125, 363–
370. 
Mäntysaari E. and Van Vleck L.D. 1989. Re-
stricted maximum likelihood estimates of 
variance components from multitrait sire 
MTT SCIENCE 30  33 
models with large number of fixed effects. 
Journal of Animal Breeding and Genetics, 
106, 409–422. 
Nocedal J. and Wright S.J. 2006. Numerical 
optimization. Springer, New York, 2nd edi-
tion. 
Patterson H.D. and Thompson R. 1971. Re-
covery of inter-block information when 
block sizes are unequal. Biometrika, 58, 
545–554. 
Patterson H.D. and Thompson R. 1974. Max-
imum likelihood estimation of components 
of variance. Proceedings of the 8th inter-
national biometric conference, 197–207. 
Pinheiro J.C. and Bates D.M. 1995. Approxi-
mations to the log-likelihood function in 
the nonlinear mixed-effects model. Journal 
of Computational and Graphical Statistics, 
4, 12–35. 
R Core Team. 2014. R: A language and envi-
ronment for statistical computing. R Foun-
dation for Statistical Computing, Vienna, 
Austria. URL http://www.R-project.org/. 
Rekaya R., Weigel K.A. and Gianola D. 2001. 
Hierarchical nonlinear model for persis-
tency of milk yield in the first three lacta-
tions of Holsteins. Livestock Production 
Science, 68, 181–187. 
Ripatti S., Larsen K. and Palmgren J. 2002. 
Maximum likelihood inference for multivar-
iate frailty models using an automated 
Monte Carlo EM algorithm. Lifetime Data 
Analysis, 8, 349–360. 
Scott W.A. 2002. Maximum likelihood estima-
tion using the empirical Fisher information 
matrix. Journal of Statistical Computation 
and Simulation, 72, 599–611. 
Searle S.R., Casella G. and McCulloch C.E. 
1992. Variance components. John Wiley & 
Sons, Inc., New York. 
Strandén I. and Lidauer M. 1999. Solving 
large mixed linear models using precondi-
tioned conjugate gradient iteration.  Jour-
nal of Dairy Science, 82, 2779–2787. 
Thompson R. 1994. Integrating best linear 
unbiased prediction and maximum likeli-
hood estimation. In: Proceedings 5th 
World Congress on Genetics Applied to 
Livestock Production XVIII: 337–340. 
Guelph, 7–12 August 1994. 
Thompson R., Brotherstone S. and White 
I.M.S. 2005. Estimation of quantitative ge-
netic parameters. Philosophical Transac-
tions of the Royal Society B, 360, 1469–
1477. 
Vonesh E.F. 1996. A note on the use of La-
place’s approximation for nonlinear mixed-
effects models. Biometrika, 83, 447–452. 
Vuori K., Strandén I. and Mäntysaari E.A. 
2006. Solving large scale nonlinear mixed 
models. 8th World Congress on Genetics 
Applied to Livestock Production, Belo 
Horizonte, MG, Brazil. 
Walker S. 1996. An EM algorithm for nonline-
ar random effects models. Biometrics, 52, 
934–944. 
Wei G. and Tanner M. 1990. A Monte Carlo 
implementation of the EM algorithm and 
the poor man’s data augmentation algo-
rithms. Journal of the American Statistical 
Association, 85, 699–704. 
Wellock I.J., Emmans G.C. and Kyriazakis I. 
2004. Describing and predicting potential 
growth in pig. Animal Science, 78, 379–
388. 
Whittemore C.T. and Green D.M. 2002. The 
description of the rate of protein and lipid 
growth in pigs in relation to live weight. 
The Journal of Agricultural Science, 138, 
415–423. 
Whittemore C.T., Tullis J.B. and Emmans 
G.C. 1988. Protein growth in pigs. Animal 
Production, 46, 437–445. 
Wolfinger R.D. 1993. Laplace’s approximation 
for nonlinear mixed models. Biometrika, 
80, 791–795. 
Wolfinger R.D. and Lin X. 1997. Two Taylor-
series approximation methods for nonline-
ar mixed models. Computational Statistics 
& Data Analysis, 25, 465–490. 
Wu L. 2004. Exact and approximate infer-
ences for nonlinear mixed-effects models 
with missing covariates. Journal of the 
American Statistical Association, 99, 700–
709.
 
MTT is publishing its research findings in two series of publications:
MTT Science and MTT Growth.
The MTT Science series includes scientific presentations and abstracts from 
conferences arranged by MTT Agrifood Research Finland.
Doctoral dissertations by MTT research scientists
will also be published in this series.
The topics range from agricultural and food research
to environmental research in the field of agriculture. 
MTT, FI-31600 Jokioinen, Finland.
email julkaisut@mtt.fi
30
Variance component estimation 
exploiting Monte Carlo methods 
and linearization with complex 
models and large data in animal 
breeding
Doctoral Dissertation
Kaarina Matilainen
MTT CREATES VITALITY THROUGH SCIENCE
www.mtt.fi/julkaisut