MiX99
Solving Large Mixed Model Equations
Release XI/2019
Command Language Interface Manual
(CLIM)
Last update: Nov 2019
©Copyright 2019
Command Language Interface Manual (CLIM)
Preface
Development of MiX99 was initiated to allow more sophisticated models in estimation
of breeding values for dairy cattle. In the first versions the emphasis was on computa-
tional efficiency and the target users were experts on genetic evaluations. Therefore
the logic of model definitions were more from an animal breeding perspective. The
foremost application of this software is solving of large-scale genetic and genomic
evaluations for national dairy cattle evaluations. Nevertheless, we have tried to keep
the software as a general tool, where many models can be used. As a result, besides
cattle, MiX99 is used in genetic evaluation of other species like pig, horse, sheep,
goats, fish, foxes, poultry, and for many types of research work.
Disclaimer
MiX99 software is owned by Natural Resources Institute Finland (Luke). When using
this program you agree with the following terms. You are not allowed to distribute,
copy, give or transfer MiX99, neither under the same nor under a different name. Any
decisions based on information given by MiX99 are made at your own responsibility
and risk. Only limited technical support can be provided, but vital questions on its use
can be directed to the authors (mix99@luke.fi). Please report any bugs to the authors.
MiX99 can be referenced by (MiX99 Development Team, 2019). If you would like to
use MiX99, please contact Animal Genetics at Natural Resources Institute Finland1.
MiX99 new (NEW) and development (DEV) features
NEWNew MiX99 features are indicated in the documentation by a colored vertical bar and
note “NEW” on the right margin.
DEVSome of the newest MiX99 features currently in development are not yet available in
the official MiX99 release. These new MiX99 development features are indicated in
the documentation by a colored vertical bar and note “DEV” on the right margin.
Authors
Ismo Strandén
Natural Resources Institute Finland (Luke),
FI-31600 Jokioinen, Finland
firstname.lastname@luke.fi
http://www.luke.fi/mix99
1MiX99 Development Team, Animal Genetics, Natural Resources Institute Finland (Luke), FI-31600
Jokioinen, Finland.
ii
Command Language Interface Manual (CLIM)
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Supported statistical models and beta testing features . . . . . . 1
1.2 Organization of the manual . . . . . . . . . . . . . . . . . . . . . 2
1.3 Invoking CLIM and command line options . . . . . . . . . . . . . 2
1.4 Simple example . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Theory and notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1 Single trait model . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Multiple trait model . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Solving mixed model equations . . . . . . . . . . . . . . . . . . . 6
2.3.1 MiX99 solver: PCG . . . . . . . . . . . . . . . . . . . . 6
2.3.2 Preconditioner . . . . . . . . . . . . . . . . . . . . . . . 6
2.3.3 Iteration on data . . . . . . . . . . . . . . . . . . . . . . 6
2.3.4 Ordering of equations by blocks . . . . . . . . . . . . . 7
3 MiX99 files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.1 Data file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.1.1 Example: Multiple trait data . . . . . . . . . . . . . . . . 8
3.2 Pedigree file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2.1 Example: Pedigree file for the data . . . . . . . . . . . . 8
3.2.2 Phantom parent groups . . . . . . . . . . . . . . . . . . 9
3.2.3 Block code . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3 Variance components file . . . . . . . . . . . . . . . . . . . . . . 10
3.3.1 Example: Variance component file . . . . . . . . . . . . 10
3.3.2 Multiple residual (co)variances . . . . . . . . . . . . . . 11
4 Using the MiX99 solver . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.1 Solution files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5 Single trait models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.1 Naming of model components . . . . . . . . . . . . . . . . . . . 13
5.1.1 Example: Animal model . . . . . . . . . . . . . . . . . . 14
5.1.2 Example: Phantom parent groups in animal model . . . 15
5.1.3 Example: Inbreeding in animal model . . . . . . . . . . 15
5.1.4 Example: Repeatability animal model . . . . . . . . . . 16
5.1.5 Example: Repeatability animal model in detail . . . . . 17
5.1.6 Example: Simple sire model . . . . . . . . . . . . . . . 19
5.1.7 Example: Sire model . . . . . . . . . . . . . . . . . . . 20
5.1.8 Example: Weights in a model . . . . . . . . . . . . . . . 20
5.2 Random regression and nested effects . . . . . . . . . . . . . . 21
5.2.1 Nested regression effects . . . . . . . . . . . . . . . . . 21
5.2.2 Covariable tables . . . . . . . . . . . . . . . . . . . . . 22
iii
Command Language Interface Manual (CLIM)
5.2.3 Example: Random regression model . . . . . . . . . . 22
5.2.4 Example: Covariable table and random regression model 24
5.2.5 Example: Heterogeneous residual variance in test day
model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.3 Maternal effect models . . . . . . . . . . . . . . . . . . . . . . . . 27
5.3.1 Example: Animal model for a maternal trait . . . . . . . 28
6 Multiple trait models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
6.1 All traits have the same effects . . . . . . . . . . . . . . . . . . . 30
6.1.1 Example: Simple multiple trait model . . . . . . . . . . 30
6.2 Traits have different effects . . . . . . . . . . . . . . . . . . . . . 31
6.2.1 Example: Multiple trait model with different effects by trait 31
6.2.2 Example: Different effects by trait using CLIM beta fea-
tures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6.3 Multiple trait random regression model . . . . . . . . . . . . . . . 32
6.3.1 Example: Multiple trait random regression model . . . . 32
6.4 Combining of trait estimates . . . . . . . . . . . . . . . . . . . . . 34
6.4.1 Example: Repeatability model by multiple trait model . 35
6.4.2 Example: Reduced rank random regression model . . . 36
6.4.3 Example: Finnish test-day model . . . . . . . . . . . . . 36
6.5 Multiple trait maternal effects model . . . . . . . . . . . . . . . . 37
7 Genomic data models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
7.1 SNP-BLUP or genomic effect model . . . . . . . . . . . . . . . . 38
7.1.1 Example: simple genomic marker BLUP . . . . . . . . 39
7.1.2 Enhanced formatting of SNP marker information . . . . 40
7.2 Example: simple G-BLUP . . . . . . . . . . . . . . . . . . . . . . 42
7.2.1 Example: G-BLUP with polygenic effect . . . . . . . . . 45
7.3 Example: single-step method . . . . . . . . . . . . . . . . . . . . 48
8 Special topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
8.1 Trait groups for single trait analysis . . . . . . . . . . . . . . . . . 51
8.1.1 Example: Multiple single trait analysis . . . . . . . . . . 51
8.1.2 Example: MACE or Sire model with weights and trait
groups . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
8.2 Deregression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
8.2.1 Example: Single trait deregression . . . . . . . . . . . . 56
8.2.2 Example: Multiple trait deregression . . . . . . . . . . . 58
9 Summary of all commands . . . . . . . . . . . . . . . . . . . . . . . . . 61
9.1 Required commands . . . . . . . . . . . . . . . . . . . . . . . . . 62
9.1.1 DATAFILE . . . . . . . . . . . . . . . . . . . . . . . . . 62
9.1.2 INTEGER . . . . . . . . . . . . . . . . . . . . . . . . . . 62
9.1.3 PARFILE . . . . . . . . . . . . . . . . . . . . . . . . . . 62
9.1.4 PEDFILE . . . . . . . . . . . . . . . . . . . . . . . . . . 63
9.1.5 PEDIGREE . . . . . . . . . . . . . . . . . . . . . . . . . 63
9.1.6 REAL . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
9.2 Optional commands . . . . . . . . . . . . . . . . . . . . . . . . . 64
9.2.1 AR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
9.2.2 DATASORT . . . . . . . . . . . . . . . . . . . . . . . . . 64
9.2.3 IA22FILE . . . . . . . . . . . . . . . . . . . . . . . . . 64
9.2.4 IGFILE . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
iv
Command Language Interface Manual (CLIM)
9.2.5 IHPRECON . . . . . . . . . . . . . . . . . . . . . . . . . 65
9.2.6 INBREEDING . . . . . . . . . . . . . . . . . . . . . . . 65
9.2.7 INBRFILE . . . . . . . . . . . . . . . . . . . . . . . . . 66
9.2.8 NORANSOL . . . . . . . . . . . . . . . . . . . . . . . . . 66
9.2.9 MISSING . . . . . . . . . . . . . . . . . . . . . . . . . . 66
9.2.10 RESTARTSOL . . . . . . . . . . . . . . . . . . . . . . . 66
9.2.11 SCALE . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
9.2.12 PARALLEL . . . . . . . . . . . . . . . . . . . . . . . . . 67
9.2.13 PRECON . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
9.2.14 RANDOM . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
9.2.15 REGFILE . . . . . . . . . . . . . . . . . . . . . . . . . . 68
9.2.16 REGMATRIX . . . . . . . . . . . . . . . . . . . . . . . . 68
9.2.17 REGPARFILE . . . . . . . . . . . . . . . . . . . . . . . 69
9.2.18 RESIDFILE . . . . . . . . . . . . . . . . . . . . . . . . 69
9.2.19 RESIDUAL . . . . . . . . . . . . . . . . . . . . . . . . . 70
9.2.20 TABLEFILE . . . . . . . . . . . . . . . . . . . . . . . . 70
9.2.21 TABLEINDEX . . . . . . . . . . . . . . . . . . . . . . . 70
9.2.22 TAFILE . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
9.2.23 TEFILE . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
9.2.24 TITLE . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
9.2.25 TMPDIR . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
9.2.26 TRAITGROUP . . . . . . . . . . . . . . . . . . . . . . . 71
9.2.27 WITHINBLOCKORDER . . . . . . . . . . . . . . . . . . . 71
9.2.28 DEFINE . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
10 Appendix: Quick reference card . . . . . . . . . . . . . . . . . . . . . . . 74
11 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
v
Command Language Interface Manual (CLIM)
1 Introduction
MiX99 is a software suite for breeding value evaluation. The set has three types of
programs: preprocessor (mix99i), solver (mix99s, mix99p), and reliability calcula-
tor (apax99, apax99p). There are some other programs as well to assist use of these
programs. For instance, imake99 for the parallel computing programs mix99p and
apax99p. The main purpose of this manual is to describe the command language
interface (CLIM) to the preprocessor mix99i. Some use of mix99s is described as
well.
The preprocessing program mix99i of MiX99 has two ways to give instructions:
the original interface, and command language interface. In the original interface
a directive file is given to mix99i. The directive file answers questions on data
and statistical model. The command language interface for MiX99 is called CLIM .
CLIM helps user in use of the MiX99 preprocessing program mix99i. CLIM has the
following advantages over the directive file:
• Commands can be given in any order. Thus, the restriction on the strict order on
giving commands has been lifted.
• Some commands have default values. Thus, not all commands need to be given.
• Commands have English language names. This makes the command instruction
file somewhat easier to read than a directive file.
• All information on the statistical model are given in the same model area, not
divided into several sections as in directive file.
Current version of CLIM assumes that model effects are given in the same order as
explained for MiX99 directive file. Thus, fixed regression effects without nesting are
given first, then fixed classification or nested fixed effects, and then random effect in
order of their random effect number.
1.1 Supported statistical models and beta testing features
The MiX99 software can handle many kinds of statistical models. Many of the models
have been implemented in CLIM, but not all. In general, the following models are in
CLIM:
• linear mixed effect multiple trait model
• random regression (covariable table file)
• reduced rank (combining of traits)
• multiple residual variances
• weights
• least squares models
• large genome information models
There are some specific models, however, that have not been implemented into CLIM.
For example, random regression model with both maternal and additive genetic effects
has not been implemented. Currently not implemented in CLIM:
1
Command Language Interface Manual (CLIM)
• random regression with multiple nesting in additive genetics
• MAS-BLUP (combining of effects)
• non-linear models: threshold and Gompertz models
In beta testing:
• order of effect on the model line is free unlike in a directive file.
1.2 Organization of the manual
This manual is organized in the following way.
• This chapter gives some introductory remarks on use of CLIM.
• The second chapter gives some theoretical background, and how it relates to the
way models are presented in this manual. In addition, some remarks are given
on computational implementation issues in MiX99 needed to understand some
of the commands.
• MiX99 files are briefly described in the third chapter.
• The fourth chapter has brief description of the MiX99 solution files
• The fifth chapter has single trait examples. The section on basic models is re-
quired reading because basic concepts of the CLIM interface are explained.
• The sixth and seventh chapter gives multiple trait models, and some special top-
ics.
• The last chapter has syntax of all commands.
The manual concentrates on how different statistical models are given to MiX99 using
CLIM. However, some options are not much covered. Please study them in the Chap-
ter 9 on summary of all commands. Some of these commands are quite important.
For example, MISSING and SCALE. And, some options can be important when us-
ing the programs, such as TMPDIR, and TITLE. But, some are seldom needed, like
NORANSOL.
1.3 Invoking CLIM and command line options
CLIM is called by the preprocessing program mix99i. CLIM reads a command
file, e.g., mix99.clm, and translates these instructions into a directive file MiX99_-
DIR.DIR to be read by mix99i. CLIM is used by mix99i when an instruction file is
given to it:
mix99i mix99.clm
Note that the old directive files are read from the standard input. Thus, executing
mix99i < mix99.dir
would expect the old mix99i directive file in the file mix99.dir.
It is possible to give some options on the command line of mix99i (see Table 1.1).
When CLIM instruction information is used, the preprocessor program mix99i makes
a file MiX99_DIR.DIR. This file has the old directive file format produced from the
2
Command Language Interface Manual (CLIM)
Table 1.1: Command line options to CLIM, given on the mix99i command line.
Option Effect
-d CLIM is executed, no preprocessing part in mix99i.
File MiX99_DIR.DIR is produced.
-b allows use of beta version feature(s), see list above.
-h help
-l long listing option of mix99i.
--nproc NPROC Number of parallel processes.
--datafile DATAFILE Data file name.
--pedfile PEDFILE Pedigree file name.
--parfile PARFILE Variance file name.
--checkdata Enhanced checking of data file.
CLIM instructions. Thus, if/when CLIM cannot make exactly the model you have in
mind, a similar model may be feasible. Then, by using the ’-d’ option (Table 1.1),
directive file is made, and this directive file can be used as a template:
mix99i -d mix99.clm
In any case, it is useful to check that the CLIM generated directive file is correct.
1.4 Simple example
Here is a simple example just to introduce CLIM. Some notes on the example:
• everything beyond ’#’ sign on a line is ignored and is considered as a comment.
• all command information are on a line which can be continued by a continuation
symbol ’&’.
• the parameter file has the old MiX99 format and assumes the same number-
ing. Because random effects of the given model are animal genetic and random
residual, numbering is 1 for animal genetics and 2 for the residual.
A simple animal model with one fixed effect (mean) and random effect (animal):
DATAFILE simple.dat # Name of data file
INTEGER animal mean # Integer column names in the data file
REAL y # Real number column name in the data file
PEDFILE simple.ped # Name of pedigree file
PEDIGREE animal am # Genetics associated with animal code
# am=animal model
PARFILE simple.var # Name of variance components file
MODEL
y = mean animal # Model
The commands can be shortened from the full command names. In addition, the
command names are not case sensitive, although in this manual all keywords will be
written in capital letters. Thus, for example, the command INTEGER can be written
int. However, all other names are case sensitive, e.g., herd_year name in the
above example. Thus, the integer number column names must be written on the model
lines exactly as they were given for the INTEGER command.
3
Command Language Interface Manual (CLIM)
2 Theory and notation
Here we introduce some notation and theory for an animal model. The mixed model
equations are not described in detail. In particular, differences in concepts such as
animal and sire model, or maternal and paternal effects are not defined. For a clear
presentation on these and many other models in animal breeding with examples, see
Mrode and Thompson (2006). This manual uses examples from this book.
2.1 Single trait model
A simple single trait animal model has the form
y =Xb+Za+ e
where
y is n × 1 vector of observations,
b is p × 1 vector of fixed effects,
X is n × p design matrix to link observations to appropriate fixed effects,
a is q × 1 vector of random additive genetic effects,
Z is n × q design matrix to link observations to appropriate random effects,
e is n × 1 random residual vector.
Hence, there are q animals with n observations, and there are p fixed effect.
In a simple animal model, the matrices X and Z are incidence matrices. In other
words, these matrices have zeros and ones to indicate which effect corresponds to
which observation. However, if model has regression coefficients, these matrices have
regression coefficients. Thus, we call these matrices design matrices to indicate
wider model possibilities.
In MiX99 it is possible to have both regression and classification variables (categories).
Classification variables will be sometimes called class effects. For example, herd effect
is a typical class effect, an observation belongs only to one herd, and observations
with the same herd effect are predicted by the same estimate. Regression effects
are not classification effects. For example, linear function has the linear coefficient in
the design matrix. However, regression effect can be nested within classification. In
practice, difference between regression and classification effects in MiX99 is that a
class effect number is in integer number input column, but regression coefficient is in
the real number input column of the data file.
Common linear mixed effects assumptions for the expectations are
E(a) = 0 Var(a) = Aσ2a
E(e) = 0 Var(e) = Iσ2e
E(y) = Xb Cov(a, e) = 0
where A is numerator relationship matrix.
For convenience of presentation, it is common to denote the residual covariance matrix
by R. Also, it is common to denote the genetic covariance matrix by G. With this
4
Command Language Interface Manual (CLIM)
notation, the mixed model equations to solve are[
X ′R−1X X ′R−1Z
Z ′R−1X Z ′R−1Z +G−1
] [
b̂
â
]
=
[
X ′R−1y
Z ′R−1y
]
where ′ denotes transpose.
The model can be described by giving its effects. For example, if the above model had
fixed herd effect (herd), and animal effect (a) for the additive genetic effects, then it can
be written as
y = herd+ a+ e
where e is the residual term. This can be considered as model for one individual
record, although subscripting to indicate this was not used.
2.2 Multiple trait model
The single trait model can be used to describe multiple trait model as well:
y =Xb+Za+ e
However, for T traits the matrices and vectors have traitwise structure. Thus, we can
write
y′ = [ y′1 y
′
2 · · · y′T ]
e′ = [ e′1 e
′
2 · · · e′T ]
b =

b1
b2
...
bT
 , X =

X1 0 · · · 0
0 X2 · · · 0
...
... . . .
...
0 0 · · · XT

a =

a1
a2
...
aT
 , Z =

Z1 0 · · · 0
0 Z2 · · · 0
...
... . . .
...
0 0 · · · ZT

where the vectors and matrices have the same meaning as before and subscripts
denote for appropriate trait.
Multiple trait linear mixed effects assumptions are
E(a) = 0 Var(a) = G0 ⊗A
E(e) = 0 Var(e) = R0 ⊗ I
E(y) = Xb Cov(a, e) = 0
where matrix G0 is T by T genetic covariance matrix, and R0 is T by T residual co-
variance matrix. The mixed model equations are[
X ′R−1X X ′R−1Z
Z ′R−1X Z ′R−1Z +G−10 ⊗A−1
] [
b̂
â
]
=
[
X ′R−1y
Z ′R−1y
]
5
Command Language Interface Manual (CLIM)
2.3 Solving mixed model equations
MiX99 solves mixed model equations using preconditioned conjugate gradient (PCG)
iteration. The following need to be considered when using the program:
• iterative method
• iteration on data
• ordering of equations by blocks
• preconditioner matrix
2.3.1 MiX99 solver: PCG
PCG is an iterative method to solve linear models. Thus, the coefficient matrix is
not inverted or decomposed. In practice, iterative methods will solve the system of
equations by updating the latest solutions by some procedure.
An iterative solving method continues updating solutions until convergence is deter-
mined. Convergence criteria value is given before starting iteration. Often this value is
a small positive number. In addition, maximum number of iterations is set in order to
ensure termination of the iterative method. It has been proved that number of iterations
PCG needs is at most number of unknowns. This has no practical meaning for large
problems, but for small problems it is good to know that this limit is used in MiX99.
It is not possible to give convergence criteria or maximum number of iterations by
CLIM. This information is given to the solver program mix99s or mix99p, not the
preprocessor (see Chapter 4).
2.3.2 Preconditioner
PCG iteration is a flexible method. The CG or conjugate gradient part of the algo-
rithm is the same (with small variations) but the preconditioner part depends on imple-
mentation. Preconditioning usually means transforming the coefficient matrix to be as
diagonal as possible. This is done by approximating inverse of the coefficient matrix.
If the approximation is exact, PCG finds the correct solution in one step. Usually the
approximation is not exact.
MiX99 has different preconditioners available. The reason for different preconditioners
is memory and time. For very large systems of equations, it is not possible to use the
most memory intensive preconditioners implemented. In addition, when only reliabili-
ties are calculated, no preconditioner is needed. Making the preconditioner may take
as much as half of the computing time in the preprocessing program.
2.3.3 Iteration on data
Iteration on data (IOD) means that MiX99 does not make or store the coefficient matrix
of the mixed model equations to memory. PCG method needs the coefficient matrix
times a vector product . This can be made using the model matrices X and Z,
pedigree list, and the variance component information. In practice, IOD means that
reliabilities must be calculated by a separate program because coefficient matrix is
never made explicitly.
6
Command Language Interface Manual (CLIM)
2.3.4 Ordering of equations by blocks
We described mixed model equations in a way that is typical in the literature. Equations
are ordered by effect. This is convenient when presenting mixed model equations or
theory. But, this is not always computationally optimal. It is better to order equations of
animal and its herd close because IOD proceeds one records at a time with a record
having animal and herd related classification information. In MiX99, equations can be
ordered by common family blocks, e.g., herd. For more information see Chapter 3.2.3.
MiX99 orders equations in a different way than by effect even when block code has not
been used to order equations. In multiple trait model, the different trait equations for an
animal are always ordered next to each other. This will ensure efficient performance
of the solver.
CLIM has command WITHINBLOCKORDER (Chapter 9.2.27). It can be used to indicate
which effects are within block equations. For example, if herd is block code, then it is
natural that animal effects (such as animal genetics and permanent environment), and
herd contemporary effects (such as herd-year-season) are within block. This is not
very important when solving the mixed model equations using mix99s. However, the
parallel computing implementation (mix99p) depends on good block ordering. When
reliabilities are estimated by ApaX (apax99 or apax99p), block information is used to
determine level of approximation. Only effects within blocks are considered in reliability
calculations.
3 MiX99 files
Information on data, pedigree, and variance components for MiX99 are in files. Ani-
mals in the data and pedigree files must be in the same order. In other words, when
data records have been sorted by animal id (or sire id for sire models), then individuals
must be in the same order in the pedigree file. The pedigree can be ordered using
RelaX2 program, a separate program for pedigree analysis from Luke (Strandén and
Vuori, 2006).
The MiX99 input files have a certain quite simple format. In the following, formats of
these files are described shortly. For a more complete explanation, see MiX99 pre-
processor manual Technical reference guide for MiX99 pre-processor .
3.1 Data file
The data file has the observed data to be analyzed. This means observations, and
model effect information such as of classification effect numbers and regression coef-
ficients. The default format is ’text’ which means text format data file where columns
are separated by space. A rarely used alternative is binary format data file.
Each record, i.e., line in a free format file, has two parts:
• Integer numbers The integer number data part consists of positive integer num-
bers for all class variables in the model. In addition, it may contain sorting vari-
ables and indices such as index for heterogeneous residual variance.
• Real numbers These are observations, regression coefficients, and weights.
7
Command Language Interface Manual (CLIM)
The data file can have columns that are not used in a particular run of MiX99. Because
MiX99 accepts only numerical data, alphanumeric data are allowed on the record only
after the real number columns in a free format text data file.
All integer numbers are coded using the default machine integer type. Hence, on 32-bit
platforms the data file, integer numbers must be positive and at most 2.147.483.648.
Missing integer numbers must be coded by number zero (0). Missing real numbers
can be coded with an arbitrary real value which is specified to MiX99 by command
MISSING.
3.1.1 Example: Multiple trait data
Consider a two trait data. The file has six columns: 4 integer number columns, and 2
real number columns. Note that the real number columns have integer values. Still,
for MiX99 these are real number columns because observations can have any real
values. The file named example.dat is:
animal1 sire2 herd×year3 ones4 trait 11 trait 22
4 1 1 1 90 200
6 3 1 1 110 190
8 5 2 1 120 140
9 5 2 1 130 120
10 7 2 1 120 130
This data file can be described with the following CLIM commands:
DATAFILE example.dat # Name of the data file
INTEGER animal sire season ones # Integer column names
REAL tr1 tr2 # Real column names
3.2 Pedigree file
All pedigree information is given in pedigree file. Each animal in the pedigree must
have a record in the pedigree file with four integers of which the forth integer is optional.
Columns of the pedigree file are
1 2 3 4
animal code sire code dam code block code
(or maternal grand sire code in
case of a sire model)
(optional)
The integers must be separated by at least one space.
When block code is given, the pedigree and observation files need to have the same
order by blovk. The main sort key is block code (e.g., herd) within which sorting is by
animal code. When animal has observations in several data file blocks, in pedigree file
the animal must appear only in one of the blocks. This special case will be considered
in a separate section on parallel computing.
3.2.1 Example: Pedigree file for the data
Let pedigree file for the multiple trait model data in example Chapter 3.1.1 be
8
Command Language Interface Manual (CLIM)
animal1 sire2 dam3
1 0 0
2 0 0
3 1 2
4 1 2
5 3 4
6 3 4
7 5 6
8 5 6
9 5 6
10 7 8
3.2.2 Phantom parent groups
Missing parents can be replaced by phantom parent groups. Then, a phantom par-
ent group code is in place of missing parent. This group code must be a negative
integer number in order to distinguish it from an animal code. A phantom parent group
code must not have an own record in the pedigree file.
For example, the pedigree above (Chapter 3.2.1) had two animals (1 and 2) that had
unknown parents. The parents could be phantom parent groups (-1 for unknown sire,
-2 for unknown dam). The pedigree file remains the same for all other animals. The
changed part of the pedigree file is:
animal1 sire2 dam3
1 −1 −2
2 −1 −2
3.2.3 Block code
There are some benefits from having a block code in the data and pedigree files.
Block code is essential in calculation of reliabilities by ApaX, and when parallel com-
puting is used by mix99p. Otherwise, block code brings very little benefit to computa-
tions, and can be omitted.
Block code of an animal is given on column 4 in the pedigree file. When the pedigree
file has a block code column, then every animal must have a block code. In addition,
the block code needs to be the same in the data file as well. Animals with records
in different data blocks (e.g. in different herds) have to be coded with the code of
one of the different data blocks where it has observations, e.g., block with most of its
observations.
If animal does not have an observation, but is parent to an animal having observations
in the data file (e.g. pedigree animal of a particular herd), then it is best that parent
without observation and its offspring have the same block code. In dairy cattle, this is
most suitable for a cow without observations. It should be assigned to a block having
most of its daughters.
When an animal does not belong to any equation family (no observations to give block
code), or it is in many different families through relationship information (e.g. dairy sires
have progeny in many herds), an extra block code should be given. We recommend
a separate block code for animals with links to many different equation family blocks.
9
Command Language Interface Manual (CLIM)
For example, sires in a dairy cattle population can be assigned to one group. Note
that a group should never be too large. It is advisable to split a large block into several
smaller blocks. The solver program reads as many animal blocks at a time as possible,
and the largest animal block dictates memory requirements.
3.3 Variance components file
The variance components file has variances and covariances for all the random ef-
fects in the model. The matrices can be of different size depending on the model
specification. The matrices are numbered in the same order as in the RANDOM com-
mand. However, no random command is needed when the only random effects in the
model are animal genetics and residual effects. Residual effect has always the highest
random effect number, and the additive genetic effect the second highest number.
The variance components file has a line for each (co)variance. Each line has 3 integers
followed by a real number (the (co)variance value). The first integer is the random ef-
fect number followed by the row-column combination, and, finally, the (co)variance pa-
rameter. The row-column combination refers to the element position in the (co)variance
matrix. Only the lower (or upper) triangle of the matrix needs to be given.
Order of lines in the file is irrelevant. It is easy to know the random effect number. Cor-
rect numbering of (co)variances, i.e., the row-column number, can be more difficult.
For example, the row-column numbers have to be checked carefully when random re-
gression effects are missing in a multiple trait random regression model. See examples
on multiple trait random regression effects for better explanation (Chapter 5.2).
3.3.1 Example: Variance component file
We illustrate variance components file for the multiple trait model data in example
Chapters 3.1.1 and 3.2.1. Let the genetic and residual (co)variance matrices be
G =
[
3.0 2.5
2.5 2.5
]
and
R =
[
7.0 2.0
2.0 7.0
]
,
respectively. Genetic correlation between the traits is about 91%, and residual corre-
lation about 29%. Heritability of the first trait is 30%, and the second trait about 26%.
The parameter file is
Random effect1 Row2 Column3 Covariance1
1 1 1 3.0
1 1 2 2.5
1 2 2 2.5
2 1 1 7.0
2 1 2 2.0
2 2 2 7.0
10
Command Language Interface Manual (CLIM)
3.3.2 Multiple residual (co)variances
When multiple residual (co)variances are present, an additional residual (co)variance
file has to be given. Format of this file is similar to the regular (co)variance file ex-
plained above. However, the first number on each line is not the random effect number
but number of the residual variance class. Numbering of the residual (co)variance
classes has to start from one (1), up to total number of residual (co)variance classes.
Each observation has its residual (co)variance class number in the INTEGER column
fields of the data file. Note that a residual (co)variance (matrix) has to be given in the
variance components file. Values of residual (co)variance in the variance components
file are ignored by the solver program (mix99s/mix99p) but used by the reliability
calculation program (apax99/apax99p).
4 Using the MiX99 solver
The MiX99 solver (mix99s/mix99p) assumes user will give some instructions on
some aspects of the iteration method, output files produced, and possible special com-
puting to be made (see Chapter 8 on special topics). The instructions can be given
either from the standard input or using the command line options.
In this manual, usually the command line option method has been used. This method
is possible only when calculating breeding values. The easiest way to execute solver is
to give option -s which uses default values in solving breeding values, and produces
standard output files. Thus, you give mix99s -s. Examples in this manual have been
produced with this option, if not otherwise mentioned.
The other command line options are
• -n or -N for number of iterations
• -ca or -Ca for Ca convergence criteria
• -cd or -Cd for Cd convergence criteria
• -cr or -Cr for Cr convergence criteria
• NEW-cm or -Cm for Cm convergence criteria
For example, giving mix99s -n 100 -cr 1e-8 would limit maximum number of
iterations to 100, and the Cr convergence criteria value to 10−8.
Instructions can be given to the solver in the standard input. This allows much wider
set of options and methods than available in the command line options. Note, however,
that giving command line options will by default lead to not reading the standard input.
It is sometimes more convenient to have the instructions in a file than type them every
time to the program. This can be achieved by reading them from the standard input,
e.g., mix99s < solver_option_file.slv. Again, giving command line options
will by default lead to not reading the file. Thus, giving
mix99s -n 100 -cr 1e-8 < solver_option_file.slv
will not read commands from the file solver_option_file.slv but proceed with
the command line options only.
11
Command Language Interface Manual (CLIM)
By specifying command line option -i the solver option file (or standard input) is read
first AND options from the command line override the corresponding solver op-
tion file values:
mix99s -i -n 100 -cr 1e-8 < solver_option_file.slv
An example of instruction file for breeding value evaluation is
H # RAM: RAM demand: L=large (mix99p only), H=high, M=medium, L=low
# Max. no. iter., Convergence_criterion, Criterion (A/R/M/D)
2000 1.0e-8 R F
N # RESID: Calculate residuals? (Y/N)
N # VALID: N=no
N # VAROPT: adjust for HV? (N)o
Y # SOLTYP: Solution files? (N)o, (Y)es
The first letter H requests high memory version which is usually used. The medium
and low memory versions are rarely used because even the high memory version uses
memory efficiently.
The most important line is the second line where PCG iteration information is given:
• number of iterations in the PCG method is limited to 2000 iterations
• convergence value is set to be 10−8
• convergence criteria is set to ”R”
• the above values are ”F”orced to be used.
If ”F” is not given then default values are used. Default values are
• limit to 5000 iterations in the PCG method
• convergence value is 10−4
• convergence criteria is ”D”
The three options after the PCG information are not that important for typical breeding
value evaluation, and their values are ”n” for no. The chapter on special topics (Chap-
ter 8) will consider some of these options. The last ”Y” is important. If the last letter is
”N” then instead only binary format solution file is produced.
4.1 Solution files
The solver will write solution files depending on the model. Different types of solu-
tions are written to different files. Different kinds of solution files are:
• Solani: Solutions for animal effects. (Sol_mn in case of a LS-model.)
• Solfix: Solutions for all across blocks fixed effects.
• Solfnn: Solutions for the nth within block fixed effect. For instance, Solf02 is
the solution file for the second within block fixed effect.
• Solrnn: Solutions for the nth random effect in the model. For example, Solr03
is the solution file for the random effects with the random effect number 3.
• Solreg: Solutions for the regression effects of the first regression effect group
(applied across all observations).
12
Command Language Interface Manual (CLIM)
Structure of the text solution files depends on the model. General form of a particular
solution file is the same. However, number of columns in a Solani file depends on
the number of traits. Therefore, detailed explanation of the content of those files is
given in the printout of the particular run of the program.
Below are descriptions of the two most common solution file formats. The other so-
lution files have similar formats. Please check solver output for explanation. In this
manual column titles are given for Solani although they are not present in the files.
In general, the Solani file has the following columns:
1) Animal ID
2) Number of offspring
3) Number of observations
4) Solution for trait 1
5) Solution for trait 2
6) . . .
When there are random regression effects, solutions are in the numbering order of the
random regression effects.
In general, the Solfix file has the following columns
1) Factor number
2) Trait number
3) Level code
4) Number of observations
5) Solution
6) Name of factor (integer number column)
7) Name of trait
5 Single trait models
5.1 Naming of model components
Data file has integer and real number columns. Columns are given names by INTEGER
and REAL commands. The names can have any alphanumeric characters, i.e., letters
and numbers. Many other characters such as underscore (_) are allowed as well.
However, there are some reserved characters not allowed in names: =, (, ), @, |, !,
<, & and #. These characters have special meaning. For example, # starts comment,
and & marks for line continuation. Others are model component separators, and will
be discussed in due course in this manual.
Statistical model has effects. The data column names can be used as effect names.
If a data column name is an integer number column name, then it is assumed to be
an effect with classes. If a data column name is a real number column name, then it
13
Command Language Interface Manual (CLIM)
is assumed to be a regression effect. CLIM expects that all effect names are different
on a model line. When some model effects refer to the same data column, component
names can be used. See repeatability model example (Chapters 5.1.4 and 5.1.5).
5.1.1 Example: Animal model
We consider a simple animal model
tr1 = herd× year + a+ e
where
herd× year is fixed herd times year interaction effect,
a is random additive genetic effect, and
e is random residual.
CLIM (nor MiX99) does not make multiplication operations between effects in the
model line. Thus, the herd × year interaction has to be coded in the data as a class
effect.
Complete CLIM instruction file is (named amodel.clm)
DATAFILE example.dat
INTEGER animal sire herd_year ones
REAL tr1 tr2
PEDFILE AM.ped # Pedigree file
PEDIGREE animal am # Genetics associated with animal code
# am=animal model
DATASORT PEDIGREECODE=animal
PARFILE AM.var
MODEL
tr1 = herd_year animal
The example.dat is the same as given earlier (Chapter 3.1.1). The AM.ped is the
same as given earlier (Chapter 3.2.1). The variance components file (AM.var) is for
the first trait:
Random effect1 Row2 Column3 Variance1
1 1 1 3.0
2 1 1 7.0
First the preprocessor is executed: mix99i amodel.clm. Next the solver is exe-
cuted: mix99s -s. The solver will produce solution files Solfix having fixed effects,
and Solani having the breeding values. The Solfix file is
Fact. Trt Level N-Obs Solution Factor Trait
1 1 1 2 99.538 herd_yea tr1
1 1 2 3 122.69 herd_yea tr1
The Solani file is (column names have been added)
14
Command Language Interface Manual (CLIM)
Animal N-Desc N-Obs Solution
1 2 0 -.18406E-14
2 2 0 -.18406E-14
3 2 0 0.92308
4 2 1 -.92308
5 3 0 -.37713E-14
6 3 1 1.8462
7 1 0 0.65421
8 1 1 0.64447E-01
9 0 1 2.0506
10 0 1 -.17840
Solutions may differ somewhat due to computing precision when the example is tested
in another computer. For instance, the solutions close to zero are likely to be different
(breeding values for animals 1, 2, and 5).
5.1.2 Example: Phantom parent groups in animal model
Phantom parent groups are as easy to give in CLIM as in the MiX99 directive method.
Phantom parent groups are signaled in the pedigree file by having them as negative
numbers. In the CLIM file, a model with phantom parent groups is used when the
PEDIGREE command has ’+p’ for phantom parent groups. For example, the previous
CLIM instruction file needs only one change:
PEDIGREE animal am+p
in order to have phantom parent groups. Note that the no space is allowed between
the characters in am+p.
Notes:
• if the pedigree has negative parent numbers, and the model instruction file does
not have ’+p’ then all negative parent numbers are considered to indicate an
unknown parent, and are effectively same as zero.
• if ’+p’ was given but an animal has a zero (0) parent (instead of negative number),
MiX99 assigns this parent to genetic group -99999999.
5.1.3 Example: Inbreeding in animal model
Relationship matrix in MiX99 does not account for non-zero inbreeding coefficients by
default. However, it is possible to use precalculated inbreeding coefficients in the ad-
ditive relationship matrix. MiX99 does not calculate inbreeding coefficients, a separate
program such as RelaX2 (Strandén and Vuori, 2006) needs to be used.
Consider the example for animal model (Chapter 5.1.1). Inbreeding coefficients calcu-
lated using RelaX2 are (file AM.inbr):
1 1 0.00000
2 2 0.00000
3 3 0.00000
4 4 0.00000
5 5 0.25000
6 6 0.25000
7 7 0.37500
8 8 0.37500
9 9 0.37500
10 10 0.50000
15
Command Language Interface Manual (CLIM)
In order to read this file, two additional lines are needed in the CLIM code:
INBRFILE AM.inbr
INBREEDING PEDIGREECODE=1 FINBR=3
The solutions will be slightly different. Fixed effect solutions in the Solfix file are
Fact. Trt Level N-Obs Solution Factor Trait
1 1 1 2 99.538 herd_yea tr1
1 1 2 3 122.61 herd_yea tr1
Breeding values in the Solani file are
Animal N-Desc N-Obs Solution
1 2 0 -.13834E-13
2 2 0 -.13834E-13
3 2 0 0.92308
4 2 1 -.92308
5 3 0 -.12953E-13
6 3 1 1.8462
7 1 0 0.70457
8 1 1 0.24590
9 0 1 1.8188
10 0 1 0.11106
5.1.4 Example: Repeatability animal model
Repeatability animal model has usually two effects with the same incidence matrix but
different covariance structure. Hence, in the model line the same integer number col-
umn in data file is referred by two different effects: permanent environment and direct
genetic. However, the same name cannot appear twice on the model line. Because of
this, component names can be used. Component names are user defined (renamed)
names of one or more components in the model line. Basically, any classification effect
can be renamed. For example, there is a column id but we want to rename this effect
to be animal. This is achieved by giving animal(id) on the model line.
Consider a repeatability animal model
y = herd× year + p+ a+ e
where herd × year is fixed herd times year interaction effect, p is random permanent
environment effect, a is additive genetic effect, and e is random residual. Both p and
a have the same design matrix relating observations to animals. However, they have
different covariance structures. The usual repeatability model assumptions are
E(p) = 0 Var(p) = Iσ2p
E(a) = 0 Var(a) = Aσ2a
E(e) = 0 Var(e) = Iσ2e
where σ2p is permanent environment variance, σ2a is additive genetic variance, and σ2e
is residual variance.
The model has two random effects which refer to the same class name, named animal,
that is present in the data file. The following model statement is unacceptable (note
that only commands relevant to the model line are given):
INTEGER animal sire herd_year ones
REAL tr12
PEDIGREE animal am
16
Command Language Interface Manual (CLIM)
MODEL
tr12 = herd_year animal animal
because animal appears twice as an effect.
An alternative would be to have two identical columns with animal id number in both of
them. Thus, our model line would be
INTEGER animal pe_animal sire herd_year ones
REAL tr12
PEDIGREE animal am # animal for animal genetic
RANDOM pe_animal # permanent environment
MODEL
tr12 = herd_year pe_animal animal
This model is correct. However, now the data file is larger, and it is necessary to
remember to make two columns having the same content. Instead, component name
can be used to name model effects.
The preferred way to give repeatability model in CLIM is to refer an effect by a compo-
nent name. In the following, name ’G’ was given to the animal genetic effect.
INTEGER animal sire herd_year ones
REAL tr12
PEDIGREE G am # G for animal genetics
RANDOM animal # permanent environment
MODEL
tr12 = herd_year animal G(animal)
This is just one way to give repeatability model. Two alternatives are (all other but the
changed commands are given):
1:
PEDIGREE G am # G for animal genetics
RANDOM PE # PE for permanent environment
MODEL
tr12 = herd_year PE(animal) G(animal)
2:
PEDIGREE animal am # animal for animal genetics
RANDOM PE # PE for permanent environment
MODEL
tr12 = herd_year PE(animal) animal
All these versions will produce the same instructions for mix99i. Note that component
names can be given to fixed effects as well. See the next chapter.
5.1.5 Example: Repeatability animal model in detail
We use the multiple trait model data already presented (Chapter 3.1.1) but modify it
for repeatability model. The model is
tr12 = herd× year + p+ a+ e
where herd×year is fixed herd-year effect, p is random permanent environment effect,
a is random additive genetic effect, and e is random residual.
17
Command Language Interface Manual (CLIM)
Variance components are: permanent environment σ2p = 2.0, genetic σ2a = 3.0, and
residual σ2e = 5.0. The parameter file RM.var is
Random effect1 Row2 Column3 Covariance1 Comment
1 1 1 2.0 permanent environment
2 1 1 3.0 additive genetic
3 1 1 5.0 residual variance
The multiple trait model data file is modified for the repeatability model:
animal1 sire2 herd×year3 ones4 tr121
4 1 11 1 90
4 1 21 1 200
6 3 11 1 110
6 3 21 1 190
8 5 12 1 120
8 5 22 1 140
9 5 12 1 130
9 5 22 1 120
10 7 12 1 120
10 7 22 1 130
DATAFILE example_repeat.dat
INTEGER animal sire herd_year ones
REAL tr12
DATASORT PEDIGREECODE=animal
PEDFILE AM.ped
PEDIGREE G am # G for animal genetics
RANDOM PE # PE for permanent environment
PARFILE rep.var
MODEL
tr12 = herd_year PE(animal) G(animal)
The fixed effect solutions (Solfix) are
Fact. Trt Level N-Obs Solution Factor Trait
1 1 11 2 99.833 herd_yea tr12
1 1 12 3 123.01 herd_yea tr12
1 1 21 2 194.83 herd_yea tr12
1 1 22 3 129.68 herd_yea tr12
Permanent environment solutions (Solr01) are
Animal N-Obs Solution
4 2 -.88889
6 2 0.88889
8 2 1.1867
9 2 -.55846
10 2 -.62828
The breeding values estimates (Solani) are
Animal N-Desc N-Obs Solution
1 2 0 -.58526E-06
2 2 0 -.58526E-06
3 2 0 0.33333
18
Command Language Interface Manual (CLIM)
4 2 2 -.33333
5 3 0 -.81159E-06
6 3 2 0.66667
7 1 0 0.97735E-01
8 1 2 0.98778
9 0 2 -.85516E-01
10 0 2 0.71556E-01
5.1.6 Example: Simple sire model
We consider a simple sire model using the data introduced for animal model (Chap-
ter 5.1.1). The sire model is
tr1 = herd× year + s+ e
where
herd× year is fixed herd-year effect,
s is random sire effect, and
e is random residual.
In sire model, records are associated with sire of the animal having record. The data
file is the same as used for animal model example.dat in Chapter 3.1.1.
In this simple sire model, all sires are assumed to be unrelated. Thus, the pedigree
file (SM.ped) is
animal1 sire2 maternal grand sire3
1 0 0
3 0 0
5 0 0
7 0 0
Variance components file needs to be changed from the animal model to sire model.
Sire genetics make only quarter of the additive genetics. Thus, the variance compo-
nents file (SM.var) is
Random effect1 Row2 Column3 Variance1
1 1 1 0.75
2 1 1 9.25
CLIM code for the sire model is
DATAFILE example.dat
INTEGER animal sire herd_year ones
REAL tr1 tr2
PEDFILE SM.ped
PEDIGREE G sm
PARFILE SM.var
MODEL
tr1 = herd_year G(sire)
The fixed effect solutions (Solfix) are
19
Command Language Interface Manual (CLIM)
Fact. Trt Level N-Obs Solution Factor Trait
1 1 1 2 100.00 herd_yea tr1
1 1 2 3 123.25 herd_yea tr1
The breeding values estimates (Solani) are
Bull N-Desc N-Obs Solution
1 0 1 -.75000
3 0 1 0.75000
5 0 2 0.24390
7 0 1 -.24390
5.1.7 Example: Sire model
The previous sire model example can be analyzed by a sire model where a sire ma-
ternal grand sire relationship matrix is used. The command file does not change, but
the pedigree file is different.
The pedigree file (smgms.ped) is
animal1 sire2 maternal grand sire3
1 0 0
3 1 0
5 3 1
7 5 3
As before the solver will produce solution file Solfix having fixed effects
Fact. Trt Level N-Obs Solution Factor Trait
1 1 1 2 99.976 herd_yea tr1
1 1 2 3 123.19 herd_yea tr1
The Solani file has breeding values for the sires is
Bull N-Desc N-Obs Solution
1 2 1 -.35736
3 2 1 0.40621
5 1 2 0.20334
7 0 1 0.24076E-01
5.1.8 Example: Weights in a model
Weight can be used to indicate that an observation is actually mean from many records.
Weight is a real number in the data file. It is indicated in CLIM to be weight by WEIGHT
option in the model line. Model options for a trait come after ”!” sign. For example,
when column ’weight’ has weights then option is ’!WEIGHT=weight’.
Consider the previous chapter sire model example again but with weights. The data
file (example_w.dat) is now:
animal1 sire2 herd×year3 ones4 trait 11 trait 22 weight3
4 1 1 1 90 200 50
6 3 1 1 110 190 100
8 5 2 1 120 140 60
9 5 2 1 130 120 20
10 7 2 1 120 130 30
CLIM code for weighted sire model is
20
Command Language Interface Manual (CLIM)
DATAFILE example_w.dat
INTEGER animal sire herd_year ones
REAL tr1 tr2 weight
PEDFILE smgs.ped
PEDIGREE G sm
PARFILE SM.var
MODEL
tr1 = herd_year G(sire) ! WEIGHT=weight
Fixed effect solutions (Solfix) are
Fact. Trt Level N-Obs Solution Factor Trait
1 1 1 2 100.68 herd_yea tr1
1 1 2 3 119.33 herd_yea tr1
Breeding value estimates (Solani) are
Bull N-Desc N-Obs Solution
1 2 1 -7.0567
3 2 1 7.5018
5 1 2 2.8028
7 0 1 1.6447
5.2 Random regression and nested effects
Random regression models have regression effects nested within a classification ef-
fect. Typical random regression models are test-day models where lactation curve is
fitted for test-day observations.
Class effect has an integer number column name or a component name (Chapter 5.1).
In the following, we extend the use of component name as a class effect. Earlier we
renamed a class effect such as the additive genetic effect G(animal) and G(sire).
The same notation will be used for nested regression effects.
5.2.1 Nested regression effects
A regression effect can be nested within a class. This is similar to the component
name concept introduced for repeatability model. However, we can go even further
and combine several effects to a component with the same nesting class. For example,
assume a fixed lactation curve is nested within season. The model could be
y = fixed_curve(1 linear quadratic cubic | season) animal
where season and animal are integer number column names in the data file, and
linear, quadratic, and cubic are real number column names in the data file.
The number 1 above means intercept term, i.e., season effect, in this example. The
’fixed_curve’ is a component name with the common nesting class season applied
to all its regression effects.
The intercept in the fixed_curve can be moved to be a separate class effect. Thus,
the above model line can be written also as
y = season fixed_curve(linear quadratic cubic | season) animal
This moving of season effect to be a separate effect is fine for fixed effects. However,
for a random effect this cannot be always done because component names are used
21
Command Language Interface Manual (CLIM)
to distinguish correlated random regression effects. See below examples for random
regression models.
5.2.2 Covariable tables
Regression models in dairy cattle are usually so called test-day models where re-
peated observations of a cow are modeled during lactation. The regression effects are
functions of days in milk which can have only certain values, e.g., integer values from 1
(one) to 350. Because dairy cattle data sets can be very large, MiX99 allows reducing
data file size by using days in milk as an index to a regression coefficient table.
Assume the same fixed effects regression curve as given above. A covariable table
file can be given by command TABLEFILE. The data file must have an integer index
column such as days in milk (DIM) which is used to indicate regression function coeffi-
cients in the table. In our example, the file will have five columns: DIM , intercept (just
ones), linear (equals to DIM ), quadratic (DIM2), and cubic (DIM3). The model line
can now be written as
y = fixed_curve(t1 t2 t3 t4 | season) animal
where t1, t2, t3, t4 refer to columns two, three, four, five, respectively, in the coefficient
table file. In the covariable table file, column one has the table index. Data file has
an integer number column that has the index to pick the correct line in the coefficient
table file. The input column in the data file is indicated by command TABLEINDEX.
See example in Chapter 5.2.4.
5.2.3 Example: Random regression model
We consider a single trait random regression animal model. The example is from
Schaeffer and Dekkers (1994).
Cows have repeated observations of milk yield. The model is
milk = DIM + log(305/DIM) +HTD + f(a,DIM) + e
where
milk is milk yield observation,
DIM is fixed days in milk linear regression effect,
log(305/DIM) is fixed logarithm of days in milk regression effect,
HTD is fixed herd test-day effect,
f(a,DIM) is random additive genetic regression function, and
e is random residual effect.
The random regression function f for animal i has form
f(a, DIM) = ai,1 +DIM · ai,2 + log(305/DIM) · ai,3
Thus, there are three random regression breeding values by animal.
Variance components are: residual variance σ2e = 100, and random regression effect
covariance matrix
G0 =
[
44.791 −0.133 0.351
−0.133 0.073 −0.010
0.351 −0.010 1.068
]
22
Command Language Interface Manual (CLIM)
The parameter file RRM.var is
Random effect1 Row2 Column3 Covariance1 Comment
2 1 1 44.791 additive genetic: intercept
2 2 1 −0.133 intercept, DIM linear
2 3 1 0.351 intercept, ln(DIM/305)
2 2 2 0.073 DIM, DIM
2 2 3 −0.010 DIM, ln(DIM/305)
2 3 3 1.068 ln(DIM/305), ln(DIM/305)
3 1 1 100.000 residual variance
The pedigree and data files for the random regression model example:
pedigree file RRM.ped data file RRM.dat
animal1 sire2 dam3 block4 HTD1 animal2 block3 DIM1 ln(305/DIM)2 milk3
1 9 7 1 1 1 1 73.0 1.4298500 26.0
2 10 8 1 2 1 1 123.0 0.9081270 23.0
3 9 2 2 3 1 1 178.0 0.5385280 21.0
4 10 8 3 1 2 1 34.0 2.1939499 29.0
5 11 7 3 2 2 1 84.0 1.2894900 18.0
6 11 1 4 3 2 1 139.0 0.7858380 8.0
7 0 0 8 4 2 1 184.0 0.5053760 1.0
8 0 0 8 1 3 2 8.0 3.6408701 37.0
9 0 0 8 2 3 2 58.0 1.6598700 25.0
10 0 0 8 3 3 2 113.0 0.9929240 19.0
11 0 0 8 4 3 2 158.0 0.6577170 15.0
5 3 2 218.0 0.3358170 11.0
6 3 2 268.0 0.1293250 7.0
2 4 3 5.0 4.1108699 44.0
3 4 3 60.0 1.6259700 29.0
4 4 3 105.0 1.0663500 22.0
5 4 3 165.0 0.6143660 14.0
6 4 3 215.0 0.3496740 8.0
4 5 3 14.0 3.0812500 35.0
5 5 3 74.0 1.4162500 23.0
6 5 3 124.0 0.9000300 17.0
5 6 4 31.0 2.2863200 28.0
6 6 4 81.0 1.3258600 22.0
CLIM code for the random regression model is
DATAFILE RRM.dat
INTEGER HTD animal blk-var # Integer column names
REAL DIM ln305DIM & # Covariables
milk_yd # Milk yield
PEDFILE RRM.ped # Pedigree file
PEDIGREE G am # animal model
PARFILE RRM.var
MODEL
milk_yd = Lact_curve(DIM ln305DIM) HTD G(1 DIM ln305DIM| animal)
Note that the component name Lact_curve is informative for the user only, not
23
Command Language Interface Manual (CLIM)
MiX99, because it is used for fixed regression effects and there is no nesting. Name
Lact_curve will remind user that these regression effects model the lactation curve.
The fixed effect solutions (Solfix) are
Fact. Trt Level N-Obs Solution Factor Trait
1 1 1 3 19.950 HTD milk_yd
1 1 2 4 20.373 HTD milk_yd
1 1 3 4 20.610 HTD milk_yd
1 1 4 4 19.728 HTD milk_yd
1 1 5 4 18.605 HTD milk_yd
1 1 6 4 17.852 HTD milk_yd
The Lact_curve regression effect solutions (Solreg) are
Trt Reg-No Solution Trait Covariable
1 1 -.49839E-01 milk_yd DIM
1 2 5.2910 milk_yd ln305DIM
The random regression effect estimates (Solani) for each animal are
Bull N-Desc N-Obs Intercept DIM ln305DIM
1 1 3 -.44256 0.36869E-01 -.36961E-01
2 1 4 0.26977 -.66036E-01 0.32266E-01
3 0 6 -.72875 0.68317E-02 -.47899E-01
4 0 5 1.1019 -.53652E-02 0.76755E-01
5 0 3 -.16240 0.69360E-02 -.14924E-01
6 0 2 -.48256 0.16641E-01 -.37788E-01
7 2 0 -.98533E-01 0.13337E-01 -.10336E-01
8 2 0 0.45724 -.23800E-01 0.36344E-01
9 2 0 -.62847 0.35030E-01 -.47914E-01
10 2 0 0.45724 -.23800E-01 0.36344E-01
11 2 0 -.18720 -.76675E-03 -.14550E-01
5.2.4 Example: Covariable table and random regression model
MiX99 allows use of covariable table files. This means using an index in the data file
that indicates a row in a covariable table file having regression coefficients. The table
numbers in model lines refer to column numbers in this covariable table file instead of
the data file. Note that the first column (index) is not counted as a column in the co-
variable table file. Use of covariable table file can reduce size of the data file. Typically
covariable table files are small because number of possible indices is small. For ex-
ample, days in milk in a data file can be from 1 to 350 days, and, so, only 350 different
regression coefficients are needed for one regression effect in the covariable table file.
CLIM commands needed are TABLEFILE and TABLEINDEX. TABLEFILE indicates
name of the covariable table file. TABLEINDEX has the integer number column name
having the index in the data file. Only one table file and index is allowed. The indices
must be positive numbers greater than zero, and be consecutively numbered. For ex-
ample, the table file can have indices 1, 2, 3, 4, but having only 1, 2, 4 is unacceptable.
The covariable table file regression coefficients need to be referenced in the model
differently from the other regression coefficients. The covariable table regression coef-
ficients are referenced by letter t and a column number: tn where n is column number
in the covariable table file. For example, t3 means column 3 in the covariable table
file.
24
Command Language Interface Manual (CLIM)
We consider again the single trait random regression animal model example by Scha-
effer and Dekkers (1994) (Chapter 5.2.3). We use the covariable table approach to
reduce number of columns in the data file. The idea is that the data file has an index
to indicate which regression coefficients are used. A natural indicator in dairy cattle
test-day models is days in milk. For the purpose of this example, an artificial index
was used instead. This was due to MiX99 requiring that the index is consecutively
numbered. Thus, the covariable index table file has to have index numbers from some
number, say 1, consecutively to a high number, say 305. This would give 305 lines.
Use of covariable table file leads to changes in the CLIM instruction file, and data file.
In addition, there has to be a covariable table file.
The data file (RRM_table.dat) is
HTD1 animal2 block3 index4 milk1
1 1 1 8 26.0
2 1 1 14 23.0
3 1 1 19 21.0
1 2 1 5 29.0
2 2 1 11 18.0
3 2 1 16 8.0
4 2 1 20 1.0
1 3 2 2 37.0
2 3 2 6 25.0
3 3 2 13 19.0
4 3 2 17 15.0
5 3 2 22 11.0
6 3 2 23 7.0
2 4 3 1 44.0
3 4 3 7 29.0
4 4 3 12 22.0
5 4 3 18 14.0
6 4 3 21 8.0
4 5 3 3 35.0
5 5 3 9 23.0
6 5 3 15 17.0
5 6 4 4 28.0
6 6 4 10 22.0
The covariable table file (RRM_table.cov) is
index1 DIM1 log(305/DIM)2
1 5 4.1108699
2 8 3.6408701
3 14 3.0812500
4 31 2.2863200
5 34 2.1939499
6 58 1.6598700
7 60 1.6259700
8 73 1.4298500
9 74 1.4162500
10 81 1.3258600
11 84 1.2894900
25
Command Language Interface Manual (CLIM)
12 105 1.0663500
13 113 0.9929240
14 123 0.9081270
15 124 0.9000300
16 139 0.7858380
17 158 0.6577170
18 165 0.6143660
19 178 0.5385280
20 184 0.5053760
21 215 0.3496740
22 218 0.3358170
23 268 0.1293250
DATAFILE RRM_table.dat
INTEGER HTD animal blk-var index
REAL milk_yd
TABLEFILE RRM_table.cov
TABLEINDEX index
PEDFILE RRM.ped
PEDIGREE G am
PARFILE RRM.var
MODEL SCALE
milk_yd = Lact_curve(t1 t2) HTD G(1 t1 t2| animal)
The solution files will be the same. However, there is small difference in the Solreg
file. The file is now
Trt Reg-No Solution Trait Covariable
1 1 -.49839E-01 milk_yd T1
1 2 5.2910 milk_yd T2
Thus, instead of the covariable names DIM and ln305DIM, there are the table covari-
able column names T1 and T2.
5.2.5 Example: Heterogeneous residual variance in test day model
Consider the random regression model example by Schaeffer and Dekkers (1994)
(Chapter 5.2.4). However, assume now that residual variance is different according to
the block. There are four blocks. Let residual variance be 100, 110, 105, and 90 in
blocks 1, 2, 3, and 4, respectively.
Important commands in CLIM for use of heterogeneous residual variance are RESIDFILE
and RESIDUAL. Command RESIDFILE has the name of the residual variance file.
Command RESIDUAL indicates the integer number column having the residual vari-
ance number in the data file.
Our example data stays the same. However, heterogeneous residual variance file
(RRM_res.var) is needed:
1 1 1 100.0
2 1 1 110.0
3 1 1 105.0
4 1 1 90.0
26
Command Language Interface Manual (CLIM)
The first column is the residual variance block number. The second and third column
refer to matrix position, here scalar. Thus, in our example, matrix position is always
(1,1). The last column has the variances.
CLIM code for analysis is
DATAFILE RRM_table.dat
INTEGER HTD animal blk-var index
REAL milk_yd
TABLEFILE RRM_table.cov
TABLEINDEX index
PEDFILE RRM.ped
PEDIGREE G am
PARFILE RRM.var # regular variance file
RESIDFILE RRM_res.var # the residual variances
RESIDUAL blk-var # index for residual variance
MODEL SCALE
milk_yd = Lact_curve(t1 t2) HTD G(1 t1 t2| animal)
Fixed effect solutions (Solfix) are
Fact. Trt Level N-Obs Solution Factor Trait
1 1 1 3 19.954 HTD milk_yd
1 1 2 4 20.369 HTD milk_yd
1 1 3 4 20.589 HTD milk_yd
1 1 4 4 19.689 HTD milk_yd
1 1 5 4 18.484 HTD milk_yd
1 1 6 4 17.755 HTD milk_yd
Fixed regression effect solutions (Solreg) are
Trt Reg-No Solution Trait Covariable
1 1 -.49291E-01 milk_yd T1
1 2 5.2951 milk_yd T2
Random regression effect solutions (Solani) are
Bull N-Desc N-Obs Intercept DIM ln305DIM
1 1 3 -.43705 0.36362E-01 -.36590E-01
2 1 4 0.26551 -.66395E-01 0.32095E-01
3 0 6 -.69442 0.64368E-02 -.45801E-01
4 0 5 1.0687 -.52282E-02 0.74740E-01
5 0 3 -.15415 0.72484E-02 -.14431E-01
6 0 2 -.48062 0.17272E-01 -.37930E-01
7 2 0 -.97642E-01 0.13161E-01 -.10194E-01
8 2 0 0.44474 -.23875E-01 0.35618E-01
9 2 0 -.60770 0.34699E-01 -.46705E-01
10 2 0 0.44474 -.23875E-01 0.35618E-01
11 2 0 -.18370 -.11773E-03 -.14523E-01
5.3 Maternal effect models
maternal effect model
The random regression effect models allow quite flexible model description. However,
random regression effects have the same covariance structure, e.g., numerator rela-
tionship matrix. Random maternal and paternal effects with correlated animal effect
27
Command Language Interface Manual (CLIM)
have a different structure. Nesting within a component is now by different class vari-
able. Thus, we have multiple correlated factors within genetic effect.
A random effect may have multiple class effects. For example, the genetic component
has both a maternal and a direct genetic effect. Component name is again needed.
A simple model with a fixed herd effect, random maternal and animal effects is
PEDIGREE G am
MODEL
y = herd G(dam animal)
Although G(dam animal) looks similar to the random regression models, there is a
notable difference. Here dam and animal are different class effects not regression co-
efficients by same class. This model specification requires a 2 by 2 genetic covariance
matrix for the maternal and animal effect.
Note that the maternal genetic model is different from model
PEDIGREE G am
RANDOM dam
MODEL
y = herd dam G(animal)
Here dam is a common dam environment effect for all of its progeny. There is no
relationship matrix involved in this dam effect.
5.3.1 Example: Animal model for a maternal trait
Consider model
tr1 = herd× year + pm + am + ag + e
where herd × year is fixed herd-year effect, pm is random common dam permanent
environment effect, am is random additive maternal genetic effect, and ag is random
additive individual genetic effect, and e is random residual.
The variance components are: maternal permanent environment variance σ2p = 1,
residual variance σ2e = 7, and genetic covariance matrix
G0 =
[
2.0 1.0
1.0 3.0
]
The parameter file mat.var is
Random effect1 Row2 Column3 Covariance1 Comment
1 1 1 1.0 maternal permanent env.
2 1 1 2.0 maternal genetic
2 1 2 1.0 cov(maternal, animal)
2 2 2 3.0 animal genetic
3 1 1 7.0 residual variance
We use the previously introduced data (Chapter 3). The pedigree file (Chapter 3.2.1)
can be kept the same. For the purposes of this example, we modify the data (Chap-
ter 3.1.1) to have the dam column instead of the sire column (example_mat.dat):
animal1 dam2 herd×year3 ones4 trait 11 trait 22
4 2 1 1 90 200
6 4 1 1 110 190
28
Command Language Interface Manual (CLIM)
8 6 2 1 120 140
9 6 2 1 130 120
10 8 2 1 120 130
CLIM code is
DATAFILE example_mat.dat
INTEGER animal dam herd_year ones
REAL tr1 tr2
PEDFILE AM.ped
PEDIGREE G am
RANDOM PE
PARFILE mat.var
MODEL SCALE
tr1 = herd_year PE(dam) G(dam animal)
The herd-year solutions (Solfix) are
Fact. Trt Level N-Obs Solution Factor Trait
1 1 1 2 99.338 herd_yea tr1
1 1 2 3 121.71 herd_yea tr1
The maternal permanent environment solutions (Solr01) are
2 1 -1.0095
4 1 1.0095
6 2 0.24829
8 1 -.24831
The maternal and animal genetic effect solutions (Solani) are
Animal id N-Desc N-Obs Maternal Animal
1 2 0 1.0095 0.50476
2 2 1 -1.0095 -.50475
3 2 0 0.25237 0.75717
4 2 1 0.75714 -.25240
5 3 0 0.38058 0.19030
6 3 2 1.1337 1.8287
7 1 0 0.69506 0.82321
8 1 1 0.22384 0.30441E-01
9 0 0 1.1042 2.0507
10 0 0 0.33530 0.54334E-01
Note that content of genetic effect solutions in the Solani file depends on the given
order in the model line. In this example, we gave G(dam animal) with the maternal
effect first, and direct animal effect second. Changing order of the effects changes
order in the solution file as well.
6 Multiple trait models
Multiple traits are defined by separate model lines. Traits are numbered in MiX99. Trait
number equals model line number. Thus, the first model line trait is trait number 1,
the second is number 2 etc. This has to be kept in mind when making covariance
matrix in PARFILE. Sometimes numbering of variance components can be difficult.
In particular, when different traits have some (random regression or maternal) effects
missing in another trait. The missing components are signaled by a dash (−) sign in
29
Command Language Interface Manual (CLIM)
the model lines. In the following, we will consider this in animal models, but sire models
work similarly.
6.1 All traits have the same effects
Simple multiple trait model has several traits that are equal in the sense of having
the same effects. For example, both traits have herd-year effect and animal genetic
effects. These effects have different solutions by trait. However, the important fact is
that both traits refer to the same classification column in the data file.
6.1.1 Example: Simple multiple trait model
Consider the multiple trait model data presented in Chapter 3.1.1. First consider a
simple model where both traits have the same effects:
tr1 = herd× year1 + a1 + e1
tr2 = herd× year2 + a2 + e2
where the subscripts 1 and 2 refer to traits 1 and 2. The variance components file
(name mt.var) was already presented in Chapter 3.3.1. CLIM instruction file is:
DATAFILE example.dat
INTEGER animal sire herd_year ones
REAL tr1 tr2
PEDFILE AM.ped
PEDIGREE animal am
DATASORT PEDIGREECODE=animal
PARFILE mt.var
MODEL SCALE
tr1 = herd_year animal
tr2 = herd_year animal
The fixed effect solutions (Solfix) are
Fact. Trt Level N-Obs Solution Factor Trait
1 1 1 2 99.760 herd_yea tr1
1 2 1 2 194.87 herd_yea tr2
1 1 2 3 122.93 herd_yea tr1
1 2 2 3 129.73 herd_yea tr2
The breeding value estimates (Solani) are
Bull N-Desc N-Obs tr1 tr2
1 2 0 0.44163E-04 0.10214E-04
2 2 0 0.44163E-04 0.10214E-04
3 2 0 0.47974 0.26924
4 2 1 -.47968 -.26923
5 3 0 -.31037E-04 -.40111E-04
6 3 1 0.95937 0.53844
7 1 0 0.23052 0.78139E-01
8 1 1 0.77389 0.86997
9 0 1 0.43458 -.14051
10 0 1 0.40098E-02 0.91964E-01
30
Command Language Interface Manual (CLIM)
6.2 Traits have different effects
Multiple trait models having different effects can be easily handled by MiX99. However,
there are some details that need to be remembered when using CLIM. The default
CLIM model line works like the MiX99 directive file: an effects missing in a trait has
to be indicated by a dash (’-’) sign. Consequently, models are column restricted, i.e.,
each effect in the model has a column which is present (given model name) or missing
(given dash sign) for each trait. Thus, it is important to give effects in a specific order.
The beta testing version of CLIM lifts this restriction, and model effects can be given in
any order without missing dash sign indicator. However, this may lead to models that
are interpreted incorrectly. Thus, it is very important to check that the model generated
by CLIM is correct in the MiX99_DIR.DIR file.
6.2.1 Example: Multiple trait model with different effects by trait
Consider the multiple trait model data as in Chapter 6.1.1 but use model
tr1 = herd× year + a1 + e1
tr2 = µ+ a2 + e2
The variance components file is the same as before. CLIM instruction file is the same
except for the model lines:
MODEL SCALE
tr1 = - herd_year animal
tr2 = ones - animal
The fixed effect solutions (Solfix) are
Fact. Trt Level N-Obs Solution Factor Trait
1 2 1 5 160.28 ones tr2
2 1 1 2 88.759 herd_yea tr1
2 1 2 3 137.73 herd_yea tr1
Breeding values estimates (Solani) are
Bull N-Desc N-Obs tr1 tr2
1 2 0 -.43878E-05 0.21829E-05
2 2 0 -.43878E-05 0.21829E-05
3 2 0 -2.2842 -2.5112
4 2 1 2.2842 2.5112
5 3 0 -5.8968 -5.8969
6 3 1 1.3284 0.87453
7 1 0 -4.2748 -4.4635
8 1 1 -7.6804 -7.6190
9 0 1 -6.6912 -7.2448
10 0 1 -9.9586 -9.9459
6.2.2 Example: Different effects by trait using CLIM beta features
Order of effects is unimportant in the CLIM beta version. The model lines in example
above in Chapter 6.2.1 can be given differently using the CLIM beta version (e.g.,
giving mix99i -b model.clm). A natural way of giving would be
MODEL SCALE
tr1 = herd_year animal
tr2 = ones animal
The additional space between the effect names is not important for CLIM, it is just to
make the model easier to read. A perfectly acceptable model would be
31
Command Language Interface Manual (CLIM)
MODEL SCALE
tr1 = animal herd_year
tr2 = ones animal
However, this is more difficult to read.
The breeding value estimates in the Solani solution file would be exactly the same
as before. However, solutions in the Solfix file are printed in different order:
Fact. Trt Level N-Obs Solution Factor Trait
1 1 1 2 88.759 herd_yea tr1
1 1 2 3 137.73 herd_yea tr1
2 2 1 5 160.28 ones tr2
The reason for different ordering is the -b option. The -b option leads to ordering by
column number in the data file. Column herd_year is before ones in the data file.
This can also be seen in the MiX99_DIR.DIR file.
6.3 Multiple trait random regression model
Multiple trait random regression models are a natural extension of the single trait ones.
The model lines are as easy to give. However, it is important to get the numbering of
random regression effects correct. Numbering is column-wise from first trait to second
etc.
Consider quadratic random regression function of an animal for two traits:[
f(a1, x1)
f(a2, x2)
]
=
[
1 x1 x
2
1
1 x2 x
2
2
][ a1
a2
a3
]
where subscripts 1 and 2 refer to trait number, x1 and x2 are regression coefficients,
and a has random regression effects to be estimated. The functions can be written
also
f(a, x1) = a1,1 + x1 · a1,2 + x21 · a1,3
f(a, x2) = a2,1 + x2 · a2,2 + x22 · a2,3
where the first subscript in a is trait number, and the second is random regression
effect number. The random regression effects are numbered[
a1,1 a1,2 a1,3
a2,1 a2,2 a2,3
]
=
[
(1) (3) (5)
(2) (4) (6)
]
where the numbers in parenthesis mean effect number.
The random regression effect numbers are used to identify variance components in
the PARFILE. They also determine order of estimates in the solution files such as
Solani.
6.3.1 Example: Multiple trait random regression model
Consider the single trait random regression test-day model data presented in Chap-
ter 5.2.3. We expand the data by adding column of observations for a second trait.
The observation column of the single trait is copied to this new column.
The two traits will have the same random regression function. Assume that within trait
the genetic covariance matrix stays the same, and between the traits the correlation
32
Command Language Interface Manual (CLIM)
is 95 %. Remember that numbering of regression effects is column-wise. Thus, the
genetic covariance matrix is
G =

44.791 42.55145 −0.133 −0.12635 0.351 0.33345
42.55145 44.791 −0.12635 −0.133 0.33345 0.351
−0.133 −0.12635 0.073 0.06935 −0.010 −0.0095
−0.12635 −0.133 0.06935 0.073 −0.0095 −0.010
0.351 0.33345 −0.010 −0.0095 1.068 1.0146
0.33345 0.351 −0.0095 −0.010 1.0146 1.068
 (1)
and let the residual covariance matrix be
R =
[
100.0 50.0
50.0 100.0
]
(2)
Then, the variance parameter file (RRM_mt.var) is
1 1 1 44.791
1 2 2 44.791
1 1 2 42.55145
1 3 1 -0.133
1 4 2 -0.133
1 4 1 -0.12635
1 3 2 -0.12635
1 3 3 0.073
1 4 4 0.073
1 3 4 0.06935
1 3 5 -0.010
1 4 6 -0.010
1 3 6 -0.00950
1 4 5 -0.00950
1 5 1 0.351
1 6 2 0.351
1 5 2 0.33345
1 6 1 0.33345
1 5 5 1.068
1 6 6 1.068
1 5 6 1.01460
2 1 1 100.0
2 2 1 50.0
2 2 2 100.0
CLIM code is
DATAFILE RRM_mt.dat
INTEGER HTD animal blk-var
REAL DIM ln305DIM milk_1 milk_2
PEDFILE ../data/RRM.ped
PEDIGREE G am
PARFILE RRM_mt.var
MODEL SCALE
milk_1 = Lact_curve(DIM ln305DIM) HTD G(1 DIM ln305DIM| animal)
milk_2 = Lact_curve(DIM ln305DIM) HTD G(1 DIM ln305DIM| animal)
Fixed effects solutions (Solfix) are
33
Command Language Interface Manual (CLIM)
Fact. Trt Level N-Obs Solution Factor Trait
1 1 1 3 20.151 HTD milk_1
1 2 1 3 20.151 HTD milk_2
1 1 2 4 20.488 HTD milk_1
1 2 2 4 20.488 HTD milk_2
1 1 3 4 20.718 HTD milk_1
1 2 3 4 20.718 HTD milk_2
1 1 4 4 19.891 HTD milk_1
1 2 4 4 19.891 HTD milk_2
1 1 5 4 18.753 HTD milk_1
1 2 5 4 18.753 HTD milk_2
1 1 6 4 17.992 HTD milk_1
1 2 6 4 17.992 HTD milk_2
Fixed regression effects (Solreg) are
Trt Reg-No Solution Trait Covariable
1 1 -.50423E-01 milk_1 DIM
1 2 5.2367 milk_1 ln305DIM
2 1 -.50423E-01 milk_2 DIM
2 2 5.2367 milk_2 ln305DIM
Random regression breeding values (Solani) are
id (1) (2) DIM(1) DIM(2) ln305DIM(1)
1 1 3 -.54362 -.54362 0.37788E-01 0.37788E-01 -.43693E-01 ...
2 1 4 0.34392 0.34392 -.67204E-01 -.67204E-01 0.37731E-01 ...
3 0 6 -.85129 -.85129 0.75673E-02 0.75673E-02 -.54810E-01 ...
4 0 5 1.2934 1.2934 -.65255E-02 -.65255E-02 0.89009E-01 ...
5 0 3 -.18533 -.18533 0.70232E-02 0.70233E-02 -.17046E-01 ...
6 0 2 -.57857 -.57857 0.17720E-01 0.17720E-01 -.44841E-01 ...
7 2 0 -.12228 -.12228 0.13506E-01 0.13506E-01 -.12222E-01 ...
8 2 0 0.54578 0.54579 -.24593E-01 -.24593E-01 0.42246E-01 ...
9 2 0 -.75284 -.75284 0.36109E-01 0.36109E-01 -.55656E-01 ...
10 2 0 0.54578 0.54579 -.24593E-01 -.24593E-01 0.42246E-01 ...
11 2 0 -.21549 -.21549 -.46539E-03 -.46540E-03 -.17023E-01 ...
The last column has been omitted due to page width restriction. It is equal to the
second last column. Numbers in the parenthesis refer to the trait number of effect.
6.4 Combining of trait estimates
Combing of trait estimates means making effects of different traits to be the same. By
default, it is assumed that effects in different traits will be different, and get separate
estimated values. Combining of trait estimates or shortly combining of traits allows
estimating the same solutions for effects in different traits. In practice, combining of
traits is used in reduced rank random regression models. However, the concept is
illustrated by a multiple trait model that is equivalent with repeatability model.
Combining of traits is indicated by the ’@’ sign in the model lines. After the ’@’ sign
combining group name is given. The name can be any allowed name that has not
already been used. Combining of traits can be instructed for any effects that have
component name. Thus, it is not possible to use integer number column names when
combining traits. The effect names have to be given a component name. For example,
G(animal)@fst.
34
Command Language Interface Manual (CLIM)
6.4.1 Example: Repeatability model by multiple trait model
Repeatability model is a multiple trait model where genetic correlation between traits
is one, residual variances are equal, and residual covariances are equal. Residual
correlations are equal to σ2p/
(
σ2p + σ
2
e
)
where σ2p is permanent environment variance
and σ2e is residual variance. We consider the repeatability model example presented
in Chapter 5.1.5.
An equivalent two trait animal model is
tr1 = herd× year + a+ e1
tr2 = herd× year + a+ e2
where the fixed effect herd× year and random animal genetic effect a are common to
both traits. Genetic variance is as before σ2a = 2. Residual covariance matrix is
R =
[
7.0 2.0
2.0 7.0
]
(3)
Note that the residual variances equal sum of permanent environment and residual
variances in the repeatability mode, and covariance equals repeatability variance.
The variance components file (mt_repeat.var) is
Random effect1 Row2 Column3 Covariance1 Comment
1 1 1 3.0 animal genetic
2 1 1 7.0 residual variance
2 1 2 2.0 permanent environment
2 2 2 7.0 residual variance
The repeatability data file changed to multiple trait format (mt_repeat.dat):
animal sire herd×year1 herd×year2 ones trait 1 trait 2
4 1 11 21 1 90 200
6 3 11 21 1 110 190
8 5 12 22 1 120 140
9 5 12 22 1 130 120
10 7 12 22 1 120 130
CLIM code
DATAFILE example_mt_repeat.dat
INTEGER animal sire hy_1 hy_2 ones
REAL tr1 tr2
PEDFILE AM.ped
PEDIGREE G am
PARFILE mt_repeat.var
MODEL SCALE
tr1 = hy_1 - G(animal)@1
tr2 = - hy_2 G(animal)@1
Note that the animal genetic effect animal needs a name G for combining. Also, there
has to be a dash (-) to indicate the use of separate integer columns for the herd-year
effects.
35
Command Language Interface Manual (CLIM)
Estimated herd-year solutions (Solfix) are
Fact. Trt Level N-Obs Solution Factor Trait
1 1 11 2 99.833 hy_1 tr1
1 1 12 3 123.01 hy_1 tr1
2 2 21 2 194.83 hy_2 tr2
2 2 22 3 129.68 hy_2 tr2
Estimated breeding values (Solani) are
Bull N-Desc N-Obs tr12
1 2 0 -.34993E-13
2 2 0 -.34993E-13
3 2 0 0.33333
4 2 1 -.33333
5 3 0 -.76001E-13
6 3 1 0.66667
7 1 0 0.97731E-01
8 1 1 0.98778
9 0 1 -.85515E-01
10 0 1 0.71553E-01
The solutions are the same as by the repeatability model in Chapter 5.1.5. However,
no solutions for permanent environment are calculated because no permanent envi-
ronment effect was in this two trait model.
6.4.2 Example: Reduced rank random regression model
Rank reduction can be used to make covariance matrices of highly correlated random
regression coefficients more independent. Consequently, size of the covariance matrix
is reduced. Convergence of the iterative solver becomes faster for at least two reasons:
equations become less correlated, and there are less unknowns to solve.
In a reduced rank model, coefficients from two or more traits multiply the same solu-
tions. We consider the two trait random regression example in Chapter 6.3.1.
6.4.3 Example: Finnish test-day model
This example is not a complete presentation of the old Finnish test-day model. No
data is given, and only the first lactation model. This illustrates potential of MiX99 for
solving complex test-day models used currently to solve dairy cattle breeding values.
This was Finnish test-day model for first lactation milk, protein, and fat yield. The full
model had second, and third with later lactations as additional traits. In this subset
model, both permanent environment and additive genetic effects are modeled by a
curve with six coefficients. However, covariance matrices of both of these effects have
size six because all traits are combined to one.
DATAFILE Ter.dat
INTEGER block animal HTM YM SEASON AGE DCC DIM
REAL milk protein fat
PEDFILE miniTDM.pedi
PARFILE TDMpara.in
PEDIGREE G am+p
RANDOM HTM PE
TABLEFILE finTDMpara.cov
TABLEINDEX DIM
36
Command Language Interface Manual (CLIM)
MODEL SCALE
milk = Curve(t1 t2 t3 t4 t96| SEASON) AGE DCC YM HTM &
PE(t5 t6 t7 t8 t9 t10| animal)@1st &
G(t59 t60 t61 t62 t63 t64| animal)@FST
protein = Curve(t1 t2 t3 t95 t97| SEASON) AGE DCC YM HTM &
PE(t11 t12 t13 t14 t15 t16| animal)@1st &
G(t65 t66 t67 t68 t69 t70| animal)@FST
fat = Curve(t1 t2 t3 t95 t98| SEASON) AGE DCC YM HTM &
PE(t17 t18 t19 t20 t21 t22| animal)@1st &
G(t71 t72 t73 t74 t75 t76| animal)@FST
DEVNote that the model lines of the previous example can be shortened using CLIM macro
and range abbreviations (command line option --usemacros is currently needed):
DEFINE CurveMILK Curve(t1:3 t4 t96 | SEASON)
DEFINE CurvePROT Curve(t1:3 t95 t97 | SEASON)
DEFINE CurveFAT Curve(t1:3 t95 t98 | SEASON)
DEFINE Common AGE DCC YM HTM
MODEL SCALE
milk = CurveMILK Common PE( t5:10|animal)@1st G(t59:64|animal)@FST
protein = CurvePROT Common PE(t11:16|animal)@1st G(t65:70|animal)@FST
fat = CurveFAT Common PE(t17:22|animal)@1st G(t71:76|animal)@FST
6.5 Multiple trait maternal effects model
Consider three trait maternal effect model where the last trait does not have a maternal
effect. In addition, some fixed effects are different by trait. Note that spaces between
the effect names are only to help reading the model lines.
DATAFILE Beef.dat
INTEGER BREED HERD id dam sex twin dam_age &
brth_mth HY_brth HY_200d HY_365d
REAL Brth_w 200d_w 365d_w age_200d age_365d
PEDFILE Beef.ped
PEDIGREE G am+p
PARFILE Beef.var
MODEL SCALE
Brth_w = - twin sex - HY_brth - - G(dam id)
200d_w = age_200d twin sex brth_mth - HY_200d - G(dam id)
365d_w = age_365d twin sex brth_mth - - HY_365d G( - id)
WITHINBLOCKORDER G HY_365d HY_200d HY_brth
Currently in beta testing (option -b) allows this model to be written without the dashes.
Because effects are ordered by their column number, the resulting directive file is
different from the example above. However, the analysis will be the same although
effects are in different order on the model lines. It is still important to have - in the
third trait 365d_w within random effect in order to have correct numbering of variance
components.
37
Command Language Interface Manual (CLIM)
MODEL
Brth_w = twin sex HY_brth G(dam id)
200d_w = age_200d twin sex brth_mth HY_200d G(dam id)
365d_w = age_365d twin sex brth_mth HY_365d G( - id)
7 Genomic data models
There are two commonly employed alternative ways to use genomic data in statistical
models for animal breeding. One way has genomic marker effects directly in the model.
The other way uses genomic data to build (co)variance structure such as genomic
relationship matrix, and use it as a (co)variance structure to breeding values. Both
kinds of models are supported in MiX99 using CLIM. The genomic marker effect model
will be called SNP-BLUP model. Models using genomic relationship matrix include G-
BLUP and the single-step method.
7.1 SNP-BLUP or genomic effect model
Statistical models for genomic selection have often several thousand SNP markers.
In the SNP-BLUP model each marker is a regression effect. It would be tedious to
write a model line which has several thousand regression coefficients. Instead, CLIM
allows use of regression coefficient matrices in files by commands REGMATRIX and
REGFILE. MiX99 allows use of several matrices but CLIM allows currently only one
regression coefficient matrix.
The regression coefficients defined by the regression coefficient matrix can be either
fixed (command REGMATRIX FIXED or random (command REGMATRIX RANDOM). If
they are random, the marker effects in MiX99 can have either common variance or,
alternatively, each marker can have its individual own variance.
Relevant commands for regression coefficient matrices are REGFILE for the name of
the file having the regression coefficients, REGMATRIX for defining type of the matrix as
well as coefficient columns, and REGPARFILE for variance component(s). For syntax
and better explanation see Chapter 9.2.
All coefficients on a line in a regression coefficient matrix file belong to only one animal.
Each line corresponds to a line in the data file defined by DATAFILE. It is important
to have the lines in the regression coefficient file in the same order as corresponding
observations in the data file. It is possible to instruct MiX99 to check that the files have
lines in the same order by animal id code. Then, the genetic evaluation is not as error
prone as when no animal id code has been given.
Animal id code can be on different columns in the regression coefficient file and the
data file. Animal id code in the regression coefficient file is instructed by option ID
in command REGMATRIX RANDOM ID=value where value is column number in
the regression coefficient file. In the data file the command for animal id code is
DATASORT PEDIGREECODE=icol where icol is integer number column name (or
number) in the data file. If either one information is missing, order of lines cannot be
checked, and order is assumed to be correct. Please check summary of commands
for syntax and better explanation of these commands (Chapters 9.2.2 and 9.2.16).
Use of the ID option in REGMATRIX allows giving genotype files that have more geno-
typed animals than animals with observation. This is convenient when the genotype
38
Command Language Interface Manual (CLIM)
data set has genotyped candidate animals without observation. Or, the same marker
data is used in analysis of several traits where some animals do not have observations
for some traits.
7.1.1 Example: simple genomic marker BLUP
Model is
y = µ+ β1g1 + β2g2 + β3g3 + β4g4 + β5g5 + β6g6 + e
where the βs are regression coefficients, the gs are random additive marker or allele
effects, and e is random residual. There are six markers, numbered 1, 2, . . . , 6.
The markers have bi-allelic loci. For each marker, the genotypes are coded as 0
for homozygous first allele, 1 for heterozygote, and 2 for homozygous second allele.
Marker effect is additive allele effect of the second allele. Thus, two times the marker
effect solution gives difference between the homozygotes for the first and the second
allele.
The variance components are: common SNP marker genetic variance σ2g =
1
6
=
0.166666666, and residual variance σ2e = 1. The variances are in two separate files.
The file for the marker genotype variance (gs_gen.par) is
Matrix number1 Row2 Column3 Covariance1
1 1 1 0.166666666 marker variance
The other parameter file (gs_res.par) has the residual variance:
Random effect1 Row2 Column3 Covariance1
1 1 1 1.0 residual variance
The data has been divided into two separate files. The data file (command DATAFILE)
is the standard MiX99 observation file having columns for animal id code, fixed effect
(general mean), and observation. The other file (command REGFILE) has the genomic
information, i.e., regression coefficients defined by genotype. As mentioned, records
in these files must be in the same order. Thus, the first record in data file and in the
regression coefficient file are from the same animal. It is assumed that all genotypes
are known, and have been coded by user.
The data files are
data file gs_obs.dat genotype file gs_geno.dat
animal1 mean2 y1 id1 1 2 3 4 5 6
1 1 5 1 2 1 0 0 0 0
2 1 6 2 1 1 0 1 0 0
3 1 10 3 1 0 2 2 2 1
4 1 15 4 0 1 1 2 2 2
5 0 0 2 1 1 1
6 0 0 1 2 2 2
CLIM code is
DATAFILE gs_obs.dat
INTEGER animal mean
REAL y
39
Command Language Interface Manual (CLIM)
MISSING -99999.
DATASORT PEDIGREECODE=animal
PARFILE gs_res.par # residual variance
REGMATRIX RANDOM SNP ID=1 FIRST=2 LAST=7
REGFILE gs_geno.dat
REGPARFILE gs_gen.par # snp variance
PRECON d
MODEL
y = mean
The fixed general mean solution (Solf01) is
1 4 7.2604
The marker effect solutions (Solreg_mat) are
Trt Matrix Effect Solution Mat-Name
1 1 1 -0.66624 SNP
1 1 2 0.11015 SNP
1 1 3 0.27294 SNP
1 1 4 0.55610 SNP
1 1 5 0.76616 SNP
1 1 6 0.87631 SNP
Estimated genomic breeding values can be calculated as
â = 1µˆ+Zĝ (4)
where 1 is vector of ones, µˆ is estimate of the general mean, Z is the regression
coefficient matrix, and ĝ is 6×1 vector of estimated marker effects. MiX99 (specifically
mix99s) calculates genomic breeding values (â) for this model when option -p is
given. Thus, mix99s -p -s writes file yHat.data0 where each line has a breeding
value for an animal. The values are in the same order as observations in the data file.
Unix command paste gs_obs.dat yHat.data0 shows animal id numbers with
their estimated genomic breeding values. Result is
1 1 5 6.0380783
2 1 6 7.2604141
3 1 10 10.660874
4 1 15 12.040634
Note that estimated genomic breeding value is not calculated to animals without ob-
servation using option -p.
7.1.2 Enhanced formatting of SNP marker information
NEWSNP marker information is typically coded with values 0, 1, or 2, and optional miss-
ing marker value with some other one digit integer such as 3 or 9. The coding can
also include separate centering and scaling information so that the possibly very large
marker information can be kept in integer form to save disk space and memory.
REGMATRIX command has optional parameter IMPUTE for specifying missing marker
value. For example:
REGMATRIX ... IMPUTE=3
40
Command Language Interface Manual (CLIM)
instructs to replace, or impute, all marker values 3 in the regression coefficient matrix
by averages of the (non-missing) marker values.
The marker value (Z012i,m ) can be optionally centered and scaled by subtracting a center
value (µm) from the marker value and multiplying this by a scaling value (sm):
Zi,m = (Z
012
i,m − µm) ∗ sm
The centering can be specified with optional CENTER parameter of the REGMATRIX
command. The centering value µm can be either average of the markers, a given con-
stant for all markers, or separate value for each marker stored in a file. For example:
REGMATRIX RANDOM SNP ID=1 FIRST=2 LAST=7 CENTER
centers the markers around the averages,
REGMATRIX RANDOM SNP ID=1 FIRST=2 LAST=7 CENTER=1
uses “-1,0,1 coding” instead of the “0,1,2 coding”, i.e. centers around one (1), and
REGMATRIX RANDOM SNP ID=1 FIRST=2 LAST=7 CENTER=mu.dat
reads six (6 = 7-2+1) separate centering values from file mu.dat.
Similarly, optional scaling can be specified with SCALE parameter of the REGMATRIX
command. The scaling value sm can be either a given contant for all markers or sepa-
rate value for each marker stored in a file. For example:
REGMATRIX RANDOM SNP ID=1 FIRST=2 LAST=7 SCALE=0.5
scales all markers with a half (0.5) and
REGMATRIX RANDOM SNP ID=1 FIRST=2 LAST=7 SCALE=s.dat
uses six separate scaling values stored in file s.dat.
By default, the general regression coefficient matrix values are real valued numbers.
The SNP marker information is typically coded with integer values 0, 1, or 2, and op-
tional missing marker value. In order to let MiX99 to check the allowed SNP marker in-
teger values, the file format of the REGFILE file can be specified with optional FORMAT
parameter of the REGMATRIX command. Default file format is “n” (or “normal”) and the
SNP marker information can be specified with format “m” (or “markers”):
REGMATRIX RANDOM SNP ID=1 FIRST=2 LAST=7 FORMAT=markers
By default, the SNP marker values are assumed to be separated by spaces in the
REGFILE file. The marker values (0, 1, and 2) are only one character wide. Thus, the
spaces separating the markers effectively double the file size. The spaces between
the marker values can be removed by specifying file format “s” (or “squeezed”). For
example, file gs_geno_nospaces.dat could contain marker values without spaces
as:
id1 SNPs
1 210000
2 110100
3 102221
4 011222
5 002111
6 001222
and the file format can be specified with:
41
Command Language Interface Manual (CLIM)
REGMATRIX RANDOM SNP ID=1 FIRST=2 LAST=7 FORMAT=squeezed
REGFILE gs_geno_nospaces.dat
7.2 Example: simple G-BLUP
Consider the same data as in the previous example. The SNP-BLUP model using
matrix notation is
y = 14µ+ Zg + e
where y is 4× 1 vector of observations, 14 is 4× 1 vector of ones, Z is 4× 6 matrix of
SNP marker genotypes, g is 6 × 1 vector of random marker effects, and e is random
residual. An equivalent model or G-BLUP model solves breeding value vector u = Zg
without need to solve the marker effects g. The model is
y = 14µ+ Zuu+ e
where Zu is 4×6 incidence matrix linking breeding values u to the observations. In our
case, Zu = [ I4 0 ] where 0 is 4 × 2 matrix of zeros. In the SNP-BLUP model it was
assumed that g ∼ N(0, I6σ2g). In the G-BLUP model, it is assumed that u ∼ N(0,Gσ2g)
where G = ZZ′ is a 6× 6 matrix. Note that Z has marker coefficients of the candidate
animals as well.
The Z matrix is
Z =

2 1 0 0 0 0
1 1 0 1 0 0
1 0 2 2 2 1
0 1 1 2 2 2
0 0 2 1 1 1
0 0 1 2 2 2

and the covariance matrix is
G = ZZ′ =

5 3 2 1 0 0
3 3 3 3 1 2
2 3 14 12 9 12
1 3 12 14 8 13
0 1 9 8 7 8
0 2 12 13 8 13

Mixed model equations need inverse of covariance matrix of a random effect. In our
case, G−1 is needed. MiX99 will not compute it: user has to calculate it. In our case
the inverse is
G−1 =

0.7500 −0.5000 −0.5000 −0.2500 0.3333 0.5833
−0.5000 2.0000 −1.0000 −1.5000 1.0000 1.5000
−0.5000 −1.0000 2.0000 1.5000 −1.6666 −2.1666
−0.2500 −1.5000 1.5000 2.7500 −1.3333 −3.0833
0.3333 1.0000 −1.6666 −1.3333 1.8888 1.5555
0.5833 1.5000 −2.1666 −3.0833 1.5555 3.9722

The inverse matrix is given to MiX99 in a file stored in either co-ordinate (Yale) sparse
matrix format or lower triangle dense format.
The default inverse matrix format is the co-ordinate sparse matrix format, also called
the Yale format. In this format, each element in the lower triangle of the matrix is given
with its element position. In our case, the matrix in file iG_raw.dat is
42
Command Language Interface Manual (CLIM)
1 1 0.75
2 1 -0.5
2 2 2
3 1 -0.5
3 2 -0.999999999999999
3 3 2
4 1 -0.25
4 2 -1.5
4 3 1.5
4 4 2.75
5 1 0.333333333333333
5 2 0.999999999999999
5 3 -1.66666666666667
5 4 -1.33333333333333
5 5 1.88888888888889
6 1 0.583333333333333
6 2 1.5
6 3 -2.16666666666667
6 4 -3.08333333333333
6 5 1.55555555555555
6 6 3.97222222222222
Note that the element positions are the animal id codes. In our case they are from one
to six. In practice, the numbers need not be consecutive or in increasing order.
The inverse co-variance file iG_raw.dat is given by command PEDFILE as an in-
verse co-variance file to the breeding values. An important requirement is that all
elements of the given matrix are from the lower triangle only. Option ’MIXED’ can be
used to read file that has both lower and upper triangle elements of a symmetric matrix,
e.g.,
PEDFILE MIXED iG_raw.dat
The mixed option should be used with caution because MiX99 will not check if a matrix
element appears as an upper and lower triangle element.
NEWAn altogether different matrix format is lower triangle dense matrix format. Command
option ’LOWER’ indicates it:
PEDFILE LOWER iG_raw.dat
In the lower triangle dense matrix format, all lower triangle elements of the matrix are
in the file. There are two header rows having the size of the matrix and animal id
codes. The previous matrix in a lower triangle dense format is
6 0
1 2 3 4 5 6
0.75
-0.5 2
-0.5 -1 2
-0.25 -1.5 1.5 2.75
0.33 1 -1.67 -1.33 1.89
0.58 1.5 -2.17 -3.08 1.55 3.97
where the matrix values have been limited to two decimals for output reasons. Note
that the order of id codes is irrelevant to MiX99, i.e., they do not have to be in increasing
order as given here. However, the order of id codes on the second row gives order of
rows below that. In other words, for MiX99 the previous matrix can be given as
43
Command Language Interface Manual (CLIM)
6 0
3 1 2 4 5 6
2
-0.5 0.75
-1 -0.5 2
-0.25 -1.5 1.5 2.75
0.33 1 -1.67 -1.33 1.89
0.58 1.5 -2.17 -3.08 1.55 3.97
Note that the first row has two number: 6 and 0. The second number (0) can be used
to inform number of core animals when APY matrix is given. For example, core of 2
would mean matrix:
6 2
1 2 3 4 5 6
0.75
-0.5 2
-0.5 -1 2
-0.25 -1.5 2.75
0.33 1 1.89
0.58 1.5 3.97
where the same values have been taken as above for presentation purposes. In prac-
tice, APY inverse matrix would have different values.
The variance components file (gs.par) is
Matrix number1 Row2 Column3 Covariance1
1 1 1 0.166666666 marker variance
2 1 1 1.0 residual variance
Note that here (ZuZ′u)−1 was used. This allowed using marker variance directly. Typi-
cally, genomic relationship matrix is build using some of the methods by VanRaden.
CLIM code for G-BLUP is
DATAFILE gs_obs.dat
INTEGER animal mean
REAL y
MISSING -99999.
DATASORT PEDIGREECODE=animal
PARFILE gs.par # parameters
PEDFILE iG_raw.dat
PEDIGREE animal FILE
PRECON d d
MODEL
y = mean animal
Note how existence of the co-variance structure of animal genetic effect is indicated
by the PEDIGREE command with option FILE.
The fixed general mean solution (Solfix) is
Fact. Trt Level N-Obs Solution Factor Trait
1 1 1 4 7.2604 mean y
The breeding value solutions (Solani) are
44
Command Language Interface Manual (CLIM)
1 1 1 -1.2223
2 2 1 -.11178E-05
3 3 1 3.4005
4 4 1 4.7802
5 5 0 2.7444
6 6 0 4.6701
Again giving Unix command paste gs_obs.dat yHat.data0 associates animal
id numbers to their estimated genomic breeding values. Result is
1 1 5 6.0380797
2 1 6 7.2604151
3 1 10 10.660873
4 1 15 12.040632
Note that these values can be calculated by adding the fixed general mean solution
7.2604 to the breeding value solutions in the Solani file.
7.2.1 Example: G-BLUP with polygenic effect
Model is
y = 1µ+ Zuu+ Zaa+ e
where u is vector of random additive genetic effects from genomic data, vector a has
random additive polygenic effect from pedigree information and vector e is random
residual. Matrix Zu is incidence matrix as in G-BLUP example, and matrix Za is inci-
dence matrix relating observation in y to proper breeding value in a. The model is the
same as in the previous example except for the polygenic a effect.
The random effects have the following assumptions: u ∼ N(0,Gσ2g), a ∼ N(0,Aσ2a),
and e ∼ N(0, Iσ2e) whereG is genomic co-variance matrix (see previous example), and
A is pedigree based relationship matrix. The following pedigree for the relationship
matrix A is in file (gs_poly.ped):
animal1 sire2 dam3
1 2 3
2 4 3
3 5 6
4 5 6
5 0 0
6 0 0
This small pedigree has some non-zero inbreeding coefficients. We will account in-
breeding coefficients in the building of the A−1 in MiX99 by an inbreeding coefficient
file, named gs_poly.inbr. The file is
animal1 number2 F1
5 1 0.00000
6 2 0.00000
3 3 0.00000
4 4 0.00000
2 5 0.25000
1 6 0.37500
where the first column has original animal id code, and the last column inbreeding
coefficient. When no inbreeding coefficient file is given, MiX99 builds A−1 assuming
all inbreeding coefficients are zero.
45
Command Language Interface Manual (CLIM)
The variance components are: common SNP marker variance σ2g =
1
6
, polygenic vari-
ance σ2a = 1, and residual variance σ2e = 1. The variances are in file gs_poly.par:
Random effect1 Row2 Column3 Covariance1
1 1 1 0.166666666 SNP genetic variance
2 1 1 1.0 polygenic variance
3 1 1 1.0 residual variance
CLIM code is
DATAFILE gs_obs.dat
INTEGER animal mean
REAL y
MISSING -99999.
DATASORT PEDIGREECODE=animal
PARFILE gs_poly.par
COVFILE animal iG_raw.dat
PEDFILE gs_poly.ped
PEDIGREE polygenic am
INBRFILE gs_poly.inbr
INBREEDING PEDIGREECODE=1 FINBR=3
RANDOM genomic polygenic
MODEL
y = mean genomic(animal) polygenic(animal)
Note that the RANDOM command gives numbering of the random effects (other than
residual). The polygenic effect must be the last random effect in this statement. Note
also that the words genomic and polygenic are NOT reserved names but names
just to identify different random effects.
The fixed general mean solution (Solfix) is
Fact. Trt Level N-Obs Solution Factor Trait
1 1 1 4 7.9564 mean y
The genomic breeding value estimates (Solr01) are
1 1 -.99014
2 1 -.93297E-06
3 1 3.0507
4 1 4.0358
5 0 2.4286
6 0 3.9911
The polygenic animal solutions (Solani) are
1 0 1 -1.1307
2 1 1 -.79141
3 2 1 -.73879
4 1 1 0.73879
5 2 0 0.12569E-12
6 2 0 0.12569E-12
Again giving Unix command paste gs_obs.dat yHat.data0 associates animal
46
Command Language Interface Manual (CLIM)
id numbers to their estimated complete breeding values. Result is
1 1 5 5.8356075
2 1 6 7.1650143
3 1 10 10.268371
4 1 15 12.731008
Note that in practice this model may have convergence problems because the poly-
genic and genomic breeding value effects try to estimate the same entity: genetics.
Thus, this model must split genetic breeding value to its polygenic and marker com-
ponents which may be sometimes difficult. In particular, the genomic breeding values
have a genomic relationship matrix, and the polygenic effect has pedigree based rela-
tionship matrix. In this example, these matrices are very different but in practice they
can be similar.
An alternative equivalent model to this model would be to make a relationship matrix
that combines the genomic and pedigree relationship matrices. For example, let GA =
G + Aσ
2
a
σ2g
. Then, instead of using G−1 in the G-BLUP model, use G−1A . Thus, the
instructions will be the same as for the G-BLUP model with the PEDFILE replaced
by the new inverse covariance matrix G−1A where pedigree based relationship matrix
and genomic information have been combined. This will yield the same estimated
complete breeding values. However, in practice, convergence of the iterative method
is likely to be better due to not having to estimate two genetic breeding values for every
animal.
The equivalent G-BLUP model needs G−1A in a file. In our case, the G
−1
A matrix (lower
triangle) in co-ordinate sparse matrix format is (file iGa.dat):
1 1 0.212360759655864
2 1 -0.173168009265463
2 2 0.302462455536769
3 1 -0.0820544909187614
3 2 -0.0175160071940054
3 3 0.284302642414021
4 1 0.0222724767716754
4 2 -0.10603222546145
4 3 0.0440272119204682
4 4 0.265976547255634
5 1 0.0343385639825208
5 2 0.0289958157919094
5 3 -0.169030129597629
5 4 -0.126868150312209
5 5 0.247839050926183
6 1 0.0436056268940923
6 2 0.0386568862841335
6 3 -0.172788965328849
6 4 -0.180933888034668
6 5 0.122875534464136
6 6 0.272614251165761
As mentioned, the CLIM-instructions will be the same as for the G-BLUP example
with only one change: PEDFILE iG_raw.dat changed to PEDFILE iGa.dat. The
predicted breeding values using this approach are as before
1 1 5 5.8356066
2 1 6 7.1650133
3 1 10 10.268371
47
Command Language Interface Manual (CLIM)
4 1 15 12.731009
7.3 Example: single-step method
Model is
y = 1µ+ Zaag + e
where ag has random additive genetic effect using pedigree and genomic information,
and e is random residual. Matrix Za is incidence matrix relating observation in y to
correct breeding value in a as in previous example.
The random effects have the following assumptions: ag ∼ N(0,Hσ2a), and e ∼ N(0, Iσ2e)
where H is pedigree relationship matrix blended with genomic data. The H is not
needed by MiX99 but its inverse H−1. The inverse is
H−1 = A−1 +
[
G−1 −A−1gg 0
0 0
]
where A is the full pedigree based relationship matrix, Agg is pedigree based rela-
tionship part for the genotyped animals, and G is genomic relationship matrix. User
has to provide matrix CGA = G−1 −A−1gg and the full pedigree used to calculate A to
MiX99. In practice, the single-step method in MiX99 is similar to using animal model
with an additional covariance matrix CGA in lower triangle co-ordinate sparse matrix
format (Yale format) by command IGFILE.
Let the full pedigree for relationship matrix A be in file one_step.ped:
animal1 sire2 dam3
1 2 3
2 4 3
3 5 6
4 5 6
5 0 0
6 0 0
7 5 6
8 2 3
It is the same as in the previous example but with two added animals (numbers 7 and
8). Inbreeding coefficients are in file full_one_step.inbr:
animal1 number2 F1
5 1 0.00000
6 2 0.00000
7 3 0.00000
3 4 0.00000
4 5 0.00000
2 6 0.25000
8 7 0.37500
1 8 0.37500
where the first column has original animal id code, and the last column inbreeding
coefficient.
In the single-step method it is important to use properly made genomic relationship
matrix G such that its scale is the same as the pedigree based relationship matrix A.
48
Command Language Interface Manual (CLIM)
A common approach is VanRaden method 1. In the example we will use this method,
and add 20% of pedigree based relationship matrix Agg to assume that the markers
explain 80% of genetic variation, and 20% is polygenic variation not explained by the
available markers. The same pedigree information will be used as in the previous
example. The CGA = G−1 −A−1gg matrix is provided in a file. As for the GBLUP model,
the matrix can have either the co-ordinate sparse matrix format (default) or the lower
triangle dense matrix format.
Assume the matrix has been stored in the default format, i.e., lower triangle co-ordinate
sparse matrix format, in file iH.dat:
1 1 -0.263096445605446
2 1 -0.659942436053408
2 2 0.244046491072819
3 1 0.870441978527262
3 2 0.998787679967059
3 3 -1.40380798986209
4 1 0.526879611011652
4 2 0.476171586907099
4 3 -0.469768966568885
4 4 -0.0336084181419456
5 1 0.332614778332254
5 2 -0.119449290320484
5 3 0.469298887947602
5 4 1.10970591785976
5 5 -0.351471648428734
6 1 0.595748584829456
6 2 0.150394111665596
6 3 0.405173304876264
6 4 -0.697147095891039
6 5 -0.745222104850557
6 6 1.21570834901989
Command IGFILE is used to indicate single-step method. The command has option
MIXED to allow giving mixed lower/upper triangle form matrix:
IGFILE MIXED iH.dat
The lower triangle dense matrix format has option LOWER and can be given as:
IGFILE LOWER iHL.dat
The MIXED option works as explained for the PEDFILE command earlier in G-BLUP
example: mixed co-ordinate sparse matrix format file has both lower and upper triangle
elements of a symmetric matrix but only one of them is assumed to be present. Thus,
in a symmetric matrix it is assumed that only lower or upper triangle element is present
in the file. The mixed option should be used with caution because MiX99 will not check
if a matrix element given as an upper and lower triangle element.
Variance components are σ2a = 1 and σ2e = 1. These are in file one_step.par:
Random effect1 Row2 Column3 Covariance1
1 1 1 1.0 polygenic variance
2 1 1 1.0 residual variance
CLIM code is
49
Command Language Interface Manual (CLIM)
DATAFILE gs_obs.dat
INTEGER animal mean
REAL y
MISSING -99999.
DATASORT PEDIGREECODE=animal
IGFILE iH.dat
PEDFILE one_step.ped
PEDIGREE animal am
INBRFILE full_one_step.inbr
INBREEDING PEDIGREECODE=1 FINBR=3
PARFILE one_step.par
MODEL
y = mean animal
The fixed general mean solution (Solfix) is
Fact. Trt Level N-Obs Solution Factor Trait
1 1 1 4 9.8195 mean y
Estimated breeding values (Solani) are
1 0 1 -4.2873
2 2 1 -2.8378
3 3 1 0.85603
4 1 1 2.9909
5 3 0 0.32826
6 3 0 2.6379
7 0 0 1.4831
8 0 0 -.99089
Again giving Unix command paste gs_obs.dat yHat.data0 associates animal
id numbers to their estimated complete breeding values. Result is
1 1 5 5.5322323
2 1 6 6.9817309
3 1 10 10.675563
4 1 15 12.810474
NEWIt is possible to give the matrices G−1 and A−1gg that form the CGA separately in dif-
ferent files by commands IGFILE and IA22FILE, respectively. In practice, it can be
computationally more efficient to let the solver program to do the required computa-
tions to make A−1gg computations using pedigree information. Then, no A−1gg need to be
computed before calling the preprocessor. This can be done by giving PEDIGREE as
file name to IA22FILE.
An alternative command file to do single-step is
DATAFILE gs_obs.dat
INTEGER animal mean
REAL y
MISSING -99999.
DATASORT PEDIGREECODE=animal
IGFILE LOWER iGL.dat
IA22FILE PEDIGREE
PEDFILE one_step.ped
50
Command Language Interface Manual (CLIM)
PEDIGREE animal am
INBRFILE full_one_step.inbr
INBREEDING PEDIGREECODE=1 FINBR=3
PARFILE one_step.par
MODEL
y = mean animal
which will be more efficient than the former code when there are many genotyped
animals. First, the G−1 matrix in file iGL.dat has been stored in the lower triangle
dense format. Second, the A−1gg matrix is not precomputed but the computations are
done using pedigree information by the solver.
8 Special topics
8.1 Trait groups for single trait analysis
8.1.1 Example: Multiple single trait analysis
It is common that several single trait analysis use the same pedigree and data file but
observations are on different columns. Still, multiple trait model analysis is not wanted
due to unknown variance components. Thus, although the data can be presented as
for multiple trait analysis, all covariances between traits are zero. This can be analyzed
as a multiple trait model by MiX99. However, this can be inefficient because the data
file may have many missing observations, and the traits have different effects.
Trait groups can be used to make the analysis more efficient. Now, the data is given
similarly to the repeatability model. However, instead of a repeatability model, there is
a trait group indicator to indicate which model is used. In the example model, the trait
group has observations from one trait.
We consider again the multiple trait model example with different effects by trait in
Chapter6.2.1). The model is
tr1 = herd× year +a+ e
tr2 = µ +a+ e
However, now the interest is in analyzing these two traits as separate independent
evaluations in the same MiX99 solver run. The multiple trait model way would be to do
as in Chapter 6.2.1 but with a parameter file where all covariances are zero.
Trait group way is to make the data to be similar to the repeatability data (Chap-
ter 5.1.5) but with an additional column to indicate trait. The data file (example_-
tr_group.dat) is
animal1 sire2 herd year3 ones4 trait5 tr121
4 1 11 1 1 90
4 1 21 1 2 200
6 3 11 1 1 110
6 3 21 1 2 190
8 5 12 1 1 120
8 5 22 1 2 140
51
Command Language Interface Manual (CLIM)
9 5 12 1 1 130
9 5 22 1 2 120
10 7 12 1 1 120
10 7 22 1 2 130
The parameter file (mt_single.var) is the same as would be for the multiple trait
model analysis described above:
Random effect1 Row2 Column3 Covariance1
1 1 1 3.0
1 2 2 2.5
2 1 1 7.0
2 2 2 7.0
Column trait is used to indicate model of the trait. It is referenced by the trait group
number in parenthesis on the model line. Command TRAITGROUP is needed to in-
dicate which column is the trait group column in the data file. The CLIM file would
be
DATAFILE example_tr_group.dat
INTEGER animal sire herd_year ones trait
REAL tr
TRAITGROUP trait
PEDFILE data/AM.ped
PEDIGREE animal am
PARFILE mt_single.var
MODEL SCALE
tr(1) = - herd_year animal
tr(2) = ones - animal
Fixed effect solutions (Solfix) are
Fact. Trt Level N-Obs Solution Factor Trait
1 2 1 5 160.73 ones tr
2 1 11 2 99.538 herd_yea tr
2 1 12 3 122.69 herd_yea tr
Breeding values estimates (Solani) are
Bull N-Desc N-Obs tr1 tr2
1 2 0 -.29831E-05 0.29868E-05
2 2 0 -.29831E-05 0.29868E-05
3 2 0 0.92308 -3.2188
4 2 2 -.92308 3.2188
5 3 0 -.12098E-04 -5.8818
6 3 2 1.8462 -.55584
7 1 0 0.65422 -5.0727
8 1 2 0.64449E-01 -7.4451
9 0 2 2.0506 -8.9024
10 0 2 -.17839 -9.9667
Solutions can be compared with those for the tr1 single trait evaluation in Chap-
ter 5.1.1. Solutions are not exactly the same. The reason is that the multiple trait
model analysis usually makes more iterations. For example, in the example above,
52
Command Language Interface Manual (CLIM)
the multiple trait model analysis converged in 18 iterations but the single trait analysis
converged in 12 iterations. There are 12 unknowns in the single trait model, and, in
theory, only 12 iterations are needed. However, the multiple trait model has 24 un-
knowns, although in two separate blocks with 12 unknowns. The PCG method tries
to solve the two separate systems at the same time. Solving and convergence in the
separate blocks is compromised.
8.1.2 Example: MACE or Sire model with weights and trait groups
We consider a multiple trait sire model where yields of daughters in different countries
are considered as different traits. This is MACE example as described in Schaeffer
(1994). The model is
y1 = µ1 + g1 + s1 + e1
y2 = µ2 + g2 + s2 + e2
where subscript is for countries 1 and 2, µ is country genetic base, y is bull’s daughter
yield deviation (DYD), g is genetic group effect of phantom parents, s is sire transmit-
ting ability by country, and e is residual.
The sire genetic effects and the residuals have the following assumptions:
E(s) = 0 Var(s) = G0 ⊗A
E(e) = 0 Var(e) = R
These look similar to standard multiple trait model assumptions. However, in the MACE
model the residual covariance matrix R is diagonal, i.e., residual correlations are zero,
and varies by sire. Residual covariance matrix for sire i is
Ri =
[
d1iσ
2
e1
0
0 d2iσ
2
e2
]
(5)
where d equals one over the number of daughters in a bull’s DYD, and σ2ej is residual
variance for country j. The di values in residual covariance matrix can be considered
as weights. Weights can be defined for each trait separately by option weight after
the model line. In Schaeffer (1994), the genetic covariance matrix is
G0 =
[
100 20
20 5
]
The residual variances are σ2e1 = 1000, and σ
2
e2
= 80.
Trait group has one or more traits that can be observed together from an individual but
cannot be observed with any trait belonging to another trait group. Residual correlation
between trait groups is zero. This definition of trait group matches with the MACE
model where country is trait group. In practice, observation belongs to a trait group
specified by an integer number column. The appropriate trait group number is given in
parenthesis after the observation name. For example, trait protein in trait group 1 is
protein(1), but in group 2, protein(2).
The parameter file MACE.var is
Random effect1 Row2 Column3 Covariance1 Variance
1 1 1 100.0 genetic
53
Command Language Interface Manual (CLIM)
Table 8.1: The pedigree and data files for the MACE model example.
pedigree file MACE.ped data file MACE.dat
Table 2 in Schaeffer (1994) Table 1 in Schaeffer (1994)
bull1 sire2 MGS13 MGD24 bull1 country2 protein1 weight32
1 6 7 −5 1 1 56.0 10.0
2 8 9 −5 2 1 −23.0 20.0
3 10 8 −5 3 1 8.0 50.0
4 10 11 −6 1 2 9.0 100.0
5 2 6 −6 4 2 3.0 40.0
6 −1 −2 −6 5 2 −11.0 20.0
7 −1 −2 −6
8 −1 −2 −6
9 −3 −4 −6
10 −3 −4 −6
11 −3 −4 −6
1maternal grandsire
2maternal grandam
3daughter yield deviation (DYD)
1 2 1 20.0 genetic
1 2 2 5.0 genetic
2 1 1 1000.0 residual
2 2 2 80.0 residual
TITLE MACE, L.Schaeffer (1994)
DATAFILE MACE.dat
INTEGER bull country
REAL protein weight
TRAITGROUP country
PEDFILE MACE.ped
PEDIGREE bull sm+p 1.0 # sm=sire model
PARFILE MACE.var
MISSING -8192.0
MODEL
protein(1) = country bull ! WEIGHT=weight
protein(2) = country bull ! WEIGHT=weight
Estimates of the country means (Solfix) are
Fact. Trt Level N-Obs Solution Factor Trait
1 1 1 3 10.497 country protein
1 2 2 3 1.2141 country protein
Breeding values estimates (Solani) by country are
54
Command Language Interface Manual (CLIM)
Country
Bull N-Desc N-Obs 1 2
1 0 2 31.132 7.0211
2 1 1 -26.538 -5.9504
3 0 1 -2.4056 -.47722
4 0 1 4.3014 1.1245
5 0 1 -29.221 -7.0674
6 2 0 10.936 2.3169
7 1 0 8.9970 2.0117
8 2 0 -12.881 -2.9245
9 1 0 -8.3029 -1.8438
10 2 0 1.4727 0.43745
11 1 0 0.46456E-01 0.69527E-01
-4 3 0 -.49058 -.76316E-01
-3 3 0 -.98117 -.15263
-2 3 0 1.2948 0.27736
-1 3 0 2.5895 0.55472
-5 3 0 1.5631 0.39077
-6 8 0 -3.9756 -.99390
8.2 Deregression
Deregression in MiX99 is based on Jairath et al. (1998) and Schaeffer (2001). Cal-
culation of deregressed proofs means solving a non-linear system of equations. The
system of equations looks the same as regular mixed model equations. However, it is
assumed that solutions for some animals (for which deregressed proofs are needed)
are known but solutions of their ancestors, phantom parent groups, and general mean
in the model are unknown. In addition, the deregressed proofs are unknown. In the
mixed model equations, the deregressed proofs are in the right hand side of the equa-
tion.
In MiX99 (or mix99s), the non-linear deregression problem is solved by a two step
iterative process:
1) solve ancestral and phantom parent group unknowns given current solution for
mean estimate, and known proofs.
2) calculate new general mean estimate
In practice, the first step means solving mixed model equations which is the core work
done in MiX99. The model used to solve ancestral and phantom groups has only one
fixed effect: general mean. Another important aspect of deregression model is that the
phantom parent groups are random.
Many methods can be used in solving non-linear systems. MiX99 has the following
methods
• Gauss-Seidel
• Bisection
• Secant
• Broyden
Default method is Broyden method which is often the fastest and most reliable of the
implemented methods. Secant method, however, can be better when solving many
55
Command Language Interface Manual (CLIM)
single trait models.
Deregression using mix99s needs to be specifically requested. A directive file for the
program is convenient to make as was described in Chapter 4. Let the directive file
(dereg.slv) be
H # RAM: RAM demand: H=high, M=medium, L=low
# Max. no. iterations, Stopping criterion, Criterion (A/R/M/D)
5000 1.0e-3 D F 1000
N # RESID: Calculate residuals? (Y/N)
R b # R=deregression
# b= Broyden method
# s= Secant method
# i= bisection
# n= Gauss-Seidel
N # adjust for HV? No
Y # Solution files? Yes
There are some most important difference to the regular breeding value estimation.
Deregression is requested (letter ’R’) with Broyden method (letter ’b’). In addition,
an additional number 1000 on the line for maximum number of iterations and con-
vergence criteria. The additional number is maximum number of iterations for the
non-linear solving method, which is Broyden method in this example. Deregression is
a non-linear problem. Consequently, two iterative methods are used: PCG iteration to
solve a linear problem, and Broyden method for the non-linear problem. In case the
Broyden method does not converge, the maximum number of iterations is reached.
Then, another method (e.g. secant method) can be used, or the problem needs re-
formulation (e.g. new definition of genetic groups, or different random genetic group
value).
8.2.1 Example: Single trait deregression
Pedigree earlier used for the animal and sire model examples are used. However, it is
modified for deregression purposes. The sire model pedigree (sm_dereg.ped) is
bull1 sire2 maternal
grand sire3
maternal grand
dam group4
1 −2 −2 −10
3 1 −2 −10
5 3 1 −11
7 5 3 −11
There is a fourth column for maternal grand dam group. This pedigree for animal
model (am_dereg.ped) is
animal1 sire2 dam3
1 −2 11
3 1 2
5 3 4
7 5 6
2 −2 −10
4 1 −11
6 3 −11
11 −2 −10
56
Command Language Interface Manual (CLIM)
The data is (dereg.dat)
sire1 ones2 proof1 EDC2
1 1 -.35736 50
3 1 0.40621 100
5 1 0.20334 80
7 1 0.24076E-01 20
The variance components file (SM.var) is the same as before
Random effect1 Row2 Column3 Variance1
1 1 1 0.75
2 1 1 9.25
Sire model The model for deregression is very simple: only general mean and the
sire effect. It is important to use random phantom parent groups. CLIM code for
deregression (sm_dereg.clm) is
DATAFILE dereg.dat
INTEGER sire ones
REAL ebv EDC
PEDFILE smgs.ped
PEDIGREE sire sm+p 1.0
PARFILE SM.var
MODEL
ebv = ones sire ! WEIGHT=EDC
Calculating deregressed proofs (mix99s < dereg.slv) will give solution for the
general mean (Solfix)
Fact. Trt Level N-Obs Solution Factor Trait
1 1 1 4 0.18958E-01 ones ebv
and the deregressed proofs (Solani):
sire N-Desc N-Obs deregressed proof
1 2 1 -.54488
3 2 1 0.49919
5 1 1 0.24384
7 0 1 -.13403
-11 2 0 0.0000
-10 2 0 0.0000
-2 3 0 0.0000
Solutions for the phantom parent groups (negative id code) can be ignored because
these solutions have been set to zero by MiX99. In general, deregressed proofs are
those in the Solani file that have an observation, i.e., N-Obs is one.
Animal model CLIM code for animal model is very similar to the sire model case
above:
DATAFILE dereg.dat
57
Command Language Interface Manual (CLIM)
INTEGER animal ones
REAL ebv EDC
PEDFILE am_dereg.ped
PEDIGREE animal am+p 1.0
PARFILE SM.var
MODEL SCALE
ebv = ones animal ! WEIGHT=EDC
General mean solution (Solfix) is exactly the same as before. Solutions are the
same (Solani):
animal N-Desc N-Obs deregressed proof
1 2 1 -.54488
3 2 1 0.49919
5 1 1 0.24384
7 0 1 -.13403
2 1 0 0.0000
4 1 0 0.0000
6 1 0 0.0000
11 1 0 0.0000
-11 2 0 0.0000
-10 2 0 0.0000
-2 3 0 0.0000
Naturally there are now more solutions because the pedigree had more animals. As
before, only solutions with observations (N-Obs equal to one) are relevant.
8.2.2 Example: Multiple trait deregression
Multiple trait deregression is done the same way as single trait deregression. We illus-
trate multiple trait deregression by example given in Schaeffer (2001) where detailed
explanation can be found. The example is on multiple trait sire model deregression for
international bull evaluation. We consider only the example data for country A. Country
B proceeds similarly.
Sire model pedigree (sch_sm.ped) is
bull1 sire2 maternal
grand sire3
maternal grand
dam group4
1 −22 −23 −24
2 −22 −23 −24
3 −22 −23 −24
4 −22 −23 −24
5 −22 −23 −24
6 −25 −26 −27
7 −25 −26 −27
8 −25 −26 −27
9 −25 −26 −27
10 −25 −26 −27
11 −25 −26 −27
12 1 2 −28
13 3 4 −28
14 3 5 −28
15 6 2 −28
58
Command Language Interface Manual (CLIM)
16 6 7 −29
17 3 8 −29
18 3 9 −29
19 3 10 −29
20 11 8 −29
21 11 3 −29
As before, there is a fourth column for maternal grand dam group. The data (sch_-
cntry_A.dat) has three lactations:
Lactation 1 Lactation 2 Lactation 3
ones1 sire2 Progeny1 EBV2 Progeny3 EBV4 Progeny5 EBV6
1 12 126 23 0 −999 0 −999
1 12 43 23 43 34 0 −999
1 12 36 23 36 34 36 38
1 13 18 36 0 −999 0 −999
1 13 5 36 5 21 0 −999
1 13 6 36 6 21 6 17
1 14 55 −14 0 −999 0 −999
1 14 21 −14 21 −26 0 −999
1 14 17 −14 17 −26 17 −49
1 15 17 48 0 −999 0 −999
1 15 7 48 7 66 0 −999
1 15 5 48 5 66 5 59
1 16 120 30 0 −999 0 −999
1 16 44 30 44 27 0 −999
1 16 39 30 39 27 39 3
The data has been constructed such that number of progeny in the third lactation also
were observed for first and second lactation. Consequently, number of progeny is
the same in all lactations when third lactation is observed. Similarly, the number of
progeny observed for second lactations were assumed to also be observed for first
lactation. This data structure is described in Schaeffer (2001).
The variance components file (sch_cntry_A.var) is
Random effect1 Row2 Column3 Variance1
1 1 1 96 Lact. 1 genetic
1 1 2 68 Lact. 1,2 genetic
1 1 3 62 Lact. 1,3 genetic
1 2 2 160 Lact. 2 genetic
1 2 3 110 Lact. 2,3 genetic
1 3 3 190 Lact. 3 genetic
2 1 1 1018 Lact. 1 residual
2 1 2 128 Lact. 1,2 residual
2 1 3 67 Lact. 1,3 residual
2 2 2 1625 Lact. 2 residual
2 2 3 170 Lact. 2,3 residual
2 3 3 1792 Lact. 3 residual
As for the single trait model, the model for deregression is very simple: only general
mean and the sire effect. It is important to use random phantom parent groups. CLIM
code for deregression (sch_sm.clm) is
59
Command Language Interface Manual (CLIM)
TITLE Multiple trait model
DATAFILE sch_cntry_A_2.dat # Data file
INTEGER ones sire # Integer column names
REAL w_1 e_1 w_2 e_2 w_3 e_3
DATASORT PEDIGREECODE=sire
MISSING -999
PEDFILE sch_sm.ped
PEDIGREE sire sm+p 1.0
PARFILE sch_cntry_A.var
PRECON b f # Preconditioner: b=block
MODEL
e_1 = ones sire ! weight=w_1
e_2 = ones sire ! weight=w_2
e_3 = ones sire ! weight=w_3
Calculating deregressed proofs (mix99s < dereg.slv) will give solution for the
general mean (Solfix)
Fact. Trt Level N-Obs Solution Factor Trait
1 1 1 15 25.011 mean e_1
1 2 1 10 24.971 mean e_2
1 3 1 5 13.586 mean e_3
and the deregressed proofs (Solani):
sire N-Desc N-Obs deregressed proof
...
12 0 3 22.510 34.257 45.390
13 0 3 45.149 9.6328 38.644
14 0 3 -17.035 -29.146 -75.258
15 0 3 50.363 85.822 118.94
16 0 3 30.240 26.829 -3.4327
...
Solutions were given above for only those animals that have an observations, i.e.,
N-Obs more than zero. All other solutions have been set to zero by MiX99.
60
Command Language Interface Manual (CLIM)
9 Summary of all commands
CLIM has optional commands, and options to the required commands. If an optional
command is not given then default values are used for this command. In general, the
commands are quite self explanatory. Below they are divided into groups. There are
chapter numbers after the short description. Required commands are explained in
Chapter 9.1 and optional commands in Chapter 9.2.
Data file commands
DATAFILE name of data file, 3.1
DATASORT information on how the data file was sorted, 9.2.2 (optional)
INTEGER integer number column names in the data file, 9.1.2
MISSING code for missing observations, 9.2.9 (optional)
REAL real number column names in the data file, 9.1.6
REGFILE regression coefficient matrix file, 9.2.15 (optional)
TABLEFILE name of the separate covariable table file, 9.2.20 (optional)
TABLEINDEX integer number column name of the covariable table file number
in the data file, 9.2.21 (optional)
TRAITGROUP integer number column name of the trait group number, 9.2.26
(optional)
Pedigree file commands
PEDFILE name of pedigree file, 3.2
PEDIGREE effect/component name in the model to which the pedigree is
attached. Also, type of pedigree information, i.e., model type
(animal or sire model). If random genetic groups, then
coefficient value as well, 9.1.5 (optional)
Variance component information
PARFILE MiX99 variance components file, 3.3
RESIDFILE name of residual (co)variance parameter file, 9.2.18 (optional)
RESIDUAL integer number column name of the residual (co)variance
number, 9.2.19 (optional)
REGPARFILE name of random regression matrix parameter file, 9.2.17
(optional)
Model commands
MODEL statistical model
RANDOM random effects in the model, 9.2.14 (optional if additive genetic
and residual effects are the only random effects)
REGMATRIX regression coefficient matrix information, 9.2.16 (optional)
61
Command Language Interface Manual (CLIM)
Solving
NORANSOL random effects for which no solution files are to be
written, 9.2.8 (optional)
PARALLEL number of processors used in parallel computing,
9.2.12 (optional)
PRECON preconditioning information, 9.2.13
TITLE title of the analysis, 9.2.24 (optional)
TMPDIR directory for the MiX99 temporary files, 9.2.25
(optional)
WITHINBLOCKORDER ordering of effects in the blocks, 9.2.27 (optional)
Macros and range abbreviations
DEFINE defines a text macro replacement
9.1 Required commands
Following are explanation and syntax of all commands except MODEL which is consid-
ered in Chapters 5, 6, 7 and 8.
9.1.1 DATAFILE
Name of data file. Optional information: file type of text or binary can be given. Default
is text file. So, for a binary file, file type has to be always specified.
Syntax:
DATAFILE [TEXT/BINARY] <filename>}
Example. Data file Beef_MiX.dat has standard text data.
DATAFILE ../data/Beef_MiX.dat
9.1.2 INTEGER
Names of integer number columns in the data file. These are used to give names to
data file columns.
Syntax:
INTEGER <names of integer variables>
Example. Data file having 8 columns of integer data information. The first column is
named block, second is id, the third is trt_group etc.
INTEGER block id trt_group HTM AGE DCC DIM
9.1.3 PARFILE
Name of (co)variance parameter file. Information in the file has the same format as
described in the MiX99 manual for mix99i Technical reference guide for MiX99 pre-
processor .
Syntax:
PARFILE <filename>
Example. Name of the parameter file is Beef.par.
62
Command Language Interface Manual (CLIM)
PARFILE ../data/Beef.par
9.1.4 PEDFILE
Name of pedigree file. This pedigree file is read by mix99i. When type of pedigree
is FILE in command PEDIGREE, the file has inverse of the co-variance matrix for
breeding values. Default format for the matrix is lower triangle co-ordinate sparse
matrix format (see Ch. 7.2). Option MIXED can be used to relax requirement of lower
triangle matrix (see Ch. 7.2). However, this means that there can be element (1,2) of
matrix, i.e., upper triangle element, but there cannot be corresponding lower triangle
element (2,1) in the file. NEWAnother optional format is lower triangle dense matrix or
LOWER (see Ch. 7.2).
Syntax:
PEDFILE [LOWER | MIXED] <filename>
Example 1. Name of the pedigree file is Beef_phantom.ped.
PEDFILE ../data/Beef_phantom.ped
Example 2. Name of co-ordinate format matrix with upper and lower triangle ele-
ments:
PEDFILE MIXED ../data/iG_matrix.dat
NEWExample 3. Name of lower triangel dense matrix:
PEDFILE LOWER ../data/iGL_matrix.dat
9.1.5 PEDIGREE
Pedigree type and other information. See MiX99 manual Technical reference guide
for MiX99 pre-processor for the pedigree types. Common pedigree types are am for
animal model and sm for sire model. Autoregressive model has type ar. Genetic
groups are indicated by suffix +p, e.g., am+p for animal model with phantom genetic
groups. Pedigree has been stored in file given by command PEDFILE.
When pedigree type is FILE, the inverse of the relationship co-variance structure (e.g.
G−1 in G-BLUP) is in file specified by PEDFILE command. See example in Ch. 7.2.
Syntax:
PEDIGREE <effect name> <pedigree type> &
[<optional number for random genetic groups>]
Example 1. Pedigree is for an animal model with random genetic groups. Pedigree is
associated with model effect name G. The phantom parent groups are random with
genetic variance coefficient 1.0.
PEDIGREE G am+p 1.0
NEWExample 2. GBLUP model where the inverse of the genomic relationship matrix G−1
is in file iGL.dat stored in lower triangle dense matrix format.
PEDFILE LOWER iGL.dat
PEDIGREE G FILE
9.1.6 REAL
Names of real number columns in the data file. Integer number columns are always
before real number columns. See MiX99 manual Technical reference guide for MiX99
pre-processor for more information.
63
Command Language Interface Manual (CLIM)
Syntax:
REAL <names of real variables>
Example. There are 3 real number columns (after the integer number columns) in the
data file. The first is B_WEIGHT, the second is W_WEIGHT, and the third is W_AGE.
REAL B_WEIGHT W_WEIGHT W_AGE
9.2 Optional commands
The following commands have default values that are used if command is not given.
9.2.1 AR
Define the autoregressive values for each trait in autoregressive model. When this
command is given, the PEDIGREE type must be ar. Default is no autoregressive
model.
Syntax:
AR <values for each trait>
Example. Autoregressive values for 2 traits
AR 0.8 0.9
9.2.2 DATASORT
Names of the integer number columns for the block sorting variable (BLOCK), and the
animal sorting variable (PEDIGREECODE). See MiX99 manual Technical reference
guide for MiX99 pre-processor for more explanation. By default none are needed.
Syntax:
DATASORT BLOCK =<block in INTEGER column> &
PEDIGREECODE=<code in INTEGER column>
Example. The block code is integer number column block, and the pedigree code is
column animal in the data file
DATASORT BLOCK=block PEDIGREECODE=animal
9.2.3 IA22FILE
NEWGives file having matrixA−1gg used in the single-step method. In the single-step method,
matrix CGA = G−1 − A−1gg is needed. The two matrices can be given in a single file
using the IGFILE command, or in two separate files giving G−1 to the IGFILE and
A−1gg to the IA22FILE.
This approach of giving two separate filenames for the components of CGA allows
separation of the needed computations for the two matrices. In particular, giving file
name PEDIGREE for IA22FILE indicates MiX99 that the computations for A−1gg are
done using pedigree information without need to give this matrix explicitly in a file.
Default format is the co-ordinate (Yale) sparse matrix format. The preprocessor checks
that each row is in lower triangle part of the matrix. This check is simple (the second
column value (j) must be less or equal to the first column value (i)), and not all programs
produce this format. Thus, option MIXED allows reading the file without the check.
Another format allowed is the lower triangle dense format (see Ch. 7.2). which can be
prepared using hginv program. For this matrix option LOWER need to be used.
64
Command Language Interface Manual (CLIM)
Syntax:
IA22FILE [LOWER | MIXED] <filename>
If the filename is PEDIGREE then no file of that name is read but instead the solver will
use pedigree information to do the required computations.
Example. Matrix A−1gg is in file iA22L.dat
IA22FILE LOWER iA22L.dat
9.2.4 IGFILE
Gives file having matrix CGA = G−1 − A−1gg used by the single-step method. In case
IA22FILE is given as well, then the file given by IGFILE contains G−1 for the single-
step and IA22FILE has A−1gg .
Default matrix format is the co-ordinate (Yale) sparse matrix format. The preprocessor
checks that each row is a lower triangle part of the matrix. This check is simple (the
second column value (j) must be less or equal to the first column value (i)), and not all
programs produce this format. Thus, option MIXED allows reading the file without the
check. Another format allowed is the lower triangle dense format (see Ch. 7.2). which
can be prepared using hginv program. For this matrix option LOWER need to be used.
Syntax:
IGFILE [LOWER | MIXED] <filename>
Example. Matrix CGA is in file iH.dat
IGFILE iH.dat
9.2.5 IHPRECON
NEWGives file having changes to the preconditioner matrix in the diagonal of inverse rela-
tionship matrix. For example, consider inverse relationship matrix for the single-step
method: H−1 = A−1−
[
0 0
0 G−1 −A−1gg
]
where A−1 is inverse of pedigree relationship
matrix for all animals, G−1 is inverse genomic relationship matrix, and A−1gg is inverse
pedigree relationship matrix of genotyped animals. When command IA22FILE is
used with the PEDIGREE option, no precalculated diagonal element values of A−1gg are
available for the preconditioner. These values can be given using IHPRECON in a file.
One diagonal element of −A−1gg is given on a line: <ID code> <value>. Note that these
are NOT diagonal elements of Agg nor A−1gg but diagonal elements of −A−1gg .
Syntax:
IHPRECON <filename>
Example. Diagonal of matrix −Agg is in file minus_diA_genotyped.dat
IHPRECON minus_diA_genotyped.dat
9.2.6 INBREEDING
Column numbers of individual id code and inbreeding coefficient in the inbreeding
coefficient file (see 9.2.7). Default is that inbreeding coefficients are all zero. Column
number of the individual id code must be before the inbreeding coefficient.
Syntax:
INBREEDING PEDIGREECODE=<individual id code column> &
FINBR=<inbreeding coefficient column>
65
Command Language Interface Manual (CLIM)
Example. The first column has the individual id code, and the third column has the
inbreeding coefficient.
INBREEDING PEDIGREECODE=1 FINBR=3
9.2.7 INBRFILE
Name of inbreeding coefficient file. Default is that all inbreeding coefficients are zero,
and, thus, no inbreeding coefficient file is read.
Syntax:
INBRFILE <filename>
Example. Inbreeding coefficient file is AM.inbr.
INBRFILE AM.inbr
9.2.8 NORANSOL
Give random effects for which no solution files are made. Default is that solutions are
written for all random effects.
Syntax:
NORANSOL <random effects>
Example. No solutions are written of effects HTM and PE.
NORANSOL HTM PE
9.2.9 MISSING
Number indicating missing information for data in the real number columns. Default is
zero.
Syntax:
MISSING <number for missing>
Example. Set missing value to -99999.0
MISSING -99999.0
9.2.10 RESTARTSOL
Make restart solution file. The restart solution file allows the solver mix99s to start
iteration using old solutions. Default is no file is written. Command RESTARTSOL is
an option within MODEL command. The option is given on the same line as command
MODEL.
Syntax:
MODEL RESTARTSOL
Example. Restart solution files requested.
MODEL RESTARTSOL
9.2.11 SCALE
Scaling of observation by residual standard deviation. Scaling of observations by the
residual standard deviation can be important for multiple trait models. Scaling makes
observations from different traits to be on the same residual scale which may lead to
numerically better behaving computations. Before any output is generated, all solu-
tions are scaled back to the original units. Default is no scaling. Command SCALE is
66
Command Language Interface Manual (CLIM)
an option within MODEL command. The option is given on the same line as command
MODEL.
Syntax:
MODEL SCALE
Example. Scaling is requested.
MODEL SCALE
9.2.12 PARALLEL
Information on parallel computing: number of processors and number of common area
blocks. Alternatively, common area blocks can be defined by specifying the block code
of the first common area block instead and adding option FIRST after it.
Optional additional information is method of the work load division, and total maximum
size of I/O buffers. Default is no parallel computing, i.e., number of processors is zero.
Work division is either by number of records (default), or number of equations. Giving
letter E or e will use number of equations in work division to the processors. Total size
of I/O buffers is given as an integer number in megabytes but by default is determined
by the preprocessor program. See PARALLEL in MiX99 manual Technical reference
guide for MiX99 pre-processor for more information.
Syntax:
PARALLEL <N processors> <N common blocks> [<work division> <buffer size>]
PARALLEL <N processors> <first common block> FIRST [...]
Example. Parallel computing with 6 processors. The last 10 blocks in the data file
belong to the common area blocks.
PARALLEL 6 10
9.2.13 PRECON
Information on the preconditioning. Format is the same as given in MiX99 manual
for mix99i Technical reference guide for MiX99 pre-processor . Thus, first characters
(one for each effect) are for the within block effects, and then one common for all
across blocks effects. Default is block diagonal preconditioner for all effects.
effect type available preconditioners
within block d=diagonal, b=block diagonal.
across blocks fixed d=diagonal, b=block diagonal,
f= full block, m=mixed block
Note that giving PRECON n will lead to use of no preconditioner. See MiX99 manual
for details.
Syntax:
PRECON <preconditioning information>
Example. Block diagonal preconditioner is used for the 4 within block effects. The
across blocks fixed effects have mixed block preconditioner where the first effect is in
block of its own and the others are in another block.
PRECON b b b b m
67
Command Language Interface Manual (CLIM)
9.2.14 RANDOM
Random effect names other than the additive genetic effect associated to pedigree.
If command is not given, the only random effects are additive genetic and residual
effects. Order of the effects give numbering of the random effects.
Syntax:
RANDOM <effect names>
Example. There are four random effects: HTM, PE, additive genetic, and residual.
Thus, HTM is random effect number 1, PE is number 2, additive genetics is number 3,
and residual is number 4. These numbers are used in the (co)variance file defined by
PARFILE.
RANDOM HTM PE
or alternatively
RANDOM HTM PE animal
9.2.15 REGFILE
Regression matrix coefficient file. Commonly SNP coefficient matrix is given as a
regression coefficient matrix file. See also commands REGPARFILE and REGMATRIX.
Syntax:
REGFILE <filename>
Example. Regression matrix in file snp.dat
REGFILE snp.dat
9.2.16 REGMATRIX
Regression matrix information. The information given:
• type:
FIXED Fixed coefficient matrix,
RANDOM Random effects with single common variance,
HETEROGENEOUS Each effect has its own variance
• name: name of the effect
• column number information:
ID = a Column number of individual code is a (optional),
FIRST = b First column of regression coefficients is b,
LAST = c Last column number is c.
• NEWSNP marker information (optional):
– Imputation:
IMPUTE = i Missing (integer) marker value i to be imputed.
– Centering:
CENTER Average marker value.
68
Command Language Interface Manual (CLIM)
CENTER = r Constant real value r.
CENTER = f Separate centering for markers in file f.
– Scaling:
SCALE = r Constant real value r.
SCALE = f Separate scaling for markers in file f.
– File format:
FORMAT = n|m|s File format is n (or normal) for normal real valued re-
gression coefficients with spaces separating columns, m (or markers)
for SNP markers of integer values ’0’, ’1’, ’2’ and optional missing
value of IMPUTE (with spaces separating columns), or s (squeezed)
for “squeezed” SNP marker values without separating spaces.
• preconditioner type (optional):
PRECON = n|d|b Preconditioner type is n for none, d for diagonal (default),
or b for block diagonal. Block diagonal means trait block by regression
coefficient.
See also commands REGFILE and REGPARFILE.
Syntax:
REGMATRIX <type> <name> [ID=<column>] FIRST=<column> LAST=<column>
[IMPUTE=<missing>] [CENTER[={<real value>|<filename>}]]
[SCALE={<real value>|<filename>}] [FORMAT=<file format>]
[PRECON=<preconditioner>]
Example. Regression matrix information: common random variance, name of effects
is snp, regression coefficients in columns 3 to 10. No column for id code. Block
diagonal preconditioner.
REGMATRIX RANDOM snp FIRST=3 LAST=10 PRECON=b
9.2.17 REGPARFILE
Variance components for a random regression matrix. See also commands REGFILE
and REGMATRIX.
Syntax:
REGPARFILE <filename>
Example. Variance components in file reg.par
REGPARFILE rep.par
9.2.18 RESIDFILE
Residual variance covariance matrix file in case of different residuals for different ob-
servations. If residual file is given then data file must have an integer number column
associated with the residual matrix number. See command RESIDUAL. Default is that
no additional residual variance file is used, i.e., the same residual (co)variance defined
in PARFILE is used for all observations.
Syntax:
69
Command Language Interface Manual (CLIM)
RESIDFILE <filename>
Example. Residuals are in file mix99pat.respar
RESIDFILE ./data/mix99par.respar
9.2.19 RESIDUAL
Name of integer number column in the data file indicating number of residual variance
used for this observation (see RESIDFILE). Default is that same residual (co)variance
is used for all observations.
Syntax:
RESIDUAL <integer column name>
Example. Residual variance number is on integer data column ResidualNumber.
RESIDUAL ResidualNumber
9.2.20 TABLEFILE
Name of covariable table file. See TABLEINDEX command. Default is no table index
file is needed.
Syntax:
TABLEFILE <filename>
Example. Covariable table file is FinTDMpara.cov.
TABLEFILE FinTDMpara.cov
9.2.21 TABLEINDEX
Name of integer number column in the data file indicating column for table index. De-
fault is no table index. See TABLEFILE.
Syntax:
TABLEINDEX <integer column name>
Example. Index for the covariable table file is on the integer number column DIM. in
the data file.
TABLEINDEX DIM
9.2.22 TAFILE
DEVGives file having the TA matrix in G−1 = ((1−w)ZZ′+wAgg)−1 = 1wA−1gg −T′ATA used
by the single-step method where w is the residual polygenic proportion. The approach
requires the A−1gg matrix to be given separately using the command IA22FILE, or by
option PEDIGREE.
Format of the TA matrix file is similar to the lower triangle dense format. However, the
TA matrix is a rectangular matrix. The TA matrix file needs to be made by a special
preprocessing program.
Syntax:
TAFILE <filename>
Example. Matrix TA is in file TA.dat
TAFILE TA.dat
70
Command Language Interface Manual (CLIM)
9.2.23 TEFILE
NEWGives file having the Te matrix in G−1 = (ZZ′ + I)−1 = 1 I−T′eTe used by the single-
step method where  is a small number such as 0.01. The approach requires the A−1gg
matrix to be given separately using the command IA22FILE, or by option PEDIGREE.
Format of the Te matrix file is similar to the lower triangle dense format. However, the
Te matrix is a rectangular matrix. The Te matrix file needs to be made by a special
preprocessing program such as Teig_make.
Syntax:
TEFILE <filename>
Example. Matrix Te is in file Te.dat
TEFILE Te.dat
9.2.24 TITLE
Line for title of the analysis. Default title is:
MiX99 analysis time: <current time and date>
Syntax:
TITLE <Title of the analysis>
Example.
TITLE New model for Simmental birth weight
9.2.25 TMPDIR
Directory for temporary files. Default is current directory.
Syntax:
TMPDIR <directory>
Example.
TMPDIR ./tmpMiX
9.2.26 TRAITGROUP
Name of integer number column for the trait group.
Syntax:
TRAITGROUP <integer column name>
Example. Trait group is in integer number column trait.
TRAITGROUP trait
9.2.27 WITHINBLOCKORDER
Ordering of effects within block. Order of effect after the command name gives the
ordering. Default is that only animal genetic effect is within block.
Syntax:
WITHINBLOCKORDER <effect names>
Example. There are three effects within block. Order of effects within block: 1=G,
2=PE, 3=HerdYear.
WITHINBLOCKORDER G PE HerdYear
71
Command Language Interface Manual (CLIM)
9.2.28 DEFINE
DEVDefines a macro identifier to be used as an abbreviation for its replacement text:
DEFINE <macro name> <replacement text>
Instructs to replace all successive occurrences of the identifier with the replacement.
Allows to shorten repeatedly occuring text, for example, directory names and common
effects in complex CLIM models.
In addition to macros, a range expansion can be used as an abbreviation. Every
occurence of
<identifier><N>:<M>
is replaced by a space separated list of
<identifier><N> <identifier><N+1> ... <identifier><M-1> <identifier><M>
where the identifier is replicated M-N+1 times with integers from N to M. For example
het1:5 is replaced by
het1 het2 het3 het4 het5
Range expansion can be used, for example, to name INTEGER and REAL columns, or
to specify covariable table columns in complex models.
Example:
DEFINE HomeDIR /home/user
DEFINE WorkDIR .
DATAFILE HomeDIR/mydata.dat
PEDFILE HomeDIR/mydata.ped
REAL milk protein fat x1:2 het1:5
DEFINE CurveMILK Curve(t1:3 t4 t96 | SEASON)
DEFINE CurvePROT Curve(t1:3 t95 t97 | SEASON)
DEFINE CurveFAT Curve(t1:3 t95 t98 | SEASON)
DEFINE Common AGE DCC YM HTM
MODEL SCALE
milk = CurveMILK Common PE( t5:10|animal)@1st G(t59:64|animal)@FST
protein = CurvePROT Common PE(t11:16|animal)@1st G(t65:70|animal)@FST
fat = CurveFAT Common PE(t17:22|animal)@1st G(t71:76|animal)@FST
TMPDIR WorkDIR/tmpMiX
is replaced by
DATAFILE /home/user/mydata.dat
PEDFILE /home/user/mydata.ped
REAL milk protein fat x1 x2 het1 het2 het3 het4 het5
MODEL SCALE
milk = Curve(t1 t2 t3 t4 t96 | SEASON) AGE DCC YM HTM &
PE(t5 t6 t7 t8 t9 t10 | animal)@1st &
G(t59 t60 t61 t62 t63 t64 | animal)@FST
protein = Curve(t1 t2 t3 t95 t97 | SEASON) AGE DCC YM HTM &
PE(t11 t12 t13 t14 t15 t16 | animal)@1st &
G(t65 t66 t67 t68 t69 t70 | animal)@FST
fat = Curve(t1 t2 t3 t95 t98 | SEASON) AGE DCC YM HTM &
72
Command Language Interface Manual (CLIM)
PE(t17 t18 t19 t20 t21 t22 | animal)@1st &
G(t71 t72 t73 t74 t75 t76 | animal)@FST
TMPDIR ./tmpMiX
Currently, preprocessor (mix99i) command line option --usemacros is needed to
activate the CLIM macro and range expansion.
73
Command Language Interface Manual (CLIM)
10 Appendix: Quick reference card
The following commands are necessary. Options are in square brackets [ ]. Syntax
and short explanation of the required commands are in Chapter 9.1 except for com-
mand MODEL which is considered separately in Chapters 5, 6, 7 and 8. Note that in
CLIM and in the following, symbol ’&’ is continuation to the next line.
DATAFILE [TEXT | BINARY] <FileName>
INTEGER <column names>
REAL <column names>
PARFILE <FileName>
PEDFILE <FileName>
PEDIGREE <Effect name> [ FILE | <type of relationship matrix> &
[<coefficient for random genetic group>] ]
MODEL [SCALE] [RESTARTSOL]
<model lines>
Optional commands (see “Optional commands” for more details):
DATASORT BLOCK=<block in INTEGER> PEDIGREECODE=<code in INTEGER>
IGFILE [LOWER|MIXED] <filename>
IA22FILE [LOWER|MIXED] <filename>
MISSING <value for missing real number data>
NORANSOL <random effect numbers without solution file>
PARALLEL <number of processors> <number/first of common blocks> [FIRST]
PRECON <preconditioning information>
RANDOM <random effect names>
REGFILE <filename>
REGPARFILE <filename>
REGMATRIX <type> <name> [ID=<column>] FIRST=<column> LAST=<column>
[IMPUTE=<missing>] [CENTER[={<real value>|<filename>}]]
[SCALE={<real value>|<filename>}] [FORMAT=<file format>]
[PRECON=<preconditioner>]
RESIDFILE <filename>
RESIDUAL <INTEGER column name of the residual variance number>
TABLEFILE <filename>
TABLEINDEX <table index INTEGER column name>
TITLE <title of analysis>
TMPDIR <directory>
TRAITGROUP <trait group INTEGER column name>
WITHINBLOCKORDER <Effect names in the order>
DEFINE <macro name> <replacement text>
Special symbols that cannot be used in user defined names like data file column
names:
# start for comment
& symbol for line continuation
” ” string in between the apostrophes is read un-
changed
! options follow (on the model line)
( ) parenthesis used on model line(s)
74
Command Language Interface Manual (CLIM)
11 References
Jairath, L., Dekkers, J.C.M., Schaeffer, L.R., Liu, Z., Burnside, E.B., and Kolstad, B.
(1998). ”Genetic evaluation for herd life in Canada”. In: J. Dairy Sci. 81.2, pp. 550–
562. DOI: 10.3168/jds.S0022-0302(98)75607-3 (cit. on p. 55).
Lidauer, M., Matilainen, K., Mäntysaari, E. A., Pitkänen, T., Taskinen, M., and Strandén,
I. (2019). Technical reference guide for MiX99 pre-processor. Release XI/2019.
Natural Resources Institute Finland (Luke) (cit. on pp. 7, 62–64, 67).
MiX99 Development Team (2019). MiX99: A software package for solving large mixed
model equations. Release XI/2019. Natural Resources Institute Finland (Luke).
Jokioinen, Finland. URL: http://www.luke.fi/mix99 (cit. on p. ii).
Mrode, R.A. and Thompson, R. (2006). Linear models for the prediction of animal
breeding values. CABI. DOI: 10.1079/9780851990002.0000 (cit. on p. 4).
Schaeffer, L. R. and Dekkers, J. C. M. (1994). ”Random regressions in animal mod-
els for test-day production in dairy cattle”. In: Proc. 5th World Congr. Genet. Appl.
Livest. Prod. Vol. 18, pp. 443–446 (cit. on pp. 22, 25, 26).
Schaeffer, L.R. (1994). ”Multiple-country comparison of dairy sires”. In: J. Dairy Sci.
77.9, pp. 2671–2678. DOI: 10.3168/jds.S0022-0302(94)77209-X (cit. on pp. 53,
54).
Schaeffer, L.R. (2001). ”Multiple trait international bull comparisons”. In: Livest. Prod.
Sci. 69.2, pp. 145–153. DOI: 10.1016/S0301-6226(00)00255-4 (cit. on pp. 55, 58,
59).
Strandén, I. and Vuori, K. (Aug. 2006). ”RelaX2: pedigree analysis program”. In: Proc.
8th World Congr. Genet. Appl. Livest. Prod. Belo Horizonte, MiG, Brazil, pp. 27–30
(cit. on pp. 7, 15).
75
Command Language Interface Manual (CLIM)
Index
Index entry styles: Page numbers:
• normal index entry • primary definition: 76
• CLIM input commands • also referred: 76
• file names • Z see example on page
• MiX99 input commands
• shell commands
+p, 15, 15, 63
across blocks effects, 12, 67
additional residual (co)variance file, 11
additive genetic effect, 1, 4, 5, 10, 14,
16, 17, 21, 36, 45, 68
additive genetic variance, 16
am, 63
Z 3, 14, 16–18, 23, 26–30, 33, 35,
46, 50–52
am+p, 63
Z 15, 36, 37, 58, 63
animal model, 4, 4, 14–16, 19, 22
ApaX, 7, 9
apax99, 1, 7, 11
apax99p, 1, 1, 7, 11
AR, 64
Z 64
ar, 63, 64
beta version, 2, 3, 31, 37
BINARY
Z 62, 74
binary format data file, 7
binary format solution file, 12
BLOCK, 64
Z 64, 74
block code, 7, 8, 9, 64, 67
block diagonal preconditioner, 67, 67
block ordering, 7, 64
blocks, 7
Broyden method, 55, 56, 56
CENTER, 41, 68
Z 41, 69, 74
centering of SNP markers, 41
CLIM, 1
coefficient matrix, 6, 6
combining traits, 34
command file, 2
command language interface, 1
command line options, 3, 11, 37, 73
commands, 61
AR, 64
DATAFILE, 61, 62
DATASORT, 61, 64
DEFINE, 62, 72
IA22FILE, 64
IGFILE, 65
IHPRECON, 65
INBREEDING, 65
INBRFILE, 66
INTEGER, 61, 62
MISSING, 61, 66
MODEL, 61, 62
NORANSOL, 62, 66
optional, 64
PARALLEL, 62, 67
PARFILE, 61, 62
PEDFILE, 61, 63
PEDIGREE, 61, 63
PRECON, 62, 67
RANDOM, 61, 68
REAL, 61, 63
REGFILE, 61, 68
REGMATRIX, 61, 68
REGPARFILE, 61, 69
required, 62
RESIDFILE, 61, 69
RESIDUAL, 61, 70
RESTARTSOL, 66
SCALE, 66
TABLEFILE, 61, 70
TABLEINDEX, 61, 70
TAFILE, 70
TEFILE, 71
76
Command Language Interface Manual (CLIM)
TITLE, 62, 71
TMPDIR, 62, 71
TRAITGROUP, 61, 71
WITHINBLOCKORDER, 62, 71
comment sign, 3
common blocks, 67
component names, 14, 16, 17, 21
convergence criteria, 6, 12, 56
Ca, 11
Cd, 11
Cm, 11
Cr, 11
covariable table file, 1, 22, 22, 24, 25,
61, 70
covariance matrix, 22, 29, 36, 42
covariance structure, 16, 16, 27
COVFILE
Z 46
data file, 4, 7, 7–9, 11, 13, 16–26, 30,
32, 35, 38–40, 51, 52, 54,
61–64, 67, 69, 70, 74
DATAFILE, 38, 39, 61, 62
Z 3, 8, 14, 18, 19, 21, 23, 26, 27,
29, 30, 33, 35–37, 39, 44, 46,
50, 52, 54, 57, 60, 62, 72, 74
DATASORT, 61, 64
Z 14, 18, 30, 40, 44, 46, 50, 60,
64, 74
daughter yield deviation, 53
DEFINE, 62, 72
Z 37, 72, 74
deregressed proofs, 55, 57
deregression, 55, 56, 57, 59
design matrix, 4, 4
Development features (DEV), ii, 37, 70,
72
directive file, 1, 2, 3, 31, 37, 56
DYD, daughter yield deviation, 53, 53,
54
executing CLIM, 2
FILE, 44, 63, 63
Z 44, 63, 74
files, 7
covariable tables, 22, 72
data, 7
directive, 2
multiple residual (co)variances, 11
multiple-trait data, 8
pedigree, 8
regression coefficient matrix, 38
solution files, 12
Sol_mn, 12
Solani, 13
Solfnn, 12
Solfix, 13
Solrnn, 12
Solreg, 12
Solreg_mat, 40
variance components, 10
FINBR
Z 16, 46, 50, 51, 65, 66
FIRST
Z 40–42, 67, 69, 74
PARALLEL, 67
REGMATRIX, 68
FIXED, 38, 68
FORMAT, 41, 69
Z 41, 42, 69, 74
G-BLUP, 38, 42, 42, 44, 45, 47, 49, 63
genetic covariance matrix, 4, 5, 28, 32,
33, 53
genomic breeding value, 40, 40, 45–47
genomic marker effect, 38, 38
genomic relationship matrix, 38, 38,
44, 47, 48
HETEROGENEOUS, 68
heterogeneous residual variance, 26
I/O buffer size, 67
IA22FILE, 50, 64, 64, 65, 70, 71
Z 50, 65, 74
ID, 38, 68
Z 40–42, 69, 74
IGFILE, 48–50, 64, 65, 65
Z 49, 50, 65, 74
IHPRECON, 65, 65
Z 65
imake99, 1
imputation, 41, 68
IMPUTE, 40, 68
Z 40, 69, 74
INBREEDING, 65
Z 16, 46, 50, 51, 65, 66
77
Command Language Interface Manual (CLIM)
inbreeding coefficient, 15, 15, 45, 48,
65
inbreeding coefficient file, 45, 65, 66
INBRFILE, 66
Z 16, 46, 50, 51, 66
incidence matrix, 4, 16, 42, 45, 48
instruction file, 1, 2, 14, 15, 25, 30, 31
INTEGER, 3, 11, 13, 61, 62, 72
Z 3, 8, 14, 16–19, 21, 23, 26, 27,
29, 30, 33, 35–37, 39, 44, 46,
50, 52, 54, 57, 60, 62, 64, 74
integer number column, 7, 13, 16, 21,
24, 26, 34, 35, 61, 62, 64,
69–71
intercept, 21, 21, 22
IOD, iteration on data, 6
iteration on data, 6
iterative method, 6, 36, 56
lactation curve, 21, 21, 24
lactation model, 36
LAST, 68
Z 40–42, 69, 74
line continuation, 3, 13
LOWER
Z 43, 49, 50, 63, 65, 74
IA22FILE, 64
IGFILE, 49, 65
PEDFILE, 43, 63
LS-model, 12
MACE model, 53, 53
macro, 72
macro and range abbreviations, 37, 72
maternal effect model, 27, 37
maximum number of iterations, 6, 6,
11, 56
memory requirements, 6, 10, 12
MISSING, 2, 8, 61, 66
Z 39, 44, 46, 50, 54, 60, 66, 74
missing effect, 29, 31
missing marker value, 40, 40, 41
missing observation, 51
missing parents, 9
missing value
integer number column, 8
real number column, 8, 66
missing variance component, 29
MiX99_DIR.DIR, 2, 2, 3, 31, 32
mix99i, 1, 1–3, 17, 62, 63, 67, 73
mix99p, 1, 6, 7, 9, 11, 11
mix99s, 1, 6, 7, 11, 11, 40, 55, 56, 66
MIXED
Z 43, 49, 63, 65, 74
IA22FILE, 64
IGFILE, 49, 65
PEDFILE, 43, 63
mixed model equations, 5, 5–7, 42, 55
solving, 6
MODEL, 61, 62, 66, 67, 74
Z 3, 14, 16–19, 21, 23, 26–33,
35–38, 40, 44, 46, 50–52, 54,
57, 58, 60, 66, 67, 72, 74
models
animal, 3, 4, 14
Finnish test-day, 36
G-BLUP, 38
genomic data models, 38
genomic evaluation, 38
maternal effect, 27
multiple-trait, 37
multiple-trait, 5, 29
random regression, 21
reduced rank random regression,
34
repeatability animal, 16
single trait, 4
sire, 19
SNP-BLUP, 38
multiple trait model, 2, 5, 7, 8, 10, 29,
30, 34, 35, 51, 53
multiple trait sire model, 53, 58
nesting, 1, 2, 4, 21, 21, 24, 28
New features (NEW), ii, 11, 40, 43, 50,
63–65, 68, 71
NORANSOL, 2, 62, 66
Z 66, 74
numerator relationship matrix, 4, 27
old solutions, 66
optional commands, 61, 64
PARALLEL, 62, 67
Z 67, 74
parallel computing, 1, 7–9, 62, 67
work load division, 67
PARFILE, 29, 32, 61, 62, 68, 69
78
Command Language Interface Manual (CLIM)
Z 3, 14, 18, 19, 21, 23, 26, 27, 29,
30, 33, 35–37, 40, 44, 46,
50–52, 54, 57, 58, 60, 62, 63,
74
PCG, 6, 6, 12, 53, 56
PEDFILE, 43, 47, 49, 61, 63, 63
Z 3, 14, 18, 19, 21, 23, 26, 27, 29,
30, 33, 35–37, 43, 44, 46, 50,
52, 54, 57, 58, 60, 63, 72, 74
PEDIGREE, 15, 44, 61, 63, 63
Z 3, 14–19, 21, 23, 26–30, 33,
35–37, 44, 46, 50, 52, 54, 57,
58, 60, 63, 74
IA22FILE, 50, 64, 65
IHPRECON, 65
pedigree file, 7, 8, 8, 9, 15, 19, 20, 23,
28, 54, 61, 63
PEDIGREECODE, 64
Z 14, 16, 18, 30, 40, 44, 46, 50,
51, 60, 64–66, 74
permanent environment effect, 16, 17,
28, 36
permanent environment variance, 16,
28, 35
phantom parent group, 9, 15, 53, 55,
57, 59, 63
polygenic effect, 45, 46, 47
polygenic variance, 46
PRECON, 62, 67
Z 40, 44, 60, 67, 69, 74
REGMATRIX, 69
preconditioned conjugate gradient
method, 6, 6
preconditioning, 6, 62, 67
block diagonal, 67
preprocessor, 1, 14
RANDOM, 10, 46, 61, 68
Z 17, 18, 28, 29, 36, 40–42, 46,
68, 69, 74
REGMATRIX, 38, 68
random regression function, 22, 32
random regression model, 1, 10, 21,
23, 26–28, 32
multiple trait, 10, 32
range expansion, 72
rank reduction, 36
REAL, 13, 61, 63, 72
Z 3, 8, 14, 16–19, 21, 23, 26, 27,
29, 30, 33, 35–37, 39, 44, 46,
50, 52, 54, 57, 58, 60, 64, 72,
74
real number column, 7, 8, 13, 21, 63
reduced rank model, 36
reduced rank random regression
models, 34
REGFILE, 38, 39, 41, 61, 68, 69
Z 40, 42, 68, 74
REGMATRIX, 38, 40, 41, 61, 68, 68, 69
Z 40–42, 69, 74
REGPARFILE, 38, 61, 68, 69, 69
Z 40, 69, 74
regression coefficient file, 38, 39, 68
regression coefficient matrix, 38, 38,
40, 61, 68
regression effect, 4, 4, 12, 14, 21
relationship matrix, 4, 15, 28, 45,
47–49
RelaX2, 7, 15
repeatability model, 21, 34, 35, 51
repeated observations, 22, 22
required commands, 61, 62, 74
reserved characters, 13
RESIDFILE, 26, 61, 69, 70
Z 27, 70, 74
RESIDUAL, 26, 61, 69, 70
Z 27, 70, 74
residual covariance matrix, 4, 5, 11, 33,
35, 53
residual variance, 16, 22, 35
restart solution file, 66
RESTARTSOL, 66
Z 66, 74
SCALE, 2, 66
Z 26, 27, 29–33, 35, 37, 41, 52,
58, 67, 69, 72, 74
REGMATRIX, 41, 69
scaling of SNP markers, 41
secant method, 55, 56
single-step method, 38, 48, 48, 49, 64,
65, 70, 71
sire maternal grand sire relationship
matrix, 20
sire model, 4, 7, 19, 19, 20
sm, 63
79
Command Language Interface Manual (CLIM)
Z 19, 21
sm+p
Z 54, 57, 60
SNP marker, 38, 39, 39, 40, 42
SNP marker variance, 46
SNP-BLUP, 38, 38, 42
Sol_mn, 12
Solani, 12, 13, 13, 14, 16, 18, 20, 21,
24, 27, 29–32, 34, 36, 44–46,
50, 52, 54, 57, 58, 60
Solfnn, 12, 40
Solfix, 12, 13, 14, 16, 18–21, 24, 27,
29–33, 36, 44, 46, 50, 52, 54,
57, 58, 60
Solrnn, 12, 18, 29, 46
Solreg, 12, 24, 26, 27, 34
Solreg_mat, 40
solution files, 2, 12
solver, 7, 11, 12, 14
TABLEFILE, 22, 24, 61, 70, 70
Z 26, 27, 36, 70, 74
TABLEINDEX, 22, 24, 61, 70, 70
Z 26, 27, 36, 70, 74
TAFILE, 70
Z 70
TEFILE, 71
Z 71
temporary files, 62, 71
test-day model, 21, 22, 25, 32, 36
TEXT
Z 62, 74
text file, 7
TITLE, 2, 62, 71
Z 54, 60, 71, 74
TMPDIR, 2, 62, 71
Z 71–74
trait group, 51, 51, 53, 71
TRAITGROUP, 61, 71
Z 52, 54, 71, 74
VanRaden method 1, 44, 49
variance components, 6, 10, 10, 11, 19
WEIGHT, 20
Z 21, 54, 57, 58
weighted sire model, 20
weights, 7, 20
within block effect, 7, 67, 71
within block fixed effect, 12, 12
WITHINBLOCKORDER, 7, 62, 71
Z 37, 71, 74
yHat.data0, 40
80