Jukuri, open repository of the Natural Resources Institute Finland (Luke) All material supplied via Jukuri is protected by copyright and other intellectual property rights. Duplication or sale, in electronic or print form, of any part of the repository collections is prohibited. Making electronic or print copies of the material is permitted only for your own personal use or for educational purposes. For other purposes, this article may be used in accordance with the publisher’s terms. There may be differences between this version and the publisher’s version. You are advised to cite the publisher’s version. This is an electronic reprint of the original article. This reprint may differ from the original in pagination and typographic detail. Author(s): Diego Rondon, Samu Mäntyniemi, Jouni Aspi, Laura Kvist, Mikko J. Sillanpää Title: A Bayesian multi-state model with data augmentation for estimating population size and effect of inbreeding on survival Year: 2024 Version: Published version Copyright: The Author(s) 2024 Rights: CC BY 4.0 Rights url: https://creativecommons.org/licenses/by/4.0/ Please cite the original version: Diego Rondon, Samu Mäntyniemi, Jouni Aspi, Laura Kvist, Mikko J. Sillanpää, A Bayesian multi-state model with data augmentation for estimating population size and effect of inbreeding on survival, Ecological Modelling, Volume 490, 2024, 110662, ISSN 0304-3800, https://doi.org/10.1016/j.ecolmodel.2024.110662. https://creativecommons.org/licenses/by/4.0/ Ecological Modelling 490 (2024) 110662 A 0 Contents lists available at ScienceDirect Ecological Modelling journal homepage: www.elsevier.com/locate/ecolmodel A Bayesian multi-state model with data augmentation for estimating population size and effect of inbreeding on survival Diego Rondon a,∗, Samu Mäntyniemi b, Jouni Aspi c, Laura Kvist c, Mikko J. Sillanpää a a Research Unit of Mathematical Sciences, University of Oulu, Oulu, FI-90014, Finland b Natural Resources Institute Finland, Helsinki, FI-00790, Finland c Research Unit of Ecology and Genetics, University of Oulu, Oulu, FI-90014, Finland A R T I C L E I N F O Dataset link: https://etsin.fairdata.fi/ Keywords: Data augmentation Hidden Markov model Inbreeding Multi-state model Survival A B S T R A C T A joint model framework for estimating population sizes over time and survival probabilities while considering inbreeding and age as covariates in the survival function is elaborated. This methods is tested with data simulated over two small (𝑁 ≈ 150) close to extinction open populations, that aims to imitate wild individuals under decline and bottleneck dynamics. A Hidden Markov Model (HMM) perspective with a multi-state formulation that considers young and adult individuals is applied with data augmentation to account for non-seen individuals. The transition probabilities are estimated, and different treatments of the covariates and levels of data seen are compared. Our results suggest that the model framework correctly estimates different population size trends but the parameter estimation is challenging in some cases. The proposed model presents a new way to use the inbreeding coefficient, and further implementations on a real conservation data of wild species can help the decision-making process for the management of small populations. 1. Introduction High effort in the implementation of conservation management in small populations is required to avoid extinction (Pimm et al., 2014; Ripple et al., 2017). Usually, there is a lack of information on various aspects of population dynamics, such as the number of individuals, sur- vival rate, and fecundity amounts. In addition, environmental (Lande, 1993) and societal factors such as hunting (Leclerc et al., 2015) or urbanization (Benson et al., 2019) bring big challenges (Ovaskainen and Meerson, 2010) and increase the uncertainty on the actions and decisions that should be taken for conservation. When a population size becomes small, the probability of inbreeding and inbreeding depression increases, leading to a rise in mortality, and a decrease in reproduction, further reducing the population size (Blomqvist et al. 2010; Godwin et al. 2020). This phenomenon is known as the extinction vortex (Frankham et al., 2017). How to es- cape or avoid it is an active area of research on the conservation of species. Thus far, several practices have been employed, such as the reintroduction of new individuals, assisted gene flow or migration to reduce inbreeding and increase genetic variation (Armstrong and Seddon 2008; Aitken and Whitlock 2013; Fitzpatrick et al. 2020) or supplementary feeding to reduce mortality (Robb et al., 2008). It is still not clear how inbreeding information and population dynamics are related (Keller and Waller, 2002), since previous research ∗ Corresponding author. E-mail address: diego.rondonbautista@oulu.fi (D. Rondon). has focused on quantifying the effect of inbreeding on various mea- sures of population viability (Leimu et al., 2006). For example, Marr et al. (2006) suggested a relationship between inbreeding and envi- ronmental factors that reduce the hatching success of Melitaea cinxia. Domingue and Teale (2007) established that in a beetle population (Ips pini), the number of offspring decreases as the inbreeding coeffi- cient increases. Bozzuto et al. (2019) showed an inverse relationship between inbreeding and population growth of the Capra ibex, alongside other authors such as Latter and Robertson (1962) and Saccheri et al. (1998). An exception is a study by Armstrong et al. (2021), where a complex multi-step model uses inbreeding information to calculate sur- vival probabilities and population sizes of North Island Robin (Petroica longipes) in New Zealand. However, there is still a lack of understanding of how inbreeding is relevant to the dynamics of the populations. One of the difficulties when considering inbreeding levels as part of the dynamics of the wild population has been that pedigree data of several generations are not often available. Nowadays, this can be overcome by estimating inbreeding levels based on whole genome sequencing, where the in- breeding level can be estimated from the DNA sample of an individual (Nietlisbach et al., 2019). In this study, we aimed to combine inbreeding and population dynamics to jointly estimate the evolution of population size and sur- vival probabilities. The work was conducted using an open simulated vailable online 29 February 2024 304-3800/© 2024 The Author(s). Published by Elsevier B.V. This is an open access a https://doi.org/10.1016/j.ecolmodel.2024.110662 Received 16 October 2023; Received in revised form 16 February 2024; Accepted 1 rticle under the CC BY license (http://creativecommons.org/licenses/by/4.0/). 6 February 2024 https://www.elsevier.com/locate/ecolmodel https://www.elsevier.com/locate/ecolmodel https://etsin.fairdata.fi/ https://etsin.fairdata.fi/ https://etsin.fairdata.fi/ https://etsin.fairdata.fi/ https://etsin.fairdata.fi/ https://etsin.fairdata.fi/ https://etsin.fairdata.fi/ https://etsin.fairdata.fi/ https://etsin.fairdata.fi/ https://etsin.fairdata.fi/ https://etsin.fairdata.fi/ https://etsin.fairdata.fi/ https://etsin.fairdata.fi/ https://etsin.fairdata.fi/ https://etsin.fairdata.fi/ https://etsin.fairdata.fi/ https://etsin.fairdata.fi/ https://etsin.fairdata.fi/ https://etsin.fairdata.fi/ https://etsin.fairdata.fi/ https://etsin.fairdata.fi/ https://etsin.fairdata.fi/ https://etsin.fairdata.fi/ https://etsin.fairdata.fi/ https://etsin.fairdata.fi/ https://etsin.fairdata.fi/ mailto:diego.rondonbautista@oulu.fi https://doi.org/10.1016/j.ecolmodel.2024.110662 https://doi.org/10.1016/j.ecolmodel.2024.110662 http://crossmark.crossref.org/dialog/?doi=10.1016/j.ecolmodel.2024.110662&domain=pdf http://creativecommons.org/licenses/by/4.0/ Ecological Modelling 490 (2024) 110662D. Rondon et al. b i D a o o population where different values for the effect of inbreeding and age were proposed resulting in two distinctive scenarios, namely, Decline and Bottleneck populations. We used a Jolly–Seber model with a multi- state formulation, also known as a Hidden Markov Model (HMM) (Zucchini et al., 2016), This model has the advantage of separating the effect of males and females. As opposed to a previous study by Armstrong et al. (2021), which also considered inbreeding, we could model the population without assuming conditions on the first captures, as is commonly the case for wild open populations. Different meth- ods are used to model the transition probabilities while considering different levels of data seen. Model performance was evaluated via the Mean Square Error (MSE) between the estimated population size against the true simulated value. The proposed model uses a Bayesian framework, which can easily incorporate any prior information in the dynamics of the population and it works with Data Augmentation (DA) to account for the unseen individuals (Kéry and Schaub, 2011; Royle and Dorazio, 2012). This work proposes a new way to consider the use of inbreeding in estimating survival probabilities on an individual level. Additionally, we present insights on how to extend the use of the model while using different covariates and gaining more information on how they affect the survival probability at different life stages of an individual. 2. Material and methods Here, we first present and formulate a multi-state model, where continuous covariates serve to parameterize the survival probability. Then, we test the performance of the method using simulated data. The presented model uses input data extracted from random samples belonging to two distinct scenarios that represent populations close to extinction and which are thus most likely to be of conservation concern and suffer from inbreeding. These two scenarios are hereafter named Decline and Bottleneck. To mitigate the effects of sampling variation, the simulation analyses are replicated multiple times. This model and simulation framework are not specialized for any specific species. Instead, it is designed towards optimizing the reproductive potential of all individuals, overlooking species-specific behaviors that would otherwise reduce inbreeding and influence mate selection. 2.1. Multi-state model for data analysis A multi-state model is used to describe the population dynamics (Lebreton and Cefe, 2002; Kéry and Schaub, 2002). It is interpreted as a Hidden Markov Model (HMM) given that it can be divided into hidden and observed processes. In this framework, the life of an individual is summarized as a set of possible discrete hidden states 𝐙 that represents a succession of life stages, and it follows a Markov process, meaning that the present state of an individual 𝑖 at generation 𝑡 depends only on P(𝑧𝑖,𝑡|𝑧𝑖,𝑡−1). The observed outcome depends only on the present hidden state through P(𝑦𝑖,𝑡|𝑧𝑖,𝑡) and refers to the observation or emission probability (Rabiner, 1989). States of the individuals and possible transitions between hidden states can be seen in Fig. 1. The hidden states can be denoted numeri- cally and interpreted as: • 𝑧 = 1: refers to the unborn state. It is possible to move from this state to any living state 𝑧 = [2,… , 5]. Moving to any young state refers to a newborn individual. Moving to an adult state can occur under two scenarios, either the individual was not detected as young, or, the individual is an immigrant. • 𝑧 = 2 or 𝑧 = 4: are the young states of females and males, respectively. From here, an individual can survive and make a transition to the adult stage. We assume that the time scales of the observation process and of the biological process are the same and thus it is not possible to observe an individual several times 2 as a young. • 𝑧 = 3 or 𝑧 = 5: are the adult stages for females and males, respectively. An individual remains in this stage until it dies. • 𝑧 = 6: marks an individual as dead and it is an absorbing state. The transition probability between hidden states is modeled by a transition matrix 𝛺𝑖,𝑡, that represents all the possible changes of individual 𝑖 between generations 𝑡 and 𝑡 + 1. The matrix is individual and generation specific, the sum of all independent rows must add up to one. As seen below: 𝛺𝑖,𝑡 = 1 2 3 4 5 6 ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ 1 𝑠1,𝑡 𝑠2,𝑡 𝑠3,𝑡 𝑠4,𝑡 𝑠5,𝑡 0 2 0 0 𝜙𝑓𝑦;𝑖 0 0 1 − 𝜙𝑓𝑦;𝑖 3 0 0 𝜙𝑓𝑎;𝑖,𝑡 0 0 1 − 𝜙𝑓𝑎;𝑖,𝑡 4 0 0 0 0 𝜙𝑚𝑦;𝑖 1 − 𝜙𝑚𝑦;𝑖 5 0 0 0 0 𝜙𝑚𝑎;𝑖,𝑡 1 − 𝜙𝑚𝑎;𝑖,𝑡 6 0 0 0 0 0 1 . (1) The transition probability between two unborn states (𝑧𝑡 = 1 → 𝑧𝑡+1 = 1) for adjacent generations is logit(𝑠1,𝑡+1) ∼ N(𝜂𝑡, 1), where 𝜂𝑡 = logit−1(𝑠1,𝑡) is the back-transformed entry probability of the previous generation, a random variation is achieved by the use of a standard nor- mal distribution centered on previous value, the link function is defined as logit(𝑠) = log( 𝑠 1−𝑠 ) and with the inverse as logit−1(𝑠) = log( exp(𝑠) exp(𝑠)+1 ). The use of the inverse function makes the arguments of the smooth process on the same range, avoiding numerical problems. By linking the transition probabilities of subsequent generations, that is, using a first order random walk smoothing (Lang and Brezger, 2004), we create a more stable model and avoid abrupt jumps in the population size. The remaining transition probabilities between an unborn state and any alive state are assumed to follow a Dirichlet distribution, 𝑠2∶5,𝑡 ∼ 𝐷𝑖𝑟(K, 𝛼), which is then scaled to the range [0, 1−𝑠1,𝑡]. The true outcome of an individual can take the values of: 𝑦 = 1 not seen (NS), 𝑦 = 2 seen as young (SY), or 𝑦 = 3 seen as an adult (SA). This is represented y a succession of discrete states that depend on the true state of the ndividual (P(𝑦𝑖,𝑡|𝑧𝑖,𝑡)). The probability of a certain result depends on the observation matrix 𝛩𝑖,𝑡, and it is independent of the population size and number of augmented individuals, this is to take full advantage of the Bayesian approach and gain more modeling freedom. In practice, however, considering underlying occurrence probabilities, which give rise to the observed proportions (N/M), may lead to roughly similar estimates as using observed proportions directly. Here the rows repre- sent the current hidden state at a given generation and the columns the observable outcome, as follows: 𝛩𝑖,𝑡 = 𝑁𝑆 𝑆𝑌 𝑆𝐴 ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ 1 1 0 0 2 1 − 𝑝𝑓𝑦;𝑖,𝑡 𝑝𝑓𝑦;𝑖,𝑡 0 3 1 − 𝑝𝑓𝑎;𝑖,𝑡 0 𝑝𝑓𝑎;𝑖,𝑡 4 1 − 𝑝𝑚𝑦;𝑖,𝑡 𝑝𝑚𝑦;𝑖,𝑡 0 5 1 − 𝑝𝑚𝑎;𝑖,𝑡 0 𝑝𝑚𝑎;𝑖,𝑡 6 1 0 0 . (2) Some assumptions of the model based on the structure of the transition matrix include (1) perfect classification of young and adults, and (2) no observation of dead individuals. A categorical distribution is used to model the transition between the hidden states (𝑧𝑖,𝑡 → 𝑧𝑖,𝑡+1) and their emission to the observable states (𝑧𝑖,𝑡 → 𝑦𝑖,𝑡,), allowing to write the model as: 𝑧𝑖,𝑡+1|𝑧𝑖,𝑡, 𝛺 ∼ 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑖𝑐𝑎𝑙 ( 𝛺𝑧𝑖,𝑡 ,1∶6,𝑖,𝑡 ) , (3) 𝑦𝑖,𝑡|𝑧𝑖,𝑡, 𝛩 ∼ 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑖𝑐𝑎𝑙 ( 𝛩𝑧𝑖,𝑡 ,1∶3,𝑖,𝑡 ) . (4) ata Augmentation (DA) is used to represent the individuals that re alive but non-observed (Royle and Dorazio, 2012). By adding bservations that do not reflect information, we created a data set f individuals seen and augmented. This is done in such a way that Ecological Modelling 490 (2024) 110662D. Rondon et al. Fig. 1. Proposed multi-state model. White circles represent hidden states and gray boxes are observed states. Solid lines (dashed) are observation (transition) probabilities, respectively. Indices for individual and time are removed to facilitate interpretation. Fig. 2. Representation of the data augmentation. The augmented data is filled with non-observed individuals (coded as 1) in the capture history. The covariates (inbreeding coefficient and age) are sampled as pairs from observed individuals. The horizontal red line represents the division between the true but not seen and the augmented individuals. the augmentation does not introduce any information into the model (Kéry and Schaub, 2011). In this case, a certain number of unborn and unseen individuals are added to the observed (𝑦 = 1) and hidden data 3 𝑖,𝑡 (𝑧𝑖,𝑡 = 1), the covariate information about age and inbreeding for non- observed individuals are jointly sampled at random from observed part of the data and their information is used to create the augmented data set. We consider that not augmenting the covariates or augmenting the covariates with missing values (NA) would easily lead to a biased esti- mation. In contrast, our augmentation scheme makes the assumption that the unobserved population follows the same distribution as the observed one, as shown in Fig. 2. There is no correct answer of how many individuals need to be added for data augmentation to work, here, a fixed value of 600 individuals was used. It is possible to inspect the posterior distribution of the population size to gain information on this issue, given that a too small size of augmented data may lead to underestimating the population size, while too large values will increase unnecessarily computational time (see supplement material). 2.1.1. Survival probabilities It is of interest to examine if the structure of the covariates have an effect on the estimation of the population dynamics, specifically on the population size, survival and entry probabilities. Therefore, three different ways of treating the covariates of survival probability are proposed and presented in Table 1. First, a model where no covariates are used and a common parameter is applied for all individuals 𝑖, this representation is the most simple model with the proposed states and aims to work as a reference model. Second, a model where the covariates age (age𝑖,𝑡) and inbreeding level (F𝑖) are modeled for indi- vidual 𝑖 at generation 𝑡, common parameters are used for males (𝜙𝑚;𝑖,𝑡) and females (𝜙𝑓 ;𝑖,𝑡). Third, a model where the covariates are scaled in order to facilitate numerical sampling (Lesaffre and Lawson, 2012), this transformation of the parameter will improve the mixing properties of the MCMC, and needs a shorter MCMC sample to obtain better estimates; this is carried out in following way age′𝑖,𝑡 = log(agei,t + 1) and F′ = exp(F ), and different parameters are used to model the survival 𝑖 𝑖 Ecological Modelling 490 (2024) 110662D. Rondon et al. s o s w P S b I p P H P T o d m t P N i M i p p t b o c o a i z M w B i a o Table 1 Survival modeling. No covariates logit(𝜙𝑖) = 𝛽0 No scaling logit(𝜙𝑓𝑦;𝑖,𝑡) = logit(𝜙𝑚𝑦;𝑖,𝑡) = 𝛽0 + 𝛽1F𝑖 logit(𝜙𝑓𝑎;𝑖,𝑡) = logit(𝜙𝑚𝑎;𝑖,𝑡) = 𝛽0 + 𝛽1F𝑖 + 𝛽2age𝑖,𝑡 Scaling logit(𝜙𝑓𝑦;𝑖,𝑡) = 𝛽0;𝑓 + 𝛽1;𝑓F′𝑖 logit(𝜙𝑚𝑦;𝑖,𝑡) = 𝛽0;𝑚 + 𝛽1;𝑚F′𝑖 logit(𝜙𝑓𝑎;𝑖,𝑡) = 𝛼0;𝑓 + 𝛼1;𝑓F′𝑖 + 𝛼2;𝑓 age′𝑖,𝑡 logit(𝜙𝑚𝑎;𝑖,𝑡) = 𝛼0;𝑚 + 𝛼1;𝑚F′𝑖 + 𝛼2;𝑚age′𝑖,𝑡 Three different cases are proposed to model the survival probability of males and females: (i) No covariates, (ii) No scaling, and (iii) scaling. probability of males and females. While working with wild individu- als, the effect of sex(female/male) and age category(young/adult) are important to consider, their modeling is often assumed to have an additive or multiplicative effect over the survival probability. This is a strong assumption, therefore relaxing this assumption and assigning separate coefficients allows us to have a complete view of how the model performs estimation over the survival function. 2.2. Model building Consider a capture-recapture data set for M individuals, from which 𝑁 are observed (𝐘) and M−N are augmented (𝐘′), and in the matrix 𝐘̃ = [𝐘,𝐘′]′ = [𝐲1, 𝐲2,… , 𝐲N, 𝐲′N+1,… , 𝐲′M]′ each row provides infor- mation of temporal observations of an individual for a window of a study period T, where 𝐭 = {1, 2,… ,T} and 𝐲𝑖 = [𝑦𝑖,1, 𝑦𝑖,2,… , 𝑦𝑖,T]. Let Z be an arrangement of possible hidden states for the M individuals, in a similar way, 𝐙 = [𝐳1, 𝐳2,… , 𝐳N,… , 𝐳M]′ and 𝐳𝑖 = [𝑧𝑖,1, 𝑧𝑖,2,… , 𝑧𝑖,T]. Given a set of covariates 𝐗̃ = [𝐗,𝐗′]′ that are observed (𝐗) for the first N individuals and augmented (𝐗′) for the reminding M−N, it is assumed a direct influence on the survival of an individual, therefore the observed values allows us to build part of the transition matrix (𝜴). The posterior distribution of a Hidden Markov Model can be represented in the following way: P(𝐙,𝜴,𝜣,𝐗′ |𝐘,𝐗) ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ Posterior ∝ P(𝐘̃|𝐙,𝜣) ⏟⏞⏞⏟⏞⏞⏟ Likelihood P(𝐙|𝜴) P(𝜣) P(𝜴|𝝋, 𝐗̃) P(𝝋|𝐗̃)P(𝐗′ |𝐗). ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ Joint prior (5) Here, on the right hand side, the first term represents the complete like- lihood (probability of observing a specific pattern 𝐘̃ given the hidden states Z and emission matrix 𝜣), the second term is the relationship between the life history of the individuals and the respective transitions over time (probability of hidden states given the transition matrix 𝜴), thirdly the given prior distribution for the emission matrix (𝜣) and the remaining terms are the prior distributions needed it to build the transition matrix, where 𝝋 = {𝐒,𝝓, 𝛿, 𝜖} is the joint distribution of the entry probabilities (S), survival coefficients (𝝓), with their variance as a hyperparameter (𝛿) and smoothing variance as another hyperparameter (𝜖). We can now give more detail. Let 𝐩𝑖,𝑡 be an arrangement of obser- vation probabilities for individual 𝑖 at time 𝑡. It is possible to construct the prior contribution of a single individual to the observation matrix P(𝜣𝑖) = ∏T 𝐭=1 P(𝐩𝑖,𝑡) and expand the observed contribution to the likelihood across all time points for a single individual 𝑖 to: P(𝐲𝑖|𝐳𝑖, 𝛩𝑖) = T ∏ 𝐭=1 P(𝑦𝑖,𝑡|𝑧𝑖,𝑡,𝜣𝑖,𝑡). (6) Considering the set of covariates from the observed and augmented data 𝐗 that can be individual specific, such as inbreeding level and age (in temporal scale), it is possible to model the survival probabilities of 4 the transition matrix (𝜴) for a given individual 𝑖 at time 𝑡 using GLM t with a logit link function, logit(𝝓𝑖,𝑡) = 𝐗𝜷. A hyperprior (𝛿) is used to allow sharing of information between the coefficients 𝜷. That is, in their prior distributions, 𝜷’s are realizations of a common distribution with unknown (estimated) variance 𝛿. The probabilities of any individual to enter the system, or entry probabilities, at a given time 𝑡 are governed by a set of variables 𝐒𝑡 = [𝑠1,𝑡,… , 𝑠5,𝑡], where ∑5 𝑗=1 𝑠𝑗,𝑡 = 1. The smooth process is controlled by a random walk prior logit(𝑠1,𝑡+1) ∼ N(𝜂𝑡, 1), where 𝜂𝑡 = logit−1(𝑠1,𝑡). Given this, the reminder probabilities are estimated, 𝑠2∶5,𝑡 ∼ 𝐷𝑖𝑟(1, 1, 1, 1), and caled each them by a factor (1 − 𝑠1,𝑡) so that they jointly sum up to ne. Now it is possible to factorize the prior distribution of the hidden tates by considering all possible generations of a single individual 𝑖, here on the first occasion the contribution is taken at random: (𝐳𝑖|𝜴𝑖) = P(𝑧𝑖,1) T ∏ 𝐭=2 P(𝑧𝑖,𝑡|𝑧𝑖,𝑡−1, 𝛺𝑖,𝑡). (7) imilarly, the prior contribution of the transition matrix is decomposed etween the parameters that allow estimating the survival probability. n this case, they are the covariates X and the group of other required arameters 𝝋, in the following way: (𝜴|𝝋, 𝐗̃) = P(𝛺𝑖,1|𝜑1, 𝐗̃) T ∏ 𝐭=2 P(𝛺𝑖,𝑡|𝜑𝑡, 𝐗̃). (8) ere the prior distribution of 𝝋 is: P(𝝋|𝐗̃) = P(𝐒,𝝓, 𝜖, 𝛿|𝐗̃) = (𝑠1,1|𝜖)P(𝜖)P(𝑠2∶5,1|𝑠1,1)P(𝝓𝑖,1|𝐗̃, 𝛿) T ∏ 𝑡=2 P(𝑠1,𝑡|𝑠1,𝑡−1)P(𝑠2∶5,𝑡|𝑠1,𝑡)P(𝝓𝑖,𝑡|𝐗̃, 𝛿). (9) he last step on building the joint prior is to write the dependency f the augmented and observed set of covariates P(𝐗′ |𝐗), which is escribed on more detail in Section 2.1. Finally, considering the contribution across all (observed and aug- ented) individuals, allows us to write the complete data likelihood of he proposed model as: (𝐘̃|𝐙,𝜣) ∝ N ∏ 𝑖=1 P(𝐲𝑖|𝐳𝑖,𝜣𝑖) ⏟⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏟ Observed ⋅ M ∏ j=N+1 P(𝐲′𝑗 |𝐳𝑗 ,𝜣𝑗 ) ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ Augmented . (10) ote that in Eq. (10) we assume that individuals are conditionally ndependent given 𝐙 and 𝜣. This is a typical assumption in Hidden arkov Models. A graphical representation of the model is presented n Fig. 3. A benefit of the Bayesian framework is that in addition to obtaining osterior distributions for all the model parameters, we can obtain osterior distributions for all their deterministic functions. In this case, he number of individuals does not belong to the set of parameters, ut we can still obtain its posterior distribution as a direct function f the model parameters. This can be calculated from the MCMC hain simply by monitoring the function that estimates the number f individuals that exist outside the unborn state at every generation, s follows N(𝑡) = ∑𝑀 𝑖=1 𝟏{𝑍𝑖,𝑡>1 ⋂ 𝑍𝑖,𝑡<6}, where 𝟏{𝑍𝑖,𝑡>1 ⋂ 𝑍𝑖,𝑡<6} is an ndicator variable which equals one if the 𝑖th individual is alive, and is ero otherwise. Samples from the target posterior distribution are generated via arkov Chain Monte Carlo (MCMC) simulation methods using a soft- are package called Numerical Inference for statistical Models for ayesian and Likelihood Estimation (NIMBLE) (de Valpine et al., 2017) n R (R Core Team, 2021). Descriptions of used prior distributions re collected in Table 2, where the truncated beta distribution for the bservation probability ensures that there are observations at every ime point, avoiding sampling of low values. Ecological Modelling 490 (2024) 110662D. Rondon et al. Fig. 3. Model representation. Gray block represents observed data, white oval represents hidden parameter of the model. Solid arrows indicate statistical dependency and dotted arrow a functional relationship. Table 2 Prior specification for key parameters. Parameter Prior distribution Informativeness Survival coefficient 𝝓 N(0, 10) Weak Observation probability 𝐩𝑖,𝑡 TBeta(10, 10)a Slight Hyperpriorb 𝛿 Gamma(2, 2) Weak Regression parameters 𝜷 N(0, 𝛿) Weak Transition from unborn 𝑆2∶5,𝑡 Dir(1, 1, 1, 1) Weak a Truncated Beta on interval (0.2, 1]. b Used in no covariates and no scaling model. 2.3. Simulation of test scenarios The simulation framework builds upon Abadi et al. (2010), incorpo- rating adjustments to account for age and inbreeding when calculating survival probabilities and the number of offsprings. It is important to note that the simulated population does not intend to accurately mirror any particular real-world species due to the inherent complexity of actual interactions in nature. Both female and male individuals are simultaneously simulated, a single generation time is defined as a complete iteration through the three procedural steps, and after one generation, young ones are marked as adults. as follows: Step 1. Immigration: The number of new females and males that enter the population is defined as a Poisson random variable with a constant mean across all the simulations. New individuals are unrelated to individuals already in the population. Step 2. Reproduction: The simulation allows each male to reproduce with multiple females. The mating pair selection is made at random and only occurs between adults. Once an assignment is made, the number of offspring to enter the system as young fol- lows a Poisson distribution where the mean is a function of age and inbreeding of the male and female pair, the higher the age or inbreeding coefficient, the fewer offspring will be produced. A new inbreeding coefficient is calculated from pedigree data using the tabular method, taking into account the coancestry between simulated individuals (Caballero, 2020). Step 3. Survival: Survival probability of individual 𝑖 at generation 𝑡 of females (𝜙𝑓 ) and males (𝜙𝑚) is modeled with a generalized linear model (GLM) and using the inbreeding coefficient (F𝑖) and age (age𝑖,𝑡) as covariates. High values of age or inbreeding will reduce survival probability. Fig. 4 presents a graphical representation of the simulation and sampling procedure. 5 Table 3 Initial conditions and parameter specifications for the simulations. (a) Initial conditions for both scenarios Number of females 15 Number of males 15 Immigration rate Poisson(0.2) Initial inbreeding Beta(5, 5) Immigrants’ inbreeding Beta(1, 6) (b) Coefficient of the survival probability for both scenarios Decline Bottleneck Intercept 7 8 Inbreeding −0.2 −3 Age −2.2 −4 (c) Coefficient of the reproduction function for both scenarios Female Male Intercept 2 2 Inbreeding −0.8 −0.9 Age −0.4 −0.6 2.3.1. Simulated scenarios Two different scenarios are proposed by changing the effects of the survival function. The initial values of the parameters of each scenario are given in Table 3. Both scenarios aim to represent small populations close to extinction, at a stage where inbreeding plays a significant role in an extinction vortex, where decreased genetic diversity and increased inbreeding cause an inbreeding depression that together with demographic stochasticity and genetic drift lead to decreased fitness via decreased survival and reproduction, further decreasing population size (Caughley, 1994). First, the Decline population scenario had a survival function with more weight on the effect of age than of inbreeding and 52 generations were simulated. Second, the Bottleneck population scenario had age and inbreeding with similar effects, and 8 individuals were manually added at generations 42 and 43 to avoid extinction and 60 generations were simulated. Initial inbreeding parameters were sampled randomly from a proposed distribution. The immigration rate is the same for both scenarios. Fig. 5 presents the evolution of the population size and inbreeding coefficient of the simulation, we use individuals from generations 33 to 52 as our final data set since it presents the population pattern that are of interest while allowing the simulated inbreeding level of each individual to be the outcome of an interaction, discarding the initial random values that are given at the start of the simulation, working as a ‘‘burn-in’’ period. Once all individuals are simulated, 50 data sets are built for three different levels of data seen at 40%, 60%, and 80%. 2.3.2. Model assessment From each simulated scenario, 50 replicated data sets were built cor- responding to different levels of observation. The size of the observed data was variable since sampling occurs at random. The goal of the data replicates is to inspect the stability of the posterior distributions, avoid finding results by chance, check how often the posterior mean will deviate from the true trend, and to monitor the posterior convergence. Every data set was analyzed under the same model characteristics. Deviation of the posterior mean compared to the true simulated value of population size can be captured by estimating the mean square error (MSE). The calculation takes the average across all MCMC chains (n) of the replicates. Comparing the square difference between the true simulated value ̂𝑁(𝑡) of population size with the posterior mean ( ̄𝑁𝑖(𝑡)) at each generation (t), creating a time series of errors, low values of MSE reflect a better estimates. MSE(t) = 1 𝑛 ∑ ( ̄𝑁𝑖(𝑡) − ̂𝑁(𝑡) )2 . (11) 𝑛 𝑖=1 Ecological Modelling 490 (2024) 110662D. Rondon et al. Fig. 4. Simulation and sampling. The simulation process combines information from genealogy, survival, and reproduction on an open population. Sampling process uses life stages of individuals and sampling probabilities to recreate capture-recapture data, this process is replicated 50 times to produce simulated data replicates. 3. Results The proposed model is used to estimate population size, effect of covariates on survival and entry probabilities. The model is used over two scenarios (Decline and Bottleneck) with different levels of data seen (40%, 60%, 80%) and treatment of covariates (scaling and no scaling). Additionally a model with no covariates and observation of data of 80% is used as a reference in both scenarios. Models without the smoothing term were tested, but the results converged to non-ideal posterior distribution. Three separate chains with different random initial values were run for each replicated data set, each with 70 000 MCMC dependent samples (thinned by a factor of 5), including 50 000 MCMC samples for a burn-in period, leaving 4 000 effective MCMC samples. Default MCMC algorithms assigned by NIMBLE are used for most of the nodes, with the exception of the coefficients for the survival function (𝛽) which use an Automated Factor Slice Sampler (AFSS) (Tibbits et al., 2014) to reduce the auto-correlation among the MCMC samples and improve the mixing of the chains. Convergence of the MCMC simulation was ensured via visual inspection of the MCMC chains for many different parameters, Effective Sample Size is an approximate number of independent samples which could reach the same estimation accuracy as our generated MCMC samples (Robert and Casella, 2010), which values for some parameters are presented in the supplementary materials. The models were run at the Finnish Center for Scientific Computing (CSC) using the high-performance computer Puhti and it took about 20 h of CPU time per chain. 3.1. Estimation of population size The behavior of three estimated time series for different scenarios (no covariates, no-scaling, scaling) and observation levels (40%, 60% and 80%) are presented in Figs. 6–8. The posterior distributions of the population sizes at each generation were summarized by the posterior means calculated over chains and the credible intervals estimated 6 Fig. 5. Simulated population scenarios. Total population size for Decline (Bottle- neck) population scenario on the top (bottom) panel, respectively. The x-axis represents simulated generation, and y-axis population size, color and size of the circles represent the mean and standard deviation of the inbreeding coefficient at each time point across all alive individuals. The shaded area defines the study generations, where the data is collected between generations 33 to 52. across all replicates from a concatenated chain. Using no covariates, Fig. 6, presents over-estimated values at the first two generations, across all the replicates in both scenarios. In the case of no scaling, there Ecological Modelling 490 (2024) 110662D. Rondon et al. Fig. 6. Population size posterior estimates over time, no covariates case: Left (Right) panel presents the scenario: Bottleneck (Decline) population, with a 80% of data seen. Comparison between the true simulated value (blue) and the posterior mean estimate for every MC chain in each replicate (gray), an average posterior across the 50 replicates of the upper and lower 95% credible intervals are presented by the red dashed lines. Fig. 7. Population size posterior estimates over time, no scaling of covariates case: Left (Right) panel presents the scenario: Bottleneck (Decline) population, with a 40%, 60% and 80% of data seen from top to bottom, respectively. Comparison between the true simulated value (blue) and the posterior mean estimate for every MC chain in each replicate (gray), an average posterior across the 50 replicates of the upper and lower 95% credible intervals are presented by the red dashed lines. is also an over-estimation that decreases as the observation level of data increases (Fig. 7). The Decline population presents 5% of the chains converging to a non-ideal posterior distribution. Finally, scaling the covariates results in over-estimation in the first generation at different observation levels of the Bottleneck scenario while the Decline scenario has less variability across all the replicates along the level of data seen (Fig. 8). MSE was calculated across the 50 replicates between the posterior means and the true simulated population sizes at each generation. Logarithmic transformation was used to facilitate interpretation. Fig. 9, top row, presents the results of the two scenarios across different models, with the highest error occurring at the beginning of the study. In the Decline scenario, the model with no covariates provided the most accurate results (the lowest MSE), it is possible that the models that used covariates are not strong enough to capture the complexity 7 of the data, making the simpler model a better option. All models performed equally well for the Bottleneck population scenario. In both cases, the first two time points presented the highest deviation from data. Examining various levels of data seen enables us to see their effect on the uncertainty present in the posterior distributions. Fig. 9, bottom row, presents the average standard deviations of each replicate, in both scenarios, a larger levels of data seen reduced the variability, as expected. 3.2. Posterior estimation of survival probability Posterior means are collected across all the replicates for effects of inbreeding and of age on the survival probability and they are presented in Fig. 10. In Decline scenario, when no scaling is applied, the effect of inbreeding is difficult to estimate accurately with 40% Ecological Modelling 490 (2024) 110662 8 D. Rondon et al. Fig. 8. Population size posterior estimates over time, scaling of covariates case: Left (Right) panel presents the scenario: Bottleneck (Decline) population, with a 40%, 60% and 80% of data seen from top to bottom, respectively. Comparison between the true simulated value (blue) and the posterior mean estimate for every MC chain in each replicate (gray), an average posterior across the 50 replicates of the upper and lower 95% credible intervals are presented by the red dashed lines. Fig. 9. Mean Square Error (MSE): Top (bottom) row present the Log average (standard deviation) difference between the estimated posterior mean value with the true simulated population size, across all the replicates over generations. Left (Right) column presents the Decline (Bottleneck) population scenario, respectively. Three cases for modeling the survival probabilities: scaling of covariates, no scaling of covariates and no covariates are shown with colors: red, blue, and green; where the type of lines refers to different levels of data seen being 40% (dotted), 60% (dashed), and 80% (solid lines), respectively. Ecological Modelling 490 (2024) 110662D. Rondon et al. Fig. 10. Distribution of the 50 posterior means of the survival regression coefficient on the Decline scenario. Top (bottom) row is used for the no scaling (scaling) scheme. Left (Right) panel presents information of inbreeding (age), respectively. The sample probability is represented by colors red, green, and blue corresponding to samples observed at 40, 60 and, 80% respectively. For the model with no scaling of the covariates, the true simulated value is presented with the black dotted vertical line. level of data seen given that the estimated posterior means do not move from the given priors, one reason being the lack of data. On the other hand, when observation level of either 60% or 80% is used, the posterior values seem to move away from their true simulated values. The estimated effect of the covariate age is close to the true simulated value when observation levels are over 40%. On the other hand, when the covariates are scaled, the level of data seen of 40% results in estimates close to zero, and effects become easier to estimate at 80% observation level. For the Bottleneck scenario in Fig. 11, no scaling of the covariates finds an effect when observation levels is 80%, while in the case of scaling of the covariates, a bimodal posterior distributions are obtained at all observation levels. The average 95% credible intervals of the lower and upper bounds are estimated across all replicates from a concatenated chain and presented in Tables 4–9, in Appendix A. In all the cases, the model finds it difficult to identify an effect of the covariates when the level of data seen is 40%. The effect of inbreeding is identifiable in males when the covariates are scaled and the data seen is 80%. In contrast, the same effect is not found in females. In the case of no scaling, the effect of age is easily identifiable for both males and females. For no scaling of covariates, the posterior estimates (Tables 5 and 8 in the Appendix A) are compared with the true values used in the simulation (Table 3(b)). For both scenarios, the posterior mean estimates of the age coefficient are closer to the true values when the level of data seen increases (Fig. 8). For the regression coefficient of inbreeding, the posterior mean is closer to the simulated value when the effect of inbreeding is large (Bottleneck scenario). The posterior credible intervals seem to behave as expected so that they become wider when the observation level decreases. 9 3.3. Posterior estimation of the entry probabilities The estimated posterior probability of staying outside the system across the study generations (𝑠1,𝑡), namely, no entry probability, is averaged over the 50 replicates and presented in Fig. 12. Decline population shows a positive trend when there are no covariates or the covariates are scaled, while the no scaling case presents a small decrease over time. The results are consistent at different levels of data seen. In the Bottleneck scenario, the probability declines across the study generations. Remaining probabilities (𝑠2∶5,𝑡) are presented in the Appendix A, Figs. 13 and 14. In both scenarios, posterior probability estimates for entering as an adult (𝑠3,𝑡 and 𝑠5,𝑡) are close to zero and have small variations across the different study generations. 4. Discussion The well-known connection between population decline, high levels of inbreeding, deleterious mutations, and inbreeding depression has been discussed and examined for decades and still is observed (Hedrick et al. 2019; Kyriazis et al.; 2021 Robinson et al. 2023; among others). To learn more about this connection, we studied a method that includes the inbreeding coefficient as an explanatory variable to model the survival probability of a small simulated population. Our approach aims to measure the effect that the different levels of inbreeding have on individuals and if this could improve the accuracy of the population size predictions. The presented joint model framework allows the MCMC sampling to consider the whole posterior distribution of the parameters, where information is summarized. In the future, the presented method could be implemented on real conservation data and may help in the Ecological Modelling 490 (2024) 110662D. Rondon et al. Fig. 11. Distribution of the 50 posterior means of the survival regression coefficient on the Bottleneck scenario. Top (bottom) row is used for the no scaling (scaling) scheme. Left (Right) panel presents information of inbreeding (age), respectively. The sample probability is represented by colors red, green, and blue corresponding to different levels of data seen at 40, 60 and, 80% respectively. For the model with no scaling of the covariates, the true simulated value is presented with the black dotted vertical line. Fig. 12. No entry probability (𝑠1,𝑡): Averaged posterior probability of no entry across all the replicates. Left (Right) column represent the Decline (Bottleneck) scenario, respectively. Three cases for modeling the survival probabilities: scaling, no scaling and no covariates are shown with colors: red, blue, and green; where the type of lines refers to different levels of data seen being 40% (dotted), 60% (dashed), and 80% (solid lines), respectively. decision-making process for the management of small populations. This can be done by obtaining capture-recapture data at an individual level and the covariates that are of interest. The individual inbreeding levels needed for the model can be acquired from pedigrees or genomic data. Even if the number of parameters in our model was large and their dependence structure was complex, we did not find large problems in estimating posterior distributions for population size time series, in contrast to Newman et al. (2022). Here, it was possible to see that under different approaches to model the survival probability (no covariates, no scaling, and scaling of covariates) and different levels of data seen (40%, 60%, and 80%), the model is consistent with the location of the population size time series and trend. However, a high number of sample individuals brought less variability in the posterior distribu- tions. It is possible that the identified overestimation in the number of 10 alive individuals may be due to the small population size, but further research is needed, several approaches such as having a longer time series, observing more individuals, or introducing informative priors to some parameters may help on obtaining more accurate estimates. On the other hand, estimation of the survival parameters is challenging for more complex models, especially with a small observation level. Estimating suitable posterior distributions for modeling transition probabilities is a big challenge for this framework, and care must be taken in the interpretation of the parameters. Here, the most accurate parameter estimation was easier to find when the covariates were not scale, the model was not over-parameterized and the observation level was large. A more complex model can bring identifiability issues in the estimation, as seen in the bi-modality of the posterior distribution in Figs. 10 and 11. We found consistency in our results and results Ecological Modelling 490 (2024) 110662D. Rondon et al. of Gimenez et al. (2009), where a CJS model was used to study how parameter redundancy produces low precision estimates. A more detailed analysis of over-parameterization can be found in Cole (2020). Parameter estimation over a transitional state, such as young individ- uals, is poorly estimated, possibly because there is a lack of observed individuals to have a meaningful impact on the posterior distribution. Misspecification of the probability of staying outside of the system, no entry probability, may lead to a wrong interpretation of the overall behavior of the population. In both simulated scenarios the population is close to extinction and it will be expected for 𝑠1,𝑡 to have a positive trend, indicating a decrease in the rate of individuals entering the system as time progresses. This behavior is exclusively observed in the Decline scenario when the covariates are scaled. Data augmentation was found to perform well in controlling for unseen individuals. In the presence of covariates, the augmentation has to be performed also over them. Here it is natural to assume that covariates of non-seen and seen individuals come from the same dis- tribution. Multiple approaches were tested (results not shown), where augmentation was performed with NA values, zero values, or ignored during the MCMC, all leading to poor posterior estimates. Then, by sampling non-observed covariates jointly from the observed popula- tion, one can ensure that the augmentation is not inducing any bias and MCMC will converge more easily. Also, by assuming neighboring entry probabilities to be close to each other (i.e., to be smooth over generations), we were able to improve identifiability of the posterior estimates of the population size. This avoided sudden jumps in the process and gave the model stability, since the entry probability at generation 𝑡 + 1 depends on 𝑡, allowing the model to borrow strength from adjacent generations, while extending the modeling to a real life application, it could be useful to reduce the computational time and try different approaches such as Approximate Bayesian Computation (Beaumont, 2010; Baey et al., 2023). Several extensions can be implemented on the model. One that can be relevant for the parameter estimation is to use time-to-event data to model the survival probabilities (Ergon et al., 2018). This framework would bring a more straightforward interpretation of the parameters, correctly handling censoring in data that can exist as migration or an unseen dead, and may be able to reduce the bias on the estimation of the survival coefficients. Another relevant extension would be to allow the model to use continuous time points, which may help to obtain more information in order to have a better estimability of the parameters. Thirdly, it will be of interest to model the entry probabilities as a function of different covariates, which can be done by using multinomial logit models. Finally, it is possible to extend the model, allowing for the observation of a dead state and studying the true survival of the individuals, in this case, the structure of the model needs to be adjusted creating a new hidden state that holds information about the ‘‘recently dead’’, that is transitory to a ‘‘dead’’ state. This new hidden state can emit ‘‘not seen’’ or ‘‘recover’’ with certain probabilities. Overall, this research underscores the need for continued innovation and refinement of statistical models to improve our understanding and management of complex ecological systems. CRediT authorship contribution statement Diego Rondon: Writing – review & editing, Writing – original draft, Visualization, Software, Methodology, Formal analysis, Data curation, Conceptualization. Samu Mäntyniemi: Validation, Conceptualization. Jouni Aspi: Validation, Conceptualization. Laura Kvist: Validation, Conceptualization. Mikko J. Sillanpää: Writing – review & editing, Validation, Conceptualization. Declaration of competing interest 11 Authors declare no conflict of interest. Table 4 Mean estimate and credible interval in the Decline scenario, scaling: mean, lower and upper 95% credible interval calculated across a concatenate chain of the 50 replicates. Boldface values are credible intervals that do not contain the value zero. 40% 60% 80% 𝛼0;𝑓 1.48(−4.46;11.80) 5.33(−4.06;13.01) 10.35(6.13;14.41) 𝛼1;𝑓 0.80(−2.29;5.10) 2.04(−1.13;5.93) 2.82(−0.02;5.85) 𝛼2;𝑓 −1.56(−8.68;1.26) −4.63(−9.64;1.25) −8.13(−10.31;−6.10) 𝛼0;𝑚 5.63(−3.47;12.14) 9.07(−2.60;14.01) 11.00(7.11;14.90) 𝛼1;𝑚 3.09(−1.24;7.87) 3.48(−0.04;7.29) 3.42(0.58;6.62) 𝛼2;𝑚 −5.53(−9.86;1.06) −7.71(−10.61;0.97) −8.74(−10.90;−6.79) 𝛽0;𝑓 1.92(−3.15;7.28) 1.51(−2.76;6.74) 1.47(−2.76;5.99) 𝛽1;𝑓 2.10(−2.73;7.32) 0.80(−3.51;6.80) 0.13(−3.55;5.21) 𝛽0;𝑚 2.15(−2.96;7.44) 2.22(−2.91;7.54) 2.10(−3.04;7.37) 𝛽1;𝑚 2.20(−2.78;7.41) 2.41(−2.51;7.55) 2.19(−2.56;7.37) Table 5 Mean estimate and credible interval in the Decline scenario, no scaling: mean, lower and upper 95% credible interval calculated across a concatenate chain of the 50 replicates. Boldface values are credible intervals that do not contain the value zero. 40% 60% 80% 𝛽0 6.27(1.07;14.79) 10.75(7.47;15.47) 10.40(7.87;13.69) 𝛽1 0.20(−4.16;4.85) 1.22(−3.47;5.72) 1.60(−2.07;5.21) 𝛽2 −1.41(−3.26;−0.27) −2.41(−3.41;−1.71) −2.34(−3.03;−1.81) Table 6 Mean estimate and credible interval in the Decline scenario, no covariates: mean, lower and upper 95% credible interval calculated across a concatenate chain of the 50 replicates. Boldface values are credible intervals that do not contain the value zero. 80% 𝛽0 0.65(0.50;0.80) Data availability Code and data for the simulation and analysis are available in: https://etsin.fairdata.fi/. Acknowledgments We are very grateful to the anonymous referees for their construc- tive comments, which helped us to improve the presentation of the work. The authors wish to acknowledge the Center of Scientific Computing (CSC), Finland, for generous computational resources. This work was financially supported by the Kvantum Institute, Uni- versity of Oulu. Under the project: ‘‘Fall and rise of endangered species: Detection of genomic and population ecological signals of decreasing and increasing populations’’. Appendix A Posterior estimation of the survival probability Average posterior mean, low and high credible intervals for a concatenated chain across all the replicates Posterior estimation of the entry probabilities See Figs. 13 and 14. Appendix B. Supplementary data Supplementary material related to this article can be found online at https://doi.org/10.1016/j.ecolmodel.2024.110662. https://etsin.fairdata.fi/ https://doi.org/10.1016/j.ecolmodel.2024.110662 Ecological Modelling 490 (2024) 110662D. Rondon et al. Fig. 13. Entry probability (𝑠2∶5,𝑡) Decline scenario: Average posterior mean of the possible entry probabilities across all the replicates. Top (Bottom) row represent entry probability as Female (Male) and left (right) column represent entry probability as Young (Adult). Three cases for modeling the survival probabilities: scaling, no scaling and no covariates are shown with colors: red, blue, and green; where the type of lines refers to the different levels of data seen, being 40% (dotted), 60% (dashed), and 80% (solid lines), respectively. Fig. 14. Entry probability (𝑠2∶5,𝑡) Bottleneck scenario: Average posterior mean of the possible entry probabilities across all the replicates. Top (Bottom) row represent entry probability as Female (Male) and left (right) column represent entry probability as Young (Adult). Three cases for modeling the survival probabilities: scaling, no scaling and no covariates are shown with colors: red, blue, and green; where the type of lines refers to the different levels of data seen, being 40% (dotted), 60% (dashed), and 80% (solid lines), respectively. Table 7 Mean estimate and credible interval in the Bottleneck scenario, scaling: mean, lower and upper 95% credible interval calculated across a concatenate chain of the 50 replicates. Boldface values are credible intervals that do not contain the value zero. 40% 60% 80% 𝛼0;𝑓 2.48(−3.61;10.54) 3.78(−2.27;12.18) 2.92(−2.16;12.87) 𝛼1;𝑓 −0.26(−4.34;4.75) −0.52(−4.03;4.19) −1.12(−4.15;3.63) 𝛼2;𝑓 −1.84(−8.41;1.61) −2.30(−8.96;1.56) −1.24(−9.34;1.70) 𝛼0;𝑚 3.34(−4.29;10.54) 3.69(−3.94;11.38) 6.54(−3.29;13.46) 𝛼1;𝑚 1.17(−2.92;5.44) 1.36(−2.26;5.46) 1.43(−1.92;4.89) 𝛼2;𝑚 −3.36(−8.68;1.47) −3.60(−9.23;1.46) −5.38(−10.21;1.53) 𝛽0;𝑓 2.10(−2.86;7.28) 2.61(−2.34;7.78) 2.86(−2.08;8.00) 𝛽1;𝑓 1.84(−3.21;7.20) 2.79(−1.92;7.81) 3.15(−1.46;8.07) 𝛽0;𝑚 1.73(−3.35;7.11) 2.43(−2.60;7.70) 2.72(−2.31;7.93) 𝛽1;𝑚 2.00(−2.59;7.13) 2.82(−1.71;7.76) 3.17(−1.32;8.04) 12 Table 8 Mean estimate and credible interval in the Bottleneck scenario, no scaling: mean, lower and upper 95% credible interval calculated across a concatenate chain of the 50 replicates. Boldface values are credible intervals that do not contain the value zero. 40% 60% 80% 𝛽0 0.40(−0.16;1.03) 3.77(0.79;16.34) 14.81(8.73;23.58) 𝛽1 0.16(−1.44;1.84) −0.79(−6.32;1.57) −3.69(−8.57;0.63) 𝛽2 −0.25(−0.40;−0.11) −1.04(−4.21;−0.28) −3.83(−6.01;−2.30) Ecological Modelling 490 (2024) 110662D. Rondon et al. Table 9 Mean estimate and credible interval in the Bottleneck scenario, No covariates: mean, lower and upper 95% credible interval calculated across a concatenate chain of the 50 replicates. Boldface values are credible intervals that do not contain the value zero. 80% 𝛽0 0.28(0.09;0.46) References Abadi, Fitsum, Gimenez, Olivier, Arlettaz, Raphaël, Schaub, Michael, 2010. An as- sessment of integrated population models: bias, accuracy, and violation of the assumption of independence. Ecology 91 (1), 7–14. http://dx.doi.org/10.1890/08- 2235.1. Aitken, Sally N., Whitlock, Michael C., 2013. Assisted gene flow to facilitate local adaptation to climate change. Annu. Rev. Ecol. Evol. Syst. 44, 367–388. http: //dx.doi.org/10.1146/annurev-ecolsys-110512-135747. Armstrong, Doug P, Parlato, Elizabeth H, Egli, Barbara, Dimond, Wendy J, Kwikkel, Renske, Berggren, Åsa, McCready, Mhairi, Parker, Kevin A, Ewen, John G, 2021. Using long-term data for a reintroduced population to empirically estimate future consequences of inbreeding. Conserv. Biol. 35 (3), 859–869. http://dx.doi. org/10.1111/cobi.13646. Armstrong, Doug P., Seddon, Philip J., 2008. Directions in reintroduction biology. Trends Ecol. Evol. 23 (1), 20–25. http://dx.doi.org/10.1016/j.tree.2007.10.003. Baey, Charlotte, Smith, Henrik G, Rundlöf, Maj, Olsson, Ola, Clough, Yann, Sahlin, Ull- rika, 2023. Calibration of a bumble bee foraging model using approximate Bayesian computation. Ecol. Model. 477, 110251. http://dx.doi.org/10.1016/j.ecolmodel. 2022.110251. Beaumont, Mark A., 2010. Approximate Bayesian computation in evolution and ecology. Annu. Rev. Ecol. Evol. Syst. 41, 379–406. http://dx.doi.org/10.1146/annurev- ecolsys-102209-144621. Benson, John F, Mahoney, Peter J, Vickers, T Winston, Sikich, Jeff A, Beier, Paul, Riley, Seth PD, Ernest, Holly B, Boyce, Walter M, 2019. Extinction vortex dynamics of top predators isolated by urbanization. Ecol. Appl. 29 (3), e01868. http://dx. doi.org/10.1002/eap.1868. Blomqvist, Donald, Pauliny, Angela, Larsson, Mikael, Flodin, Lars-Åke, 2010. Trapped in the extinction vortex? Strong genetic effects in a declining vertebrate population. BMC Evol. Biol. 10 (1), 1–9. http://dx.doi.org/10.1186/1471-2148-10-33. Bozzuto, Claudio, Biebach, Iris, Muff, Stefanie, Ives, Anthony R, Keller, Lukas F, 2019. Inbreeding reduces long-term growth of alpine ibex populations. Nat. Ecol. Evol. 3 (9), 1359–1364. http://dx.doi.org/10.1038/s41559-019-0968-1. Caballero, Armando, 2020. Quantitative Genetics. Cambridge University Press. Caughley, Graeme, 1994. Directions in conservation biology. J. Anim. Ecol. 215–244. Cole, Diana J., 2020. Parameter Redundancy and Identifiability. Chapman and Hall/CRC. de Valpine, P., Turek, D., Paciorek, C.J., Anderson-Bergman, C., Temple Lang, D., Bodik, R., 2017. Programming with models: writing statistical algorithms for general model structures with NIMBLE. J. Comput. Graph. Statist. 26, 403–417. http://dx.doi.org/10.1080/10618600.2016.1172487. Domingue, Michael J., Teale, Stephen A., 2007. Inbreeding depression and its effect on intrinsic population dynamics in engraver beetles. Ecol. Entomol. 32 (2), 201–210. http://dx.doi.org/10.1111/j.1365-2311.2007.00854.x. Ergon, Torbjørn, Borgan, Ørnulf, Nater, Chloé Rebecca, Vindenes, Yngvild, 2018. The utility of mortality hazard rates in population analyses. Methods Ecol. Evol. 9 (10), 2046–2056. http://dx.doi.org/10.1111/2041-210X.13059. Fitzpatrick, Sarah W, Bradburd, Gideon S, Kremer, Colin T, Salerno, Patricia E, Angeloni, Lisa M, Funk, W Chris, 2020. Genomic and fitness consequences of genetic rescue in wild populations. Curr. Biol. 30 (3), 517–522. http://dx.doi.org/ 10.1016/j.cub.2019.11.062. Frankham, Richard, Ballou, Jonathan D, Ralls, Katherine, Eldridge, Mark Derek Bruce, Dudash, Michele R, Fenster, Charles B, Lacy, Robert C, Sunnucks, Paul, 2017. Ge- netic Management of Fragmented Animal and Plant Populations. Oxford University Press. Gimenez, Olivier, Morgan, Byron J.T., Brooks, Stephen P., 2009. Weak identifiability in models for mark-recapture-recovery data. In: Thomson, David L., Cooch, Evan G., Conroy, Michael J. (Eds.), Modeling Demographic Processes in Marked Populations. Springer. Springer US, Boston, MA, pp. 1055–1067. http://dx.doi.org/10.1007/978- 0-387-78151-8_48. Godwin, Joanne L, Lumley, Alyson J, Michalczyk, Łukasz, Martin, Oliver Y, Gage, Matthew JG, 2020. Mating patterns influence vulnerability to the extinction vortex. Global Change Biol. 26 (8), 4226–4239. http://dx.doi.org/10.1111/gcb. 15186. Hedrick, PW, Robinson, JA, Peterson, Rolf O, Vucetich, John A, 2019. Genetics and extinction and the example of isle royale wolves. Animal Conserv. 22 (3), 302–309. http://dx.doi.org/10.1111/acv.12479. 13 Keller, Lukas F., Waller, Donald M., 2002. Inbreeding effects in wild populations. Trends Ecol. Evol. 17 (5), 230–241. http://dx.doi.org/10.1016/S0169-5347(02)02489-8. Kéry, Marc, Schaub, Michael, 2011. Bayesian Population Analysis using WinBUGS: A Hierarchical Perspective. Academic Press. Kyriazis, Christopher, Wayne, Robert, Lohmueller, Kirk, 2021. Strongly deleterious mu- tations are a primary determinant of extinction risk due to inbreeding depression. Evol. Lett. 5 (1), 33–47. http://dx.doi.org/10.1002/evl3.209. Lande, Russell, 1993. Risks of population extinction from demographic and envi- ronmental stochasticity and random catastrophes. Am. Nat. 142 (6), 911–927. http://dx.doi.org/10.1086/285580. Lang, Stefan, Brezger, Andreas, 2004. Bayesian P-splines. J. Comput. Graph. Statist. 13 (1), 183–212. http://dx.doi.org/10.1198/1061860043010. Latter, B.D.H., Robertson, Alan, 1962. The effects of inbreeding and artificial selection on reproductive fitness. Genet. Res. 3 (1), 110–138. http://dx.doi.org/10.1017/ S001667230000313X. Lebreton, Jean-Dominique, Cefe, R. Pradel, 2002. Multistate recapture models: mod- elling incomplete individual histories. J. Appl. Stat. 29 (1–4), 353–369. http: //dx.doi.org/10.1080/02664760120108638. Leclerc, Camille, Bellard, Céline, Luque, Gloria M, Courchamp, Franck, 2015. Over- coming extinction: understanding processes of recovery of the tibetan antelope. Ecosphere 6 (9), 1–14. http://dx.doi.org/10.1890/ES15-00049.1. Leimu, Roosa, Mutikainen, PIA, Koricheva, Julia, Fischer, Markus, 2006. How general are positive relationships between plant population size, fitness and genetic variation? J. Ecol. 94 (5), 942–952. http://dx.doi.org/10.1111/j.1365-2745.2006. 01150.x. Lesaffre, Emmanuel, Lawson, Andrew B., 2012. Bayesian Biostatistics. John Wiley & Sons. Marr, AB, Arcese, P, Hochachka, WM, Reid, Jane Margaret, Keller, LF, 2006. Interactive effects of environmental stress and inbreeding on reproductive traits in a wild bird population. J. Anim. Ecol. 75 (6), 1406–1415. http://dx.doi.org/10.1111/j.1365- 2656.2006.01165.x. Newman, Ken, King, Ruth, Elvira, Víctor, de Valpine, Perry, McCrea, Rachel S, Mor- gan, Byron JT, 2022. State-space models for ecological time-series data: Practical model-fitting. Methods Ecol. Evol. 14 (1), 26–42. http://dx.doi.org/10.1111/2041- 210X.13833. Nietlisbach, Pirmin, Muff, Stefanie, Reid, Jane M, Whitlock, Michael C, Keller, Lukas F, 2019. Nonequivalent lethal equivalents: Models and inbreeding metrics for unbiased estimation of inbreeding load. Evol. Appl. 12 (2), 266–279. http://dx.doi.org/10. 1111/eva.12713. Ovaskainen, Otso, Meerson, Baruch, 2010. Stochastic models of population extinction. Trends Ecol. Evol. 25 (11), 643–652. http://dx.doi.org/10.1016/j.tree.2010.07.009. Pimm, Stuart L, Jenkins, Clinton N, Abell, Robin, Brooks, Thomas M, Gittleman, John L, Joppa, Lucas N, Raven, Peter H, Roberts, Callum M, Sexton, Joseph O, 2014. The biodiversity of species and their rates of extinction, distribution, and protection. Science 344 (6187), 1246752. http://dx.doi.org/10.1126/science.1246752. R Core Team, 2021. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, https://www.R-project.org/. Rabiner, Lawrence R., 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77 (2), 257–286. Ripple, William J, Wolf, Christopher, Newsome, Thomas M, Hoffmann, Michael, Wirsing, Aaron J, McCauley, Douglas J, 2017. Extinction risk is most acute for the world’s largest and smallest vertebrates. Proc. Natl. Acad. Sci. USA 114 (40), 10678–10683. http://dx.doi.org/10.1073/pnas.1702078114. Robb, Gillian N, McDonald, Robbie A, Chamberlain, Dan E, Reynolds, S James, Harrison, Timothy JE, Bearhop, Stuart, 2008. Winter feeding of birds increases productivity in the subsequent breeding season. Biol. Lett. 4 (2), 220–223. http: //dx.doi.org/10.1098/rsbl.2007.0622. Robert, Christian, Casella, George, 2010. Introducing Monte Carlo Methods with R. Springer. Robinson, Jacqueline, Kyriazis, Christopher C, Yuan, Stella C, Lohmueller, Kirk E, 2023. Deleterious variation in natural populations and implications for conservation genetics. Annu. Rev. Anim. Biosci. 11, 93–114. http://dx.doi.org/10.1146/annurev- animal-080522-093311. Royle, J. Andrew, Dorazio, Robert M., 2012. Parameter-expanded data augmentation for Bayesian analysis of capture–recapture models. J. Ornithol. 152 (2), 521–537. http://dx.doi.org/10.1007/s10336-010-0619-4. Saccheri, Ilik, Kuussaari, Mikko, Kankare, Maaria, Vikman, Pia, Fortelius, Wilhelm, Hanski, Ilkka, 1998. Inbreeding and extinction in a butterfly metapopulation. Nature 392 (6675), 491–494. http://dx.doi.org/10.1038/33136. Tibbits, Matthew M, Groendyke, Chris, Haran, Murali, Liechty, John C, 2014. Auto- mated factor slice sampling. J. Comput. Graph. Statist. 23 (2), 543–563. http: //dx.doi.org/10.1080/10618600.2013.791193. Zucchini, Walter, MacDonald, Iain L., Langrock, Roland, 2016. Hidden Markov Models for Time Series: An Introduction using R. Chapman and Hall/CRC. http://dx.doi.org/10.1890/08-2235.1 http://dx.doi.org/10.1890/08-2235.1 http://dx.doi.org/10.1890/08-2235.1 http://dx.doi.org/10.1146/annurev-ecolsys-110512-135747 http://dx.doi.org/10.1146/annurev-ecolsys-110512-135747 http://dx.doi.org/10.1146/annurev-ecolsys-110512-135747 http://dx.doi.org/10.1111/cobi.13646 http://dx.doi.org/10.1111/cobi.13646 http://dx.doi.org/10.1111/cobi.13646 http://dx.doi.org/10.1016/j.tree.2007.10.003 http://dx.doi.org/10.1016/j.ecolmodel.2022.110251 http://dx.doi.org/10.1016/j.ecolmodel.2022.110251 http://dx.doi.org/10.1016/j.ecolmodel.2022.110251 http://dx.doi.org/10.1146/annurev-ecolsys-102209-144621 http://dx.doi.org/10.1146/annurev-ecolsys-102209-144621 http://dx.doi.org/10.1146/annurev-ecolsys-102209-144621 http://dx.doi.org/10.1002/eap.1868 http://dx.doi.org/10.1002/eap.1868 http://dx.doi.org/10.1002/eap.1868 http://dx.doi.org/10.1186/1471-2148-10-33 http://dx.doi.org/10.1038/s41559-019-0968-1 http://refhub.elsevier.com/S0304-3800(24)00050-4/sb10 http://refhub.elsevier.com/S0304-3800(24)00050-4/sb11 http://refhub.elsevier.com/S0304-3800(24)00050-4/sb12 http://refhub.elsevier.com/S0304-3800(24)00050-4/sb12 http://refhub.elsevier.com/S0304-3800(24)00050-4/sb12 http://dx.doi.org/10.1080/10618600.2016.1172487 http://dx.doi.org/10.1111/j.1365-2311.2007.00854.x http://dx.doi.org/10.1111/2041-210X.13059 http://dx.doi.org/10.1016/j.cub.2019.11.062 http://dx.doi.org/10.1016/j.cub.2019.11.062 http://dx.doi.org/10.1016/j.cub.2019.11.062 http://refhub.elsevier.com/S0304-3800(24)00050-4/sb17 http://refhub.elsevier.com/S0304-3800(24)00050-4/sb17 http://refhub.elsevier.com/S0304-3800(24)00050-4/sb17 http://refhub.elsevier.com/S0304-3800(24)00050-4/sb17 http://refhub.elsevier.com/S0304-3800(24)00050-4/sb17 http://refhub.elsevier.com/S0304-3800(24)00050-4/sb17 http://refhub.elsevier.com/S0304-3800(24)00050-4/sb17 http://dx.doi.org/10.1007/978-0-387-78151-8_48 http://dx.doi.org/10.1007/978-0-387-78151-8_48 http://dx.doi.org/10.1007/978-0-387-78151-8_48 http://dx.doi.org/10.1111/gcb.15186 http://dx.doi.org/10.1111/gcb.15186 http://dx.doi.org/10.1111/gcb.15186 http://dx.doi.org/10.1111/acv.12479 http://dx.doi.org/10.1016/S0169-5347(02)02489-8 http://refhub.elsevier.com/S0304-3800(24)00050-4/sb22 http://refhub.elsevier.com/S0304-3800(24)00050-4/sb22 http://refhub.elsevier.com/S0304-3800(24)00050-4/sb22 http://dx.doi.org/10.1002/evl3.209 http://dx.doi.org/10.1086/285580 http://dx.doi.org/10.1198/1061860043010 http://dx.doi.org/10.1017/S001667230000313X http://dx.doi.org/10.1017/S001667230000313X http://dx.doi.org/10.1017/S001667230000313X http://dx.doi.org/10.1080/02664760120108638 http://dx.doi.org/10.1080/02664760120108638 http://dx.doi.org/10.1080/02664760120108638 http://dx.doi.org/10.1890/ES15-00049.1 http://dx.doi.org/10.1111/j.1365-2745.2006.01150.x http://dx.doi.org/10.1111/j.1365-2745.2006.01150.x http://dx.doi.org/10.1111/j.1365-2745.2006.01150.x http://refhub.elsevier.com/S0304-3800(24)00050-4/sb30 http://refhub.elsevier.com/S0304-3800(24)00050-4/sb30 http://refhub.elsevier.com/S0304-3800(24)00050-4/sb30 http://dx.doi.org/10.1111/j.1365-2656.2006.01165.x http://dx.doi.org/10.1111/j.1365-2656.2006.01165.x http://dx.doi.org/10.1111/j.1365-2656.2006.01165.x http://dx.doi.org/10.1111/2041-210X.13833 http://dx.doi.org/10.1111/2041-210X.13833 http://dx.doi.org/10.1111/2041-210X.13833 http://dx.doi.org/10.1111/eva.12713 http://dx.doi.org/10.1111/eva.12713 http://dx.doi.org/10.1111/eva.12713 http://dx.doi.org/10.1016/j.tree.2010.07.009 http://dx.doi.org/10.1126/science.1246752 https://www.R-project.org/ http://refhub.elsevier.com/S0304-3800(24)00050-4/sb37 http://refhub.elsevier.com/S0304-3800(24)00050-4/sb37 http://refhub.elsevier.com/S0304-3800(24)00050-4/sb37 http://dx.doi.org/10.1073/pnas.1702078114 http://dx.doi.org/10.1098/rsbl.2007.0622 http://dx.doi.org/10.1098/rsbl.2007.0622 http://dx.doi.org/10.1098/rsbl.2007.0622 http://refhub.elsevier.com/S0304-3800(24)00050-4/sb40 http://refhub.elsevier.com/S0304-3800(24)00050-4/sb40 http://refhub.elsevier.com/S0304-3800(24)00050-4/sb40 http://dx.doi.org/10.1146/annurev-animal-080522-093311 http://dx.doi.org/10.1146/annurev-animal-080522-093311 http://dx.doi.org/10.1146/annurev-animal-080522-093311 http://dx.doi.org/10.1007/s10336-010-0619-4 http://dx.doi.org/10.1038/33136 http://dx.doi.org/10.1080/10618600.2013.791193 http://dx.doi.org/10.1080/10618600.2013.791193 http://dx.doi.org/10.1080/10618600.2013.791193 http://refhub.elsevier.com/S0304-3800(24)00050-4/sb45 http://refhub.elsevier.com/S0304-3800(24)00050-4/sb45 http://refhub.elsevier.com/S0304-3800(24)00050-4/sb45 Kansi_Rondon_etal-2024-A_Bayesian_multi-state_model_with_data_augmentation 1-s2.0-S0304380024000504-main A Bayesian multi-state model with data augmentation for estimating population size and effect of inbreeding on survival Introduction Material and Methods Multi-state model for data analysis Survival probabilities Model building Simulation of test scenarios Simulated scenarios Model assessment Results Estimation of population size Posterior estimation of survival probability Posterior estimation of the entry probabilities Discussion CRediT authorship contribution statement Declaration of competing interest Data availability Acknowledgments Appendix A Posterior estimation of the survival probability Posterior estimation of the entry probabilities Appendix B. Supplementary data References