A DATA MODEL FOR A GIS-BASED FOREST INFORMATION Raito Paananen Metsäntutkimuslaitoksen tiedonantoja 493 The Finnish Forest Research Institute. Research Papers 493 A DATA MODEL FOR A GIS-BASED FOREST INFORMATION SYSTEM Paikkatietojärjestelmään perustuvan metsätietojärjestelmän tietomalli Raito Paananen Finnish Forest Research Institute Department of Forest Production Metsäntutkimuslaitoksen tiedonantoja 493 The Finnish Forest Research Institute. Research Papers 493 Vantaa 1994 2 Paananen R. 1994. A data model for a GIS-based forest information system. Seloste: Paikkatietojäijestelmään perustuvan metsätietojärjestelmän tietomalli. Metsäntutkimuslaitoksen tiedonantoja 493. The Finnish Forest Research Institute. Research Papers 493. 54 + 24 p. ISBN 951-40-1357-3. ISSN 0358-4283 This study concerns with conceptual data modelling methods for planning geographic information systems for forest management. The study was made in the Finnish Forest Research Institute as part of a project called The Research Forest Database and Planning System. The project was started in 1991 to develop the information processing and planning of the research forests owned by the institute. The primary objective of the project was to define and develop a GIS-based forest information and planning system. In the system both up-to-date and history information of the forests can be integrated with the experiments and conservation areas for forest planning purposes. This study was part of the project including the functional and information analysis and modelling for the new system. This report includes the conceptual data model description with an example of its implementation for handling the forest stand history information. Tämä tutkimus on toteutettu osana TUTGIS-hanketta, jossa kehitetään Metsäntutkimuslaitoksen tutkimusmetsien operatiivista tietojärjestelmää. Hankkeen tavoitteena on tuottaa paikkatietojäijestelmään perustuva tutkimusmetsien tiedonhallinta- ja suunnittelujärjestelmä, jolla hallitaan metsävarojen nykytila ja mahdollistetaan metsien tutkimuskäytön integroitu suunnittelu. Järjestelmän päätoimintoja ovat paikkaan sidotun kuvioittaisen metsävaratiedon hallinta, metsäsuunnittelu ja kokeiden hallinta. Tässä tutkimuksessa tuotettiin järjestelmän perustan muodostavat loogiset mallit tutkimusmetsien toiminnoista (toimintomalli) ja tiedoista (käsitekaavio). Tässä raportissa kuvataan käsitekaavion rakenne ja sisältö sekä esimerkki toteutetusta metsikköhistorian hallintarakenteesta. Työn tekemisen on tehnyt mahdolliseksi arvokas apu, jota ovat työn eri vaiheissa antaneet TUTGIS-hankkeen tutkijat Jussi Saramäki, Tuula Nuutinen, Kai Blauberg, Markku Juvakka, Aki Nalli, Jorma Nykänen ja Janne Soimasuo sekä Metsäntutkimuslaitoksen tutkimusalueyksikön henkilöstö. Keywords: forest information system, data modelling, GIS Publisher: The Finnish Forest Research Institute; Project: 303901-0. Accepted for publication by Professor Jari Parviainen, Research Director, in February 2, 1994. Distribution: The Finnish Forest Research Institute, Department of Forest Production, P.O. Box 18, FIN-01301 Vantaa, Finland. ISBN 951-40-1357-3 ISSN 0358-4283 3 Contents Page 1 Information System development 4 1.1 Types of information 4 1.2 Types of information systems 4 1.3 IS development life cycle and methodologies 5 1.4 Data Models 6 1.5 Conceptual data modelling 8 1.5.1 Principles 8 1.5.2 ER model 8 1.5.3 Advanced data modelling concepts (the EER model) 9 1.5.4 ER/EER notations 10 1.6 Data modelling for GIS 12 1.7 Temporality in spatial databases 12 1.8 IS development in forestry 13 1.8.1 Data models 13 1.8.2 Temporality in forest information systems 14 1.8.3 Overview of the FFRI project 15 1.9 Aim of the study 16 2 Methods 16 2.1 Information Engineering 16 2.1.1 Overview of the methodology 16 2.1.2 Business Area Analysis 17 2.2 The modelling approach in this study 18 2.3 Application of the model 20 2.3.1 Inventory design 20 2.3.2 Test databases 21 3 Results 21 3.1 Research forest activities 21 3.2 ER schema of the research forests 22 3.2.1 General aspects 22 3.2.2 Spatial entities of basic mapping 23 3.2.3 Basic inventory data 25 3.2.4 Special land use data 27 3.2.5 Forest planning 30 3.2.6 Experiments 31 3.2.7 Forest stand history 32 3.2.7.1 Operation history 34 3.2.7.2 State history 35 3.2.8 Spatial entity types with interfaces to other systems 37 3.3 An application: Stand history data structures 37 3.3.1 Operation history data structures and operations 37 3.3.2 State history data structures 43 4 Discussion 46 4.1 Conceptual modelling 46 4.2 Forest stand history 47 4.3 Forest management 47 References 50 Seloste 53 Appendices 4 5 1 Information System development 1.1 Types of information The two basic concepts in information processing are data and information. Davis & Olson (1985) define data as symbols which represent, describe or record reality. Data symbols are not the same as in reality. Information (datalogical information) is defined by Davis & Olson (1985) as data that has been processed into a form that is meaningful to the recipient and is of real or perceived value in current actions or decisions. Knowledge can be seen as a special type of datalogical information, typically complex and variable information about a specific area of human activities (Virtanen 1989). Forest resource information is geographical in nature. Geographical data are referenced to locations on the earths surface by using coordinate systems. Geographic features are things that can be recognized in a map, e.g. road, river or lake. They have a location and some common descriptive information. Lehan (1986) defines feature as a physical entity that is recognized in the mans definition of reality. Spatial objects are digital representations of geographic features representing the location, geometry and topology of the feature. Location is usually defined with some specific coordinate system. Geometry refers to the dimensions and shape of the objects. Topology represents the relationships between connecting or adjacent spatial objects. The primitive spatial objects are one-dimensional points, two-dimensional lines and two-dimensional polygons. All spatial data can be reduced into these three basic primitives. A map is a set of points, lines and polygons that are defined both by their locations in space with a reference to a coordinate system and by their non-spatial descriptive attributes (Burrough 1986). 1.2 Types of information systems Information system (IS) is a collection of activities that regulate the sharing and distribution of information and the storage of data that are relevant to the organization (Batini et al. 1992). Management information system (MIS) can be defined as an integrated system for providing information to support operations, management and decision-making functions in an organization (Davis & Olson 1985). MIS uses procedural logic to manipulate data. Decision support system (DSS) is a computer based information system used to support decision making activities in situations where it is not possible or not desirable to have an automated system to perform the entire process (Ginzberg & Stohr 1982). Expert system (ES) is a computerized advisory program that attempts to imitate or substitute the reasoning processes and knowledge of experts in solving specific types of problems (Turban 1990). Usually the term knowledge base system (KB) is used interchangeably with ES. All types of information systems may be based on databases. A database is any large collection of structured data stored in a computer supporting shared access of many users. Database management system (DBMS) is a collection of software for managing 6 a database. Database together with its management software comprise a database system (Elmasri & Navathe 1989). The role of information systems in forestry organizations has been discussed in Kaila & Saarenmaa (1990). An integrated forest information system can be seen as a MIS that brings together the information important to the management of the forest. Geographic information system (GIS) may be seen as a database system in which the data are spatially indexed, and upon which a set of procedures operates in order to get answer queries about geographic features in the database (Smith et al. 1987). In GIS the spatial information is stored in information layers. Information layer is a digital overlay of uniformly attributed spatial data (Langran 1992b). Coverage is a commonly used synonym for layer. A GIS typically consists of five major components (Burrough 1986): 1. Data input and verification, which contains all the various operations to transform data from existing maps etc. to digital form. 2. Data storage and database management, which concern the way how data is stored, structured and organized and how they are perceived by the users. 3. Data output and presentation concerns the ways how data is displayed and the results of the analysis are reported to the users. 4. Data transformation includes both data manipulations and spatial analyses. The former is concerned e.g. with error corrections or changes to the form of the data while analyses are made in order to get answers to specific questions. 5. Interaction with the user (user interface). There can be seen three main stages in the development of GIS systems (Sijainninhallinnan... 1991, McLaren 1990). 1. The eldest structure is to store the spatial (locational) information in its own file management system. The attribute data are stored in a separate database system. The two systems are not integrated. 2. In the next stage, the two separate systems are connected by using an interface from the GIS to the database system with logical connections based on common identifiers. The interface can be implemented with subprogram libraries or with SQL-queries. Data can be widely accessed for various needs, but the integrity between spatial and attribute information requires specific mechanisms. 3. In the third stage both the spatial information and its attributes are stored in the same integrated database system. 1.3 IS development life cycle and methodologies Information systems development is a change process taken with respect to object systems in a set of environments by a development group to achieve or maintain some objectives (Lyytinen 1987). Information systems development life cycle can be defined as the life span beginning with the idea that a system is needed ending in the discard of the system (Connors 1992). The classical life cycle of IS development project is called the waterfall life cycle. It consists of five main phases, requirement specifications, system analysis, system design, implementation (coding, testing) and integration (operation). A typical characteristic is the linear, sequential progression from one phase to the next. This classical IS development life cycle has been 7 discussed e.g. in Batini et al. (1992), Connors (1992), Loomis (1990) and Yourdon (1989). Requirement collection and system analysis are concerned with the so-called mission of the system, i.e the application areas and the problems that the system should solve. These phases are carried out in interaction with the users. Design is concerned with the specification of the structure of the information system. Design can be divided into database design and application design. Database design is a complex process that involves several decisions at different levels. Database design is normally decomposed into conceptual, logical and physical design (Batini et al. 1992). These activities are called data modelling processes. Implementation includes the programming of the operational version of the system. Recently prototyping tools are being utilized prior to the final implementation to make simplified versions of the system to verify the needs of the users. Validation and testing are made to assure system quality and to verify that the implementation reflects the design specifications. Operation starts with the initial loading of data and ends when the system is replaced. Information systems development methodology is an organized collection of concepts, beliefs, values and normative principles supported by material resources. The purpose of the methodology is to help a development group successfully change object systems, to perceive, generate, assess, control, and to carry out changing actions in them (Lyytinen 1987). System development methodologies combine tools and techniques to guide the development process. While the life cycle gives a measure of project control, methodologies provide tools to improve the productivity and quality of the system analysis and design. There are various approaches to systems development. Three main categories currently in use are structured analysis and design (SA), information engineering (IE) and object-oriented analysis and design (OOA,OOD). For a brief comparison of the methodologies, see Fichman & Kemerer (1992). In structured analysis methodologies the emphasis lies on the modelling of processes. For a typical SA presentation, see Yourdon (1989). Information engineering is a comprehensive methodology that extend the data-oriented approach to the entire development life cycle. IE is developed by James Martin (see e.g. Martin 1990). The OOA methodologies also rely on information modelling but they encapsulate data and behaviour: all processes are encapsulated within objects (Fichman & Kemerer 1992). 1.4 Data Models Data modelling is an activity where a data model is applied to derive a logical organization of data that is documented in a schema (Klein & Hirschheim 1987). A data model is respectively a way of perceiving, organising and describing data. Elmasri & Navathe (1989) define data model as a set of concepts that can be used to describe the structure of a database. Shlaer & Mellor (1988) consider data model (or as they name it information model) to be a thinking tool used to aid in the formalization 8 of knowledge. Tsichritzis & Lochovsky (1982) state that data models enable us to capture the meaning of data as related to the meaning of the world in an appropriate amount which is adequate for the desired use of the data. Data models define general rules for the specification of the structures of the data and also of the operations that are allowed on the data. Data models can be categorized according to the level of abstraction used. Usually three levels are distinguished (Elmasri & Navathe 1989, Hull & King 1987): 1. Conceptual models. These are the most high-level models. The term semantic data model is also commonly used. Conceptual models describe the logical structure of the data for a community of users. Concepts that are used are common in the language used in the problem domain. A specific term, Universe of Discourse, is used in this context (Klein & Hirschheim 1987, Wieringa 1989). Universe of Discourse, UoD, is a slice of real world (a mini world) in the problem domain containing a set of entities which are of interest to the relevant people. A conceptual model is an abstract entity which embodies a common understanding among the relevant people of the UoD (Wieringa 1989). A conceptual data model usually contains an organization of concepts and a graphical notation suitable for describing and defining the vocabulary and conceptualizing of the problem domain. The most common conceptual data model is the ER (Entity-Relationship) model. 2. Physical models are the most low-level models. These models provide concepts that describe how the data is stored in the computer. Record formats, indexes and access paths are typical structures of these models. 3. Implementation models or logical models are placed between the two former abstraction levels. Implementation models provide concepts that can be understood by the end user but which are not far from the way data is organized in the computer. The three typical implementation models used in databases are relational, hierarchical and network models. The result of applying a data model to a specific problem domain is a schema, which is a description of a data collection in a chosen abstraction level. The schemas can be defined in three levels according to the three-schema architecture (the ANSI/SPARC architecture, Tsichritzis & Klug 1978): 1. The internal schema represents the physical storage structure of the database. This schema is produced applying the physical data model. 2. The conceptual schema describes the Universe of Discourse in question. This schema can be produced by using the conceptual data model or the implementation data model. 3. The external schema (also called user view) describes the database from the viewpoint of a group of users. This schema can also be produced using the conceptual data model or the implementation data model. An external schema normally contains parts of the conceptual schema. 9 1.5 Conceptual data modelling 1.5.1 Principles The aim of conceptual modelling is to specify an explicit conceptual model of the Universe of Discourse. A conceptual model provides a formal basis for common understanding of the UoD. It defines the allowable ways in which information about the UoD can be stored (and manipulated) and provides a basis for interpretation of external and internal syntactical forms which represent information about the UoD (Wieringa 1989). There is a relation between a conceptual schema, the UoD it represents and information systems that implement the schema. Conceptual schema is an abstract mathematical structure expression of the UoD. Wieringa (1989) presents three roles of conceptual models in the development and use of an information system. Firstly, a conceptual model represents the possible entity types, their possible states, processes and interaction in the UoD. This is the descriptive role of the model. There can be many different databases (occurrences) which correspond to a certain schema. The rules for generating the schema specify properties that must be true for all occurrences of the schema. (Tsichritzis & Lochovsky 1982). In addition to the descriptive role, a conceptual model may also have normative or institutional roles. It may include or create rules to specify what is permitted, forbidden or obliged in certain situations in the UoD (Wieringa 1989). Conceptual models encapsulate structural aspects of objects (Hull and King 1987). The advantages of conceptual models lie in the support of database design and evolution. They provide a variety of abstraction mechanisms and trough them serve as a buffer between the form of requirements collected from the users and the low-level computer-oriented form of record-oriented physical models. The abstraction mechanisms of conceptual models include classification, identification, specialization, generalization, aggregation and association (Batini et al. 1992, Elmasri & Navathe 1989). Classification (categorization) involves classifying similar objects into object classes (entity types). Identification is a process, where all abstract concepts and real objects are made uniquely identifiable by means of an identifier. Specialization is a process where class of objects is further divided into subclasses, i.e conceptual refinement. Generalization is inverse to specialization, a conceptual synthesization process where several object classes are combined into a higher level abstract class. Aggregation is used to build composite objects from their component objects. Association is used to associate objects from several independent classes, i.e the definition of relationships among classes. 1.5.2 ER model The most common conceptual data model is the Entity-Relationship (ER) model. It was first introduced by Chen (1976). After that numerous extensions have been proposed to the ER model. Elmasri & Navathe (1989) incorporate their most 10 important concepts into the ER model and call the resulting model the enhanced ER or EER model (also extended ER model). The basic object that the ER model represents is an entity, which is a thing in the real world with an independent existence. The existence can be physical or conceptual. Each entity has particular properties, called attributes, that describe and identify it. Each attribute is associated with a value set (domain), which specifies the set of values that can be assigned to that attribute. A set of entities that have the same attributes define an entity type. Relationship is an object that connects one or more entities. A relationship type among a number of entity types is a set of associations among entities of these entity types. Associations indicate that the participating entities are related to each other some way in the real word. The degree of a relationship type is the number of participating entity types. A relationship type of degree two is called binary and relationship type of degree three ternary. Relationship types usually have constraints that limit the possible combinations of entities participating in relationship instances. These constraints are derived from the miniworld situation represented by the relationships. These constraints are called cardinality ratios and participation constraints. Together they make up the structural constraints of a relationship type. Cardinality ratio constraint specifies the number of relationship instances of one relationship type that an entity can participate in. Common cardinality ratios for binary relationship types are 1:1, 1:N, and M:N. The participation constraint specifies whether the existence of an entity depends on its relation to another entity via the relationship. There are two types of participation constraints, total and partial. Total participation means that every entity of an entity type must be related to another entity via the relationship. Partial participation means that only some part of entities of an entity type are related to another entity via the relationship. Relationship type with total participation is also called mandatory and relationship type with partial participation respectively optional. 1.5.3 Advanced data modelling concepts (the EER model) The EER model contains all the modelling concepts of the ER model. In addition to these, it includes the concepts of subclass and superclass. In some cases an entity type may have numerous additional subgroups of its entities that are meaningful and need to be represented explicitly because of their significance to the database application. For example, an area delineated in a forest map (a compartment) may be further grouped into forest stands, agricultural areas, lakes etc. The set of entities in each subgrouping is a subset of the entities that belong to the compartment entity type, e.g. a forest stand is also a compartment. These subgroupings are called subclasses of the entity type compartment and compartment is called the 11 superclass for each of these subclasses. The process of defining subclasses for an entity type is called specialization. The reason for defining subclasses may be that a subclass has specific attributes, or a subclass may participate in specific relationship types. For example, the data collected from of a forest stand may differ from the data of an agricultural area, or sample plots are measured only in a forest stand. An important concept associated with subclasses is attribute inheritance. An entity that is a member of a subclass inherits all the attributes of the entity as a member of the superclass. It also inherits all relationship instances for relationship types in which the superclass participates. For superclass/subclass relationship types there also exist constraints of disjointness and completeness. If a superclass entity can be a member of at most one of its subclasses, the subclasses are disjoint, e.g. a compartment may be either forest or lake, but not both. If the subclasses are not disjoint, their set of entities may overlap. The same entity may be a member of more than one subclass of the specialization. Completeness constraint includes two alternatives. Total specialization specifies that every entity in the superclass must be a member of some subclass in the specialization. Partial specialization allows an entity not to belong to any of the subclasses. 1.5.4 ER/EER notations Three typical ER notations are shown in Fig. 1. All show entity types forest and tree and the relationship between them. In notation a) (the so-called Chen notation see e.g. Elmasri & Navathe 1989)) entity type FOREST STAND includes attributes ID, AREA and SITE. The underlined attribute ID is the key attribute (identifier). Entity type TREE has two attributes, NR (key attribute) and SPECIES. There is a relationship type CONTAINS between the two entity types indicating that in a forest stand there may be trees. The pair of integer numbers in the relationship line associate the cardinality ratios. The numbers (o,n) mean that a FOREST STAND may contain zero or more trees. Zero because it is assumed that clearcuts are also forest stands. From the TREE point of view, numbers (1,1) mean that each tree must be associated exactly to one forest stand. Generally, we can associate a pair of integer numbers (min., max.) with each participation of an entity type in a relationship type, where min ge 0, min le max, and max ge 1. This notation includes both the cardinality ratio and the participation constraint. Min = 0 implies partial participation and min > 0 implies total participation. 12 Figure 1. Examples of alternative ER notations: a) Chen , b) IE , c) Bachman Notation b) is used e.g. in Information Engineering (A Guide... 1990). In the notation attributes are not displayed. Relationship types are depicted as single lines joining the entity type boxes. The name(s) of the relationship type can be displayed along the line. The cardinality of the relationship type is depicted in the ends of the relationship lines as follows: - a bar that crosses the line perpendicularly indicates only one (min. = 1, max. = 1) - a crows foot at the end of the line indicates one or more (min. = 1, max. = n). The maximum number is not displayed in the diagram. The participation constraint is depicted as a circle on the relationship line next to the cardinality symbol. The circle indicates partial participation. When the circle is omitted, it indicates total participation. Notation c) is the so-called Bachman notation (Bachman 1969). The notation is close to the IE notation, with the exception of different cardinality symbols. One arrow indicates only one, double arrow one or more. The participation constraint is a respective circle as in lE. The subclass/superclass relationships can be represented in the notations by defining 1:1 optional relationships between the superclass and its subclasses. 13 1.6 Data modelling for GIS The entity-relation modelling is a general tool in business world database design. Laurini & Thompson (1992) note that there exist only a few examples of its use in geographic information systems design. Structures analysis methodology has been used in some projects (e.g. Bulger & Hunt 1991). An ER diagram is normally included in the structured analysis tools. Armstrong (1988) presents the use of entity-category relationship diagrams in the design of temporal spatial databases. Laurini & Thompson (1992) state that semantic data modelling could provide appropriate tools to identify the complex data structures of geographic information. The essential part of the modelling is to choose the presentation of spatial objects (point, line or polygon). Effective modelling for spatial databases requires attention to various other elements, such as the cardinalities of associations. Many-to-many relationships are quite common in geographic information and they must be decomposed and treated carefully. For example, a forest stand delineated in the inventory may belong to two or more forest lots and one lot always consists of many stands. The same situation exists between forest lots and owners. The implementation data modelling of spatial databases is discussed in various papers. For relational model, see e.g. van Roessel (1987). For object-oriented spatial data modelling, see Worboys et al. (1990). For a Finnish review of relevant articles, see Sijainninhallinnan... (1991). 1.7 Temporality in spatial databases Maps are usually two-dimensional. Normally attributes are considered as the third cartographic dimension and time as fourth dimension. Maps describe geographic entities. Each entity within the modelled system has location, attributes and a lifespan (time when the entity exists). Usually one of the components is fixed, one is controlled and only one can be measured on an interval or ratio scale (Langran 1992b). On a traditional map time is fixed. Attributes are included using different symbols and tones. Only location can be measured. Langran (1992b) presents the following conceptions of cartographic time: 1. The space-time cube represents one time and two space dimensions in a theoretical three-dimensional cube. The model can be implemented in a CAD-system without topological relations. 2. Sequent snapshots of time slices represent changes as a series of states. It doesn't, however, represent the events that change the state. Also several changes between adjacent snapshots are not detected. 3. A third image of geographic time is a base state with amendments superimposed. Instead of states the model records change with its type and timing. 4. Space-time composite includes accumulated geometric changes in one coverage. It is based on a base state of some starting point representing the geometry and topology of the coverage in a chosen time. Each change (with distinct spatial location and extent) causes the geographic objects to break down into discrete objects with own distinct history. The resulting spatial objects (e.g. polygons) 14 represent the greatest common spatiotemporal units including distinct temporal attribute sets. Space-time composite reduces three dimensions (location, attributes, time) into two, so space can be treated atemporally and time can be treated aspatially. Temporality is an attribute of spatial objects. The alternatives of handling the aspatial attribute information in a relational data model include relation-level versioning, tuple-level versioning and attribute-level versioning (Langran 1989, 1992b). Relation-level temporality creates and stores a new snapshot of a table when any of its attributes change. There exists various versions of tuple-level temporality handling. E.g. each tuple is supplied with time stamps denoting its lifespan. New tuples are appended to the relation without deleting existing ones. Attribute-level versioning supplies each attribute with respective time stamps requiring variable-length fields to store lists of attribute versions. 1.8 IS development in forestry 1.8.1 Data models There exist numerous IS development projects where data modelling and systems analysis methodologies have been applied. The projects presented below are closely related to this work. Large conceptual models have been produced in Finland for forestry purposes in the Finnish Forest and Park Service (Paikkatietojärjestelmien... 1989), in the Finnish Forest Research Institute (Saarenmaa et ai. 1990) and in the Ministry of Agriculture and Forestry (Metsätalouden... 1991). The Finnish Forest Research Institute (FFRI) has defined an information strategy for the whole institute. The strategy was formulated by studying the existing IS and future requirements using the lE-methodology. During the work two conceptual models, activity model and data model, were produced. The activity model described hierarchically the basic functions and work processes of the organization. The data model was an entity-relationship diagram of the data objects that are used by the activities. IS architecture was derived by forming a matrix of the supplementary activities and subject areas, arranging it by logical dependencies (create, update, delete) and defining information systems from the arranged matrix. The same methodology as in the FFRI project was applied when an information architecture (containing data and business models) was developed for the whole forestry sector. The architecture is based on distributed processing of geographically referenced information in the forestry organizations. Nalli (1992) has made a conceptual model to describe geographic information in multiple use forestry. He modelled the different spatial objects that should be considered in multiple use planning into point, line and polygon entity types and also discussed the principles of defining relationships between the spatial entity types. 15 In the Netherlands information and business models for an average forest enterprise have been produced using the IE methodology to make the knowledge of information modelling and management available for the forest enterprises (Borsboom & Six Dijkstra (1992). The Ontario Ministry of Natural Resources and ESRI Canada have developed a forest management decision support system based on ARC/INFO GIS-tools (Bulger & Hunt 1991). In the development process, structured analysis and design methodology with a CASE tool have been utilized. Kaila & Saarenmaa (1990) state that one general descriptive model can be defined for forestry, where all activities and data objects can be described. This definition can be made using formal conceptual modelling procedures. From this description can data flows and applications be derived. Each forest organization can use only those parts of the descriptive model needed for its business. 1.8.2 Temporality in forest information systems Temporality in forest GIS has been studied by Armstrong (1988), Bulger & Hunt (1991), Langran (1992 a) and Kennett (1992). There are two sorts of forest GIS temporality: events that may cause change and changes to the forest state itself (Langran 1992 a). The temporal aspects of forest management can be summarized as follows: - Compartmentwise forest inventories present sequent snapshots of the state of the forest. The state is described with spatial objects (forest stand polygons) and respective descriptive attributes - Change caused by the continuous processes (growth, mortality) can be assumed to be aspatial. It can be predicted and calculated using growth models. - Discrete events that cause change include silvicultural activities, fires, insect infestation, and unusual weather, such as windstorms. They have spatial and temporal location and extent (Langran 1992 a). Temporal location concerns the state of the event, whether it is completed, underway or planned. Temporal extent spans the period when an event is performed. Spatial location and extent are defined by the area in which the event occurs. These events may cause that for one geographic entity 1. only spatial attributes change (e.g. a part of a stand is detached to build a new entity), 2. only feature attributes change (no spatial change), 3. both spatial and feature attributes vary, 4.a new geographic entity is established. Langran (1992 a) introduces alternative ways of handling forest temporal information. 1. Simplest method is to include narrative descriptions of activities in the database records. 2. Snapshots of the forest states are taken by rasterizing the vector data. 3. Third is the space-time composite method discussed in Chapter 1.7. 4. Feature histories are described by treating the attributes that describe different versions of the feature as separate database records. Geometric change is treated by referencing the correct spatial objects to each version of the feature as it changes over time. There are some problems associated with the implementation of silvicultural operation history. 1. There are many types of activities and they all have different sets of 16 descriptive attributes. For example, cutting is described with method and outturn while regeneration attributes may include tree species, plant type, and planting density. These attributes can not be stored in a common database table. Furthermore the different types of activities must be examined together to get the information needs fulfilled. So it is not sufficient to create separate information layers for each activity type. 2. The same type of activity may occur on the same place many times, e.g. planting and repair planting. 3. Third problem is associated with the update of the inventory layer. An area may be regenerated using various methods (seeding, planting) with a homogenous result from the inventory perspective in some ten years. Some kind of generalization procedure must be applied to combine the regeneration polygons to basic inventory polygons. Kennett (1992) introduces a practical silvicultural operation history model. Spatial data control is based on the ideas of space-time composite being a result of spatial overlay operations. The attribute data is stored in the relational database tables. A specific master table and individual activity data tables are separated. The master table forms the linkage between the spatial database and the activity attribute tables. Bulger & Hunt (1991) present a temporal forest GIS-solution based on two principles, time stamped layers and database transaction history. Time stamping includes a mechanism that copies all the old (retired) data into a history layer. Transaction processing provides transaction tables for the recording of all events which change the database (both thematic and spatial data). An accumulated transaction history allows the reconstruction of the database as it existed at any point of time in the past. 1.8.3 Overview of the FFRI project This study was made in the Finnish Forest Research Institute as part of a project called The Research Forest Database and Planning System. The project (named TUTGIS) was started in 1991 to develop the information processing and planning of the research forests owned by the institute. The primary objective of the project was to define and develop a GIS-based forest information and planning system. In the system both up-to date and history information of the forests can be integrated with the experiments and conservation areas for forest planning purposes. For an overview of the project and its background, see Nuutinen (1991) and Paananen & Nuutinen (1993). This study is part of the project including the functional and information analysis and modelling for the new system. The Finnish Forest Research Institute (FFRI) has some 150 000 hectares of state forest as its disposal. The forest area is used mainly for experimental purposes. The areas that are not in experimental use are managed according to the law concerning state forests. The new strategy for the forests is presented in Metsäntutkimuslaitoksen... 1993. In the forests there are 2 300 experiments. Some 65 000 hectares are set aside for conservation purposes. The conserved areas include national parks, nature reserves etc. The forests are divided into research areas for management purposes. 17 When the project started the forests were managed according to the management plans. Management planning was made for each research area approximately every tenth year. Planning was based on a compartmentwise forest inventory using aerial photographs and field surveys. During the inventory 1:10 000 forest maps were produced using a vector based mapping system. The stand descriptions were stored in sequential attribute files used for statistical calculations. The attribute files and mapping systems were loosely integrated to produce thematic maps. For the experiment data there existed a register that contains general information about the purpose and approximate location of the experiments. The system had some shortcomings. The inventory data were not updated, so the information about the stands may be over 10 years old. There was no accurate information about the location of the experiments, or the information was spread over numerous organizational units. There was no automated system for collecting silvicultural operation history with the exception of regeneration data that has been collected into manual files. 1.9 Aim of the study The aim of this study was to define and test a logical descriptive data model of the primary data objects for the inventory, planning and monitoring of research forests. The problem can be decomposed into 3 subproblems: 1. To analyse the requirements of research forests activities for information systems. 2. To present the structure and contents of the FFRI forest management information system in a conceptual schema. 3. To design and test a GIS-based database according to the conceptual data model with special respect to a) basic inventory data, b) stand history data and c) integration of experiment and stand data. There are three underlying assumptions of the model and modelling method: 1. The model is independent of the implementation. There may be many different databases (occurrences) which correspond to the descriptive model (Wieringa 1989). 2. The resulting schema contains only basic data elements of the activities. Derived or aggregated data is excluded. 3. The descriptive model can successfully be utilized in the implementation of operational forest information systems. 2 Methods 2.1 Information Engineering 2.1.1 Overview of the methodology Information Engineering methodology (IE) is a pragmatic, business-oriented methodology that considers the entire enterprise (Martin 1990). IS is seen as a support to achieve the strategic goals of the organization. IE has seven stages, five of which address various levels of information system development (A Guide... 1990). The 18 stages are Information Strategy Planning (ISP), Business Area Analysis (BAA), Business System Design (BSD), Technical Design (TD), Construction, Transition and Production. During ISP a broad view of the information requirements of the business is established (for an example of the use of ISP tools, see Saarenmaa et ai. 1990). In BAA stage a more detailed analysis on a particular segment of a business (business area) is performed and in BSD an application system is described supporting a segment of a particular business area in detail disregarding the target computing environment. During technical design the results of business system design are tailored to a specific target computing environment. The characteristics of the hardware environment, operating system and DBMS are considered. In construction stage all of the executable components of a system are created, e.g. programs, databases and screen formats. Transition refers to the installation of the system in a production environment, possibly replacing existing systems or parts of them. lEF is an automated implementation of the IE methodology. It is a set of tools to capture the information needs of high abstraction levels to transform them into executable application system (A Guide... 1990). The lEF supports currently five first of the above listed stages. In this study, a business area analysis was made to analyse the requirements of the research forest system. Corresponding lEF tools were applied on OS/2 operating environment. 2.1.2 Business Area Analysis Business area analysis involves the definition and refinement of the activities a business performs (called business functions and business processes), the things with which it deals (entities) and the interaction between the two. It is a refinement of a subset of the information architecture developed in the ISP stage. BAA is used to identify and define the business activities that make up business functions, data required for each business activity, the sequence of business activities and how business activities affect the data (A Guide... 1990). The main tasks performed to achieve the objectives are data analysis, activity analysis and interaction analysis. In data analysis, the data used to represent the relevant things to the business (entities) and their interrelationships are defined resulting in a conceptual data model. In activity analysis, business functions are examined to determine the processes they comprise. The result of the task is an activity model. In interaction analysis the effect of activities on data are analysed and presented in an interaction model. lEF provides tools to perform the above mentioned tasks. Data is modelled by building an Entity Relationship Diagram (ERD). Business activities are modelled hierarchically on an Activity Hierarchy Diagram (AHD). Interaction between activities and data can be modelled using data/activity matrices and action diagrams. Data modelling in IEF includes some specific concepts not mentioned in Chapter 1.5.3. In this study subject areas and partitionings were applied. Subject area is defined 19 as an area of interest to the enterprise centred on a major resource, product or activity (A Guide... 1990). It consists of a set of entity types closely related to each other. Subject areas illustrate the essential structure of the schema. An alternative way to represent subclass specializations in lEF is the use of partitionings. Partitioning is defined as a basis for subdividing entities of one type into sublasses (A Guide... 1990). The classification of a particular entity along a partitioning is based on the value of a specific attribute of the entity being partitioned. The attribute is called classifying attribute. For example, the partitioning of forest map compartments into forest stands, lakes etc. could be based on classifying attribute land cover class. Partitionings can be presented in lEFs ER diagrams by using boxes where the subclass boxes are placed inside. 2.2 The modelling approach in this study In this study three ER notations are applied. Full IE notation with subject areas and partitionings is used in Appendix 1 where the whole model is presented. In Figs. 2-9 Bachman notation is used. A modified Chen notation is applied in Fig. 10. The requirements of the system were collected by analysing current systems, interviewing researchers and those responsible for managing the research forests. There exist numerous reports and instructions concerning the use of research forests (Metsäntutkimuslaitoksen... 1985, Tutkimusaluetyöryhmän... 1989 a, Metsäntutkimuslaitoksen... 1989b, Metsäntutkimuslaitoksen... 1993). The conceptual schema of the FFRI forest management UoD represents it at a certain level of abstraction. Parts of the UoD are not modelled at all, and of the chosen parts some were modelled to a deeper level of detail. The objectives of the project affected the selection. The following selections were made: 1. The detailed integrity constraints were excluded of the schema. Constraints between attributes (e.g. the allowed values of trees height-diameter value combinations) were not defined in the schema. 2. Attributes were defined only for part of the schema. Those attributes that could be obtained from the FFRI forest management UoD or those that were necessary and useful in the test database design were included in the schema. For the included attributes the following properties were defined: 1. type (numeric or text variable), 2. description of the attribute, 3. length of the attribute values in bytes, 4. optionality of the attribute (are null values allowed), 5. possible default value, 6. attribute domain (permitted values). From a data modelling perspective, geographic features are entities that have in addition to the descriptive attributes three specific (spatial) attributes, location, geometry and topology. Location is an identifying attribute. In this study, certain geographic entity types were specialized into two entity types, spatial and feature entity types. The conceptual division is discussed e.g. in Langran (1992b). Spatial entity type (spatial objects) includes geometric and topological 20 attributes of the geographic entity and the feature entity type includes the corresponding thematic attributes. Primitive spatial entity types (geometric primitives) are point, line and polygon. All other spatial entity types can be seen as subclasses of them. For example, the subclasses of the entity type polygon represent the thematically distributed geographic objects that are implemented in distinct information layers. The entities of one spatial entity type subclass have same feature attributes (relationships to corresponding feature entities) and they belong to the same logical group of geographic objects. The subclasses are assumed to be disjoint. The approach was based on the assumption of the implementation. The most common approach in geographic information system construction is to segregate the aspatial (feature) data from the spatial data keeping the last in special structures while storing the feature attribute information in relational database tables (see Chapter 1.2). In this study, those feature attributes that were suggested to be stored in relational database tables, were specialized and placed in feature entity types. The relationships between spatial and feature entities were defined using normal ER relationship types. So the model is not purely conceptual, as implementation aspects are taken into consideration. The principle is illustrated in Fig. 2. Spatial information is stored in information layers. Some of these layers may have a connection to some database tables containing the feature attributes. The aim of this study was not to describe the technical details how the geographic information is exactly stored in these layers. The approach aimed at finding the logical groupings of geographic features and their relations. Figure 2. Basic entity types of the geographical information schema. 21 Figure 3. Relationship types of spatial and feature entity types. The relationship types between spatial and feature entity types may have different structural constraints. This is illustrated in Fig. 3. In 1:1 relationship type for each spatial entity there is one corresponding feature entity type. E.g. an experiment plot may have one measurement data set (the latest measurement). 1:N represents the case where a spatial entity type has several attribute sets, e.g. an operation history polygon includes data of several completed operations. N:1 represents the case where one geographic feature comprises of 2 or more spatial objects. E.g. a forest stand is represented by 2 spatial polygons because a road divides the stand into two parts. M:N may be like N:l, in addition to it there may be feature attributes from several points of time. 2.3 Application of the model 2.3.1 Inventory design For the part of the ER schema that concerns basic inventory data a detailed analysis was made to define the classifications of the stand characteristics. The development of the classification was based on the current inventory contents, information needs and existing stand characteristic classifications of the National Forest Inventory (Valtakunnan metsien... 1986) and Finnish Forest and Park Service (PATl maastotyöohje... 1991). After the detailed analysis an inventory method with field guides and data collecting forms were designed. The new inventory design was tested in the Kivalo research forest. A total of 2000 hectares was inventoried during summer 1992. The inventory design is presented in Juvakka (1993). 22 2.3.2 Test databases The FFRI has purchased Ingres relational database management system (RDBMS) and ARC/INFO GIS-system. The research forest information system was built upon these two commercial software systems. ARC/INFO is a file oriented geographic database system. It contains modules for both raster and vector data storage and handling. Spatial data storage is based on hierarchical data model and the information layer principle. In ARC/INFO spatial and attribute data are stored separately and are connected by using an interface from the GIS to the database system. The attribute data can be stored either in ARC/INFOs own tabular database (INFO) or in an external relational database. Connections between spatial features and an external database tables are established using the so-called database integrator. Database integrator allows ARC/INFO applications to view and use various external relational databases. External attribute tables can be stored in the DBMS and related to the spatial objects. A relate makes a connection between a record in the so-called spatial feature attribute table and a corresponding row in the related attribute table. Spatial feature attribute tables include internally attributes concerning location, geometry and topology. Users may add their own attributes, eg. the logical identifiers. An item (relate item in ARC/INFO terminology) in one spatial feature attribute table is used as a relate key (common attribute) to a corresponding column in the related table. The relate may connect two ARC/INFO tables, ARC/INFO and external database table, or two external database tables (so-called stacked relate). (Managing tabular... 1991.) From the Kivalo inventory material a geographical database was designed and implemented. The database consists of tables in Ingres relational database system and information layers in ARC/INFO. The relational database tables were based on a part of he lEF data model (subject area basic inventory data). An lEF transformation module was utilized to generate the data description language (DDL) statements of the database tables from the model. The primary statements had to be modified to get suitable statements for Ingres RDBMS. The ARC/INFO-layers were defined during the digitizing process using the data model (subject area spatial data). The background information was digitized from basic maps (1:10 000) and stand boundaries were digitized from field transparencies laid over the basic maps. Each element (e.g. roads, administrative lines) was stored in its own information layer according to the data model. A total of 27 layers were created during the data input. For the forest history structures a small test database was designed and implemented using materials of Kivalo area. The test database and applications designed in the project are documented in future TUTGIS publications. 3 Results 3.1 Research forest activities The main activities of research forests are: 1. goal definition, 2. basic information processing, 3. forest management planning, 4. forestry operative control, 5. forest operations, 6. conservation area planning, 7. conservation area operative control, 8. conservation area operations, 9. estate management, 10. experiment planning, and 11. 23 experiment control and management. These activities were further detailed into functions and processes. The most important was the modelling of basic information processing, an activity that serves all other main activities. 3.2 ER schema of the research forests 3.2.1 General aspects The whole ER schema is included in Appendix 1, and in Figs 4-9 parts of the schema are extracted and rearranged to illustrate basic inventory data, special land use data, planning, experiments and stand history schemas. In the schema of Appendix 1, subject areas are used to aggregate spatial entities and feature entities into logical groups. All spatial entity types are included in subject area spatial data. The entity types not included in subject area spatial data represent the feature entity types data that are connected to the spatial entities to build complete geographic objects. Some of them refer to various databases or systems that were defined in the FFRI information system architecture (Saarenmaa et ai. 1990). In Appendix 1, the feature entities of basic forest inventories are grouped in subject area basic inventory data. The land ownership system (estates) are included in subject area kihti referring to a respective separate system. Special land use spatial entity types are grouped into two subject areas, administrative land use layers and functional land use layers. The corresponding feature entity types for both spatial subject areas are included in subject area land use data. The planning system has an interface to the TOTTI system. The information strategy defines TOTTI to be a system of planning and budgeting. The interrelationships of points, lines and polygons are not illustrated in Fig. 2 and in Appendix 1 except the entity type line point, which is a specific subclass of point that comprises lines. The subclasses of polygon, point and line entity types inherit the attributes concerning location, geometry and topology. Because these attributes are internal structures of the GIS database, they were not included in the conceptual schema. Some spatial entity types are subclasses of both line and point entity types. This means that some geographic features may be presented either by lines or points and may be included in one information layer. For each point, line and polygon subclass, the most important attribute is the identification attribute (named ID). With polygons it refers to the user-defined identifier of the polygon (e.g. forest stand number). With lines and points it refers to the type of entity (e.g. type of road). The ID coding system (three-digit numbers) was derived from the Nalle mapping system (Nalle-metsäkarttaohjelmisto 1984). Raster data was modelled to be a specific subclass of point named cell having special attributes row and column (for the identification of location) and standard size and shape (i.e. cells cover constant areas). 24 3.2.2 Spatial entities of basic mapping Basic mapping schema consists of entities that are used to store the background map information collected by the National Board of Survey. The modelling of them was based on the new conceptual model of basic map information (Maastotietojen... 1993). The types of basic map entities were adjusted to the model of the basic map. The various spatial objects of the basic maps were modelled into separate entities. A total of 23 entity types were defined from that specialization. The textual information of basic maps (e.g. names) is assumed to be connected in each entity type it concerns (in ARC/INFO it can be implemented using the so called annotation data types). The basic map entity schema is presented in Fig. 4. Entity types ADMINISTRATIVE LINE (forest lot boundaries, commune boundaries etc.), ESTATE POLYGON (forest lots owned by the FFRI) and BOUNDARY MARK (points) refer to the land ownership system. Some entity types can be either points or lines. BUILDING may be presented either by its outer edge (line) or by digitizing the central point of the building. CONSTRUCTED STRUCTURE refers to other man made structures than buildings, eg. fenced electric power stations. Those constructed structures that are presented by lines can be built to polygons. CONTOUR ELEMENT includes contour lines and altitude points. STREAM OBJECT contains under 5 meters wide streamlines and respective point objects (e.g. springs). TERRAIN OBJECT refers to remarkable point objects (e.g. stones) and edges of rocks. The latter can be built to polygons. Point entity type TREE refers to protected or otherwise remarkable separate trees, and trees selected for breeding. Roads are presented by centre lines. They can be built to polygons (ROAD POLYGON) using buffer-operations. Those transmission lines that have an area (e.g. electric transmission lines) can respectively be buffered to polygons (TRANSMISSION POLYGON). Line entity types that represent polygon boundaries include AGRICULTURAL LAND BOUNDARY, SHORELINE (shorelines of lakes and over 5 meter wide streams) and BUILT-UP AREA BOUNDARY. These boundary lines, together with constructed structure lines and terrain lines may delineate BASIC MAPPING POLYGONS. The relationship types between those entity types illustrate this function. Basic mapping polygon is an information layer where the land cover is filled exhaustively with water areas, constructed and urban areas, agricultural areas and forests. Forest area is not divided into stands, but there may be refined classification inside the other areas. The classification is made using values of an identifying attribute (ID). Polygon entity type PEATLA> D POLYGON refers to peatland boundaries and polygons. 25 Figure 4. ER schema of the basic map entities. 26 Entity types FOREST MAP COMPARTMENT POLYGON and FOREST STAND BOUNDARY are also included in Fig. 4, although they are not basic mapping entities. The area owned by the organization is filled exhaustively with forest map compartment polygons. Some forest map compartment polygons are copied BASIC MAPPING POLYGONS, ROAD POLYGONS or TRANSMISSION POLYGONS. Most compartment polygons are comprised of FOREST STAND BOUNDARY lines. 3.2.3 Basic inventory data The ER schema of compartmentwise forest description is presented in Fig. 5 (see also Fig. 4). The spatial entities include entity types FOREST STAND BOUNDARY, FOREST MAP COMPARTMENT POLYGON and INVENTORY SAMPLE PLOT POINT. The last mentioned entity contains locational information of the inventory sample plots. Forest stand boundaries are acquired in the compartmentwise inventory and updated by the organization. For a detailed description of the attributes and domains of basic inventory data, see Appendix 2 and Juvakka (1993). FOREST MAP COMPARTMENT is the central feature entity type of the inventory data. The respective spatial entity is FOREST MAP COMPARTMENT POLYGON. The relationship is one-to-many, so a forest map compartment may consist of one or more separate spatial objects (polygons). E.g. a homogenous forest stand may be represented with 2 polygons because a road divides the stand into two parts. Forest map compartment is further classified into five subclasses according to the attribute land cover class (for definitions of land cover classes, see Juvakka 1993). 1. FOREST STAND is the most important subclass of forest map compartment. It is a forest area homogenous in respect of soil, site and growing stock. 2. AGRICULTURAL COMPARTMENT refers to fields, pastures and meadows that are used for agricultural purposes. 3. BUILT-UP COMPARTMENT refers to housing and industrial areas, gravel pits etc. 4. ROAD COMPARTMENTS include all types of communication routes, transmission lines etc. that are considered as areal entities. 5. WATER COMPARTMENT contains water areas that are over 5 meters wide. Subclasses 3-5 could also be one subclass of FOREST MAP COMPARTMENT because in this study no specific attributes or relationship types have been defined for them. The full specialization was maintained in the schema to illustrate the land cover classification. The total forest area is divided into research forests and blocks for management purposes (entity types RESEARCH FOREST and BLOCK). There is no corresponding spatial entities, and research forest and block numbers were modelled to part of the identifier of forest map compartments. The forest area is also divided into several lots according to the land ownership system of Finland. This is represented by a relationship to entity type ESTATE. The spatial entities of estates were discussed earlier. Each compartment always belongs to one lot. 27 Figure 5. ER schema of compartmentwise forest description. cr»n di rvr STONINESS MEASUREMENT MFASURFDTRFF «o "I t MODEL TREE u 1 INVENTORY ©_^, IMVPNI'iriWY 9 —©-> TREE STRATUM «© — $> e» SAMPLE PLOT IfN V CIX 1 UK 1 SAMPLE PLOT ~1 POINT DEGRADING (1 FACTOR OF «© 'l TAX CLASS V V ADDITIONAL I M POP M A TT OM J . FOREST STAND I DAMAGE «Q — — m > ESTATE 0> AGRICULTURAL <=h0 COMPARTMENT IMPLEMENTED «e— | UIXI\A 1 1VJI1 BUILT UP q COMPARTMENT / \ \ — FOREST MAP eg 1 » FOREST MAP rnUDA DTK/tHMT 1 C?I Vir>l II T1 ID A 1 L-'sr\ road > COMPARTMENT LUMrAK I mlilN I POLYGON TREATMENT PROPOSAL COMPARTMENT V WATEP < o Dl (V V BLOCK COMPARTMENT El h r SPECIAL 1 A L PROPERTY V RESEARCH FOREST 28 Basic inventory entities are already normalized (first normal form) for implementation. As a result of the normalization there exist entity types that could also be multivalued attributes of either forest map compartment or forest stand. These entity types include DEGRADING FACTOR OF TAX CLASS and SPECIAL PROPERTY. The former is an additional attribute used in forest taxation and the latter a special observation made in a compartment indicating needs for additional special inventory and planning. ADDITIONAL INFORMATION contains textual descriptive information of the forest map compartment. It can also be connected to a sample plot measured in a forest stand. FOREST STAND is related to a set of specific entities that describe it in detail. For measurements of the growing stock INVENTORY SAMPLE PLOTs are placed in stands. Plots may be either variable size plots (relascope plots) or fixed size plots. For each plot, the growing stock can be measured either treewise (MEASURED TREE) or by tree strata (TREE STRATUM). Tree strata include weighted mean values of the growing stock stratified according to tree species and canopy layer. MODEL TREEs are calculated after the inventory from stratumwise mean estimates or measure trees to add growing stock volumes and growth to the measured growing stock values. An inventory sample plot may also be a unit for soil measurements. The entity types SOIL PLOT and STONINESS MEASUREMENT refer to soil type, thickness of the horizons and stoniness measurements (soil measurements were made in the test inventory). For forest stands there may also be data concerning damages (DAMAGE), completed operations (IMPLEMENTED OPERATION) and suggested silvicultural operations (SILVICULTURAL TREATMENT PROPOSAL). Proposals may also be made for agricultural compartments (reforestation). There may also be some data collected about completed operations (IMPLEMENTED OPERATION). This data can be moved into the detailed history data structures. 3.2.4 Special land use data The ER schema of special land use data is presented in Fig. 6 and the attribute lists of the entities is listed in Appendix 3. Special land use spatial entity types were divided into two logical groups, administrative land use layers and functional land use layers. Administrative land use layers refer to areas that are based on an administrative decision (e.g. act, resolution of a government office). The boundaries of the areas are stable and the use of the areas are regulated by the decisions. Functional land use layers contain areas that are delineated in special inventories and planning. The instructions for the use of the areas can vary depending on the type and location of the specific area. The problem of overlapping land use types is solved by using the information layer concept. By defining information layers the distinct areal divisions can be maintained properly and by using GIS overlay operations they can be integrated for specific needs. 29 Figure 6. ER schema of special land use data. . ENDANGERED LANDSCAPE SPECIES _J r\ MANAGEMENT < O ( AREA GAME 1 MANAGEMENT AREA ENDANGERED * — — i r — POINT SPECIES OCCURRENCE RECREATIONAL AREA / V <-© 0 > J — ©-$» SPECIAL BIOTOPE — —Jl DESCRIPTION o Z> *c o II K DESCRIPTION «— OFOCClJRRFNrF V r OCCURRENCE Q c n ■ l l l OF ENDANGERED SPECIES 's u 1 1 —^ ■11 Y x ; TREATMENT iNicTRiimnf j BS H S ( H POLYGON / ( 1 ( \ PROTECTED r — Q — *1 r AREA POLYGON i\ A \f — r 70M C A DC A J I\W 1 1 CU C(U1*D AKbA LEASE CONTRAC LEASED AREA _r~ » e-> OTHER SPECIAL — e-> LAND USE AREA n L, PRESERVED SITE ..... Q >> 30 Polygon entity types PROTECTED AREA POLYGON, ZONE AREA, LEASED ESTATE AREA, OTHER SPECIAL LAND USE AREA and point entity type PRESERVED SITE were grouped to administrative land use areas. Protected areas contain nature reserves, national parks and special areas that are established under the Nature Protection Act. Besides those areas there are also the areas that axe part of protection programmes. The layer contains not only outer borders of the areas, but also inner subdivision based on combinations of land use classification, treatment instruction classification and access regulations classification. E.g. a part of a protected area (one polygon in the information layer) may be set aside of public access and it may be preserved untouched. The classifications are commonly used with protected areas (Luonnonsuojelualueiden... 1982). The basic feature attributes of protected areas can be obtained from external nature preserve area database (LSA database). It is maintained by the Environment Data Centre. Attribute LSA IDENTITY CODE is an identifier to the LSA database. Zone areas are results of zoning activity. There are four types of zone plans (attribute ZONE PLAN TYPE) and in each type there can be various land use classes (attribute LAND USE CLASS). The most detailed zone plan is stored in the layer. The zone area attributes are in this schema included in the spatial entity and no feature entity types are needed. Leased areas have been leased out for a fixed time. There are various types of leased areas (attribute AREA TYPE). Each lease contract is identified with a contract number. The specification of the lease contracts data contents was not included in the study. Other special land use areas are established by other administrative decisions than protected areas, typically resolutions of the FFRI. They include e.g. protected peatlands, nature management forests, wilderness areas and various areas for forest tree breeding purposes. Preserved sites are specific nature objects (located to a point) that are protected under an act or by resolution of the FFRI. These places may include e.g. relics and historic places. Most of the spatial entity types are related to feature entity types DESCRIPTION and TREATMENT INSTRUCTION. These entity types were not modelled to attribute level, and it may be difficult to formalize the heterogeneous descriptions and treatment instructions to exact data types to be utilized in forest management. Functional land use areas include polygon entity types LANDSCAPE MANAGEMENT AREA, GAME MANAGEMENT AREA and RECREATIONAL AREA. Entity types SPECIAL BIOTOPE and OCCURRENCE OF ENDANGERED SPECIES may be polygons or points. Landscape management areas are treated with special attention to landscape maintenance and development. Game management areas are respectively areas where the protection of game is considered to be an important practice (e.g. areas that are protected at nesting-time or areas where hunting is forbidden). Recreational areas are used for recreation, outdoor activities, picking of berries etc., and they are treated 31 according to the specific requirements. Special biotopes are small scale sites that are not included in administratively protected areas. They may have protectional values or they need special attention in forest management planning. Attributes were not defined in detail. Occurrences of endangered species include both plant and animal species. The areas may also be buffer zones created around and endangered species occurrence. The location, species and occurrence descriptions and treatment instructions (entity types ENDANGERED SPECIES, ENDANGERED SPECIES OCCURRENCE and DESCRIPTION OF OCCURRENCE) are suggested to be obtained from the UHEX system. UHEX is a special database for endangered species maintained by the Environment Data Centre. The attribute definitions in this schema are consistent with the UHEX definitions. E.g. for each endangered species UHEX contains data about the current state of the species (STAGE CODE), main reason for being endangered and the biotopes where the species may occur. 3.2.5 Forest planning Forest planning system utilizes the data of the basic inventory and land use. For the planning system, there are special spatial entity types defined for the storage of the plans. The planning system and its data contents are described in detail by Nuutinen (1994 a, 1994b). In this schema (Fig. 7) only the main entity types and their relationships are presented. Figure 7. ER schema of the planning entity types. 32 PLANNED OPERATIONS POLYGON denotes a short-term planning unit. The delineations of operative planning normally follow the stand boundaries, but they may sometimes be delineated regardless of the stand boundaries. OPERATION BLOCK is a set of planned operations polygons, e.g. a logging unit that contains several stand polygons. Polygon entity types TIMBER LOT and FELLING SECTION, line entity type EXTRACTION ROUTE and point entity type TIMBER STORAGE are associated to short-term planning and management of harvesting. One planned operations polygon (e.g. a cutting unit) is always treated uniformly with one or more STANDWISE OPERATIONS. Standwise operations include the descriptive data of any planned silvicultural operations, but the detailed data content is not included in this study. An OPERATION PLAN always consists of many standwise operations. An operation plan may be cutting plan or other silvicultural operation plan. TREATMENT POLYGON is a variable content spatial entity type for various planning situations. Treatment polygon layer may include unions (overlays) of information layers needed in the planning process (Nuutinen 1994b). The corresponding feature entity type, TREATMENT UNIT includes the description of the forest (originating from the basic inventory data) and keys to the goals, restrictions and treatment instructions of the polygons included in the analysis process. Treatment polygons may be copied to PLANNED OPERATIONS POLYGONS, if the planning process ends in an operational plan that is later implemented. ALTERNATIVE FOREST PLANs are based on simulation process where the stand data is updated and future development with treatment alternatives are calculated for each treatment unit. Entity type GOAL includes the management objectives that guide the planning process. DECISION MAKER is the instance that sets the goals and selects the acceptable FOREST MANAGEMENT PLAN among the alternative plans. The process with detailed data definitions is described by Nuutinen (1994b). The system system for the planning and budgeting is not defined in this context, but two entity types, ANNUAL BUDGET and ANNUAL FOREST WORKING PLAN have been included in the schema. They concern a research forest. Annual forest working plan is based on the forest operation plans and guided by planning goals. Annual budget is respectively based on the annual working plan. 3.2.6 Experiments The ER schema for the experiments is presented in Fig. 8. Two spatial entity types were defined, EXPERIMENT STAND POLYGON and EXPERIMENT UNIT. Experiment unit is the actual area of the experiment (usually permanent sample plot). It may sometimes be a single point. Experiment stand polygon denotes the effective area around the experiment unit that is treated according to the researchers' instructions. It may be a strip of fixed width around the plot or a larger area, e.g. a whole stand. 33 Figure 8. ER schema of the experiments. The feature entity types of experiments are stored in an existing relational database system called experiment database (KOEREKISTERI). The content of the register is not presented completely in this context. Only those entity types and attributes that are meaningful for the forest management are presented. For full description, see Lehto and Isomäki (1993). Attributes of EXPERIMENT STAND and EXPERIMENT STAND DESCRIPTION refer to experiment stand polygon. EXPERIMENT UNIT DESCRIPTION and EXPERIMENT UNIT GROWING STOCK correspond to experiment unit. For each experiment stand, there may be descriptions from several points of time. Respectively, there may be growing stock estimates from several points of time for one experiment unit. EXPERIMENT OPERATIONS include planned, partly implemented and completed operations of experiment stands. They are not recorded accurately in the experiment unit level, but the presented principle is enough for forest management purposes to get an indication about the location and schedules of planned experiment operations. 3.2.7 Forest stand history The ER schema for the history is presented in Fig. 9 and the attribute lists of the entities is listed in Appendix 4. The forest stand history solution is based on the space time composite principle presented by Langran (1992b). Two spatial entity types, OPERATION HISTORY POLYGON and STATE HISTORY POLYGON, were defined. 34 Figure 9. ER schema of the forest stand history. H^l KS Bfl 35 3.2.7.1 Operation history Information about implemented operations is essential in experimental forests when searching for suitable locations for different types of experiments. For example, when placing growth and yield experiments it is a necessity to know whether the stand has been fertilized in the past or not. OPERATION HISTORY POLYGON refers to the greatest common spatiotemporal units in respect of events. Events refer here to silvicultural operations and different kinds of damages. Together they are called operations. An operation history polygon has an unique history of completed operations distinct from the neighbour polygons. For each operation history polygon, there are several completed operations (OPERATION OF HISTORY POLYGON). Each operation may be one of several types (CUTTING, DRAINING, FERTILIZATION, SEEDLING STAND IMPROVEMENT, PRUNING, ARTIFICIAL REGENERATION or OCCURRED DAMAGE). Each operation is identified (in addition to time and location) using specific attributes, e.g. cuttings is identified using the cutting method and fertilizations using the type and amount of fertilizer. Because of the space-time-composite principle, each operation having spatial location and extent different from the operation history layer's existing geometry and topology causes the changed portion of the layer to break from its parent spatial object to become a distinct object with its own operational history. This principle is denoted with the 1:N mandatory relationship between all types of operations and entity type OPERATION OF HISTORY POLYGON. E.g. when a thinning is made, it may be stored as one operation history polygon and its attributes are stored in CUTTING. When the subsequent operation, e.g. pruning, is made, it may be applied only to a part of the cutting area. In the operation history coverage, the original polygon is divided into two polygons, the other including both cutting and pruning, the other one only cutting. The original entity of type CUTTING can be maintained, but it is now related to two polygons (two entities of type OPERATION OF HISTORY POLYGON). An example of ARC/INFO and relational database implementation of the schema is presented in Chapter 3.3. Entity type CUTTING has one subclass, NATURAL REGENERATION CUTTING. Natural regeneration is promoted using soil preparation and clearing. Sometimes natural regeneration needs also one or more COMPLEMENTARY PLANTINGS. The normalization of entity type CUTTING resulted in entity type CUTTING OUTTURN that includes the outturn of the cutting divided to timber assortments. The definition entity types related to entity type ARTIFICIAL REGENERATION was based on the analysis of manually maintained archives of artificial regenerations. In the schema, clearing, soil preparation and prescribed burning activities are enclosed as attributes to ARTIFICIAL REGENERATION. Each regeneration includes one or more tree species (REGENERATED TREE SPECIES). Tree species information always contains information about the seeds (SEED TYPE) and when planting is used, also information about the plants used (PLANT TYPE). Sometimes the regeneration fails and REPAIR PLANTING is needed. The regenerated area is monitored using INSPECTIONS. 36 3.2.7.2 State history The state history spatial object (STATE HISTORY POLYGON) includes the greatest common spatiotemporal units in respect of state of the forest. It follows the space time-composite concept. For each state history polygon, there may be many states of the forest (STATE OF STATE HISTORY POLYGON). The state is identified using start and finish dates and an identifier (the number of the stand during the time span). Each state is related to the description of the forest that was effective during the time span (feature entity types FOREST STAND HISTORY and HISTORY TREE STRATUM). The attribute entity type structure is based on the tuple-level versioning principle discussed in Chapter 1.7. The principles of updating the state history can be stated as follows: 1. As the stand is invented for the first time, its spatial and feature data is copied to the state history provided with starting dates (starting date = inventory date). This inventory coverage provides the so-called base state to the space-time-composite (Langran 1992b) 2. As the stand is invented again, the state history entity of first period is closed by updating the closing dates. 3. The spatial and feature data of the new inventory is copied to the state history provided with starting dates (starting date = inventory date). If the stand boundaries have changed, the coverage is fragmented using the space-time-composite approach. The update principle denotes that state history also contains the effective state of the forest (stands with missing closing dates). An inventory is in this context understood not only as an extensive region-wide inventory, but also a complete revision of the delineation and description of one stand. It normally includes new measurements of the growing stock. Such a small-scale inventory is needed e.g. after thinnings to measure the growing stock. On the other hand, corrections to existing descriptions (e.g. correction of site type) does not cause the updating of the state history. Such changes that are of less importance can be handled either by using the so-called transaction logs or by updating them directly to the state history entities (without changing the date attributes) in addition to the update of the basic inventory data. The data content of the state history (feature entity types FOREST STAND HISTORY and HISTORY TREE STRATUM) may nearly be the same as in basic inventory data. Only time stamps (closing dates) should be added. The plotwise growing stock measurements of the basic inventory may also be summed to the history to store the stratumwii.. mean values of the whole stand. The model trees are not copied to the history. 37 Figure 10. A general conceptual schema of a multitemporal compartmentwise forest inventory information. 38 A general illustration of a multitemporal compartmentwise forest inventory information model is presented in Fig. 10. The spatial data is organised according to the space-time-composite principle, so each polygon has a unique series of inventory attributes. The stand variables used and their values may both vary as function of both space and time. Entity type COMPARTMENT denotes the basic feature entity. The definitions of stand variables used in each inventory (and compartment) are stored in VARIABLE CATALOG. The actual stand attributes for each compartment and inventory are stored in MEASURED VARIABLE and the descriptions of the attribute values in VARIABLE VALUE DESCRIPTION. 3.2.8 Spatial entity types with interfaces to other systems There are also some spatial entity types included in the model that are not specified in detail. Their relationship types to feature entities present interfaces to special systems and databases (see Appendix 1). These entity types and interfaces include: 1. Spatial entity type TREE is related to JALTA systems entity type SELECTED TREE FOR BREEDING. JALTA is a system for forest tree breeding data. 2. Spatial subject area VEGETATION LAYERS and feature subject area VEGETATION INVENTORY DATA refer to vegetation inventories. The inventory data may include VEGETATION POLYGONS and VEGETATION SAMPLE PLOT POINTs. For an example of a vegetation inventory database made in Pallas-Ounastunturi national park, see Eeronheimo (1993). 3. Spatial subject area SOIL LAYERS is defined for soil surface information contents. It is a presentation similar to the digital maps of quaternary deposits made by the Geological Survey of Finland. Soil type polygons are presented with SOIL POLYGONS, and the SOIL MAP POINT denotes indefinite parts of a soil polygon where the surface soil type (0-30 cm) differs from the soil type below it. SOIL SAMPLE POINTs are points where specific measurements with drillings are made. For details of geological surveys and maps, see Haavisto (1983). 4. For the archiving of photographs, the data of PHOTOGRAPH REGISTER is located to PHOTOGRAPH POINTs. Photograph register was not included in the IS architecture. 5. SITE MAPPING POLYGON is defined for site type inventories and forest taxation purposes. 3.3 An application: Stand history data structures 3.3.1 Operation history data structures and operations The operation history data structures include one ARC/INFO coverage named PLOPHIST. It contains all the polygons of the implemented operations accumulated 39 using the space-time-composite. No base state is needed for the coverage, so it is not exhausting. The areas where no operations have been completed are null areas (polygon identifiers are set to zeroes). The polygon attribute table (PAT) definition of the coverage is presented in Table 1. In a PAT-table, first three attributes (area, perimeter. pl_ophist#) are ARC/INFO supported. The fourth (pl_ophist-id) is a user-defined identifier of the polygons. In this case, the only requirement for the identifier is that it should be unique over the coverage. For practical purposes, research forest number is included in the table to enable reasonable numbering of the polygons. Arckey is a technical attribute, it is a converted character composite of pl_ophist-id and research_forest_nr. It is used in the relates to build a single unique key to the relational database tables. In ARC/INFO the linking attribute between polygon attribute tables and external database tables must be a single attribute. A sample listing of the PAT-table is presented in Table 2. Table 1. Polygon attribute table definition of the operation history coverage. Item name = name of the attribute, width = storage size (in bytes), output width =output width(in characters), type = attribute type (F=real number, B=binary integer, I=integer( 1 byte/digit), C=character string), alternate name = optional alternative name of the attribute. Table 2. A sample listing of the PL_OPHIST.PAT- table. item name width output width type alternate name area 4 12 F perimeter 4 12 F pl_ophist# 4 5 B pl_ophist-id 4 5 B polygon-id research_forest 2 3 B arckey 8 8 C attribute polygon 1 polygon 2 area 9,861.688 959.93 (square meters) perimeter 781.691 160.963 (meters) pl_ophist# 2 3 pl_ophist-id 1 2 research_forest 112 112 arckey 11200001 11200002 40 Table 3. Relational database table definitions of the operation history. Column name = name of the attribute, type = attribute type, length = storage size (in bytes), nulls = can the attribute have missing values, defaults = does the attribute have default value (0 in integer types, blanks in character types). An example of relational database table definitions of the operation history is presented in Table 3. The tables correspond to entity types OPERATION OF HISTORY POLYGON, CUTTING and ARTIFICAL REGENERATION. The latter two were entity types not mapped completely, only a few attributes were defined in the table to illustrate the principle. The tables were named operation, cutting and regeneration. Their relations are also illustrated in Fig. 11. column name type length nulls defaults Table Operation: researchforest integer 2 no no compartment_nr integer 2 no no arckey char 8 no no area float 4 yes no operation_type char 3 no no operation_code char 7 no no Table Cutting: operation_type char 3 no no research_forest integer 2 no no cutting_nr char 7 no no method integer 2 no no additional_text char 50 yes no Table Regeneration: operation_type char 3 no no researchforest integer 2 no no regeneration_nr char 7 no no regeneration_date date no no height_above_sea integer 2 yes no site_type char 12 yes no site_class integer 1 yes no soil_type integer 1 yes no 41 Figure 11. An illustration of operation history relational database tables. 112, 2, 112( 112, 2, 11 2< operation 30002, 0.096, CUT, 000 )0002, 0.096, ARE, 000 0001 — 0464 r_c Jtti — cutting •CUT, 112, 0000001, 15 •CUT, 112, 0000002, 16 r_ophist ■112, 3, 112( 112, 6, 112( 50003, 1.927, CUT, 000 J0006, 2.298, CUT, 000 0001 — 0002 - - r_regon regeneration — ARE, 112, 0000464, 09-jun-1988, 190, KgK, 3, 13 42 The attribute operation_type of table OPERATION refers to the specific operation type and table. It is here defined as a character string, e.g. CUT for cutting and ARE for artificial regeneration. Operation code denotes a unique identifier of an operation. E.g. cutting may be numbered in a research forest area beginning from 1 and ending at 9999999. The field is a character string allowing alphanumeric identifiers, because they are used in the manual regeneration archives of FFRI. Several relates were defined to connect ARC/INFO PAT-table and corresponding Ingres-table. The definition listings of the relates are presented in Table 4. Relate type denotes the type of connection made and relate access denotes the RDBMS access mode (RW = read and write, RO = read only). Table 4. Relate definitions of the operation history. Relate R_OPHIST of Table 4 presents the basic relate between operation history coverage PAT-table and corresponding Ingres-table OPERATION. The attribute arckey presents the composite identifier of the polygons in both tables. LUOTIHIST is the name of the test history database. The relation is 1 :N, because for each operation history polygon, there may be several completed operations. The relate type FIRST denotes that for each PAT record, the first relational table row matching the relate key is returned. For browsing through all the related rows, a special approach called cursor has to be used. With cursors, an application can access the related data one row at a time. The values of columns in the current row are available for update and listing (Managing tabular... 1991). Relates R_REGEN and R_CUTTI present stacked relates between the table OPERATION and the tables that contain the specific attributes for different operations (in the illustration artificial regeneration or cutting). The relate attribute is operationnr, which in case of the regenerations denotes the regeneration number and in case of cuttings denotes the identifier of a cutting. To find and relate only the relevant operation history polygons (those having one ore several desired operations, e.g. implemented artificial regenerations) a three stage procedure has to be completed: relate table database item column relate relate name type access R OPHIST operation luotihist arckey arckey first rw R REGEN regeneration luotihist operation_nr regen_nr first rw R CUTTI cutting luotihist operationnr cutting_nr first rw RV REGE op_regen luotihist arckey arckey first rw 43 1. A View of the table OPERATION is cr