Skip to main content
Enterprise AI Analysis: GEPOC Parameters Open Source Parametrisation and Validation for Austria Version 2.0

Demographic Modelling

GEPOC Parameters Open Source Parametrisation and Validation for Austria Version 2.0

GEPOC, short for Generic Population Concept, is a collection of models and methods for analysing population-level research questions. For the valid application of the models for a specific country or region, stable and reproducible data processes are necessary, which provide valid and ready-to-use model parameters. This work contains a complete description of the data-processing methods for computation of model parameters for Austria, based exclusively on freely and publicly accessible data. In addition to the description of the source data used, this includes all algorithms used for aggregation, disaggregation, fusion, cleansing or scaling of the data, as well as a description of the resulting parameter files. The document places particular emphasis on the computation of parameters for the most important GEPOC model, GEPOC ABM, a continuous-time agent-based population model. An extensive validation study using this particular model was made and is presented at the end of this work.

0 Key Parameter Sets Derived
0 Data Sources Integrated
0 Validation Scenarios Executed

Executive Impact & Strategic Value

The GEPOC framework and its robust parametrization for Austria provide unparalleled granularity for demographic analysis, enabling precise decision-making in public health, urban planning, and resource allocation. By leveraging open-source data and transparent methodologies, GEPOC fosters reproducibility and adaptability, essential for dynamic policy environments.

0 Data Accuracy (Population SC2)
0 Forecast Horizon (Year)
0 Historic Data Coverage (Year)

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

The core of GEPOC's data harmonization relies on robust disaggregation algorithms to transform coarse-resolution data into the finer granularity required by the agent-based model. This process addresses inconsistencies in temporal, spatial, age, and sex resolutions across diverse data sources.

One-Sided Disaggregation: This method applies when one dataset is strictly finer than another. The goal is to elevate the resolution of the coarser dataset to match the finer one, making assumptions about the underlying distribution. Two primary techniques are used:

  • Proportional Disaggregation: The most straightforward approach, where values are disaggregated proportionally based on a reference distribution from the fine-resolution dataset. This method perfectly conserves the distribution.
  • Integer-Valued Disaggregation (Huntington-Hill): Used when whole numbers must be conserved (e.g., initial population). This apportionment problem is solved using the Huntington-Hill strategy, an iterative method known for its fairness.

Two-Sided Disaggregation: This addresses scenarios where two datasets each have resolution deficiencies in different dimensions. The result harmonizes data on the joint finest resolution, essentially estimating a distribution given its marginals. Key techniques include:

  • Iterative Proportional Fitting (IPF): A well-known method to find a positive matrix matching given row and column sums. It iteratively divides an initial estimate by target sums, leading to convergence. IPF is extended to 3D (IPF-3D) for more complex data entanglements, like internal migration with origin, destination, age, and sex.
  • Disaggregation of Integer-Valued Marginals: Although not currently applied in GEPOC parameterisation, specialized algorithms exist for integer-valued two-sided disaggregation, motivated by connecting pins on a board via strings to adjust marginals.

Transforming raw census counts into individual-level probabilities and rates is crucial for agent-based modeling. GEPOC employs established demographic formulas, adapted for its specific needs, to derive these parameters accurately.

Event Probabilities (Xp): Defined as the probability that an event occurs to/for a person of a specific sex and age, in a given region and year, before their next birthday. This definition aligns with classic death probability concepts and the GEPOC model's dynamic update mechanism. It is used for deaths (Dp), external emigrations (Ep), and births (Bp).

Event Rates (Xr): Represent the average rate of an event in the course of a year. While intuitively a simple ratio of events to population size, the denominator must account for the average population over the year (Pavg) due to aging. For example, the age-dependent rate of fertility (Brm) is a key indicator.

Probabilities from Rates and Census (Farr's Death Rate Formula): Due to the mismatch between population at the start of the year and the population 'at risk' for an event during the year, direct calculation of probabilities is complex. GEPOC uses a modern version of William Farr's formula (Theorem 3.1) to accurately estimate death probabilities. This formula accounts for the expected year-of-life spent by a person in their age cohort given they will die, using a parameter α(a). A value of α(a)=0.5 is typically used for a>0, while α(0)>0.9 for infants due to higher early-life mortality.

Life Tables and Life-Expectancy (LE): Calculated using the computed death probabilities (Dp) and the Sullivan method. This provides the life expectancy of a person at their a-th birthday in a given region and year. The process involves recursively computing population and death series, population at-risk, and cumulative population years at risk, ultimately yielding the life expectancy vector.

GEPOC ABM Population Parametrization (Pˆ)

9.5 Average Validity Score (0-10) for Municipal-level Data (2002-y0)

The population data (Pˆ) serves as the micro-census foundation. For the most recent historical period (2002-y0) at the finest spatial resolution (Viennese registration-districts), the data processing achieves an impressive 9.5 average validity score, reflecting statistical linkage with population data from Vienna. Forecasts (y0+1 to 2101) maintain a score of 8, demonstrating robust extrapolation methods using distributions from recent years. The Huntington-Hill disaggregation method ensures integer-valued results, critical for agent-based modeling.

GEPOC Parameter Calculation Workflow

Source Data Collection (Statistics Austria)
Population Harmonization (Huntington-Hill)
Birth & Death Census Generation (Proportional Disaggregation)
External Migration (Emigrants/Immigrants) Processing
Internal Migration Flows (IPF 2D/3D)
Probability Derivation (Farr Formula)

The GEPOC parameter calculation follows a structured workflow to ensure high-quality, harmonized data for the agent-based model. It integrates diverse data sources, applies advanced disaggregation techniques, and derives probabilities necessary for simulating demographic events. Each step ensures consistency across spatial, temporal, age, and sex resolutions.

Validation of Core Demographic Events (2000-2024)
Event SC1 (Geography Only) SC2 (Full Regional IM)
Population (Total) 0.37% (max diff) 0.01% (max diff)
Births (Total Female) 4.68% (max diff) 2.42% (max diff)
Deaths (Total Female) 3.10% (max diff) 2.05% (max diff)
Emigrants (Total Female) 8.08% (max diff) 7.66% (max diff)

A comparative validation against Statistics Austria's ground-truth data from 2000-2024 shows the performance of SC1 (Geography only) versus SC2 (Full Regional Internal Migration). SC2 consistently demonstrates lower maximum relative differences across total population, births, and deaths, highlighting the critical role of internal migration modeling in achieving higher accuracy. For emigrants, both models show higher fluctuations, particularly around the COVID-19 pandemic period, but SC2 still offers a marginal improvement.

Internal Migration (Mˆ) Full Regional Model (1996-2100)

8 Validity Score (0-10) for Raw Data Matching

The full regional internal migration model (Mˆ) provides the most comprehensive view of migration flows, detailing movements between regions by age and sex. Using 3D Iterative Proportional Fitting (Algorithm 4.4) with existing OD, IE, and II data as marginals, the process achieves a validity score of 8 for raw data matching (2002-y0). Forecasts for internal migrants (y0 to 2100) are scored at 2.5 due to reliance on federal-state level forecasts and IPF matching, indicating the challenges of projecting fine-grained internal movements without detailed age-sex resolution in source forecasts.

Internal Migration Data Harmonization Process

Merge External Migration Sources (S_em1, S_em2)
Refine Age/Sex Resolution (S_emc with Proportional Disaggregation)
Extrapolate Forecast Period (S_emf with distribution from historic data)
Historic Data Extension (S_emb)
Aggregate to Districts_Districts Level for Stability
Compute Interregional Flows (ODˆ with IPF 2D)
Compute Internal Emigrants (ˆIE with Proportional Disaggregation)
Compute Internal Immigrants (ˆII with IPF 2D for alignment)
Compute Full Regional Migrants (Mˆ with IPF 3D)

Harmonizing internal migration data is one of the most complex tasks, requiring the integration and disaggregation of multiple sources with varying resolutions. This workflow details the intricate steps, from merging initial datasets to applying advanced Iterative Proportional Fitting (IPF 2D and 3D) algorithms to build comprehensive origin-destination-age-sex migration flows. The process ensures consistency across different migration types and prepares data for probabilistic modeling.

Validation of Internal Migrants OD (2002-2024)
Model Total Migrants (max diff) AT-1 Emigrants (max diff) AT-9 Emigrants (max diff)
SC2 (Full Regional IM) 5.24% 14.36% 8.16%
SC3 (Biregional IM) 5.26% 14.36% 8.26%
SC4 (Interregional IM) 5.65% 14.36% 7.02%

Comparison of internal origin-destination (OD) migration flows between the full regional (SC2), biregional (SC3), and interregional (SC4) models against Source 5.15 data for 2002-2024. All models show similar maximum deviations for total migrants and for specific federal states like AT-1 and AT-9, suggesting that for high aggregation levels, the choice of internal migration model has limited impact on overall numerical accuracy. However, deeper analysis (e.g., age profiles per region) would reveal significant differences in representation fidelity, particularly where interregional models simplify age dependencies.

The Challenge of Forecast Data Resolution

Problem: Future demographic forecasts (e.g., population beyond 2024, births/deaths beyond current historical data) are often provided by statistical offices at coarser regional and/or age resolutions than historical data. This creates a significant challenge for agent-based models that require fine-grained, individual-level parameters for long-term simulations.

Solution: GEPOC addresses this by using the most recent high-resolution historical data as a 'key' distribution for disaggregating coarser forecast data. For instance, population forecasts at the federal-state level are disaggregated to municipality-district level using the observed distributions from recent years. For births and deaths, optimization problems are solved to fit age-dependent rates to total forecast numbers and mean age at childbearing, often using Gaussian bell curve fits for age distributions. This approach ensures consistency with available forecasts while maintaining the required model granularity.

Impact: This strategy allows GEPOC ABM to extend simulations into the far future (up to 2100) with plausible demographic dynamics, despite limitations in external forecast data resolution. While the validity score for forecast periods might be lower than for historical periods (e.g., population forecast at 8, compared to historical at 9.5), it enables long-run scenario analysis critical for policy planning. The approach provides a robust framework for handling data heterogeneity and projecting fine-grained trends.

Calculate Your Potential ROI

Estimate the tangible benefits of implementing GEPOC's advanced demographic modeling in your organization.

Estimated Annual Savings
Hours Reclaimed Annually

Your GEPOC Implementation Roadmap

A typical GEPOC integration involves a phased approach, ensuring seamless adoption and maximum value realization for your enterprise.

Phase 1: Discovery & Data Integration

Initial consultation to understand your specific demographic modeling needs. Assessment of existing data infrastructure and identification of relevant public and proprietary data sources. Development of a tailored data integration strategy for GEPOC parametrization.

Phase 2: Parametrization & Model Calibration

Application of GEPOC's advanced algorithms to harmonize, disaggregate, and process your data into ready-to-use model parameters. Calibration and validation of the GEPOC ABM for your specific regional context, ensuring high accuracy and reliability against historical data.

Phase 3: Scenario Design & Simulation Execution

Collaborative design of custom simulation scenarios based on your strategic questions (e.g., impact of policy changes, population shifts). Execution of agent-based simulations, generating high-resolution projections of demographic trends and outcomes.

Phase 4: Insights & Strategic Recommendations

Comprehensive analysis of simulation results, identifying key insights and potential impacts. Translation of model findings into actionable strategic recommendations and robust forecasts to inform your enterprise decision-making processes.

Ready to Transform Your Demographic Insights?

Leverage GEPOC's advanced open-source framework for unparalleled precision in population-level analysis. Schedule a free 30-minute consultation to explore how GEPOC can empower your strategic initiatives.

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs


AI Consultation Booking