Status
|
Ready
|
Ready
|
Ready
|
Ready
|
In Progress/Ready Soon
|
In Progress/Ready Soon
|
In Progress/Ready Soon
|
Purpose
|
A quality-controlled, public available data source containing
attribute-rich data at the individual level - with the aim to create a
digital twin for every adult in the population with a large amount of
associated information about each person.
|
A variety of popualtion health indicators for small geographical units
(local authorities and LSOAs/MSOAs) for use in statistical analyses and
monitoring of area-level health inequalities.
|
This dataset was designed to provide a meaningful operationalisation of
the underlying concept of the inclusive economy at local authority
level, and to enable statistical models to further explore the concept.
|
This dataset was designed to provide a meaningful operationalisation of
the underlying concept of the inclusive economy at electoral ward level.
|
To represent a multi-dimensional measure of wellbeing, consisting of
seven indicators, in terms of a single index metric, equivalent income.
|
To elicit public preferences regarding trade-offs between improving
wellbeing and reducing inequality.
|
A dataset with a battery of self-reporting health and wellbeing
indicators from a large UK sample, oversampling from Scotland.
|
Context
|
Individual level data enable us to understand an individuals’
situations, what happens to them over time or when affected by changes
due to external events or policies. The lack of a comprehensive
register-based system in Great Britain has made it challenging to access
data on individuals across multiple domains. The SIPHER Synthetic
Population helps bridging this gap by providing a representative,
attribute-rich dataset reflecting the whole of the adult population in
Great Britain. By randomly selecting individuals from a survey and
assigning them to small geographical areas based on census statistics,
the SIPHER Synthetic Population ensures that the distribution of
demographic characteristics for all sampled individuals corresponds
exactly to the true demographic structure within each small census
output area. This enables researchers to derive area-level profiles
which would otherwise not be available. In more complex applications,
the dataset can be used to simulate policy interventions and explore
their potential impact on individuals and households at a granular
resolution, distinguishing small geographical areas such and even
population subgroups within these areas.
|
Modelling the impact of public policy on health requires a shared
understanding of how we conceptualise and measure health as an outcome.
We need a set of health indicators that are meaningful in the context of
understanding the effects of policies and interventions of interest to
SIPHER, such as those aiming to create an inclusive economy or improve
mental health. These indicators can be derived either from synthetic
data (e.g., SIPHER Synthetic Popualtion) or from non-synthetic data
sources (e.g., ONS/NRS data)
|
SIPHER has adopted a particular understanding which focuses on economic
inclusion, rather than inclusive growth. There are multiple approaches
and definitions of what constitutes an inclusive economy. To date, there
is no single definition of the concept. In response, SIPHER has
developed a collection of indicators for researchers and policymakers
which describes the extent and nature of economic inclusion across local
authorities in Great Britain. The creation of the dataset has been
informed by an initial review of the underlying theoretical concepts.
The selection and estimation of all indicators benefited from
co-production between SIPHER researchers and policy partners.
|
SIPHER has adopted a particular understanding which focuses on economic
inclusion, rather than inclusive growth. There are multiple approaches
and definitions of what constitutes an inclusive economy. To date, there
is no single definition of the concept. In response, SIPHER has
developed a collection of indicators for researchers and policymakers
which describes the extent and nature of economic inclusion across
electoral wards in Great Britain. The creation of the dataset has been
informed by an initial review of the underlying theoretical concepts.
The selection and estimation of all indicators benefited from
co-production between SIPHER researchers and policy partners.
|
SIPHER’s WS6 team has developed a wellbeing indicator set comprising
seven indicators - SIPHER-7. While SIPHER-7 describes people’s wellbeing
across these seven indicators, when some indicators improve and others
worsen, it is difficult to judge whether overall wellbeing is improving
or worsening. The purpose of this part of the project is to collapse the
multi-dimensional wellbeing indicators into a single index metric for
wellbeing, equivalent income. To do this, four surveys using Discrete
Choice Experiments (DCE) were conducted with a sample of the UK public.
Participants were asked to review a set of ten choice tasks, each
involving two imaginary scenarios described in terms of SIPHER-7, and
select which scenario they believed was better. In three of the surveys,
participants were asked to complete the tasks from a personal
perspective (i.e., which scenario they would want for themselves), and
in the remaining survey, participants were asked to complete the task
from a social perspective (i.e., which scenario they think would be
better for policy makers to bring about for others). The econometrically
estimated parameters represent the relative values given to the seven
wellbeing indicators of SIPHER-7 by samples of the UK general public.
|
Public policies aim to improve wellbeing and reduce wellbeing
inequality, but it is not always possible to do both. How do the public
balance the trade-off between improving wellbeing and reducing
inequality? The relative importance people place on increasing averages
and reducing inequalities (or “inequality aversion”) was elicited from a
sample of the UK general public (n=53). Respondents participated in one
of eleven online discussion groups, where a series of quantitative
trade-off exercises were explained and discussed. Each respondent then
completed the same exercise individually. The exercises covered aversion
to inequality in: (a) an overall measure of wellbeing (equivalent
income); (b) lifetime health across otherwise equal individuals; and (c)
lifetime health across the rich and poor.
|
Different surveys use different health outcome indicators. Therefore,
data might be available for one indicator set when another is required.
For example, answers to SF-12 survey items are available but a WEMWBS
value is required. This is a large-cross section online survey of the
general public (n=12,401) where respondents are asked to self-report
their health and wellbeing across a battery of questions. This dataset
allows the estimation of a statistical mapping algorithm between the
different indicator sets.
|
Strengths
|
The SIPHER Synthetic Population is representative of the demographic
characteristics of the respective area - down to a low geographical
resolution. The strength of the SIPHER Synthetic Population is that it
provides a wide range of information at the level of individuals. This
information can be aggregated into groupings of interest (e.g. sex,
income groups) and particular geographical units of interest (LSOA/DZ;
MSOA; Local Authorities etc.). The method used to develop the dataset is
referred to as spatial microsimulation. We often use the SIPHER
Synthetic Population in conjunction with other models we have developed.
This enables us to determine whether an intervention has benefitted a
population group of interest.
|
Small-area health indicators can be used to monitor area-level health
inequalities or as inputs in statistical models. In addition, all health
outcome measures can be attached to the Synthetic Population
representing area-level health indicators. SIPHER reviewed the available
measures and conducted a consensus process with SIPHER colleagues to
agree on a final set of indicators. The criteria used were: 1.
Interpretability -accessible & meaningful to decision makers, 2.
Sensitivity to policy – the indicator can plausibly show the effects of
policy. 3. Indicator can show impacts of pandemic on health. 4.
Timeliness – refers to the current health state. 5. Availability of
timeseries data
6. Changes in mental AND physical health can be separately studied. 7.
Regular updates into the future are expected, 8. Comparability – between
areas, ideally comparable between England & Scotland, 9. High
resolution – available for small areas with LA as a minimum, 10.
Disaggregate – available by subgroups (e.g. broken down by age, sex
etc).
|
The dataset has been subject to a thorough geographical harmonisation
and review process. In addition, the dataset contains a number of
supplementary health and demographic indicators for all local
authorities. The major strength of this dataset is the wide range of
potential applications; from descriptive analyses to studies examining
the complex relationships between economic inclusion and health and
wellbeing. The dataset is available as an open access resource via the
Open Science Framework: https://osf.io/vnsur/
|
The dataset has been subject to a thorough geographical harmonisation
and review process. In addition, the dataset contains a number of
supplementary wellbeing and demographic indicators for all local
authorities. The major strength of this dataset is the wide range of
potential applications; from descriptive analyses to studies examining
the complex relationships between economic inclusion and health and
wellbeing. The dataset is available as an open access resource via the
Open Science Framework: https://osf.io/s24ye/
|
The DCE data on relative preferences allow the calculation of equivalent
income - a quantitative preference-based single metric of wellbeing -
for any combination of SIPHER-7 indicators. The samples are large
(ranging from 1000 to 3000, totalling just under 11,000) and
representative of the UK general public in terms of age and sex.
|
Public policies aim to improve wellbeing and to reduce wellbeing
inequality. When there is a conflict between these, policy makers need
to make difficult decisions. The quantitative data on inequality
aversion is derived from discussion groups, where participants had the
opportunity to examine the trade-off exercise in detail. The results
help inform policy makers on the trade-offs between the two policy aims
that members of the public would support.
|
Different surveys have different health and wellbeing indicators, and
this dataset allows the estimation of a statistical mapping algorithm
between them. This would allow predicting SIPHER-7 information where the
relevant variables are not available.
|
Limitations
|
The accuracy of the SIPHER Synthetic Population depends on the quality
and availability of the underlying data. Some variables may have poor
completion rates in the underlying survey, resulting in missing data
after linkage. Despite the high number of participants in the
Understanding Society survey, explicit spatial constraints cannot be
applied when creating the datasaet. This means that an individual who
was interviewed as part of the survey and who is actually residing in
place X can be assigned to a variety of places A, B, and C, as long as
they match the demographic constraints such as age, sex, marital status
etc. Although recent updates of the code have led to more constraints on
how to perform this selection process, it is important to remember that
the creation of the SIPHER Synthetic Population is based on associations
and descriptive statistics. It can only ever serve as an approximation
of the true population in Scotland, England and Wales - which is likely
to be much more heterogenous and diverse than the population captured in
the synthetic data source. Therefore, all results obtained from the
SIPHER Synthetic Population should always be interpreted carefully as
model output, and not as equivalent to a population-based register.
|
The dataset cannot resolve situations where no data is available at all
or where sampling in surveys is not representative of small geographical
units.
|
For a few of the indicators, exact definitions differ between countries.
For example, there are different definitions of fuel poverty in use in
Scotland and England. In these cases, national deciles were created and
comparable alternative indicators were identified. For example, food
insecurity was used as an alternative cost-of-living indicator.
|
It should be noted that the metrics for two indicators differ from those
in the SIPHER Inclusive Economy (Local Authority) Level Dataset: (1)
Indicator 5A (poverty), low income before housing costs (BHC) was used,
rather than after housing costs (AHC); (2) Indicator 5B (cost of
living), fuel poverty was used, rather than food poverty.
|
Currently not available.
|
Currently not available.
|
Currently not available.
|
Geography
|
Individuals in the SIPHER Synthetic Population have a geography assigned
to them (a synthetic DZ/LSOA). This allows all levels of geography
upwards from DZ/LSOA Level for Scotland, England and Wales - excluding
Northern Ireland - to be analysed and modelled.
|
The exact geographical resolution is indicator-dependent. Typically, the
following resolutions are available for Mortality: DZ/LSOA Level for
Scotland, England and Wales and LA Level
|
Longitudinal (2017-2021) and geographically harmonised data is available
at the level of local authorities in England, Scotland, and Wales. The
dataset covers all 363 local authorities in Great Britain, reflecting
their 2021 boundaries according to ONS definition.
|
Longitudinal (2019-2021) and geographically harmonised data is available
at the level of electoral wards in England, Scotland, and Wales. The
dataset covers 7,973 of 8,020 wards in Great Britain, reflecting their
2022 boundaries according to ONS definition.
|
The surveys collected data from participants resident in the UK with
sampling quotas for age and for sex.
|
UK with sampling quotas for age and for sex.
|
The survey collected data from participants resident in the UK with
sampling quotas for age and for sex. Oversamples Scotland.
|
Variables / Indicators
|
A large variety of variables can be included. This includes all
variables included in the Understanding Society survey - the underlying
survey data source. It also possible to estimate other derived variables
from this data source, for example ‘Equivalent Income’, using the
‘Equivalent Income Calculator’ method.
|
The dataset includes measures of mortality, physical, and mental health,
and composite measures combining mortality and health. It is open to
data updates, and additional health indicators can be estimated and
incorporated if required.
|
Details on all indicators are outlined in the Technical Report for the
SIPHER Inclusive Economy Indicator Set – See Additional Resources.
|
Details on all indicators are outlined in the Technical Report for the
SIPHER Inclusive Economy Indicator Set – See Additional Resources.
|
In addition to the DCE choice data, the surveys include participant
self-reported data on: SIPHER-7; household size; age; gender; etc.
Surveys (1) and (2) use the original SIPHER-7. Surveys (3) and (4) use
the revised version of SIPHER-7.
|
In addition to the inequality aversion task, the survey include
participant self-reported data on: SIPHER-7; household size; age;
gender; etc.
|
The indicator sets and questions included in the survey: SIPHER-7;
ICECAP-A; EQ-5D-5L; SF-12 v2; HUI; WEMWBS; EQ-HWB; ONS-4; Understanding
Society items on crime and housing; items from the Labour Force Survey,
the Living Wage Foundation questionnaire; education, income, ethnicity,
children, informal caregiving; gender, age; etc. Includes sampling
weights to correct for age and sex with respect to the mid-year UK
population estimate.
|
Time Period
|
The latest release reflects the years 2019-2021. Results from the UK
census 2011 are used as constraints for the spatial microsimulation -
the process generating the Synthetic Population. Preliminary updated
version for England and Wales are available which are based on the UK
census 2021. However, Scotland has not yet published all required input
data from its most recent census.
|
DZ/LSOA/MSOA Level: typically, cross-sectional representing the period
covered by the synthetic population. Local Authority level: typically,
longitudinal for 2004-2020 when based on non-synthetic data. Data will
be updated as new data becomes available.
|
Longitudinal data are available for every year between 2017 and 2021.
|
Longitudinal data are available for every year between 2019 and 2021.
|
There are four datasets: (1) people’s personal preferences in autumn
2020; (2) people’s personal preferences in autumn 2021; (3) people’s
personal preferences in spring 2022; (4) people’s social preferences in
spring 2022. Dataset (2) includes returning respondents from (1).
Otherwise, the observations are independent.
|
Data collected: summer - autumn 2022.
|
Data collected: late 2022.
|
Missing Data
|
The level of missing information for a particular variable is determined
by the levels of missingness in the underlying Understanding Society
survey.
|
Level of missing data determined by data availability. Older data not
always comparable across time or form for some indicators.
|
Missing data were imputed using a sophisticated multiple imputation
algorithm. In some cases, only cross-sectional measurements were
available, which were carried forward or backward. For example, local
elections (Indicator 6B) did not take place every single year.
|
Missing data were imputed using a sophisticated multiple imputation
algorithm. In some cases, only cross-sectional measurements were
available, which were carried forward or backward. For example, local
elections (Indicator 6B) did not take place every single year.
|
Currently not available.
|
Currently not available.
|
Currently not available.
|
Examples / Link with Other Models and Data
|
The Synthetic Population is used as the underlying data source in
several SIPHER models. These include: (1) dynamic systems model, (2)
static and dynamic microsimulation and (3) decision support tool.
Information covered in the Synthetic Population can be extended by
adding additional variables from other data sources. These could be
datasets that are not publicly available. In addition, the SIPHER
Synthetic Population can be used to derive more complex concepts such as
the ‘Equivalent Income’ - a variable which is calculated using the
‘Equivalent Income Calculator’ method.
|
A portfolio of area-level summary indicators on mortality, health, and
composite indicators that combine information on mortality and health.
These indicators can be attached as area-level indicators to the SIPHER
Synthetic Population. In addition, health measures are used in the Local
Authority clustering work, as well as in the Dynamic Systems model.
|
The dataset is currently used in a k-means clustering machine learning
study. The primary aim of this study is to identify clusters of similar
local authorities and to examine the association of each cluster with a
number of health outcomes. In another application, we explore the
association between Quality-Adjusted Life Expectancy (QALE) and
indicators of economic inclusion.
|
The dataset relies on the SIPHER Synthetic Population for 8/13 of the
inclusive economy indicators. It also includes several demographic and
wellbeing indicators in the form of the Shortform-12 (SF-12) measures,
physical and mental components scores (PCS and MCS).
|
The estimated parameters can be used to calculate an equivalent income
variable in the Synthetic Population.
|
The estimated inequality aversion parameter is used to identify the
optimal trade-off between maximising wellbeing and reducing inequality
in the decision support tools.
|
Currently not available.
|
Software Requirements
|
Requires a software that can handle the size of the data file, such as R
or Python. An interactive Rshiny dashboard allows a code-free
exploration of an aggregated version: https://sipherdashboard.sphsu.gla.ac.uk/
|
Requires a software that can handle the size of the data file, such as R
or Python
|
Requires a software that loads data, such as Excel, R, or Python. Access
SIPHER Inclusive Economy Dataset Interactive Map - https://www.gla.ac.uk/research/az/sipher/products/inclusiveeconomydataset/ieinteractivemap/#d.en.1054750.
|
Requires a software that loads data, such as Excel, R, or Python. Access
SIPHER Inclusive Economy Dataset Interactive Map - https://www.gla.ac.uk/research/az/sipher/products/inclusiveeconomydataset/ieinteractivemap/#d.en.1054750
|
The main choice data and respondent background variables are saved in
Stata and require a software that can read in Stata files.
|
The main trade-off data and respondent background variables are saved in
Stata and require a software that can read in Stata files.
|
Currently saved in Stata and requires a software that can read in Stata
files.
|
Data Requirements / Restrictions
|
The SIPHER Synthetic Population is available for full indeopendent use
via the UK Data Service’s Curated Data Collection. To set up the SIPHER
Synthetic Population, it is required to link the synthetic population
file (UK Data Service ID: SN9277) with Understanding Society survey data
(UK Data Service ID: SN6614) - as is typically done for area-level
linkages of surveys. Both datasets are subject to the General End-User
License Agreement terms and conditions, and can be downloaded without
any costs directly from the website of UK Data Service.
|
For key indicators such as QALE, Life Expectancy, and Lifespan Variation
it is planned that a final version of the dataset and the underlying
code will be made publicly available. In order to fully reproduce health
measures requiring the Synthetic Population, access to the Synthetic
Population is required.
|
The final dataset is available as an open access resource.
|
The final dataset is available as an open access resource.
|
Currently not available.
|
Currently not available.
|
Currently not available.
|
Data / Code Available
|
Due to the underlying license agreement, the dataset cannot be shared as
an open access version. However, the dataset can be downloaded through
the UK Data Service website, after acceptance of the General End-User
license terms and conditions: https://beta.ukdataservice.ac.uk/datacatalogue/studies/study?id=9277#!/details
In addition, we have made a wealth of supplementary material available,
documenting creation, validation, linkage, and exploration of the
dataset: https://reshare.ukdataservice.ac.uk/856754/
|
Work in progress, final dataset will be made publicly available.
Pipeline of code for estimation of Quality-Adjusted Life Expectancy
(QALE) is available.
|
The final dataset and additional documentation are publicly available
via the Open Science Framework: https://osf.io/vnsur/.
|
The final dataset and additional documentation are publicly available
via the Open Science Framework: https://osf.io/s24ye/.
|
Currently not available.
|
Currently not available.
|
Currently not available. The dataset will be archived. There is no
associated code.
|
Training
|
We have provided a comprehensiv, open access User Guide for our SIPHER
Synthertic Population. The User Guide provides background information
and explains how to setup up the data and analyse it swiftly: https://doc.ukdataservice.ac.uk/doc/9277/mrdoc/pdf/9277_user_guide_r4_clean.pdf
|
Online pipeline example via GitHub.
|
The data is accompanied by a comprehensive data dictionary which
provides context relating to all variables included.
|
The data is accompanied by a comprehensive data dictionary which
provides context relating to all variables included.
|
Currently not available.
|
Currently not available.
|
Currently not available.
|
Additional Resources
|
SIPHER Synthetic Population for Individuals in Great Britain, 2019-2021
(UK Data Service Curated Collection, SN9277): https://beta.ukdataservice.ac.uk/datacatalogue/studies/study?id=9277#!/details
Comprehensive User Guide: https://doc.ukdataservice.ac.uk/doc/9277/mrdoc/pdf/9277_user_guide_r4_clean.pdf
Supplementary Resources: https://reshare.ukdataservice.ac.uk/856754/ Paper
describing the statistical creation process: https://www.nature.com/articles/s41597-022-01124-9
Understanding Society Survey Blog: https://www.understandingsociety.ac.uk/news/2024/07/10/building-synthetic-population-data/
Introduction Video: https://www.youtube.com/watch?v=CkiORY7GSLc
|
Choosing the SIPHER health Indicators Report: https://www.gla.ac.uk/media/Media_970682_smxx.pdf and
QALE exemplar: https://github.com/AndreasxHoehn/QALE_Exemplar Some
indicators are available through the SIPHER Synthetic Population
Dashboard: https://sipherdashboard.sphsu.gla.ac.uk/
|
Explore - https://www.gla.ac.uk/research/az/sipher/products/inclusiveeconomydataset/
SIPHER Inclusive Economy Indicator Set: Technical paper [PDF] - https://www.gla.ac.uk/media/Media_970680_smxx.pdf SIPHER
Inclusive Economy Indicator Set: Summary [PDF] - https://www.gla.ac.uk/media/Media_1029792_smxx.pdf
Estimating quality-adjusted life expectancy (QALE) for local authorities
in Great Britain and its association with indicators of the inclusive
economy: a cross-sectional study BMJ Open March 2024 - https://bmjopen.bmj.com/content/14/3/e076704 Measuring
the Inclusive Economy Blog - https://www.gla.ac.uk/research/az/sipher/sharingourevidence/blog/headline_1049629_en.html
|
Explore - https://www.gla.ac.uk/research/az/sipher/products/inclusiveeconomydataset/
SIPHER Inclusive Economy Indicator Set: Technical paper [PDF] - https://www.gla.ac.uk/media/Media_970680_smxx.pdf SIPHER
Inclusive Economy Indicator Set: Summary [PDF] - https://www.gla.ac.uk/media/Media_1029792_smxx.pdf
Inclusive Economy Indicators for Electoral Wards Blog - https://www.gla.ac.uk/research/az/sipher/sharingourevidence/blog/headline_1132578_en.html
|
Explore - https://www.gla.ac.uk/research/az/sipher/products/sipher-7wellbeingindicators/
Blog: Collasping multi-dimensional wellbeing into equivalent income -
March 2022 https://www.gla.ac.uk/research/az/sipher/sharingourevidence/blog/headline_1019908_en.html
|
Currently not available.
|
Currently not available.
|
Contact
|
sipher@glasgow.ac.uk
|
sipher@glasgow.ac.uk
|
sipher@glasgow.ac.uk
|
sipher@glasgow.ac.uk
|
sipher@glasgow.ac.uk
|
sipher@glasgow.ac.uk
|
sipher@glasgow.ac.uk
|