Insights from the 2
nd
South African Comparative Risk Assessment Study
Estimating health risk factors distributions from
sparse and heterogeneous data sources
Annibale Cois
Division of Health Systems and Public Health
Stellenbosch University
ILD Public Lecture
Thu 07/21/22 5:30 PM
QM369 Queen Mary Court
Greenwich Campus
Data
:
multiple surveys with self
-
report data
consumption recorded as
‘
intervals
’
of number of drinks
Example
5
:
Average consumption of alcohol among drinkers
Resources
:
Administrative data on alcohol sales and import
/
export
Estimates of self
-
produced alcohol
Evidence of the shape of the distribution of alcohol consumption across populations
Main
(
untestable
)
assumptions
:
Level of underreporting of alcohol consumption is approximately constant across ages
,
sexes
Smoothness of the consumption trend across time and age
Long
-
term consumption above
150
g
/
day are very unlikely
Probst
C
,
Shuper
PA
,
Rehm
,
J
.
(
2017
).
Addiction
,
112
:
705
–
710
Distribution shape
Interval data
Total consumption matched admin data
+
self
-
production
Bayesian model
Consumption above
150
g
/
day are
unlikely
Example
1
:
Prevalence of IDA among women of reproductive age
2000
data
VS
Cross
-
walk
/
1
Resources
:
cross
-
walk equation
Estimates of
%
of pregnant women from DHS
Main
(
untestable
)
assumptions
:
what is valid in the
“
global
”
GBD population is also valid in SA
Stevens GA
,
Finucane MM et al
.
Lancet
Glob Health
.
2013
Jul
;
1
(
1
):
e
16
-
25
.
2003
data
:
“
quantity questions
”
alone
(
as compared to
“
frequency
-
quantity
”
questions
)
Does not differentiate between fruit
&
fruit juice
Example
2
:
Daily consumption of fruit
&
Vegetables in the adult population
+
Resources
:
2012
survey includes both frequency and quantity questions
0
0
.
2
0
.
4
0
.
6
0
.
8
1
1
.
2
18
-
24
25
-
34
35
-
44
45
-
54
55
-
64
65
-
74
75
+
Males
Females
Correction factor
(
vegetables
)
0
0
.
05
0
.
1
0
.
15
0
.
2
0
.
25
0
.
3
0
.
35
0
.
4
18
-
24
25
-
34
35
-
44
45
-
54
55
-
64
65
-
74
75
+
Males
Females
Correction factor
(
fruit
)
Main
(
untestable
)
assumptions
:
no change in ratio fruit
/
fruit juice between
2003
and
2012
no difference on the effect of question wording between
2012
and
2003
Cross
-
walk
/
2
Data
:
multiple surveys
(
various populations
)
different diagnostic criteria
data generally not available for FPG
Example
3
:
Distribution of FPPG in the adult population
Resources
:
Known relationships between different diagnostic criteria
Shape of the FPG distribution
Cross
-
walk equation between mean FPG and diabetes prevalence
Main
(
untestable
)
assumptions
:
what is valid in the
“
global
”
GBD population is also valid in SA
diagnostic criteria are actually equivalent
the distribution of FPG in the population is log
-
normal
1
.
Model diabetes prevalence for small scale studies
(
age
*
population group
fixed effects
)
2
.
Cross
-
walk equation from prevalence of diabetes and mean FPG
2
.
Equivalence of diagnostic criteria
+
3
.
evidence of the distribution shape
Data
:
different number of measurements across surveys
Cois A
.
Understanding Blood Pressure Dynamics in
the South African Population
:
A Latent Variables
Approach to the Analysis and Comparison of Data
from Multiple Surveys
.
2017
.
http
://
hdl
.
handle
.
net
/
11427
/
25196
Example
4
:
Distribution of systolic Blood pressure in the adult population
Resources
:
relationship between readings form
“
complete
”
surveys
Another form of
“
internal cross
-
walk
”
,
but the cross
-
walk is
estimates concurrently with the rest of the model
SBP
1
Reading
1
Reading
2
Reading
3
Reading
1
Reading
1
Reading
3
SBP
2
Structural
Equation
Model
Missing data
ML estimator
Main
(
untestable
)
assumptions
:
the relationship between the
“
true
’
BP and the sequence of readings is constant across surveys
Assumptions regarding the joint distribution of variables are correct
Credits
The
SACRA2 study was
funded by the South African Medical Research
Council’s Flagships Awards Project (SAMRC
-
RFA
-
IFSP
-
01
-
2013/SA CRA 2).
D Bradshaw, JD Joubert, V Pillay
-
van
Wyk
, R Pacella, R
Matzopoulos
and all members of the SACRA2 collaborative group
The comparative risk
assessment study
Examples
A “principled” meta
-
regression approach
Sparse and heterogeneous
data sources
Notes, questions,
comments
The comparative risk
assessment study
A “principled” meta
-
regression approach
Notes, questions,
comments
Examples
Sparse and heterogeneous
data sources
Comparative
Risk
Assessment
The
CRA
method
is
a
standardised
and
systematic
approach
to
estimate
the
contribution
of
individual
risk
factors
to
the
observed
burden
of
disease
.
It
compares
the
observed
burden
of
disease
due
to
an
exposure
with
a
hypothetical
distribution
in
a
population
,
making
use
of
the
level
of
exposure
in
the
population
and
the
epidemiological
relationship
between
a
risk
factor
and
health
outcomes
.
Murray CJ,
Ezzati
M
et al.
Popul
. Health
Metr
. 2003;1(1):1
RR
DEATHS
DALYs
ATTRIBUTABLE DALYs
ATTRIBUTABLE DEATHS
The 2
nd
Comparative Risk Assessment for South Africa
aimed to estimate the temporal trend of
burden attributable to a
series of 18 risk factors
between 2000 and 2012.
NCD
cluster
•
High
systolic
blood
pressure
•
High
body
mass
index
•
High
fasting
plasma
glucose
•
High
LDL
cholesterol
•
Low
fruit
intake
•
Low
vegetable
intake
•
High
sodium
intake
•
Low
physical
activity
•
Tobacco
smoking
•
Alcohol
consumption
Addictive substance use
Undernutrition cluster
•
Childhood
undernutrition
•
Iron
deficiency
•
Unsafe
sex
•
Interpersonal
violence
Social behaviour cluster
•
Ambient
air
pollution
-
PM
2
.
5
•
Ambient
air
pollution
-
ozone
•
Household
air
pollution
•
Unsafe
water,
sanitation
and
hygiene
Environmental cluster
http://www.samj.org.za/index.php/samj
SACRA
2
1998
2016
2000
2012
?
1998
2016
2012
2000
Dutton, D.J., McLaren, L.
BMC Public Health
14,
430 (2014)
Small sample
Large sample
Biased sample
Data sources in
general differs in
terms of …
Target population
Sample realisation
Sampling weights,
calibration
Diagnostic criteria
Devices/measurement protocol
Sampling error
Representativeness
Measurement
Olsen SJ,
Azziz
-
Baumgartner E et al. MMWR
Morb
Mortal
Wkly
Rep. 2020
Sep 18;69(37):1305
-
1309.
doi
: 10.15585/mmwr.mm6937a6. PMID:
32941415; PMCID: PMC7498167.
E1
RECODING, UNIFORM CLEANING,…
SOURCE
-
SPECIFIC ESTIMATION
(tacking into account sampling
strategy)
E2
E3
1
2
3
Expert opinions
“shape” of temporal trends
“shape” of age trends
Relationships with other
variables/risk factors
………………
ESTIMATED
MODEL
FINAL
ESTIMATES
PREDICTION
RECODING, UNIFORM CLEANING,…
SOURCE
-
SPECIFIC ESTIMATION
(tacking into account sampling
strategy)
WEIGHTED ESTIMATION
E1
RECODING, UNIFORM CLEANING,…
SOURCE
-
SPECIFIC ESTIMATION
(tacking into account sampling
strategy)
E2
E3
1
2
3
Expert opinions
“shape” of temporal trends
“shape” of age trends
Relationships with other
variables/risk factors
………………
ESTIMATED
MODEL
FINAL
ESTIMATES
PREDICTION
RECODING, UNIFORM CLEANING,…
SOURCE
-
SPECIFIC ESTIMATION
(tacking into account sampling
strategy)
WEIGHTED ESTIMATION
Principled
Explicit
Quantify uncertainty
beyond random error
As much as possible!
Administrative
data
Relationships with other
variables/risk factors
Experiences/estimates
from other populations
Relationships between
epidemiological measures
Smooth (slow)
transition
Expert
opinions
Example 1:
Prevalence of IDA among women of reproductive age
2000 data
VS
Cross
-
walk/1
Resources:
cross
-
walk equation
Estimates of % of pregnant women from DHS
Main (untestable) assumptions:
what is valid in the “global” GBD population is also valid in SA
Stevens GA, Finucane MM et al. Lancet
Glob Health. 2013 Jul;1(1):e16
-
25.
2003 data
:
“quantity questions” alone (as compared to “frequency
-
quantity” questions)
Does not differentiate between fruit & fruit juice
Example 2:
Daily consumption of fruit & Vegetables in the adult population
+
Resources:
2012 survey includes both frequency and quantity questions
0
0.2
0.4
0.6
0.8
1
1.2
18-24
25-34
35-44
45-54
55-64
65-74
75+
Males
Females
Correction factor (vegetables)
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
18-24
25-34
35-44
45-54
55-64
65-74
75+
Males
Females
Correction factor (fruit)
Main (untestable) assumptions:
no change in ratio fruit/fruit juice between 2003 and 2012
no difference on the effect of question wording between2012 and 2003
Cross
-
walk/2
Data: multiple surveys (various populations)
different diagnostic criteria
data generally not available for FPG
Example 3:
Distribution of FPPG in the adult population
Resources:
Known relationships between different diagnostic criteria
Shape of the FPG distribution
Cross
-
walk equation between mean FPG and diabetes prevalence
Main (untestable) assumptions:
what is valid in the “global” GBD population is also valid in SA
diagnostic criteria are actually equivalent
the distribution of FPG in the population is log
-
normal
1.
Model diabetes prevalence for small scale studies (age*population group
fixed effects)
2.
Cross
-walk equation from prevalence of diabetes and mean FPG
2.
Equivalence of diagnostic criteria +
3.
evidence of the distribution shape
Resources:
relationship between readings form “complete” surveys
Another form of “internal cross
-
walk”, but the cross
-
walk is
estimates concurrently with the rest of the model
SBP
1
Reading
1
Reading
2
Reading
3
Reading
1
Reading
1
Reading
3
SBP
2
Structural
Equation
Model
Missing data
ML estimator
Main (untestable) assumptions:
the relationship between the “true’ BP and the sequence of readings is constant across surveys
Assumptions regarding the joint distribution of variables are correct
Data:
multiple surveys with self
-
report data
consumption recorded as ‘intervals’ of number of drinks
Example 5:
Average consumption of alcohol among drinkers
Resources:
Administrative data on alcohol sales and import/export
Estimates of self
-
produced alcohol
Evidence of the shape of the distribution of alcohol consumption across populations
Main (untestable) assumptions:
Level of underreporting of alcohol consumption is approximately constant across ages, sexes
Smoothness of the consumption trend across time and age
Long
-
term consumption above 150 g/day are very unlikely
Probst
C,
Shuper
PA, Rehm,
J.
(2017).
Addiction
,
112:
705
–
710
Distribution shape
Interval data
Total consumption matched admin data + self
-
production
Bayesian model
Consumption above
150 g/day are
unlikely
Data:
different number of measurements across surveys
Cois A.
Understanding Blood Pressure Dynamics in
the South African Population: A Latent Variables
Approach to the Analysis and Comparison of Data
from Multiple Surveys
. 2017.
http://hdl.handle.net/11427/25196
Example 4:
Distribution of systolic Blood pressure in the adult population
Quality effects
weights
Quality Score
Variance
Relative bias
across age/sex groups
Unrecorded
consumption
Average daily
consumption
(individual)
Analytical solution (rare)
Resampling
Montecarlo
Quantifying uncertainty
Conclusions?
Look beyond your dataset
Make your evidence (and
assumptions) explicit
Formalise uncertainty
Sampling error is not everything
Cois, A.,
Matzopoulos
, R., Pillay
-
van
Wyk
, V.
et al.
Popul
Health
Metrics
19,
43 (2021
Recalibration of sampling weights
Quality effects
weights
Quality Score
Variance
Insights from the 2
nd
South African Comparative Risk Assessment Study
Estimating health risk factors distributions from
sparse and heterogeneous data sources
Annibale Cois
Division of Health Systems and Public Health
Stellenbosch University
ILD Public Lecture
Thu 07/21/22 5:30 PM
QM369 Queen Mary Court
Greenwich Campus
Data
:
multiple surveys with self
-
report data
consumption recorded as
‘
intervals
’
of number of drinks
Example
5
:
Average consumption of alcohol among drinkers
Resources
:
Administrative data on alcohol sales and import
/
export
Estimates of self
-
produced alcohol
Evidence of the shape of the distribution of alcohol consumption across populations
Main
(
untestable
)
assumptions
:
Level of underreporting of alcohol consumption is approximately constant across ages
,
sexes
Smoothness of the consumption trend across time and age
Long
-
term consumption above
150
g
/
day are very unlikely
Probst
C
,
Shuper
PA
,
Rehm
,
J
.
(
2017
).
Addiction
,
112
:
705
–
710
Distribution shape
Interval data
Total consumption matched admin data
+
self
-
production
Bayesian model
Consumption above
150
g
/
day are
unlikely
Example
1
:
Prevalence of IDA among women of reproductive age
2000
data
VS
Cross
-
walk
/
1
Resources
:
cross
-
walk equation
Estimates of
%
of pregnant women from DHS
Main
(
untestable
)
assumptions
:
what is valid in the
“
global
”
GBD population is also valid in SA
Stevens GA
,
Finucane MM et al
.
Lancet
Glob Health
.
2013
Jul
;
1
(
1
):
e
16
-
25
.
2003
data
:
“
quantity questions
”
alone
(
as compared to
“
frequency
-
quantity
”
questions
)
Does not differentiate between fruit
&
fruit juice
Example
2
:
Daily consumption of fruit
&
Vegetables in the adult population
+
Resources
:
2012
survey includes both frequency and quantity questions
0
0
.
2
0
.
4
0
.
6
0
.
8
1
1
.
2
18
-
24
25
-
34
35
-
44
45
-
54
55
-
64
65
-
74
75
+
Males
Females
Correction factor
(
vegetables
)
0
0
.
05
0
.
1
0
.
15
0
.
2
0
.
25
0
.
3
0
.
35
0
.
4
18
-
24
25
-
34
35
-
44
45
-
54
55
-
64
65
-
74
75
+
Males
Females
Correction factor
(
fruit
)
Main
(
untestable
)
assumptions
:
no change in ratio fruit
/
fruit juice between
2003
and
2012
no difference on the effect of question wording between
2012
and
2003
Cross
-
walk
/
2
Data
:
multiple surveys
(
various populations
)
different diagnostic criteria
data generally not available for FPG
Example
3
:
Distribution of FPPG in the adult population
Resources
:
Known relationships between different diagnostic criteria
Shape of the FPG distribution
Cross
-
walk equation between mean FPG and diabetes prevalence
Main
(
untestable
)
assumptions
:
what is valid in the
“
global
”
GBD population is also valid in SA
diagnostic criteria are actually equivalent
the distribution of FPG in the population is log
-
normal
1
.
Model diabetes prevalence for small scale studies
(
age
*
population group
fixed effects
)
2
.
Cross
-
walk equation from prevalence of diabetes and mean FPG
2
.
Equivalence of diagnostic criteria
+
3
.
evidence of the distribution shape
Data
:
different number of measurements across surveys
Cois A
.
Understanding Blood Pressure Dynamics in
the South African Population
:
A Latent Variables
Approach to the Analysis and Comparison of Data
from Multiple Surveys
.
2017
.
http
://
hdl
.
handle
.
net
/
11427
/
25196
Example
4
:
Distribution of systolic Blood pressure in the adult population
Resources
:
relationship between readings form
“
complete
”
surveys
Another form of
“
internal cross
-
walk
”
,
but the cross
-
walk is
estimates concurrently with the rest of the model
SBP
1
Reading
1
Reading
2
Reading
3
Reading
1
Reading
1
Reading
3
SBP
2
Structural
Equation
Model
Missing data
ML estimator
Main
(
untestable
)
assumptions
:
the relationship between the
“
true
’
BP and the sequence of readings is constant across surveys
Assumptions regarding the joint distribution of variables are correct
Credits
The
SACRA2 study was
funded by the South African Medical Research
Council’s Flagships Awards Project (SAMRC
-
RFA
-
IFSP
-
01
-
2013/SA CRA 2).
D Bradshaw, JD Joubert, V Pillay
-
van
Wyk
, R Pacella, R
Matzopoulos
and all members of the SACRA2 collaborative group
Thank you!
1
Title
CRA_0
CRA_1
CRA_2
CRA_3
DATA_0
DATA_1
DATA_2
DATA_3
DATA_4
APPR_0
APPR_1
APPR_2
APPR_3
APPR_4
APPR_5
APPR_6
APPR_7
EX_0
EX_1A
EX_1B
EX_2A
EX_2B
EX_3A
EX_3B
EX_4A
EX_4B
EX_5A
EX_5B
EX_5C
EX_5D
EX_5E
EX_6
EX_7
EX_7
CON_0
Thanks
Credits