Sample Size Considerations for Establishing Norms in Low- and Middle-Income Countries for a Brief, Tablet-Based Neuropsychological Test Battery.

Reuben Robbins, Columbia University/NYSPI, New York, United States
Davidson Leslie, Columbia University, New York, United States
Jun Liu, NYSPI, New York, United States
Curtis Dolezal, NYSPI, New York, United States
Mei Tan, Columbia University, New York, United States
Gavin George, University of KwaZulu Natal, Durban, South Africa
Rachel Gruver, Columbia University, New York, United States
Adele Munsami, University of KwaZulu Natal, Durban, South Africa
Christopher Desmond, Wits University, Johannesburg, South Africa


Clear guidance on optimal sample sizes for neuropsychological (NP) test norms is limited, though samples of >10,000 for traditional norming methods, and ~1,000 for regression-based norms (RBNs), have been proposed. Such large sample size requirements have made it challenging to establish norms for NP tests in low- and middle-income countries (LMICs). Many LMICs generate norms from small convenience samples, leading to imprecise scores that hamper clinical interpretation (e.g., a T-score of 45 with a 95% confidence interval [CI] of +/-10 could indicate average to borderline performance, T=35). This study examined the precision of norms generated from South African adolescents across a range of sample sizes for a tablet-based battery of NP tests (NeuroScreen).

Participants and Methods:

NeuroScreen was administered to 1,098 adolescents 13-18 years of age enrolled in a long-standing community-based cohort study (Asenze) in a peri-rural area of KwaZulu-Natal, South Africa. Standard RBN procedures were used to generate age, gender, and education adjusted T-scores using the full sample for all 12 NeuroScreen tests along with a global T-score (GTS; mean of all tests). Random subsamples were selected in increments of 100, from n = 100 to 800. Each sample was randomly selected 1,000 times. For each random sample, a new set of RBNs and accompanying GTSs were generated and compared to the GTS from the full sample. Data driven 95% CIs were then computed for select GTSs and test scores.


Mean age was 15.80; 50.20% were female; mean years of education was 10.0. Mean differences in GTSs between recreated subsamples and the full sample were the largest for sample sizes < 400. Examples of norm precision by sample size are as follows: a GTS of 35.67 had a 95% CI of +/-1.79 at n = 100, +/- 0.88 at n = 400, and +/- 0.50 at n = 800. For a GTS of 44.17, the CI was +/- 1.21 at n = 100, +/- 0.50 at n = 400, and +/- 0.25 at n = 800. An individual T-score of 32 from a timed test at n = 100 could range from 25-37; 29-35 at n = 400; and 31-34 at n = 800. An individual T-score of 32 from a test with a discrete score could range from 26-37 when n = 100; 29-34 when n = 400; and 31-33 at n = 800.


As expected, RBNs with larger samples sizes produced the most precise global and individual T-scores. However, RBNs generated from sample sizes well below 1,000 were also able to produce scores with similar precision as larger samples. GTSs were more precise across samples than individual T-scores. While these results are based on a specific battery of tests, a constrained age range (13-18), and participants not specifically sampled for test norms generation, limiting generalizability, they do provide guidance on optimal sample sizes for more precise norms in LMIC settings. Clinically meaningful RBNs may be generated with samples <1,000. Additional research is needed to guide optimal sample sizes for generating norms in South Africa and other LMICs.

Category: Assessment/Psychometrics/Methods (Child)

Keyword 1: normative data
Keyword 2: test development
Keyword 3: psychometrics