INS NYC 2024 Program

Symposium

Symposia 3 Program Schedule

02/15/2024
09:00 am - 10:30 am
Room: West Side Ballroom - Salon 4

Symposia 3: Current Trends and Future Frontiers in Neuropsychology and Digital Technologies


Simposium #4

A Comparison of Acoustic Features Extracted from A Voice-recorded Cognitive Test in United States and Malaysia.

Preeti Sunderaraman, Boston University & Framingham Heart Study, Boston, United States
Huitong Ding, Boston University, Boston, United States
Cody Karjadi, Boston University & Framingham Heart Study, Boston, United States
Jinying Chen, Boston University, Boston, United States
James Glass, Massachusetts Institute of Technology, Cambridge, United States
Spencer Low, Boston University, Boston, United States
Sherral Devine, Boston University & Framingham Heart Study, Boston, United States
Roshaslina Rosli, University of Malay, Kuala Lumpur, Malaysia
Vijaya Kolachalama, Boston University, Boston, United States
Honghuang  Lin, Boston University, Boston, United States

Category: Teleneuropsychology/ Technology

Keyword 1: acoustics
Keyword 2: psychometrics
Keyword 3: teleneuropsychology

Objective:

Current cognitive assessments are language, culture and educationally biased. Speaking is a cognitively complex task and analysis of speech provides an alternative method for measuring cognition. Since most people speak and speech can be recorded easily via smartphones that are used by billions of people worldwide, using speech provides a unique approach to measuring cognition universally. Emerging evidence suggests that acoustic features extracted from speech, such as intonation, phonemes and modulation, may provide signals predictive of neurogenerative diseases and thus can be used to as an easy and low-cost method for identifying those evidencing early signs of cognitively related behaviors at a global level. However, what remains to be determined is whether there are acoustic features that are similar across cultures. In the current study, we investigated the factor structure of acoustic features extracted from digital voice (dVoice) data obtained from various speech-based cognitive tests in two culturally different cohorts. 

Participants and Methods:

The same protocol of smartphone recorded dVoice assessments were administered to cohorts in the US and Malaysia.The US-based cohort included 621 participants from the Framingham Heart Study (FHS), with mean age of 63 years (SD=10), 56% females, 60% college graduates, 90% Caucasians, dVoice data obtained in U.S. English. The Malaysian cohort included 270 participants from the Malaysia’s Transforming Cognitive Frailty into Later-Life Self-Sufficiency study (AGELESS), with mean age of 67 years (SD=8), 61% females, 57% college graduates, 88% ethnic Chinese, with dVoice data obtained in various languages including English, Mandarin and Cantonese. dVoice data collected from a Picture Description task was used for analysis given that this task had a relatively larger volume of speech-related output. For each participant, a standard set of 65 acoustic features were extracted by using OpenSMILE, an open-source software used for processing dVoice recordings. After adjusting for age and gender, the associations between each pair of acoustic features were first examined, and then exploratory factor analysis (EFA) was performed to identify underlying structure (i.e., latent variables) of the dVoice data. The top four factors in each cohort were selected and acoustic features with a factor loading of over .80 were examined for each factor. EFA was conducted for the FHS and AGELESS cohorts separately and then results were qualitatively compared. 

Results:

The heatmaps that showed correlations between pairs of acoustic features overall looked similar across the two cohorts with correlations ranging from -0.944 to 0.992 (FHS) and –0.981 to 0.997 (AGELESS). EFA revealed ten and eleven factors above the eigenvalue of 1 for the US-based and Malaysian cohorts, respectively. High-loading features (>.80) were a combination of prosodic and spectral features for Factor 1, spectral features for Factor 2, another combination of prosodic and spectral features for Factor 3, and sound quality for Factor 4. Qualitative comparison revealed that the high-loading features for top-4 factors overlapped across the two cohorts.  

Conclusions:

Findings revealed that, when dVoice was obtained from a similar test across different spoken languages, there was a high degree of similarity in feature correlation patterns and factor structures of the acoustic features. This indicates that acoustic features may be harmonized across languages and can be used as a common measure of cognition.