INS NYC 2024 Program

Poster

Poster Session 04 Program Schedule

02/15/2024
12:00 pm - 01:15 pm
Room: Majestic Complex (Posters 61-120)

Poster Session 04: Neuroimaging | Neurostimulation/Neuromodulation | Teleneuropsychology/Technology


Final Abstract #67

AI Chat Generators: Is There a Threat to Test Security?

Shannon Lavigne, UT Health San Antonio, San Antonio, United States
Jeremy Davis, UT Health San Antonio, San Antonio, United States

Category: Forensic Neuropsychology/Malingering/Noncredible Presentations

Keyword 1: performance validity
Keyword 2: technology

Objective:

With technological advancements, our field has faced numerous challenges and threats related to test security. Recently, we shifted to a telehealth model in light of the COVID-19 pandemic, and this contributed to AACN opining a statement regarding the importance of neuropsychological and psychological test security and specific recommendations (Boone et al., 2022). Technology continues to advance and there has been a surge of artificial intelligence (AI) chat generators that are readily available to the public. This study sought to determine what information could be gleaned from AI chat generators regarding neuropsychological assessment and specifically about performance validity tests (PVTs) that could potentially threaten the utility of PVTs.

Participants and Methods:

Two AI chat generators (ChatGPT and Google’s Bard) were asked similar questions related to neuropsychological assessments and PVTs. Two different approaches were used when framing questions. One attempted to gather information about neuropsychological evaluations broadly (e.g., what to expect from a disability neuropsychology evaluation). The second utilized emotionally charged words (e.g., fake, fool, trick, lie, cheat) and asked how one should perform on the neuropsychological assessment. The wording for the initial questions was the same. Follow-up questions were similar in content, though used the verbiage that was produced by the AI generator in its initial response. The themes of these follow-up questions included asking about effort and requesting more information about the tests the answers had described. Examples of follow-up questions include: “how do they know I’m trying my best”, “what are good scores on these tests”, “tell me about WMT [Word Memory Test]”, amongst others. The same line of questioning was used for both AI chat generators.

Results:

ChatGPT produced more accurate and thorough responses than Bard. When asked broad questions and following up based on the generated responses, answers included specific test names (both standalone and embedded PVTs), test format, description of administration, and scoring (although largely nonspecific and inaccurate). When asked questions utilizing emotionally valanced words, generated answers focused on the unethical nature of that approach to neuropsychological testing and discouraged any attempts to “malinger.”

Conclusions:

Currently, AI chat generators offer a mixture of accurate and inaccurate information regarding neuropsychological testing and performance validity measures. The language that is used to ask questions is important in guiding what information is given by the generator. As technology and AI chat generators continue to advance and undergo updates, it will be important that our field determine how we want to approach this potential resource and determine ways to mitigate any test security risk.