Poster Session 04 Program Schedule
02/15/2024
12:00 pm - 01:15 pm
Room: Majestic Complex (Posters 61-120)
Poster Session 04: Neuroimaging | Neurostimulation/Neuromodulation | Teleneuropsychology/Technology
Final Abstract #68
Does Generative Artificial Intelligence Pose a Threat to the Test of Memory Malingering?
Shannon Lavigne, UT Health San Antonio, San Antonio, United States Jeremy Davis, UT Health San Antonio, San Antonio, United States
Category: Teleneuropsychology/ Technology
Keyword 1: technology
Keyword 2: forensic neuropsychology
Objective:
Numerous articles indicate the importance of maintaining test security, especially within forensic neuropsychology. With the growth of generative artificial intelligence (AI) chatbots, written content on a wide range of topics can be produced in natural language by typing questions and prompts into an online interface. Although AI-generated content can be error prone, it remains unknown whether it might divulge information that would compromise the security of psychological tests. Given the popularity of the interfaces and the ease of use, it is important to understand the level of information that might be shared with patients and examinees. This study examined AI-generated responses regarding the Test of Memory Malingering (TOMM), the most commonly used performance validity test (PVT).
Participants and Methods:
Two AI chatbots (ChatGPT and Bard) were asked a series of questions about the TOMM. Questions were initially broad (e.g., “Tell me about the TOMM”) and became focused based on the answers provided by the chat generator (e.g., “What is a good score”). Follow-up questions included asking about performance validity and requesting more information about the tests based on generated responses.
Results:
Both chatbots produced mixed content containing facts and errors about the TOMM. ChatGPT was more accurate than Bard overall. ChatGPT described the TOMM as a measure involving two learning trials of 50 drawings of common objects the examinee is asked to remember, but it described the Retention trial as if it were the only one with recognition format: “50 target items (the original items from the learning trials) mixed with 50 new or distractor items.” ChatGPT accurately described scoring as the number of targets identified but noted an erroneous cutoff (≥40) as evidence the examinee “performed well on the recognition trial.” Bard described the TOMM as a “forced-choice PVT that is designed to assess whether the individual is malingering memory impairment” but then mentioned that it “involves identifying faces that have been previously presented.” Bard described TOMM scoring as “based on the number of correct identifications” and indicated that “if the individual scores below a certain cutoff score, it is considered to be a sign of malingering.” When asked about cutoff scores, Bard identified a score of 24 or lower as a sign of “malingering.” Bard also offered incongruent information in response to follow-up questions, indicating a series of 25 rather than 50 trials.
Conclusions:
AI chat generators produced discrepant information about the TOMM. ChatGPT included a detailed description of the test and scoring with an inaccurate cutoff. Bard generated substantial information about the TOMM, but most of it was inaccurate. It is concerning for test security that some accurate information was produced by both chatbots. Whether provision of inaccurate information offsets or counteracts the accurate information remains unknown. Future research might explore the impact of coaching on PVTs when the provided information involves a mix of fact and error. It is important for the field of neuropsychology to continue examining the impact of online sources of information readily available for patients and examinees.
|