ISSN 2757-8135 | E-ISSN 2757-9816
Use of large language models in turkish information materials for glaucoma patient education: evaluation of readability, accuracy and comprehensiveness [Eur Eye Res]
Eur Eye Res. 2026; 6(1): 60-69 | DOI: 10.14744/eer.2025.93723

Use of large language models in turkish information materials for glaucoma patient education: evaluation of readability, accuracy and comprehensiveness

Ali Dal1, Murat Erdag2, Betül Dikme1, Bünyamin Kutluksaman1
1Department of Ophthalmology, Tayfur Ata Sokmen Faculty of Medicine, Mustafa Kemal University, Hatay, Turkiye
2Department of Ophthalmology, Fırat Faculty of Medicine, Fırat University, Elazığ, Turkiye

PURPOSE: This study aims to evaluate the readability of the Turkish Ophthalmology Association’s (TOA) glaucoma patient education brochure and to assess the capabilities of GPT-4.0, Gemini, and DeepSeek in generating Turkish patient education materials with respect to readability, accuracy, and comprehensiveness.
METHODS: The TOA’s patient education brochure on glaucoma was evaluated for readability using the Ateşman and Bezirci-Yilmaz formulae. The questions from the TOA booklets were presented independently to the GPT-4.0, Gemini, and DeepSeek models. The replies generated by these models were readability tested using the same formulas. In addition, qualified ophthalmologists evaluated the accuracy and comprehensiveness of the artificial intelligence (AI)-generated responses. AI-generated responses were converted to Q1 and Q2 formats to test text simplification. These versions were reevaluated for readability, accuracy, and comprehensiveness to see if simplification increased intelligibility without affecting medical accuracy.
RESULTS: The TOA brochure had a higher readability level than the recommended patient education standard. Bezirci-Yilmaz scores showed that Gemini and DeepSeek had significantly lower readability than the TOA brochure (p=0.007 and p=0.033, respectively), whereas GPT-4.0 showed no significant difference (p=0.077). Ateşman scores indicated no significant difference between TOA and AI-generated texts. Gemini showed significantly higher comprehensiveness than GPT-4.0 (p=0.042), whereas accuracy scores did not differ significantly among the models. Readability improved for Gemini following simplification (p=0.013 and p=0.005, respectively), whereas GPT 4.0 and DeepSeek remained unchanged. After simplification, the comprehensiveness score decreased for Gemini, whereas GPT-4.0 and DeepSeek maintained their comprehensiveness.
CONCLUSION: While large language models hold promise for use as glaucoma patient information materials, it is essential to rigorously evaluate the accuracy and comprehensiveness of the content they produce.

Keywords: Glaucoma, large language models, readability.


Corresponding Author: Ali Dal, Türkiye
Manuscript Language: English
×
APA
NLM
AMA
MLA
Chicago
Copied!
CITE