PURPOSE: This study aims to evaluate the readability of the Turkish Ophthalmology Association’s (TOA) glaucoma patient education brochure and to assess the capabilities of GPT-4.0, Gemini, and DeepSeek in generating Turkish patient education materials with respect to readability, accuracy, and comprehensiveness.
METHODS: The TOA’s patient education brochure on glaucoma was evaluated for readability using the Ateşman and Bezirci-Yilmaz formulae. The questions from the TOA booklets were presented independently to the GPT-4.0, Gemini, and DeepSeek models. The replies generated by these models were readability tested using the same formulas. In addition, qualified ophthalmologists evaluated the accuracy and comprehensiveness of the artificial intelligence (AI)-generated responses. AI-generated responses were converted to Q1 and Q2 formats to test text simplification. These versions were reevaluated for readability, accuracy, and comprehensiveness to see if simplification increased intelligibility without affecting medical accuracy.
RESULTS: The TOA brochure had a higher readability level than the recommended patient education standard. Bezirci-Yilmaz scores showed that Gemini and DeepSeek had significantly lower readability than the TOA brochure (p=0.007 and p=0.033, respectively), whereas GPT-4.0 showed no significant difference (p=0.077). Ateşman scores indicated no significant difference between TOA and AI-generated texts. Gemini showed significantly higher comprehensiveness than GPT-4.0 (p=0.042), whereas accuracy scores did not differ significantly among the models. Readability improved for Gemini following simplification (p=0.013 and p=0.005, respectively), whereas GPT 4.0 and DeepSeek remained unchanged. After simplification, the comprehensiveness score decreased for Gemini, whereas GPT-4.0 and DeepSeek maintained their comprehensiveness.
CONCLUSION: While large language models hold promise for use as glaucoma patient information materials, it is essential to rigorously evaluate the accuracy and comprehensiveness of the content they produce.
Keywords: Glaucoma, large language models, readability.