Enterprise AI Analysis: Ophthalmology
Benchmark analysis of myopia-related issues using large language models: a comparison of ChatGPT-40 and deepseek
This study evaluated ChatGPT-40 and DeepSeek's accuracy and comprehensiveness in answering 30 common myopia-related questions across six clinical domains. DeepSeek significantly outperformed ChatGPT-40 in overall accuracy (76.7% vs 43.3% 'Good' ratings). While both models provided comprehensive answers when accurate, performance declined for treatment-related queries, especially concerning commercial products and region-specific information. The study identified poor inter-rater agreement and non-normal score distributions. The findings suggest that localized LLMs like DeepSeek offer competitive advantages, emphasizing the need for ongoing refinement, data updates, and domain-specific fine-tuning for reliable AI in clinical communication.
Key Insights at a Glance
Our analysis reveals critical performance differences and key areas for AI application in healthcare.
Deep Analysis & Enterprise Applications
Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.
Overall Accuracy Distribution (Table 1)
DeepSeek significantly outperformed ChatGPT-40 in overall accuracy, achieving a 'Good' rating for 76.7% of responses compared to ChatGPT-40's 43.3%.
| AI Model | Good (%) | Fair (%) | Poor (%) |
|---|---|---|---|
| ChatGPT-40 | 13 (43.3) | 12 (40) | 5 (16.7) |
| DeepSeek | 23 (76.7) | 5 (16.7) | 2 (6.6) |
Accuracy Across Myopia-Related Domains (Table 2)
While both models showed strengths in foundational knowledge domains, performance consistently declined in treatment-related queries, particularly regarding specific products.
| Parameter | ChatGPT-40 (Poor/Fair/Good) | DeepSeek (Poor/Fair/Good) |
|---|---|---|
| Pathogenesis | 0/1/3 | >0/1/3 |
| Clinical feature | 0/1/0 | 0/0/1 |
| Diagnosis | 1/0/4 | 0/0/5 |
| Prevention | 1/2/3 | >0/3/3 |
| Treatment | 3/8/1 | 2/1/9 |
| Prognosis | 0/0/2 | 0/0/2 |
Comprehensiveness of 'Good' Responses (Table 4)
When responses were rated as 'Good' in accuracy, both models generally provided sufficiently detailed and comprehensive information to inform lay audiences. DeepSeek achieved slightly higher mean comprehensiveness scores overall.
0 DeepSeek Mean Comprehensiveness (N2)Questions with Low Accuracy in Treatment Domain (Table 3)
Both AI models struggled with up-to-date, region-specific information regarding commercial myopia control products and specific atropine concentrations, leading to 'Poor' ratings.
| Question | ChatGPT-40 Rating | DeepSeek Rating |
|---|---|---|
| Are there special types of glasses that can control myopia? | Fair | Fair |
| What is the role of defocus spectacle lenses in controlling myopia? What are the major existing brands? | Poor | Poor |
| What is the function of phototherapy devices in controlling myopia? Are they harmful to the eyes? | Fair | Poor |
Enterprise Process Flow (Study Design)
Our rigorous methodology ensured an independent and expert-driven evaluation of chatbot performance against real-world clinical concerns.
DeepSeek's Competitive Edge: The Power of Localized AI
DeepSeek, a Chinese-developed LLM, demonstrated superior overall accuracy compared to ChatGPT-40. This highlights the growing capability of domestically trained models to deliver reliable health information, particularly relevant in regions like East Asia where localized data and linguistic nuances can be crucial for performance. Its strong showing suggests that regional AI development can offer competitive advantages in specialized domains like ophthalmology.
Quote: "DeepSeek's stronger performance suggests that localized LLMs may offer competitive advantages."
Source: Study Conclusion
Calculate Your Potential AI Impact
Understand the tangible benefits of integrating advanced AI for patient education and support in your organization.
Your AI Implementation Roadmap
A strategic phased approach to integrating large language models effectively and responsibly into your healthcare operations.
Phase 1: Initial LLM Deployment & Patient Education
Integrate LLMs for basic patient query handling, health information dissemination, and awareness campaigns for conditions like myopia.
Phase 2: Continuous Data Updates & Knowledge Base Refinement
Establish mechanisms for regular updates to LLM training data, ensuring the inclusion of the latest clinical guidelines and emerging treatments.
Phase 3: Domain-Specific Fine-tuning & Localization
Tailor LLMs to specific medical specialties (e.g., ophthalmology) and regional contexts, addressing local treatment protocols and cultural nuances.
Phase 4: Rigorous Quality Control & Performance Benchmarking
Implement continuous evaluation frameworks, including expert review and patient feedback, to monitor accuracy, comprehensiveness, and safety.
Phase 5: Integration with Clinical Workflows & Feedback Loop
Seamlessly embed AI chatbots into existing healthcare platforms, enabling clinicians to provide feedback for ongoing model improvement and adaptation.
Ready to Transform Your Operations with AI?
Leverage cutting-edge AI to enhance patient engagement, improve information accuracy, and drive better health outcomes.