Skip to main content
Enterprise AI Analysis: Benchmark analysis of myopia-related issues using large language models: a comparison of ChatGPT-40 and deepseek

Enterprise AI Analysis: Ophthalmology

Benchmark analysis of myopia-related issues using large language models: a comparison of ChatGPT-40 and deepseek

This study evaluated ChatGPT-40 and DeepSeek's accuracy and comprehensiveness in answering 30 common myopia-related questions across six clinical domains. DeepSeek significantly outperformed ChatGPT-40 in overall accuracy (76.7% vs 43.3% 'Good' ratings). While both models provided comprehensive answers when accurate, performance declined for treatment-related queries, especially concerning commercial products and region-specific information. The study identified poor inter-rater agreement and non-normal score distributions. The findings suggest that localized LLMs like DeepSeek offer competitive advantages, emphasizing the need for ongoing refinement, data updates, and domain-specific fine-tuning for reliable AI in clinical communication.

Key Insights at a Glance

Our analysis reveals critical performance differences and key areas for AI application in healthcare.

0 DeepSeek Accuracy 'Good'
0 ChatGPT-40 Accuracy 'Good'
0 DeepSeek Inter-rater Kappa
0 ChatGPT-40 Inter-rater Kappa

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Overall Accuracy Distribution (Table 1)

DeepSeek significantly outperformed ChatGPT-40 in overall accuracy, achieving a 'Good' rating for 76.7% of responses compared to ChatGPT-40's 43.3%.

AI Model Good (%) Fair (%) Poor (%)
ChatGPT-40 13 (43.3) 12 (40) 5 (16.7)
DeepSeek 23 (76.7) 5 (16.7) 2 (6.6)

Accuracy Across Myopia-Related Domains (Table 2)

While both models showed strengths in foundational knowledge domains, performance consistently declined in treatment-related queries, particularly regarding specific products.

Parameter ChatGPT-40 (Poor/Fair/Good) DeepSeek (Poor/Fair/Good)
Pathogenesis0/1/3>0/1/3
Clinical feature0/1/00/0/1
Diagnosis1/0/40/0/5
Prevention1/2/3>0/3/3
Treatment3/8/12/1/9
Prognosis0/0/20/0/2

Comprehensiveness of 'Good' Responses (Table 4)

When responses were rated as 'Good' in accuracy, both models generally provided sufficiently detailed and comprehensive information to inform lay audiences. DeepSeek achieved slightly higher mean comprehensiveness scores overall.

0 DeepSeek Mean Comprehensiveness (N2)

Questions with Low Accuracy in Treatment Domain (Table 3)

Both AI models struggled with up-to-date, region-specific information regarding commercial myopia control products and specific atropine concentrations, leading to 'Poor' ratings.

Question ChatGPT-40 Rating DeepSeek Rating
Are there special types of glasses that can control myopia?FairFair
What is the role of defocus spectacle lenses in controlling myopia? What are the major existing brands?PoorPoor
What is the function of phototherapy devices in controlling myopia? Are they harmful to the eyes?FairPoor

Enterprise Process Flow (Study Design)

Our rigorous methodology ensured an independent and expert-driven evaluation of chatbot performance against real-world clinical concerns.

Compile 30 myopia questions (6 domains)
Submit questions to ChatGPT-40 & DeepSeek
3 senior pediatric ophthalmologists independently rate responses
Assess inter-rater reliability (Fleiss' Kappa)
Perform statistical comparisons (Chi-square test)
Analyze accuracy & comprehensiveness

DeepSeek's Competitive Edge: The Power of Localized AI

DeepSeek, a Chinese-developed LLM, demonstrated superior overall accuracy compared to ChatGPT-40. This highlights the growing capability of domestically trained models to deliver reliable health information, particularly relevant in regions like East Asia where localized data and linguistic nuances can be crucial for performance. Its strong showing suggests that regional AI development can offer competitive advantages in specialized domains like ophthalmology.

Quote: "DeepSeek's stronger performance suggests that localized LLMs may offer competitive advantages."

Source: Study Conclusion

Calculate Your Potential AI Impact

Understand the tangible benefits of integrating advanced AI for patient education and support in your organization.

Projected Annual Savings
Hours Reclaimed Annually

Your AI Implementation Roadmap

A strategic phased approach to integrating large language models effectively and responsibly into your healthcare operations.

Phase 1: Initial LLM Deployment & Patient Education

Integrate LLMs for basic patient query handling, health information dissemination, and awareness campaigns for conditions like myopia.

Phase 2: Continuous Data Updates & Knowledge Base Refinement

Establish mechanisms for regular updates to LLM training data, ensuring the inclusion of the latest clinical guidelines and emerging treatments.

Phase 3: Domain-Specific Fine-tuning & Localization

Tailor LLMs to specific medical specialties (e.g., ophthalmology) and regional contexts, addressing local treatment protocols and cultural nuances.

Phase 4: Rigorous Quality Control & Performance Benchmarking

Implement continuous evaluation frameworks, including expert review and patient feedback, to monitor accuracy, comprehensiveness, and safety.

Phase 5: Integration with Clinical Workflows & Feedback Loop

Seamlessly embed AI chatbots into existing healthcare platforms, enabling clinicians to provide feedback for ongoing model improvement and adaptation.

Ready to Transform Your Operations with AI?

Leverage cutting-edge AI to enhance patient engagement, improve information accuracy, and drive better health outcomes.

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs


AI Consultation Booking