Enterprise AI Analysis

Diagnostic accuracy of generative large language artificial intelligence models for the assessment of dental crowding

Background Generative artificial intelligence (Al) models have shown potential for addressing text-based dental enquiries and answering exam questions. However, their role in diagnosis and treatment planning has not been thoroughly investigated. This study aimed to investigate the reliability of different generative Al models in classifying the severity of dental crowding. Methods Two experienced orthodontists categorized the severity of dental crowding in 120 intraoral occlusal images as mild, moderate, or severe (40 images per category). These images were then uploaded to three generative Al models (ChatGPT-40 mini, Microsoft Copilot, and Claude 3.5 Sonnet) and prompted to identify the dental arch and to assess the severity of dental crowding. Response times were recorded, and outputs were compared to orthodontists' assessments. A random image subset was re-analyzed after one week to evaluate model consistency. Results Claude 3.5 Sonnet successfully classified the severity of dental crowding in 50% of the images, followed by ChatGPT-40 mini (44%), and Copilot (34%). Visual recognition of the dental arches was higher with Claude and ChatGPT-40 mini (99%) compared to Copilot (72%). Response generation was significantly longer for ChatGPT-40 mini than for Claude and Copilot (p < .0001). However, the response times were comparable for both Claude and Copilot (p=.98). Repeated analyses showed improvement in image classification for both ChatGPT-40 mini and Copilot, while Claude 3.5 Sonnet misclassified a significant portion of the images. Conclusions The performance of ChatGPT-40 mini-, Microsoft Copilot, and Claude 3.5 Sonnet in analyzing the severity of dental crowding often did not match the evaluations made by orthodontists. Further developments in image processing algorithms of commercially available generative Al models are required prior to reliable use for dental crowding classification. Keywords ChatGPT, Microsoft, Claude, Al, Large language models

Schedule Your Strategy Session

Executive Impact: Key Performance Indicators

Understand the critical metrics of AI's current capabilities in dental diagnostics and what it means for your practice.

0 Max Crowding Classification Accuracy (Claude 3.5 Sonnet)

0 Orthodontist Inter-rater Agreement (Kappa)

0 Max Arch Recognition Accuracy (Claude/ChatGPT-40 mini)

0 Longest Avg. Response Time (ChatGPT-40 mini)

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Background

Methods

Results

Discussion

Conclusions

Context of Dental Crowding & AI in Orthodontics

Quantifying dental crowding is crucial for orthodontic treatment planning. Traditional methods are often subjective and time-consuming. Artificial intelligence (AI), particularly deep learning (DL) and large-language models (LLMs), offers new opportunities. While AI has been extensively studied in dental radiology and facial esthetics, its application in visual diagnostic tasks, such as assessing dental crowding from intraoral images, remains largely unexplored. This study aimed to fill that gap by evaluating the diagnostic accuracy of generative AI models for dental crowding classification.

Study Design and AI Model Evaluation

This comparative study evaluated ChatGPT-40 mini, Microsoft Copilot, and Claude 3.5 Sonnet. Two experienced orthodontists classified 120 intraoral occlusal images (40 mild, 40 moderate, 40 severe crowding). Images were randomized and uploaded to AI models with a standardized prompt to classify crowding and identify the dental arch. Response times were recorded, and a subset of 30 images was re-analyzed after one week to assess consistency. Statistical analysis included Cohen's kappa for inter-rater agreement and confusion matrices for AI performance.

Enterprise Process Flow

One-hundred and twenty intra-oral occlusal images were sourced

→

Classified for crowding severity by two experienced orthodontists

→

Images randomized

→

Pilot test for suitable prompts

→

Avoid sequential same-category images

→

Panel consensus reached

→

Images uploaded into: (a) GPT-40 mini (b) Microsoft Copilot (c) Claude 3.5 Sonnet

→

Uploaded October 2024

→

Models prompted to: (a) Classify crowding severity (b) Recognize the dental arch

→

Response time generation for each model was recorded

→

Responses were tabulated

→

Stopwatch

→

Repeatability analysis: A random sample of thirty images re-analyzed one week later

→

Check for AI hallucination and robustness of results

AI Model Performance & Repeatability

Orthodontist agreement was substantial (kappa 0.87). For crowding classification, Claude 3.5 Sonnet had the highest accuracy at 50%, followed by ChatGPT-40 mini (44%), and Copilot (34%). Arch recognition was significantly better for Claude and ChatGPT-40 mini (99%) than Copilot (72%). ChatGPT-40 mini had the longest response time (11.9s) but showed the best consistency (87% kappa 0.74) in repeatability analysis, compared to Copilot (73% kappa 0.41) and Claude (43% kappa 0.1).

50% Highest Crowding Classification Accuracy by Claude 3.5 Sonnet

Feature	ChatGPT-40 mini	Microsoft Copilot	Claude 3.5 Sonnet
Overall Crowding Accuracy	44.2%	34.2%	50.0%
Mild Crowding Sensitivity	67.5%	22.5%	55.0%
Mild Crowding Specificity	58.8%	86.3%	77.5%
Arch Recognition Accuracy	99%	72%	99%
Average Response Time	~11.9s	~4.0s	~3.9s
Crowding Classification Repeatability	87%	73%	43%

Implications and Limitations of AI in Dental Crowding

The study's findings suggest that current commercially available LLMs are suboptimal for visual crowding assessment, performing below expert orthodontists. While promising for arch identification, their varied performance in crowding classification and repeatability highlights the need for further development. These models are not yet ready for autonomous diagnostic decisions, but could assist in low-risk scenarios or early triaging. Integrating NLP and computer vision is complex, and future improvements will rely on domain-specific training data and refined prompting strategies.

Future Directions for AI in Orthodontics

ChatGPT-40 mini, Microsoft Copilot, and Claude 3.5 Sonnet showed limited and inconsistent accuracy in classifying dental crowding from intraoral occlusal images. While arch identification was somewhat better, their ability to grade crowding severity often misaligned with experienced orthodontists' judgment. Further developments in image processing algorithms and domain-specific training are required before reliable use in clinical dental crowding classification.

Key Takeaway: AI Still Needs Refinement for Accurate Crowding Diagnosis

The study unequivocally demonstrated that commercially available generative AI models (ChatGPT-40 mini, Microsoft Copilot, and Claude 3.5 Sonnet) currently lack the diagnostic accuracy and consistency required for reliable classification of dental crowding based on intraoral images. While they show potential in basic visual recognition tasks like arch identification, their performance in grading the severity of crowding often falls short of expert orthodontists' judgment. This highlights a critical need for further advancements in image processing algorithms and domain-specific training before these AI models can be dependably integrated into orthodontic diagnostic workflows.

Key Learnings for Enterprise AI Adoption:

Commercial LLMs are not yet clinically reliable for complex visual dental diagnoses.
Significant development in AI image processing and specialized training data is essential.
AI models currently serve better as supportive tools rather than autonomous diagnostic systems in orthodontics.

Quantify Your Potential AI Impact

Use our interactive calculator to estimate the efficiency gains and cost savings AI can bring to your specific enterprise operations.

Your Industry

Number of Employees in Relevant Department

Average Hours Spent on Manual Tasks Per Week Per Employee

Average Hourly Cost Per Employee (Including Benefits)

Estimated Annual Cost Savings $0

Annual Hours Reclaimed 0

Your AI Implementation Roadmap

A structured approach to integrating AI into your enterprise, ensuring maximum impact and minimal disruption.

Phase 1: Discovery & Strategy

Comprehensive assessment of current workflows, identification of AI opportunities, and development of a tailored AI strategy aligned with your business objectives.

Phase 2: Pilot & Validation

Deployment of AI solutions in a controlled environment, rigorous testing, and validation of performance against defined KPIs. Iterative refinement based on feedback.

Phase 3: Integration & Scaling

Seamless integration of validated AI solutions into your existing enterprise systems and scaling across relevant departments. Training and support for your teams.

Phase 4: Optimization & Future-Proofing

Continuous monitoring, performance optimization, and exploration of advanced AI capabilities to ensure sustained competitive advantage and long-term value.

Ready to Transform Your Enterprise with AI?

Connect with our AI specialists to discuss a bespoke strategy that drives efficiency and innovation in your organization.

Schedule a Consultation

Enterprise AI Analysis

Diagnostic accuracy of generative large language artificial intelligence models for the assessment of dental crowding

Executive Impact: Key Performance Indicators

Deep Analysis & Enterprise Applications

Context of Dental Crowding & AI in Orthodontics

Study Design and AI Model Evaluation

Enterprise Process Flow

AI Model Performance & Repeatability

Implications and Limitations of AI in Dental Crowding

Future Directions for AI in Orthodontics

Key Takeaway: AI Still Needs Refinement for Accurate Crowding Diagnosis

Quantify Your Potential AI Impact

Your AI Implementation Roadmap

Phase 1: Discovery & Strategy

Phase 2: Pilot & Validation

Phase 3: Integration & Scaling

Phase 4: Optimization & Future-Proofing

Ready to Transform Your Enterprise with AI?

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs

Select Time Zone

Big Competitive Advantage With Ai

Learn More

Our Demos

Research Center

Contact Us

1 888 985 3025

Solutions@OwnYourAi.com

Get Your Ai