Precision Oncology & AI
Real-world application of large language models for automated TNM staging using unstructured gynecologic oncology reports
Manual cancer registry data entry is time-consuming and error-prone. This study demonstrates that Large Language Models (LLMs) can accurately extract TNM classifications from unstructured gynecologic oncology reports using only prompt engineering, outperforming manual entries. Both cloud-based (Gemini 1.5) and top-performing local (Qwen2.5 72B) LLMs achieved high accuracies for T, N, and M classifications, highlighting their potential to enhance data integrity and streamline clinical workflows without complex fine-tuning or data anonymization.
Executive Impact at a Glance
Key performance indicators demonstrating the potential of LLM integration in clinical data management.
Deep Analysis & Enterprise Applications
Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.
Manual data entry in cancer registries is time-consuming and prone to error, with reported inaccuracies ranging from 5-17%. The complexity of TNM classification criteria and frequent updates exacerbate these errors, underscoring the need for reliable and efficient data registration methods.
Cloud-based LLM (Gemini 1.5) achieved 0.994 accuracy for pT and 0.993 for pN, outperforming manual entries. The top-performing local model (Qwen2.5 72B) also showed strong results with 0.971 accuracy for pT and 0.923 for pN. These models effectively extracted pathological T and N classifications from unstructured reports using prompt engineering alone.
Gemini 1.5 achieved 0.909 accuracy for clinical M (cM) classification, and Qwen2.5 72B achieved 0.895 accuracy. While robust, performance for M-stage was comparatively lower than T/N stages, often due to misinterpretation of peritoneal dissemination and extra-regional lymph node metastases as M1.
The study utilized out-of-the-box LLMs with prompt engineering on raw, unstructured medical records, bypassing complex fine-tuning or data anonymization. Implementation in secure cloud-based and local offline environments ensures data confidentiality and practical applicability in clinical workflows. Pydantic-constrained decoding significantly improved output consistency and accuracy.
Percentage of pT classification errors in cervical cancer
Enterprise Process Flow: Automated TNM Staging Workflow
LLM Performance: Cloud vs. Local
Both cloud-based and local LLMs show high accuracy, but cloud models often lead with slight performance advantages due to their scale and continuous optimization.
| Feature | Cloud-based LLMs (e.g., Gemini 1.5) | Local LLMs (e.g., Qwen2.5 72B) |
|---|---|---|
| pT Classification Accuracy |
|
|
| pN Classification Accuracy |
|
|
| cM Classification Accuracy |
|
|
| Data Security & Compliance |
|
|
| Customization & Fine-tuning |
|
|
Improving Data Integrity with Pydantic-constrained Decoding
Challenge: Conventional prompt-based structured output often leads to format variations, extraneous explanations, and structural inconsistencies, requiring manual post-processing and reducing reliability.
Solution: Implemented Pydantic-constrained decoding for forced JSON output generation, ensuring consistent and valid JSON formats without irrelevant text.
Impact: Significantly improved accuracy (mean difference 0.0268, p=0.004) and F1 score (mean difference 0.0271, p=0.006) for pT classification, eliminating output verbosity and structural inconsistencies and enhancing automation.
Calculate Your AI Automation ROI
Estimate the potential cost savings and efficiency gains for your organization by automating manual data entry and classification tasks with LLMs.
Your AI Implementation Roadmap
A phased approach to integrating LLM-based automation into your clinical data workflows.
Phase 1: Discovery & Strategy
Assess current manual processes, identify key data points for automation (e.g., TNM staging), and define success metrics. Develop a secure data handling strategy (cloud vs. local LLMs) and compliance framework.
Phase 2: Pilot & Validation
Implement LLM solution with prompt engineering on a subset of real-world, unstructured reports. Validate accuracy against ground truth, measure efficiency gains, and refine prompts based on initial results. Focus on secure environment setup.
Phase 3: Integration & Scale
Integrate the validated LLM solution into existing clinical workflows and registry systems. Train staff on new automated processes and monitoring protocols. Expand to broader datasets and additional classification tasks.
Phase 4: Optimization & Future-proofing
Continuously monitor LLM performance, update prompts for evolving classification guidelines, and explore advanced techniques like Pydantic-constrained decoding for further accuracy and consistency. Stay agile with LLM advancements.
Ready to Transform Your Data Management?
Schedule a personalized consultation with our AI specialists to explore how LLM automation can enhance accuracy, reduce workload, and future-proof your cancer registry operations.