AI Research Analysis

NERIF: GPT-4V for Automatic Scoring of Drawn Models

Authors: Gyeonggeon Lee, Xiaoming Zhai

Publication Date: November 19, 2025

This study proposes Notation-Enhanced Rubric Instruction for Few-Shot Learning (NERIF), a prompt engineering method to leverage GPT-4V for automatic scoring of student-drawn models. It evaluates GPT-4V's accuracy and interpretability across six science modeling tasks, finding promising potential despite challenges with complex models.

Schedule Your Strategy Session

Executive Impact

This analysis of 'NERIF: GPT-4V for Automatic Scoring of Drawn Models' reveals key performance indicators for implementing AI-powered automatic scoring in educational settings.

0.51 Average Scoring Accuracy

0.64 Beginning Category Accuracy

0.62 Developing Category Accuracy

0.26 Proficient Category Accuracy

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

The study introduces NERIF, a prompt engineering approach for GPT-4V, combining instructional notes, scoring rubrics, and few-shot learning for automatic model scoring.

NERIF Process Flow

Write Prompt

→

Validate

→

Test

9 Few-shot examples used for training GPT-4V, significantly reducing data collection burden.

GPT-4V achieves an average scoring accuracy of 0.51, with higher accuracy for 'Beginning' and 'Developing' categories (0.64 and 0.62) but lower for 'Proficient' (0.26).

Category	Accuracy
Beginning	0.64
Developing	0.62
Proficient	0.26

Interpretable Rationales

GPT-4V provides detailed, rubric-aligned justifications, explaining how it identifies model components and assigns scores, allowing human users to understand its decision-making process. For example, it identified 'longer arrows after heating' as evidence for faster particle motion, even if the student used double lines for motion, demonstrating plausible but sometimes incorrect inferences.

VLMs like GPT-4V offer a paradigm shift for computer vision in education, reducing technical barriers and data requirements for automatic scoring. However, accuracy for complex models needs improvement.

100% Reduction in programming skills required for automatic scoring compared to traditional ML methods.

Feature	VLM (GPT-4V)	Traditional ML (CNN)
Technical Barrier	Low (prompt engineering)	High (sophisticated CNNs)
Training Data Needs	Few-shot (9 examples)	Large (hundreds to thousands)
Interpretability	High (natural language rationales)	Low (black box)
Flexibility	High (natural language prompts)	Low (model-specific tuning)

Advanced ROI Calculator

Estimate the potential time and cost savings by automating student model scoring in your institution.

Your Industry

Number of Employees (Scoring)

Avg. Hours/Week on Manual Scoring

Avg. Hourly Rate of Scorers ($)

Annual Savings $0

Hours Reclaimed Annually 0

Calculate Your Potential Savings

Implementation Roadmap

Our phased implementation plan ensures a smooth transition and maximum impact for your educational AI initiatives.

Phase 1: Pilot & Validation

Conduct small-scale pilots with a subset of modeling tasks and educators to validate NERIF's effectiveness and gather feedback.

Phase 2: Integration & Customization

Integrate GPT-4V with existing learning platforms and customize rubrics and instructional notes for broader application across disciplines.

Phase 3: Scalable Deployment & Training

Roll out the automatic scoring system across the institution, providing comprehensive training for educators on prompt engineering and critical assessment of AI outputs.

Ready to Transform Your Assessment? Schedule a Consultation.

Our experts are ready to help you integrate cutting-edge AI into your educational workflows, ensuring efficient and effective assessment practices.

Ready to Transform Your Assessment? Schedule a Consultation.

AI Research Analysis

NERIF: GPT-4V for Automatic Scoring of Drawn Models

Executive Impact

Deep Analysis & Enterprise Applications

NERIF Process Flow

Interpretable Rationales

Advanced ROI Calculator

Implementation Roadmap

Phase 1: Pilot & Validation

Phase 2: Integration & Customization

Phase 3: Scalable Deployment & Training

Ready to Transform Your Assessment? Schedule a Consultation.

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs

Select Time Zone

Big Competitive Advantage With Ai

Learn More

Our Demos

Research Center

Contact Us

1 888 985 3025

Solutions@OwnYourAi.com

Get Your Ai