Skip to main content
Enterprise AI Analysis: Can Generative AI Produce Test Cases? An Experience from the Automotive Domain

Enterprise AI Analysis

Can Generative AI Produce Test Cases? An Experience from the Automotive Domain

This paper explores the use of generative AI to automatically convert informal test case specifications into executable test scripts in the automotive domain. By integrating large language models with few-shot learning and retrieval-augmented generation, the proposed approach highlighted the potential of generative AI to support industrial software testing processes. Our solution assumes test case specifications defined in Rational Quality Manager and test scripts specified in ecu.test. We evaluated our solution by considering an industrial benchmark of 200 unique pairs of informal test step descriptions and executable test instructions. Our results show that generative AI can produce correct or near-correct test cases in many scenarios, the quality of results depends significantly on prompt design, large language model selection, and the accuracy of context retrieval. Our study underscores the need for human oversight to address subtle errors in logic sequencing and value assignments, ensuring functional correctness. Future research should prioritize improving retrieval mechanisms, expanding dataset diversity, and exploring hybrid human-AI workflows to enhance generative AI's scalability, reliability, and larger applicability in industrial settings.

Key Findings & Impact

Our analysis reveals significant opportunities for efficiency gains and quality improvements in enterprise testing workflows.

0 Min EMR for Correctness
0 Max LMR for Correctness
0 Max ChrF Similarity

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Correctness (RQ1)
Prompt Design (RQ2)
LLM Variants (RQ3)
Context Accuracy (RQ4)

Few-shot learning (properly configured) can produce test scripts that correctly implement their informal specification between 49.5% and 64.5% of the cases. However, when not correct, the test scripts are highly similar to the ones defined by humans and might be a valid starting point for the implementation of the test cases.

Even though in most cases the prompt order and context order do not affect the effectiveness of our solution, our results suggest ordering the context entries alphabetically and putting the context before the informal specification.

Our results suggest that LLMs tuned for code generation outperform their counterparts. Instruction tuning decreased the effectiveness when the ideal context was used, and did not provide improvements for the unfiltered context.

Unnecessary and inaccurate context significantly reduces the effectiveness of the test generation.

64.5% Max LMR achieved in Exp 1

Enterprise Process Flow

RQM2Text
Context Retrieval
Instructions Generation
Test Generation
Impact of LLM Variants on Effectiveness (Exp 1)
Comparison CHRF EMR LMR
Code Generation Tuning vs. Base
  • Outperformed in most cases
  • Outperformed in most cases
  • Outperformed in most cases
Instruction Following vs. Base
  • Significantly worse (10/20)
  • Significantly worse (12/20)
  • Significantly worse (15/20)

Industrial Context & Confidentiality

The study was conducted in collaboration with a large automotive company, using Rational Quality Manager for informal specifications and ecu.test for test scripts. Industrial confidentiality (Requirement R3) was maintained by executing models locally. This ensures real-world applicability while respecting data sensitivity.

  • Collaboration with automotive OEM.
  • Use of RQM and ecu.test (Assumptions A1, A2).
  • Local execution for data confidentiality (R3).
  • Results directly relevant to automotive domain.
43.2 Avg. CHRF drop due to unnecessary context

Estimate Your Test Automation ROI

Quantify the potential time and cost savings from automating test case generation in your enterprise.

Estimated Annual Savings $0
Annual Hours Reclaimed 0

AI Test Automation Implementation Roadmap

A strategic phased approach to integrate generative AI into your testing workflow.

Phase 1: Pilot & Integration

Integrate AI-powered test case generation into a pilot project, focusing on a critical subsystem. Establish initial RAG context and few-shot examples.

Phase 2: Feedback & Refinement

Collect feedback from engineers on AI-generated scripts. Refine prompt design, context retrieval, and LLM configuration based on real-world usage.

Phase 3: Scaling & Training

Expand AI test generation to additional projects. Develop internal training programs for engineers to leverage and oversee AI effectively.

Phase 4: Advanced Features & Optimization

Explore advanced features like automated context retrieval, integration with CI/CD pipelines, and continuous LLM model optimization.

Ready to Transform Your Testing Workflow?

Connect with our experts to explore how Generative AI can revolutionize your test case generation and accelerate software delivery.

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs


AI Consultation Booking