Skip to main content
Enterprise AI Analysis: Code World Model Preparedness Report

AI Preparedness Report

Code World Model Preparedness Report: Moderate Risk for Open-Weight Release

This report documents the preparedness assessment of Code World Model (CWM), a model for code generation and reasoning about code from Meta. We conducted pre-release testing across domains identified in our Frontier AI Framework as potentially presenting catastrophic risks, and also evaluated the model's misaligned propensities. Our assessment found that CWM does not pose additional frontier risks beyond those present in the current AI ecosystem. We therefore release it as an open-weight model.

Our assessments indicate that CWM's performance on cybersecurity, chemical & biological risks, and propensity evaluations places it within the "moderate" risk threshold for catastrophic domains, affirming its suitability for open-source release.

Key Findings at a Glance

A concise overview of CWM's performance across critical safety and capability domains, supporting its moderate risk classification.

0 Parameters
0 Cybench CTF Pass Rate
0 WMDP-Bio Accuracy
0 Baseline Epistemic Honesty

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Cybersecurity Evaluation
Chemical & Biological Evaluation
Propensities

Cybersecurity Evaluation Summary

Models with strong coding capabilities may also be capable of automating various cybersecurity tasks, which could be used for offensive or defensive purposes. To assess the cybersecurity capabilities of CWM and peer models, we ran a combination of cybersecurity knowledge tests and “capture the flag" (CTF) style agentic challenges requiring the model to identify and exploit vulnerabilities.

25% Cybench CTF Pass Rate for CWM, on par with other open-source models.

Hack The Box Challenge Workflow

Reconnaissance & Information Gathering
Analyze Open Ports & Services
Identify Vulnerabilities & Misconfigurations
Document Findings Clearly
State Final Answer

Cybench CTF Challenge Solve Rate (pass@10)

ModelCTFs passed (count)Share of 40 CTFs passed (%)
Llama 4 Maverick717.5
Qwen3-Coder1025.0
gpt-oss-120b (high)1127.5
CWM1025.0

Hack the Box Performance (pass@10)

ModelNumber of compromised machines (out of 10)Average successful intermediate steps (%)Max successful intermediate steps (%)
Llama 4 Maverick054.266.7
Qwen3-Coder053.783.3
gpt-oss-120b (high)041.966.7
CWM041.066.7

Chemical & Biological Evaluation Summary

Our evaluation of Chemical and Biological risks focuses on capabilities that could potentially lower barriers for developing harmful agents, ranging from foundational scientific knowledge to specialized dual-use applications. We employ a multi-tiered assessment framework across two key capability domains: Knowledge (Formal and Tacit) and Experimental Design.

78.1% CWM's Accuracy on WMDP-Bio, lowest across peer models.

Biological Agent Workflow Phases

Agent Acquisition (Isolation/Synthesis)
Production (Culturing, Modification, Scale-up)
Processing (Formulation, Verification, Storage)

WMDP-Bio and WMDP-Chem Accuracy

ModelWMDP-Bio (%)WMDP-Chem (%)
Llama 4 Maverick86.4±1.876.5±4.2
Qwen3-Coder83.2±2.065.9±4.6
gpt-oss-120b (high)86.3±1.973.3±4.3
CWM78.1±2.364.6±4.5

HPCT and VCT Accuracy (Human Expert Baseline)

ModelHPCT (%)VCT (%)
Human Expert31.0±0.022.0±0.0
Llama 4 Maverick39.4±8.627.3±7.4
Qwen3-Coder33.2±8.725.7±8.0
gpt-oss-120b (high)48.1±8.840.7±8.3
CWM31.2±7.823.8±6.2

Propensities Evaluation Summary

Frontier models can develop unsafe propensities – tendencies towards certain behaviors that emerge without being explicitly taught and which conflict with their intended use or safety standards. These can arise from models encoding higher-level concepts from training data in unexpected ways, optimizing for poorly defined objectives, or overgeneralizing learned patterns.

+13.4% Improvement in Normalized Honesty with Structured Reasoning Prompts

Honesty-Relevant Reasoning Stages Framework

Task Understanding
Conflict Acknowledgement
Uncertainty Externalization
Conflict Resolution
Reasoning-Statement Consistency

Honesty Scores with 95% Confidence Intervals on MASK

ModelHonesty (%)Normalized Honesty (%)
Llama 4 Maverick53.5±3.149.8±3.0
Qwen3-Coder52.0±2.848.4±3.1
gpt-oss-120b (high)88.7±1.787.3±1.8
CWM (without reasoning)52.6±2.844.8±3.0
CWM (with reasoning)62.7±2.655.5±2.8

Change in Honesty Metrics with Structured Reasoning Prompts

ModelHonesty (%)Normalized Honesty (%)
Δ CWM (w/ reasoning)+11.7+13.4
Δ CWM (w/o reasoning)+12.0+12.1

Calculate Your Potential AI Impact

Estimate the efficiency gains and cost savings your enterprise could realize by strategically integrating advanced AI models like CWM.

Annual Cost Savings $0
Annual Hours Reclaimed 0

Your AI Implementation Roadmap

A phased approach to integrate advanced AI models securely and effectively within your enterprise, ensuring maximum impact and minimal risk.

Phase 01: Strategic Assessment & Planning

Define clear objectives, identify critical use cases, and conduct a thorough assessment of existing infrastructure and data readiness. Establish governance frameworks and evaluate potential risks and mitigation strategies.

Phase 02: Pilot Deployment & Iteration

Implement CWM or similar models in controlled environments. Monitor performance, gather user feedback, and iterate on model configurations and integration points to optimize for specific enterprise needs.

Phase 03: Scaled Integration & Continuous Monitoring

Roll out solutions across relevant departments, ensuring robust security, scalability, and compliance. Establish continuous monitoring systems to track model performance, identify emerging risks, and ensure ongoing alignment with safety standards.

Ready to Transform Your Enterprise with AI?

Unlock the full potential of advanced AI while navigating its complexities. Our experts are ready to guide you.

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs


AI Consultation Booking