Enterprise AI Analysis

BlackboxNLP-2025 MIB Shared Task: Improving Circuit Faithfulness via Better Edge Selection

One of the main challenges in mechanistic interpretability is circuit discovery—determining which parts of a model perform a given task. We build on the Mechanistic Interpretability Benchmark (MIB) and propose three key improvements to circuit discovery.

Schedule Your Strategy Session

Executive Impact at a Glance

Our novel methods yield more faithful circuits and outperform prior approaches across multiple MIB tasks and models. Different combinations of our techniques, tailored to specific faithfulness objectives, consistently demonstrate improved performance over leading baselines, enhancing the robustness and interpretability of discovered AI model circuits.

0 CMD Reduction (GPT-2 IOI)

0 CPR Increase (Qwen-2.5 MCQA)

0 Reduction in Unstable Edges

Unlock Full Potential

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Positive-Negative Ratio (PNR) Selection

Traditional circuit discovery methods often select edges based purely on score magnitude, which can inadvertently include components with negative contributions, thereby misrepresenting the model's true behavior. Our Positive-Negative Ratio (PNR) strategy addresses this by prioritizing positively-scoring edges. We first select a predefined proportion of top-positive edges, and then fill the remaining budget with edges ranked by absolute score. This fine-grained control ensures a better balance of edge types, leading to more faithful and interpretable circuits.

Bootstrapped Confidence Filtering

Attribution scores can be noisy and inconsistent across different data samples, leading to unstable circuit discoveries. We observed that some edges exhibit varying signs (positive or negative) depending on the sample, indicating their unreliable contribution. To counter this, our bootstrapping method involves resampling the training data multiple times to calculate consistent attribution scores. By analyzing the statistical significance of these scores, we can filter out unstable edges, ensuring that only components with consistently signed contributions are included in the circuit, thus improving robustness.

Integer Linear Programming (ILP) for Optimal Circuits

Current circuit discovery methods often rely on greedy selection algorithms, which make local decisions and may result in suboptimal circuits. We reformulate circuit construction as an Integer Linear Programming (ILP) optimization problem. This allows for a global optimal subset selection of edges, subject to structural and budget constraints. These constraints ensure the resulting circuit is connected, includes source and target nodes, and maintains node-edge consistency. This approach yields more faithful circuits by considering the overall graph structure optimally.

Enterprise Process Flow: Enhanced Circuit Discovery

Model Computation Graph

→

Edge Scoring (EAP-IG)

→

Bootstrapped Confidence Filtering

→

PNR-aware Edge Ranking

→

ILP Optimization for Global Selection

→

Connected Subgraph Assembly

→

Faithful & Robust Circuit

Comparative Performance Overview

Method	GPT-2 IOI CMD (Lower is Better)	GPT-2 IOI CPR (Higher is Better)	Qwen-2.5 MCQA CMD (Lower is Better)	Qwen-2.5 MCQA CPR (Higher is Better)
Baseline (Greedy)	0.0308	2.4901	0.1846	1.8769
Our Enhanced Approach	0.0294 (4.55% Reduction)	2.5061 (0.64% Increase)	0.1820 (1.41% Reduction)	1.9145 (1.99% Increase)

Scaling Mechanistic Interpretability: Acknowledging Challenges

Challenge: While our ILP optimization significantly enhances circuit faithfulness, its computational complexity currently limits applicability to larger, more complex models. The search for globally optimal solutions becomes resource-intensive as the number of model components grows. Furthermore, accurately determining the optimal Positive-Negative Ratio (PNR) requires task-specific tuning, adding to the setup overhead.

Solution & Future Outlook: Despite these limitations, our principled edge selection methods – ILP, PNR, and bootstrapping – consistently yield more faithful and robust circuit discoveries compared to prior greedy approaches. This indicates the strong potential of advanced optimization for mechanistic interpretability. Future research will focus on developing more scalable ILP formulations and improved attribution methods that better reflect "ground truth" edge importance, further unlocking the benefits of optimal graph building for even the largest AI models.

Calculate Your Potential ROI

See how enhancing AI interpretability can translate into tangible operational savings and efficiency gains for your enterprise.

Your Industry

Number of Employees Impacted

Avg. Hours Per Week on AI-Related Tasks (per employee)

Average Hourly Wage ($)

Annual Savings Potential $0

Hours Reclaimed Annually 0

Your Path to Interpretable AI

A structured approach to integrating advanced mechanistic interpretability techniques into your enterprise AI pipeline.

Phase 1: Discovery & Assessment

Comprehensive evaluation of your existing AI models and interpretability needs. Identify key areas where improved circuit faithfulness can drive business value.

Phase 2: Pilot & Customization

Implement our methods on a chosen pilot project. Customize PNR values, bootstrapping parameters, and ILP constraints to optimize for your specific models and tasks.

Phase 3: Integration & Scaling

Seamlessly integrate the enhanced circuit discovery pipeline into your development and MLOps workflows. Begin scaling interpretability across your AI portfolio.

Phase 4: Monitoring & Refinement

Continuous monitoring of circuit faithfulness and model behavior. Iterative refinement of parameters and exploration of advanced attribution techniques for sustained performance.

Discuss Your Implementation

Ready to Enhance Your AI's Interpretability?

Schedule a free consultation with our AI experts to explore how principled circuit discovery can transform your enterprise AI strategy.

Book Your Consultation Now

Enterprise AI Analysis

BlackboxNLP-2025 MIB Shared Task: Improving Circuit Faithfulness via Better Edge Selection

Executive Impact at a Glance

Deep Analysis & Enterprise Applications

Positive-Negative Ratio (PNR) Selection

Bootstrapped Confidence Filtering

Integer Linear Programming (ILP) for Optimal Circuits

Enterprise Process Flow: Enhanced Circuit Discovery

Comparative Performance Overview

Scaling Mechanistic Interpretability: Acknowledging Challenges

Calculate Your Potential ROI

Your Path to Interpretable AI

Phase 1: Discovery & Assessment

Phase 2: Pilot & Customization

Phase 3: Integration & Scaling

Phase 4: Monitoring & Refinement

Ready to Enhance Your AI's Interpretability?

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs

Select Time Zone

Big Competitive Advantage With Ai

Learn More

Our Demos

Research Center

Contact Us

1 888 985 3025

Solutions@OwnYourAi.com

Get Your Ai