Enterprise AI Analysis

Preserving Bilinear Weight Spectra with a Signed and Shrunk Quadratic Activation Function

Authored by Jason Abohwo and Thomas Mosen from Yale University, USA & École polytechnique fédérale de Lausanne (EPFL), Switzerland.

This paper introduces Signed Quadratic Shrink (SQS), a novel activation function for Gated Linear Units (GLUs) that achieves high interpretability directly from model weights without compromising performance. SQS addresses the limitations of Bilinear MLPs, which offer interpretability but often underperform state-of-the-art GLUs. By carefully engineering the activation to handle gradient issues and preserve weight structure, SQS-GLUs deliver competitive performance across tasks like MNIST, Fashion MNIST, and Tiny Stories, while allowing for the extraction of meaningful, interpretable eigenfeatures from their weights. This breakthrough enables more transparent and reliable enterprise AI systems.

Schedule Your Strategy Session

Executive Impact & Key Findings for Your Business

Addressing the critical need for transparent and performant AI, SQS offers a powerful new approach. This research demonstrates how to unlock high-fidelity interpretability without the typical performance trade-offs, driving confidence and compliance in enterprise AI deployments.

0.0834 MNIST Final Loss (SQS)

97.85% MNIST Final Accuracy (SQS)

7.6286 Tiny Stories Perplexity (SQS)

Discuss Your Implementation

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Understanding the inner workings of machine learning models is crucial for ensuring their reliability and robustness. Weight-based interpretability, particularly through performant, innately interpretable architectures, is a promising avenue. Prior work by Sharkey [1] and Pearce et al. [2] explored Bilinear MLPs (GLUs without an activation function) for learning interpretable features via weight spectra. However, these often suffer from reduced performance and data efficiency compared to state-of-the-art GLUs like SwiGLU and GEGLU [11]. Our work aims to address this performance gap while retaining interpretability, which is a critical bottleneck for enterprise adoption of advanced AI.

High Interpretability Potential

The core motivation is to achieve high interpretability directly from weights, a critical aspect for enterprise AI adoption. SQS-GLU maintains high interpretability while improving performance.

We introduce Signed Quadratic Shrink (SQS) as a novel activation function for Gated Linear Units (GLUs). Motivated by the interpretability of Bilinear MLPs and the performance issues of raw quadratic activations, SQS modifies f(x) = x² to a more robust form. The function incorporates a shift parameter (c) and a shrinking factor (λ, p) to address vanishing and exploding gradients, crucial for stable training. For GLUs, the activation function σ(x) is specifically derived as (x / |x|) * (|x| - c) / (1 + (λ|x|)ᵖ), which simplifies to (x - c sgn(x)) / (1 + λx sgn(x)) when p=1. This design is engineered to yield competitive performance and data efficiency while critically preserving the weight structure properties necessary for interpretability.

Enterprise Process Flow

Identify Bilinear MLP Interpretability vs. Performance Trade-off

→

Analyze Gradient Issues of Standard Quadratic Activations

→

Introduce Shift 'c' to Mitigate Small-x Gradient Vanishing

→

Apply Shrinking Factor 'λ, p' to Control Large-x Gradient Explosion

→

Adapt for GLUs: Factor out directional sign and maintain gate information

→

Result: SQS Activation Function - Interpretable & Performant

Our experiments rigorously evaluate SQS-GLUs against standard activations (GeLU, ReLU) and Bilinear GLUs across MNIST [6], Fashion MNIST [7] (image classification), and Tiny Stories [8] (language modeling). Using parameters λ = 0.5, c = 0.01, and p = 1, SQS-GLU demonstrates superior or competitive performance:

MNIST/FMNIST: SQS-GLU converges faster and achieves lower loss than ReLU-GLUs and Bilinear MLPs. Its final accuracy and loss are on par with state-of-the-art SwiGLU and GELU.
Tiny Stories: SQS shows the best performance in terms of both loss and perplexity across various training checkpoints.

Crucially, SQS-GLU retains the interpretability of Bilinear MLPs. Eigenvector analysis (Figure 2) reveals meaningful, class-specific features. The cosine similarity between SQS-GLU and Bilinear MLP eigenvectors remains high (never below 0.5, often above 0.95 for key eigenvectors), validating its ability to preserve weight-based interpretability while boosting performance.

Feature	Bilinear MLP (Baseline)	Standard GLUs (e.g., SwiGLU, GELU)	SQS-GLU (Our Approach)
Weight-based Interpretability	High (Intrinsic)	Low / Implicit (Activation-driven)	High (Preserved, Enhanced via SQS)
Performance (Loss/Accuracy/Perplexity)	Often Lags SOTA	State-of-the-Art	Competitive / On par with SOTA
Data Efficiency	Limited	High	High (Comparable to SOTA)
Gradient Stability	N/A (no activation in gate)	Good	Improved (with tunable c, λ, p)

Interpretable Eigenfeatures on MNIST & FMNIST

SQS-GLU robustly learns interpretable features on image classification tasks. Our eigenvector decomposition analysis on MNIST and Fashion MNIST reveals top eigenvectors that visually correspond to distinct, recognizable classes (e.g., specific digits, or items like 'trousers', 'pullover', 'dress'). This direct visual correlation underscores the model's transparency. Furthermore, the cosine similarity between SQS-GLU derived eigenvectors and those from Bilinear MLPs is consistently high (ranging from 0.5 to 0.95 for the most important eigenvectors), confirming that SQS effectively maintains the inherent interpretability of bilinear structures while delivering superior overall performance.

Key Findings:

Eigenvectors visually map to distinct semantic features (e.g., specific digits, clothing items).
High cosine similarity with Bilinear MLP eigenvectors validates preserved interpretability.
SQS-GLU shows faster convergence and lower loss compared to non-SQS baselines.
Achieves state-of-the-art performance benchmarks on image datasets.

The Signed Quadratic Shrink (SQS) activation function represents a significant advancement towards intrinsically interpretable neural networks. By enabling Gated Linear Units to learn features that are directly analyzable through weight spectra, SQS offers a compelling solution that not only matches the performance and data efficiency of state-of-the-art activation functions but also addresses the critical enterprise need for AI model transparency. This dual benefit of high performance and inherent interpretability makes SQS-GLUs particularly valuable for enterprise AI applications, where understanding model decisions is paramount for regulatory compliance, building user trust, efficient debugging, and responsible AI deployment.

Transparent AI Enhanced Trust & Compliance

SQS-GLU's inherent interpretability through weight spectra directly supports regulatory compliance and builds trust in critical enterprise AI deployments by making model decisions understandable.

Optimized Performance Efficiency & Scalability

Achieving state-of-the-art performance while retaining interpretability means enterprises don't have to choose between efficiency and understanding, enabling scalable and reliable AI solutions.

Calculate Your Potential AI ROI

Estimate the financial and operational benefits of implementing interpretable, high-performance AI in your enterprise with SQS-GLU.

Your Industry

Number of Employees (Impacted by AI)

Avg. Hours/Week on Repetitive Tasks

Avg. Hourly Rate ($)

Estimated Annual Savings $0

Annual Hours Reclaimed 0

Quantify Your AI Advantage

Your Path to Interpretable AI Implementation

A phased approach ensures seamless integration and maximum impact of SQS-GLU in your existing enterprise architecture.

Phase 1: Discovery & Strategy

Goal: Define specific AI use cases where interpretability and performance are critical. Assess current infrastructure and data readiness. Develop a tailored SQS-GLU adoption roadmap.

Phase 2: Pilot Development & Testing

Goal: Implement SQS-GLU in a controlled pilot project. Train models on your specific datasets, validate interpretability of eigenfeatures, and benchmark performance against existing solutions.

Phase 3: Integration & Optimization

Goal: Integrate SQS-GLU models into your production environment. Optimize hyperparameters and model architecture for peak performance and maintainability. Establish monitoring for interpretability metrics.

Phase 4: Scaling & Continuous Improvement

Goal: Expand SQS-GLU deployment across relevant enterprise applications. Implement feedback loops for continuous model improvement, retraining, and feature evolution, ensuring long-term value and transparency.

Start Your AI Transformation

Ready to Build Trustworthy & Performant AI?

The future of enterprise AI lies in models that are not only powerful but also transparent. Partner with us to integrate SQS-GLU and unlock new levels of interpretability, reliability, and performance in your AI systems.

Book a Free Consultation Now

Enterprise AI Analysis

Preserving Bilinear Weight Spectra with a Signed and Shrunk Quadratic Activation Function

Executive Impact & Key Findings for Your Business

Deep Analysis & Enterprise Applications

Enterprise Process Flow

Interpretable Eigenfeatures on MNIST & FMNIST

Key Findings:

Calculate Your Potential AI ROI

Your Path to Interpretable AI Implementation

Phase 1: Discovery & Strategy

Phase 2: Pilot Development & Testing

Phase 3: Integration & Optimization

Phase 4: Scaling & Continuous Improvement

Ready to Build Trustworthy & Performant AI?

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs

Select Time Zone

Big Competitive Advantage With Ai

Learn More

Our Demos

Research Center

Contact Us

1 888 985 3025

Solutions@OwnYourAi.com

Get Your Ai