Skip to main content
Enterprise AI Analysis: In-N-Out: A Parameter-Level API Graph Dataset for Tool Agents

Research Analysis

In-N-Out: A Parameter-Level API Graph Dataset for Tool Agents

Authors: Seungkyu Lee, Nalim Kim, Yohan Jo

Abstract: Tool agents—LLM-based systems that interact with external APIs—offer a way to execute real-world tasks. However, as tasks become increasingly complex, these agents struggle to identify and call the correct APIs in the proper order. To tackle this problem, we investigate converting API documentation into a structured API graph that captures API dependencies and leveraging it for multi-tool queries that require compositional API calls. To support this, we introduce In-N-Out, the first expert-annotated dataset of API graphs built from two real-world API benchmarks and their documentation. Using In-N-Out significantly improves performance on both tool retrieval and multi-tool query generation, nearly doubling that of LLMs using documentation alone. Moreover, graphs generated by models fine-tuned on In-N-Out close 90% of this gap, showing that our dataset helps models learn to comprehend API documentation and parameter relationships. Our findings highlight the promise of using explicit API graphs for tool agents and the utility of In-N-Out as a valuable resource. We will release the dataset and code publicly.

Executive Impact & Key Findings

In-N-Out revolutionizes how AI agents understand and utilize APIs, dramatically enhancing their ability to perform complex real-world tasks.

0 Parameter-Level Edges
0 Tool Retrieval Accuracy Boost
0 Automated Graph Accuracy

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

The In-N-Out Dataset: Foundation for Smarter AI Agents

The In-N-Out dataset is the first expert-annotated collection of API graphs, meticulously constructed from two real-world API benchmarks: AppWorld and NESTful. It features both APIs and their parameters as nodes, with directed edges signifying valid input-output dependencies. This foundational dataset spans 550 APIs and over 30,000 parameter-level edges, offering a rich resource for diverse and realistic tool-use scenarios.

A multi-stage pipeline was designed for its construction, involving documentation refinement, rule-based, semantic, and context-aware filtering, followed by rigorous human annotation. This process ensures high-quality, parameter-level connections, capturing the subtle interdependencies crucial for advanced AI agent planning. The dataset’s creation addresses the inherent ambiguity and noise in raw API documentation that often hinders LLM comprehension.

0.7% - 1.7% Sparsity of Actual API Connections Among All Possible Pairs

Empowering LLMs to Understand API Dependencies

LLMs often struggle with the zero-shot inference of parameter-level API connections from raw documentation due to ambiguities and inconsistent terminology. Our experiments show that even high-performing closed-source models like GPT-4.1 achieve only 70.0% accuracy on NESTful and 47.7% on AppWorld in this task.

However, fine-tuning LLMs on the In-N-Out dataset leads to significant improvements. Models like Qwen2.5-32B reach 76.3% on NESTful and 94.7% on AppWorld. Crucially, these fine-tuned models demonstrate strong generalization capabilities, extending effectively to unseen APIs and different datasets, indicating they learn fundamental principles of parameter dependencies rather than dataset-specific patterns. This transferability underscores In-N-Out's value in training robust graph-construction modules for tool agents.

Model Type NESTful (%) AppWorld (%)
Zero-shot (GPT-4.1) 70.0 47.7
Zero-shot (Qwen2.5-32B) 57.3 50.3
Fine-tuned (Qwen2.5-32B) 76.3 94.7
Fine-tuned (Qwen2.5-32B Cross-Dataset) 74.3 66.3

Dramatic Boost in AI Tool Agent Capabilities

The explicit, parameter-level API graphs from In-N-Out significantly enhance tool agent performance in complex multi-tool scenarios. For tool retrieval, integrating graph information into the ranking process dramatically improves accuracy. On NESTful, the average rank of the correct API drops from 3.1 to 1.6, and Top-1 accuracy soars from 43.3% to 84.3% when using gold graphs. Even automatically generated graphs from fine-tuned LLMs provide substantial gains, closing over 90% of the performance gap.

Similarly, for multi-tool query generation, In-N-Out provides strong structural guidance for selecting valid API subsets that match specific dependency patterns. For a 3-API chain, precision on NESTful rises from 58.2% to 90.2% with gold graphs, and automated graphs recover a large portion of this gain. These results demonstrate the critical role of explicit API graphs in overcoming challenges like cross-domain reasoning and complex API chaining for tool agents.

Up to 84.3% Tool Retrieval Top-1 Accuracy with Gold Graphs

Streamlining Complex Enterprise Workflows with API Graphs

Enterprise applications often require AI agents to orchestrate a sequence of API calls to fulfill complex user requests. In-N-Out provides the structured knowledge base needed for agents to accurately identify dependencies and plan these calls. This module demonstrates a typical multi-tool flow, illustrating how a parameter-level API graph facilitates robust task execution in real-world scenarios.

Enterprise Process Flow

Login(username, password)
SearchArtists(genre='EDM', min_follower_count='1000')
FollowArtist(access_token, artist_id)

Advanced ROI Calculator

Estimate the potential savings and reclaimed hours your enterprise could achieve by integrating advanced AI tool agents.

Estimated Annual Savings $0
Annual Hours Reclaimed 0

Your AI Implementation Roadmap

A structured approach to integrating In-N-Out's methodology into your enterprise for maximum impact.

Phase 01: Discovery & API Audit

In-depth analysis of your existing enterprise APIs and workflows to identify key integration points and opportunities for automation with graph-based AI agents.

Phase 02: In-N-Out Data Ingestion & Graph Construction

Leverage In-N-Out's principles to refine API documentation and construct robust parameter-level API graphs tailored to your specific enterprise environment.

Phase 03: LLM Fine-tuning & Agent Development

Train state-of-the-art LLMs on your custom API graphs to build highly accurate tool agents capable of complex multi-tool query generation and execution.

Phase 04: Deployment & Integration

Seamlessly integrate the AI tool agents into your existing systems and applications, ensuring smooth operation and minimal disruption to ongoing workflows.

Phase 05: Monitoring, Optimization & Scaling

Continuous monitoring of agent performance, iterative optimization based on real-world feedback, and strategic scaling to unlock further efficiencies across your organization.

Ready to Revolutionize Your Enterprise with AI Tool Agents?

Unlock the full potential of your API ecosystem and empower your teams with intelligent automation. Schedule a personalized consultation to see how In-N-Out's approach can be tailored to your business needs.

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs


AI Consultation Booking