AI Keeps Making Mistakes: Should We Investigate Them Like Plane Crashes?

AI incident investigation frameworks are emerging as the next critical frontier in AI governance, driven by a simple observation: aviation became the safest form of mass transportation not by building perfect aircraft, but by building perfect systems for learning from imperfect ones. A new policy framework from the Aspen Policy Academy, published in March 2026, is now asking whether artificial intelligence needs that same discipline, and whether states should be the ones to build it first.

The answer, based on where AI incidents are already occurring, is overdue.

What Is Actually Going Wrong With AI Right Now

AI system failures are not hypothetical edge cases being stress-tested in research labs. They are documented, real-world events already producing financial, physical, and societal consequences across 4 active deployment categories.

In February 2026, Air Canada’s AI-powered customer service chatbot provided a passenger with incorrect bereavement fare information, which a Canadian tribunal ruled the airline was legally bound to honor, establishing that AI output carries organizational liability regardless of internal disclaimers. In 2024, a U.S. healthcare AI system used by insurance provider Cigna was found to be automatically denying claims in under 1.2 seconds per case, with a physician spending an average of 1.2 seconds reviewing each AI-generated denial. A New York lawyer submitted AI-generated legal citations in federal court in 2023, 6 of which referenced entirely fabricated case law. The judge sanctioned the attorney $5,000.

These 3 incidents share 1 structural feature: no formal investigation process existed to extract systemic lessons from any of them.

The Aspen Framework: Aviation Safety Logic Applied to AI

The Aspen Policy Academy’s incident investigation framework targets Utah’s Office of Artificial Intelligence Policy. One of the only statewide AI regulatory sandboxes currently operating in the United States. Utah’s Regulatory Relief program grants compliance exemptions to AI companies whose tools demonstrate potential public benefit, creating a live testing environment where AI failures carry real consequences for real users.

The framework proposes 3 foundational investigation requirements:

1. Defined Incident Triggers:

AI incidents requiring formal investigation must be classified before they occur, not after. The framework establishes “GenAI incidents” as cases where AI systems cause direct harm through their development, deployment, or outputs, covering biased decision-making, unsafe recommendations, and failures with financial, physical, or societal repercussions.

2. Root-Cause Analysis Over Enforcement:

The investigation model follows aviation and healthcare safety practices, emphasizing why failures occur rather than who to punish. Aspen Policy Academy fellow Michelle Sipics, who authored the report, framed the distinction directly: aviation safety improved over the decades because investigation findings fed back into pilot training, air traffic control procedures, aircraft maintenance operations, and cockpit design, not because regulators assigned blame faster.

3. Public Disclosure Commitments:

Companies participating in Utah’s sandbox would sign pledges committing to publicly share investigation findings, modeled on National Transportation Safety Board incident reports. Transparency functions as both an accountability mechanism and an industry-wide learning infrastructure, allowing every organization to benefit from failures it did not directly experience.

Why Aviation’s Safety Record Is Harder to Copy Than It Sounds

Aviation’s current accident rate did not emerge from good intentions and safety principles documents. It emerged from 4 specific institutional structures built over 5 decades of painful, fatal lessons.

Aviation Safety Component	What It Actually Required	AI Equivalent Status
Non-Punitive Reporting (ASRS)	NASA administration, legal immunity, and FAA enforcement restraint	Does not exist at scale
Independent Investigation (NTSB)	Statutory authority, subpoena power, separation from regulator	Does not exist
Cross-Organizational Data Sharing (CAST)	De-identification protocols, legal protections, competitor trust	Does not exist
Safety Management Systems (SMS)	Executive accountability, board visibility, organizational authority	Rare; mostly advisory teams

Philip Mann, a 17-year FAA veteran writing on AI governance, identified the core problem in March 2026: organizations invoking aviation safety almost universally describe the outputs, low accident rates, transparent investigations, and shared data, without building the structures that produced those outputs. Non-punitive reporting requires a channel administered with genuine independence, written protections reporters trust, and an organizational commitment to never use incident data punitively. The first time a report triggers a disciplinary action, the channel dies permanently. Trust, Mann noted, is the load-bearing element, and trust is perishable.

The 1977 Tenerife disaster, where 2 Boeing 747s collided on a runway, killing 583 people, produced Crew Resource Management training, cockpit authority gradient reforms, and standardized communication protocols. Aviation extracted 583 deaths’ worth of systemic learning from that single incident. AI governance currently extracts nothing systematic from incidents involving millions of affected users.

The 3 Places Where the Aviation Analogy Breaks Down

Aviation’s regulatory architecture transfers to AI with 3 significant structural limitations that the Aspen framework and its proponents acknowledge directly.

1. Industry Concentration vs. AI Fragmentation:

Commercial aviation concentrates around a small number of large manufacturers, airlines, and regulators operating under a single international framework, including ICAO, Annex 13, and bilateral safety agreements. AI deployment is radically fragmented: open-source models, startups, cloud infrastructure providers, domain-specific fine-tuned systems, and models running on individual laptops. Applying aviation’s regulatory model to thousands of independent AI actors with minimal capital requirements is not a scaling challenge. It is a category mismatch.

2. Visible Accidents vs. Invisible AI Harm:

Aviation accidents are visible and dramatic. A crash generates an immediate investigation. AI harms are diffuse, slow-moving, and frequently contested, biased hiring decisions compounding over years, privacy intrusions accumulating across millions of individual interactions, and subtle degradation of information quality across entire ecosystems. Defining what constitutes a serious AI incident is genuinely difficult in ways that defining an aviation accident is not.

3. Unified International Governance vs. Fragmented AI Regulation:

International AI governance remains fragmented across the EU AI Act, national strategies, voluntary frameworks, and sectoral rules that frequently conflict. Aviation built its international safety architecture through ICAO over decades of aligned political will. That institutional alignment does not currently exist for AI.

Final Thoughts

The aviation safety analogy is correct in its diagnosis and genuinely difficult in its prescription. AI systems are causing documented harm across healthcare, legal practice, financial services, and public infrastructure, right now, without formal investigation processes, without independent investigative authority, and without cross-organizational learning mechanisms that would prevent the same failures from recurring across different deployments.

The Aspen framework’s Utah sandbox focus is the honest starting point: build the process where the regulatory authority to test it already exists, demonstrate that transparent investigation increases rather than destroys public trust, then expand from evidence rather than mandate.

The contradictory position is equally honest. Aviation’s safety culture required Tenerife. It required 583 deaths and decades of institutional construction to reach the point where the phrase “aviation safety” carries genuine meaning. Sipics acknowledged in her interview that federal-level AI incident investigation is “a ways off.” Mann was more direct: building aviation-equivalent AI safety infrastructure requires legislation, sustained funding, and a willingness from AI companies to accept external technical scrutiny they currently and actively avoid.

The framework is right. The urgency is right. The timeline is the uncomfortable variable nobody in this conversation controls.

AI governance is moving from principles to policy, and the decisions being made right now will shape how AI failures are handled for decades. Subscribe to The IT Horizon newsletter. We track every regulatory development, governance framework, and real-world AI incident so you stay informed before the next one makes headlines.

Related blogs

Join the IT Horizon Community

Stay connected with a community of curious minds following the ideas, breakthroughs, and disruptions shaping our digital future. Join the conversation.

Related blogs

Google Maps Just Got Its Biggest Upgrade in a Decade, and It Changes Everything About How You Find Places

April 14, 2026

Japan Just Bet $16 Billion on a Chip Startup Nobody Had Heard of 3 Years Ago

April 14, 2026

Blue Light and Sleep: Why Your Phone Isn’t the Real Reason You’re Tired at Night

April 14, 2026

Has Neuralink Made a Miscalculation? The Reality Behind the Hype

April 14, 2026

AI Keeps Making Mistakes: Should We Investigate Them Like Plane Crashes?

What Is Actually Going Wrong With AI Right Now

The Aspen Framework: Aviation Safety Logic Applied to AI

Why Aviation’s Safety Record Is Harder to Copy Than It Sounds

The 3 Places Where the Aviation Analogy Breaks Down

Final Thoughts

Google Maps Just Got Its Biggest Upgrade in a Decade, and It Changes Everything About How You Find Places

Japan Just Bet $16 Billion on a Chip Startup Nobody Had Heard of 3 Years Ago

Blue Light and Sleep: Why Your Phone Isn’t the Real Reason You’re Tired at Night

Has Neuralink Made a Miscalculation? The Reality Behind the Hype

Art schools vs AI: adaptation or erosion?

Google Maps Just Got Its Biggest Upgrade in a Decade, and It Changes Everything About How You Find Places

Japan Just Bet $16 Billion on a Chip Startup Nobody Had Heard of 3 Years Ago

Blue Light and Sleep: Why Your Phone Isn’t the Real Reason You’re Tired at Night

Has Neuralink Made a Miscalculation? The Reality Behind the Hype

Art schools vs AI: adaptation or erosion?

Google Maps Just Got Its Biggest Upgrade in a Decade, and It Changes Everything About How You Find Places

Japan Just Bet $16 Billion on a Chip Startup Nobody Had Heard of 3 Years Ago

Blue Light and Sleep: Why Your Phone Isn’t the Real Reason You’re Tired at Night

Has Neuralink Made a Miscalculation? The Reality Behind the Hype

Art schools vs AI: adaptation or erosion?

Latest topics

Quick Links

Social Media Links

Join the IT Horizon Community

AI Keeps Making Mistakes: Should We Investigate Them Like Plane Crashes?

What Is Actually Going Wrong With AI Right Now

The Aspen Framework: Aviation Safety Logic Applied to AI

Why Aviation’s Safety Record Is Harder to Copy Than It Sounds

The 3 Places Where the Aviation Analogy Breaks Down

Final Thoughts

Google Maps Just Got Its Biggest Upgrade in a Decade, and It Changes Everything About How You Find Places

Japan Just Bet $16 Billion on a Chip Startup Nobody Had Heard of 3 Years Ago

Blue Light and Sleep: Why Your Phone Isn’t the Real Reason You’re Tired at Night

Trump Posted an AI Image of Himself as Jesus, Then Deleted It After His Own Base Turned on Him

Has Neuralink Made a Miscalculation? The Reality Behind the Hype

Art schools vs AI: adaptation or erosion?

Latest topics

Quick Links

Social Media Links

Join the IT Horizon Community