The content on this page was provided by an independent third party and syndicated by XPR Media. Members of the editorial and news staff of the USA TODAY Network were not involved in the creation of this content.

Quesma Releases OTelBench: Independent Benchmark Reveals Frontier LLMs Struggle with Real-World SRE Tasks

New benchmark shows top LLMs achieve only 29% pass rate on OpenTelemetry instrumentation, exposing the gap between coding ability and real-world SRE work.

OTelBench shows that while LLMs are impressive at generating code snippets, they’re not yet capable of the cross-cutting reasoning required for production engineering.”

— Jacek Migdał, founder of Quesma

WARSAW, POLAND, January 20, 2026 /EINPresswire.com/ — Quesma, Inc. announced the release of OTelBench, the first comprehensive benchmark for evaluating LLMs on OpenTelemetry instrumentation tasks. The open-source dataset tests 14 state-of-the-art models across 23 real-world tasks in 11 programming languages, revealing significant gaps in AI’s ability to handle production-grade Site Reliability Engineering (SRE) work.

While frontier LLMs have demonstrated impressive coding capabilities, the benchmark reveals a stark reality: the best-performing model, Claude Opus 4.5, achieved only a 29% pass rate on OpenTelemetry instrumentation tasks, compared to 80.9% pass rate in the SWE-Bench. This gap highlights a critical distinction between writing code and performing the complex, cross-cutting engineering work required for production systems.

The $1.4 Million Per Hour Problem
Enterprise outages cost an average of $1.4 million per hour, making production visibility mission-critical. Distributed tracing, the gold standard for debugging complex microservices, allows teams to link user actions to every underlying service call. However, implementing this visibility remains difficult, with 39% of organizations citing complexity as their top observability obstacle. OpenTelemetry has emerged as the industry standard with backing from 1,100+ organizations, yet configuring it correctly remains a major source of toil for SRE teams.

Fundamental Limitations Exposed
The benchmark tested models on agentic coding tasks where they were given source code from realistic applications, an interactive Linux terminal, and clear instrumentation objectives. The results revealed several critical failure modes:

Context propagation, passing trace context between services to maintain parent-child span relationships, proved to be an insurmountable barrier for most models. This is particularly concerning because context propagation is fundamental to distributed tracing.

“The backbone of the software industry consists of complex, high-scale production systems with mission-critical reliability, and seasoned engineers are architecting, evolving, and troubleshooting them,” said Jacek Migdał, founder of Quesma. “OTelBench shows that while LLMs are impressive at generating code snippets, they’re not yet capable of the cross-cutting reasoning and sustained problem-solving required for production engineering. This gap matters because many vendors are marketing AI SRE solutions with bold claims but no independent verification. We need benchmarks like this to separate reality from hype.”

Language Ecosystems Matter
Success rates varied dramatically across programming languages, revealing that AI generalization is far weaker than human engineers. Models had some moderate success with Go and, quite surprisingly, C++. A few tasks were completed for JavaScript, PHP, .NET, and Python. Just a single model solved a single task in Rust. None of the models solved a single task in Swift, Ruby, or (to our biggest surprise, due to a build issue) – Java.

Why This Matters for AI Development
OTelBench reveals several reasons why OpenTelemetry instrumentation challenges current LLMs:
– Reliability-critical applications reside in private repositories at companies like Apple, Airbnb, and Netflix, limiting training data.
– Instrumentation requires cross-cutting changes across codebases, rather than sequential additions.
– Some tasks required 50+ commands over 10+ minutes. Models consistently performed worse as tasks lengthened.

Migdał added, “AI SRE in 2026 is what DevOps Anomaly Detection was in 2016—lots of marketing, huge budgets, but lacking independent benchmarks. Just as SWE-Bench became the standard for coding evaluation, we need SRE-style benchmarks to determine what actually works. That’s why we’re releasing OTelBench as open-source: to create a North Star for navigating the AI hype and to enable the community to track real progress.”

A Path Forward
Despite the challenges, the benchmark reveals promising signals. Claude Opus 4.5, GPT-5.2, and Gemini 3 models show capability on specific tasks, with go-otel-microservices-traces reaching a 52% pass rate. With more environments for Reinforcement Learning with Verified Rewards, OpenTelemetry instrumentation appears to be a solvable problem for future AI systems.

Until then, organizations requiring distributed tracing across services should expect to write that code themselves—or work alongside AI assistants that understand their limitations.

OTelBench is available today as an open-source project at https://quesma.com/benchmarks/otel/, enabling researchers and practitioners to reproduce results and contribute additional test cases.

Lucie Šimečková
Quesma
press@quesma.com

Legal Disclaimer:

EIN Presswire provides this news content “as is” without warranty of any kind. We do not accept any responsibility or liability
for the accuracy, content, images, videos, licenses, completeness, legality, or reliability of the information contained in this
article. If you have any complaints or copyright issues related to this article, kindly contact the author above.

Information contained on this page is provided by an independent third-party content provider. XPRMedia and this Site make no warranties or representations in connection therewith. If you are affiliated with this page and would like it removed please contact pressreleases@xpr.media

Rooney Law Appoints David Wall as Managing Director, UK

Rooney Law Appoints David Wall as Managing Director, UK

UK companies looking to grow in the US market represent a significant opportunity, and David brings the experience and

January 27, 2026

DerScanner Launches First Software Composition Analysis (SCA) for Delphi

DerScanner Launches First Software Composition Analysis (SCA) for Delphi

Bringing automated SBOM generation and third-party dependency analysis to Embarcadero RAD Studio projects. DerScanner

January 27, 2026

Think Global Awards Announces 2026 Shortlist

Think Global Awards Announces 2026 Shortlist

DUBLIN, DUBLIN, IRELAND, January 27, 2026 /EINPresswire.com/ — Think Global Awards has announced the shortlist for the

January 27, 2026

FREESTYLE DIGITAL MEDIA RELEASES HORROR FEATURE “HOUSE OF KA”

FREESTYLE DIGITAL MEDIA RELEASES HORROR FEATURE “HOUSE OF KA”

Historical Horror-Thriller Now Available on North American VOD Platforms starting January 27, 2026 I’m thrilled to

January 27, 2026

OutIn Reintroduces Fino: Redefining Espresso Precision for a New Generation of Home and On-the-Go Baristas

OutIn Reintroduces Fino: Redefining Espresso Precision for a New Generation of Home and On-the-Go Baristas

OutIn reintroduces Fino, a portable electric grinder bringing espresso-level precision anywhere, with faster grinding

January 27, 2026

Tianjin Fuqin Science & Technology: Global Leading Heating Patch Manufacturer

Tianjin Fuqin Science & Technology: Global Leading Heating Patch Manufacturer

TIANJIN, TIANJIN, CHINA, January 27, 2026 /EINPresswire.com/ — Navigating the opening months of 2026, the global

January 27, 2026

Graphy Strengthens Global Orthodontic Presence at AEEDC 2026, Expanding Middle East Partnerships

Graphy Strengthens Global Orthodontic Presence at AEEDC 2026, Expanding Middle East Partnerships

Graphy reinforces its global orthodontic presence at AEEDC, showcasing SMA technology and strengthening strategic

January 27, 2026

PASCAL Expands Watch Portfolio with Navigator and Paradoxe Series

PASCAL Expands Watch Portfolio with Navigator and Paradoxe Series

January 27, 2026 – PRESSADVANTAGE – PASCAL introduces two new watch series that expand the brand's design range and

January 27, 2026

A new Book Provides Families With the Road map to Clear a Path Into Their Future

A new Book Provides Families With the Road map to Clear a Path Into Their Future

Written by Claire Nicolé, ‘Letting Go: The Ultimate Guide to a Guilt Free House Clean Out’, will be available in print,

January 27, 2026

New x402 Protocol Solution Addresses Media Monetization and Deepfake Verification in the AI Era

New x402 Protocol Solution Addresses Media Monetization and Deepfake Verification in the AI Era

TAIPEI, TAIWAN, January 27, 2026 /EINPresswire.com/ — As generative AI accelerates in 2026, the media industry faces a

January 27, 2026

Why MACY-PAN is the Global Leading Hyperbaric Oxygen Chamber Supplier with CE Approval

Why MACY-PAN is the Global Leading Hyperbaric Oxygen Chamber Supplier with CE Approval

SHANGHAI, SHANGHAI, CHINA, January 27, 2026 /EINPresswire.com/ — As a Global Leading Hyperbaric Oxygen Chamber

January 27, 2026

AsedaSciences and Clyde Biosciences Announce Commercial Agreement to Integrate Human-Relevant Cardiotoxicity Screening

AsedaSciences and Clyde Biosciences Announce Commercial Agreement to Integrate Human-Relevant Cardiotoxicity Screening

Agreement integrates Clyde Biosciences' human-relevant, CellOPTIQ cardiotoxicity screen for chemical safety assessment

January 27, 2026

AsedaSciences and Xenometrix Announce Strategic Sales & Marketing Agreement to Integrate Ames Test into 3RnD Platform

AsedaSciences and Xenometrix Announce Strategic Sales & Marketing Agreement to Integrate Ames Test into 3RnD Platform

Integration of the Xenometrix Ames mutagenicity test modernizes this cornerstone, regulatory-aligned NAM within the

January 27, 2026

When Engineering Becomes Personal Risk: Why Asset Owners Are Rethinking Design Validation

When Engineering Becomes Personal Risk: Why Asset Owners Are Rethinking Design Validation

A global shift in engineering accountability is forcing asset owners to rethink how design validation protects people,

January 27, 2026

SecureSkeye Unveils Advanced IT Disaster Recovery and Business Continuity Platform Across the Southeast United States

SecureSkeye Unveils Advanced IT Disaster Recovery and Business Continuity Platform Across the Southeast United States

SecureSkeye is committed to delivering unbreakable resilience and uninterrupted IT operations for businesses across the

January 27, 2026

The Future of Smart Factories: Insights from CCIG, a China Top Intelligent Automation Platform Factory

The Future of Smart Factories: Insights from CCIG, a China Top Intelligent Automation Platform Factory

SUZHOU, JIANGSU, CHINA, January 27, 2026 /EINPresswire.com/ — The global manufacturing landscape is currently

January 27, 2026

A Fresh Start for 2026: How Creative Biolabs’ Tools Are Returning to the Bench

A Fresh Start for 2026: How Creative Biolabs’ Tools Are Returning to the Bench

Creative Biolabs’ biological products share 10%-20% off until January 31, 2026, committed to advancing scientific

January 27, 2026

Offshore Asset Protection Trusts as a Strategic Safeguard in Divorce

Offshore Asset Protection Trusts as a Strategic Safeguard in Divorce

Divorce is a major financial risk. Early offshore asset protection can preserve wealth before courts, costs, and forced

January 27, 2026

Legal Tax Defense Releases 2026 Tax Season Guide on What Happens When Taxpayers Owe the IRS

Legal Tax Defense Releases 2026 Tax Season Guide on What Happens When Taxpayers Owe the IRS

TUSTIN, CA, UNITED STATES, January 27, 2026 /EINPresswire.com/ — Legal Tax Defense, a company that specializes in

January 27, 2026

Evolution of Grinding: Why This Automated CNC Cylindrical Grinding Manufacturer Leads the High-Tech Sector

Evolution of Grinding: Why This Automated CNC Cylindrical Grinding Manufacturer Leads the High-Tech Sector

SUZHOU, JIANGSU, CHINA, January 27, 2026 /EINPresswire.com/ — The trajectory of industrial manufacturing has always

January 27, 2026

Sunstone Digital Tech Expands Business Growth Solutions With Custom Software Development Services

Sunstone Digital Tech Expands Business Growth Solutions With Custom Software Development Services

Sunstone Digital Tech strengthens its digital solutions portfolio by delivering scalable, performance-driven software

January 27, 2026

CCIG’s Global Leading Automated Flexible Sheet Metal Line Production Driving High-Efficiency Manufacturing

CCIG’s Global Leading Automated Flexible Sheet Metal Line Production Driving High-Efficiency Manufacturing

SUZHOU, JIANGSU, CHINA, January 27, 2026 /EINPresswire.com/ — The contemporary manufacturing landscape is undergoing a

January 27, 2026

Roc Property Managers Expands Professional Property Management Services in Penfield, NY

Roc Property Managers Expands Professional Property Management Services in Penfield, NY

Roc Property Managers continues to support Penfield, NY owners by delivering responsive communication, and full-service

January 27, 2026

Pioneering Smart Factories: CCIG Leads with Global Leading Automated Flexible Sheet Metal Line Production

Pioneering Smart Factories: CCIG Leads with Global Leading Automated Flexible Sheet Metal Line Production

SUZHOU, JIANGSU, CHINA, January 27, 2026 /EINPresswire.com/ — The global manufacturing landscape is currently

January 27, 2026

‘Team Anticupido’ de La Original Banda El Limón Debuta #1 en el Chart Hot Songs de Monitor Latino USA y México

‘Team Anticupido’ de La Original Banda El Limón Debuta #1 en el Chart Hot Songs de Monitor Latino USA y México

“Team Anticupido" es un nuevo tema que marca una evolución histórica para la banda de Salvador Lizárraga Este #1 es el

January 27, 2026

Custom Legal Marketing Celebrates 21 Years as AI Redefines How Clients Find Lawyers

Custom Legal Marketing Celebrates 21 Years as AI Redefines How Clients Find Lawyers

San Francisco, California – Law firm marketing company, Custom Legal Marketing, is celebrating its 21st year in

January 27, 2026

Leading the Market: Top Cone Crusher Wear Parts Manufacturer Revolutionizes Durability and Performance

Leading the Market: Top Cone Crusher Wear Parts Manufacturer Revolutionizes Durability and Performance

SHANGRAO CITY, JIANGXI PROVINCE, CHINA, January 27, 2026 /EINPresswire.com/ — The mining and construction industries

January 27, 2026

Human activities drive global dryland greening

Human activities drive global dryland greening

GA, UNITED STATES, January 27, 2026 /EINPresswire.com/ — A new global dryland assessment using long-term satellite

January 27, 2026

Entourage Barbershop Reintroduces the Barbershop as a Cultural Space in West Hollywood

Entourage Barbershop Reintroduces the Barbershop as a Cultural Space in West Hollywood

We built Entourage Barbershop for people. We care about culture, confidence, and craft.”— Elie Eldib LOS ANGELES, CA,

January 27, 2026

Synergy Weight Loss and Primary Care Launches New Website Ahead of Lewisville, Texas Clinic Opening

Synergy Weight Loss and Primary Care Launches New Website Ahead of Lewisville, Texas Clinic Opening

LEWISVILLE, TX – January 26, 2026 – PRESSADVANTAGE – Synergy Weight Loss and Primary Care has announced the launch of

January 27, 2026

Arrowhead Clinic Chiropractor Decatur Announces Expanded Partnership Network with Personal Injury Attorneys

Arrowhead Clinic Chiropractor Decatur Announces Expanded Partnership Network with Personal Injury Attorneys

DECATUR, GA – January 26, 2026 – PRESSADVANTAGE – Arrowhead Clinic Chiropractor Decatur has announced an expanded

January 27, 2026

DataMasters Expands Automotive Database Mailing List Solutions to Support Businesses Reaching Car Owners Nationwide

DataMasters Expands Automotive Database Mailing List Solutions to Support Businesses Reaching Car Owners Nationwide

FLOWER MOUND, TX – January 26, 2026 – PRESSADVANTAGE – DataMasters announced an expansion of its automotive database

January 27, 2026

CNC Onsite announces leadership transition: Søren Kellenberger to become CEO

CNC Onsite announces leadership transition: Søren Kellenberger to become CEO

CNC Onsite begins a planned leadership transition, targeting growth across Europe and North America while expanding

January 27, 2026

Roberta Casper Watson Recognized in Tampa Magazine’s 2026 Top Lawyers List

Roberta Casper Watson Recognized in Tampa Magazine’s 2026 Top Lawyers List

Roberta Casper Watson Recognized in Tampa Magazine’s 2026 Top Lawyers List Roberta is an extraordinary attorney and we

January 27, 2026

Virginia’s Newest Board of Visitors Charts the Future of Richard Bland College

Virginia’s Newest Board of Visitors Charts the Future of Richard Bland College

SOUTH PRINCE GEORGE, VA, UNITED STATES, January 26, 2026 /EINPresswire.com/ — Over the last six months, the Richard

January 27, 2026

Dental Implants Bradford Shipley Idle Appointments Announced for Private Patients at Taylored Dental Care

Dental Implants Bradford Shipley Idle Appointments Announced for Private Patients at Taylored Dental Care

Bradford, England – January 26, 2026 – PRESSADVANTAGE – Taylored Dental Care Idle has announced the availability of

January 27, 2026

Consumer Warranty Choice Expands Platform Making Extended Auto Warranties More Accessible to All

Consumer Warranty Choice Expands Platform Making Extended Auto Warranties More Accessible to All

January 26, 2026 – PRESSADVANTAGE – Consumer Warranty Choice announced today the expansion of its comparison platform

January 27, 2026

DataField Technology Services Expands Operations to Include Warehouse Staffing Services in Columbus, Ohio

DataField Technology Services Expands Operations to Include Warehouse Staffing Services in Columbus, Ohio

January 26, 2026 – PRESSADVANTAGE – DataField Technology Services announced the expansion of its operational

January 27, 2026

Cardinal Compliance Consultants Addresses Safety and Compliance Needs for Concrete Contractors

Cardinal Compliance Consultants Addresses Safety and Compliance Needs for Concrete Contractors

January 27, 2026 – PRESSADVANTAGE – Cardinal Compliance Consultants, LLC participated as an exhibitor at World of

January 27, 2026

Scientific Restoration Specialists Inc. Releases Comprehensive Guide on Water Damage Repair Costs for Lancaster Homeowners

Scientific Restoration Specialists Inc. Releases Comprehensive Guide on Water Damage Repair Costs for Lancaster Homeowners

January 27, 2026 – PRESSADVANTAGE – Scientific Restoration Specialists Inc., a Lancaster-based water damage restoration

January 27, 2026