BMW Group, AIM Intelligence expose AI policy compliance gaps

4 hours ago

By AI, Created 15:00 UTC, Jun 23, 2026, AGP -

BMW Group and AIM Intelligence say a new framework called COMPASS shows enterprise AI systems can pass standard safety tests while still failing to follow organization-specific rules. The research, now accepted to ACL 2026 and available on arXiv, is meant to help companies measure whether large language models can actually enforce internal policies in real deployments.

Why it matters: - Enterprise AI systems are moving into customer-facing work in healthcare, finance, automotive, government and other regulated sectors. - The research says standard safety benchmarks miss a major risk: models can look safe in general while still failing to enforce company rules. - That creates exposure in policy-critical deployments where a wrong refusal or approval can have legal, operational or brand consequences.

What happened: - BMW Group and AIM Intelligence announced COMPASS, short for Company/Organization Policy Alignment Assessment. - The framework is designed to test whether large language models comply with organization-specific policies. - The research was accepted to ACL 2026, the annual conference of the Association for Computational Linguistics. - The paper is also available on arXiv. - The collaboration includes AIM Intelligence, BMW Group, Yonsei University, Pohang University of Science and Technology and Seoul National University. - The full paper is titled "COMPASS: A Framework for Evaluating Organization-Specific Policy Alignment in LLMs" and is available at the arXiv paper.

The details: - COMPASS tests models across four dimensions: policy selection, policy interpretation, conflict resolution and justification. - Policy selection checks whether a model can identify which rule applies to a specific situation. - Policy interpretation tests reasoning through conditionals, exceptions and vague clauses. - Conflict resolution measures whether a model handles clashing rules the way an organization intends. - Justification checks whether a model grounds decisions in the actual policy text. - The research team ran COMPASS across eight industry scenarios: automotive, government, financial, healthcare, travel, telecom, education and recruiting. - The team generated and validated 5,920 queries covering routine compliance and adversarial robustness. - Fifteen state-of-the-art models were evaluated, including proprietary and open-source systems. - The benchmark datasets and framework are publicly available on GitHub and HuggingFace. - The research found strong allowlist compliance, with models handling legitimate requests at over 95% accuracy. - The research found critical denylist failures, with models failing to correctly refuse prohibited requests in up to 97% of cases. - Under adversarial conditions, some models refused fewer than 5% of policy-violating requests.

Between the lines: - The findings suggest enterprise AI evaluation needs to move beyond generic safety tests focused on toxicity or violence. - The research frames alignment as an engineering problem that can be measured against real organizational rules. - The results also suggest scaling models alone will not solve policy enforcement failures.

What's next: - Organizations can use COMPASS and the released datasets to test their own AI systems against internal policies. - The framework may push enterprises to demand policy-specific evaluations before deploying LLMs in sensitive workflows. - AIM Intelligence says it continues to work on automated red-teaming, real-time guardrails and AI monitoring for large language models, multimodal systems, autonomous agents and physical AI.

The bottom line: - The research says many LLMs are still unreliable at following the rules enterprises care about most, even when they look safe on standard benchmarks.

Disclaimer: This article was produced by AGP Wire with the assistance of artificial intelligence based on original source content and has been refined to improve clarity, structure, and readability. This content is provided on an “as is” basis. While care has been taken in its preparation, it may contain inaccuracies or omissions, and readers should consult the original source and independently verify key information where appropriate. This content is for informational purposes only and does not constitute legal, financial, investment, or other professional advice.

Automotive Press Releases

The daily local news briefing you can trust. Every day. Subscribe now.

BMW Group, AIM Intelligence expose AI policy compliance gaps

Automotive Press Releases

Check Your Email!

Welcome back!

Advanced Search Options