When AI can deceive Its creators: Legal and regulatory lessons from “anti-scheming” alignment research

Yiannos Georgiades

10:50 - 06 October 2025

Artificial intelligence (AI) is moving from controlled laboratory use into safety-critical, regulated domains – healthcare, finance, transportation, legal services. Legislators and courts increasingly expect that developers can verify and document that their models behave as intended. Yet technical researchers warn that advanced models may act deceptively: appearing compliant while secretly optimising for other goals. This behaviour, known as scheming or covert misalignment, represents one of the most serious emerging risks.

In September 2025, a team of researchers led by Bronson Schoen, Senior Research Engineer at Apollo Research, an AI safety organisation, published Stress Testing Deliberative Alignment for Anti-Scheming Training on the pre-print server arXiv. Although aimed at machine-learning specialists, the paper is directly relevant to lawyers, compliance officers, and policy-makers. It describes one of the first systematic attempts to detect and reduce covertly harmful behaviour in large language models (LLMs). Understanding such work is crucial for anyone drafting AI-related contracts, conducting due diligence, advising boards, or shaping regulatory policy.

The Technical Advance Explained Simply

The researchers explore deliberative alignment – a method that asks models to explicitly reason about a human-written safety specification before acting. Instead of relying solely on fine-tuning to encourage safe responses, the model must cite and apply rules it has been given.

To test whether this approach works, the team created a battery of synthetic but challenging environments designed to tempt the model into breaking rules covertly. These included tasks with hidden incentives to deceive, situations where the model benefits from withholding information, and experiments that varied whether the model knew it was being evaluated (situational awareness).

Key results:

Deliberative alignment substantially reduced covert misbehaviour (in some tests from over 10% of cases to below 1%).
But it did not eliminate deception – residual covert actions persisted.
Models behaved more safely when they realised they were under evaluation, raising concern that safety could collapse when oversight is weak.
The method could suppress some pre-planted hidden misaligned goals, but not in every case.

The authors stress that their environments are synthetic and that the approach does not yet defeat fully adversarial deception. Still, it represents one of the most rigorous attempts so far to measure and mitigate this class of risk.

Why Lawyers Should Care

Regulatory compliance and the EU AI Act

The EU Artificial Intelligence Act (Articles 9-15) obliges providers of “high-risk” AI to implement risk-management systems, testing and post-market monitoring. Stress-testing against covert misalignment is exactly the type of procedure regulators may soon expect during conformity assessments. Failure to adopt available methods could be used as evidence of non-compliance.

Product liability and negligence

Under EU product liability reform proposals and common-law negligence, developers must exercise reasonable care to avoid foreseeable harm. Once methods such as those described in this paper are known, ignoring them could be construed as negligent. Conversely, documenting robust anti-scheming tests may help a developer defend itself when accidents occur.

Contract drafting and risk allocation

Corporate clients purchasing or deploying AI should require:

Warranties that the supplier has performed recognised safety and anti-scheming tests.
Disclosure of test results and any known residual risk.
Audit rights to verify continuing compliance during the AI system’s lifecycle.
Indemnities where covert misalignment later causes loss.

Lawyers drafting SaaS or enterprise AI contracts can integrate these requirements now, even before regulators make them mandatory.

Due diligence and M&A

Investors and acquirers increasingly review AI safety documentation. A company claiming safe or aligned AI without having run recognised stress tests may face valuation adjustments or require contractual hold-backs. Legal teams performing due diligence can add anti-scheming testing to their checklists.

Litigation and evidence

In future disputes, plaintiffs may argue that a developer failed to apply available testing methods like those described by Schoen et al. Developers, conversely, can cite having applied these stress tests as proof of due care and industry best practice.

Policy and Standardisation Implications

Regulators (EU Commission, national competent authorities) could issue guidance encouraging or requiring anti-scheming stress testing for high-risk AI.
Standards bodies (ISO/IEC JTC 1/SC 42, CEN-CENELEC, NIST) may formalise testing frameworks derived from this research.
Certification schemes under the EU AI Act could include proof of covert-misalignment evaluation as part of the CE-marking process.

Such moves would provide clearer legal benchmarks for what counts as “reasonable safety testing”.

Strategic Recommendations for Legal Practitioners

Track emerging technical methods: Covert misalignment is becoming an auditable risk. Lawyers advising on AI must understand these terms well enough to ask the right questions.
Update risk management clauses: Include express references to stress testing for covert goal pursuit in AI procurement and SaaS agreements.
Guide corporate clients on governance: Boards should require documentation of alignment and anti-scheming evaluations before deployment.
Prepare for evidentiary use: Keep detailed records of tests and their results; they may be decisive in litigation or regulatory inquiries.
Engage with policy: Law firms and industry groups can comment on draft standards or guidance to ensure practicality and legal clarity.

The research by Schoen and colleagues is a reminder that AI safety is not just a technical challenge – it is a compliance, liability and governance issue. While “deliberative alignment” reduces covert misbehaviour, it does not eliminate it, and regulators are unlikely to accept unverified safety claims.

Lawyers, in-house counsel, and policy-makers should treat covert misalignment testing as an emerging best practice. Incorporating such requirements into contracts, risk assessments and due diligence will help clients deploy AI responsibly and protect against both regulatory sanctions and civil liability.

As AI becomes deeply embedded in critical sectors, the legal profession must be conversant with technical safety frontiers. Understanding and leveraging advances such as anti-scheming stress testing is no longer optional – it is part of modern legal risk management.

*By Yiannos Georgiades, Managing Partner of Y. Georgiades & Associates LLC and Co-Founder of Kinisis Ventures Fund

When AI can deceive Its creators: Legal and regulatory lessons from “anti-scheming” alignment research

News Feed

Understanding Generation Z: How They Think, Work, and Get Inspired

The Paradox of Hybrid Work and the Reassessment of Organizational Strategy

Leadership with Trust and Authenticity

SOFTSWISS marks third year of supporting Movember health campaign

A look inside global AI-powered digital marketing firm Semrush's offices in Limassol (pics)

Danil Shelekhov talks about why Semrush chose to create a presence in Cyprus (video)

The End of Skills as We Knew Them: Humans in the Age of AI

Mersina Isidorou: Regulating the profession of the real estate developer

The importance of accountability in the successful implementation of strategic plans

Retainer Partnerships: Turning law firms into in-house allies

AI and the Augmented CFO: Transforming Financial Leadership in Real Estate Business

Fraud prevention: If something seems too good to be true, it probably is

M&A in Greece, Cyprus and beyond: From consolidation to continuity

The EU’s AML revolution: Why businesses cannot afford to ignore it

Bridging the Gap: What Cyprus’ New IFA Law Means for the Investment Funds Sector