Artificial intelligence (AI) is moving from controlled laboratory use into safety-critical, regulated domains – healthcare, finance, transportation, legal services. Legislators and courts increasingly expect that developers can verify and document that their models behave as intended. Yet technical researchers warn that advanced models may act deceptively: appearing compliant while secretly optimising for other goals. This behaviour, known as scheming or covert misalignment, represents one of the most serious emerging risks.
In September 2025, a team of researchers led by Bronson Schoen, Senior Research Engineer at Apollo Research, an AI safety organisation, published Stress Testing Deliberative Alignment for Anti-Scheming Training on the pre-print server arXiv. Although aimed at machine-learning specialists, the paper is directly relevant to lawyers, compliance officers, and policy-makers. It describes one of the first systematic attempts to detect and reduce covertly harmful behaviour in large language models (LLMs). Understanding such work is crucial for anyone drafting AI-related contracts, conducting due diligence, advising boards, or shaping regulatory policy.
The Technical Advance Explained Simply
The researchers explore deliberative alignment – a method that asks models to explicitly reason about a human-written safety specification before acting. Instead of relying solely on fine-tuning to encourage safe responses, the model must cite and apply rules it has been given.
To test whether this approach works, the team created a battery of synthetic but challenging environments designed to tempt the model into breaking rules covertly. These included tasks with hidden incentives to deceive, situations where the model benefits from withholding information, and experiments that varied whether the model knew it was being evaluated (situational awareness).
Key results:
- Deliberative alignment substantially reduced covert misbehaviour (in some tests from over 10% of cases to below 1%).
- But it did not eliminate deception – residual covert actions persisted.
- Models behaved more safely when they realised they were under evaluation, raising concern that safety could collapse when oversight is weak.
- The method could suppress some pre-planted hidden misaligned goals, but not in every case.
The authors stress that their environments are synthetic and that the approach does not yet defeat fully adversarial deception. Still, it represents one of the most rigorous attempts so far to measure and mitigate this class of risk.
Why Lawyers Should Care
Regulatory compliance and the EU AI Act
The EU Artificial Intelligence Act (Articles 9-15) obliges providers of “high-risk” AI to implement risk-management systems, testing and post-market monitoring. Stress-testing against covert misalignment is exactly the type of procedure regulators may soon expect during conformity assessments. Failure to adopt available methods could be used as evidence of non-compliance.
Product liability and negligence
Under EU product liability reform proposals and common-law negligence, developers must exercise reasonable care to avoid foreseeable harm. Once methods such as those described in this paper are known, ignoring them could be construed as negligent. Conversely, documenting robust anti-scheming tests may help a developer defend itself when accidents occur.
Contract drafting and risk allocation
Corporate clients purchasing or deploying AI should require:
- Warranties that the supplier has performed recognised safety and anti-scheming tests.
- Disclosure of test results and any known residual risk.
- Audit rights to verify continuing compliance during the AI system’s lifecycle.
- Indemnities where covert misalignment later causes loss.
Lawyers drafting SaaS or enterprise AI contracts can integrate these requirements now, even before regulators make them mandatory.
Due diligence and M&A
Investors and acquirers increasingly review AI safety documentation. A company claiming safe or aligned AI without having run recognised stress tests may face valuation adjustments or require contractual hold-backs. Legal teams performing due diligence can add anti-scheming testing to their checklists.
Litigation and evidence
In future disputes, plaintiffs may argue that a developer failed to apply available testing methods like those described by Schoen et al. Developers, conversely, can cite having applied these stress tests as proof of due care and industry best practice.
Policy and Standardisation Implications
- Regulators (EU Commission, national competent authorities) could issue guidance encouraging or requiring anti-scheming stress testing for high-risk AI.
- Standards bodies (ISO/IEC JTC 1/SC 42, CEN-CENELEC, NIST) may formalise testing frameworks derived from this research.
- Certification schemes under the EU AI Act could include proof of covert-misalignment evaluation as part of the CE-marking process.
Such moves would provide clearer legal benchmarks for what counts as “reasonable safety testing”.
Strategic Recommendations for Legal Practitioners
- Track emerging technical methods: Covert misalignment is becoming an auditable risk. Lawyers advising on AI must understand these terms well enough to ask the right questions.
- Update risk management clauses: Include express references to stress testing for covert goal pursuit in AI procurement and SaaS agreements.
- Guide corporate clients on governance: Boards should require documentation of alignment and anti-scheming evaluations before deployment.
- Prepare for evidentiary use: Keep detailed records of tests and their results; they may be decisive in litigation or regulatory inquiries.
- Engage with policy: Law firms and industry groups can comment on draft standards or guidance to ensure practicality and legal clarity.
The research by Schoen and colleagues is a reminder that AI safety is not just a technical challenge – it is a compliance, liability and governance issue. While “deliberative alignment” reduces covert misbehaviour, it does not eliminate it, and regulators are unlikely to accept unverified safety claims.
Lawyers, in-house counsel, and policy-makers should treat covert misalignment testing as an emerging best practice. Incorporating such requirements into contracts, risk assessments and due diligence will help clients deploy AI responsibly and protect against both regulatory sanctions and civil liability.
As AI becomes deeply embedded in critical sectors, the legal profession must be conversant with technical safety frontiers. Understanding and leveraging advances such as anti-scheming stress testing is no longer optional – it is part of modern legal risk management.
*By Yiannos Georgiades, Managing Partner of Y. Georgiades & Associates LLC and Co-Founder of Kinisis Ventures Fund