Coding agents get faster, policy softens, safety work scales

🧩 The Gist

OpenAI introduced GPT-5.1-Codex-Max, a faster agentic coding model aimed at long-running, project-scale work. Alongside the release, OpenAI detailed safety mitigations, external testing with independent experts, and how evals help businesses measure real performance. In policy, the European Commission proposed changes that weaken parts of GDPR and delay parts of the AI Act to boost competitiveness. On the product front, Meta highlighted Segment Anything Model 3 and Mosaic launched an agentic video editor, while Scania shared an enterprise case study with ChatGPT Enterprise.

🚀 Key Highlights

OpenAI unveiled GPT-5.1-Codex-Max, designed for project-scale coding with enhanced reasoning and better token efficiency.
A system card for GPT-5.1-Codex-Max details mitigations, including specialized safety training, defenses against prompt injection, agent sandboxing, and configurable network access.
OpenAI outlined work with independent experts for external testing of frontier systems, intended to strengthen safety and validate safeguards.
OpenAI explained how evals help businesses define and measure AI performance, reducing risk and improving productivity.
The European Commission proposed changes that weaken EU privacy rules and delay parts of the AI Act to improve global competitiveness.
Meta presented Segment Anything Model 3, reflecting continued investment in segmentation models that drew strong interest on Hacker News.
Mosaic launched an agentic video editing platform, a node-based canvas with a multimodal copilot and a timeline editor for hands-on control.
Scania shared how it is scaling AI with ChatGPT Enterprise, using team-based onboarding and guardrails to lift productivity and quality.

🎯 Strategic Takeaways

Products and developer tools
- Agentic coding moves toward sustained, multi-file work, which can make AI more useful in real software projects.
- Vision and video tools continue to mature, from segmentation models to editable agentic workflows.
Safety and evaluation
- Publishing system cards, using sandboxed agents, and engaging external testers signals a more formal safety stack.
- Evals give companies a way to quantify outcomes, align models to business goals, and manage risk.
Policy and compliance
- Proposed EU changes could lower near-term compliance friction for AI builders, while delaying some obligations under the AI Act.
Enterprise adoption
- Case studies like Scania’s suggest structured onboarding plus guardrails can speed safe rollout across large workforces.
Infrastructure and ecosystem
- Debate continues around hardware portability and performance expectations, including skepticism about CUDA translation solving AMD’s challenges.

🧠 Worth Reading

Can AI Models be Jailbroken to Phish Elderly Victims? An End-to-End Evaluation
- The piece evaluates whether AI models can be jailbroken for phishing and reports collaboration with Reuters to measure impacts on elderly users. The practical takeaway is clear, robust guardrails and targeted testing are needed for misuse scenarios that harm vulnerable populations.