Feb 23, 2026

Daily Briefing

Security Benchmarks Rise as Platforms Tighten Access Rules

OpenAI and Paradigm introduced EVMbench to evaluate agents on detecting, patching, and exploiting high severity smart contract vulnerabilities, aiming to professionalize blockchain security testing. In parallel, paying Google AI Ultra users report account restrictions tied to the OpenClaw integration, highlighting operational risk when platforms enforce terms with limited transparency. openai.comdiscuss.ai.goog...

Today's Pulse

  • EVMbench debuts to test agent performance on real smart contract flaws. openai.com
  • The benchmark emphasizes responsible deployment in blockchain security contexts. openai.com
  • Google AI Ultra subscribers report abrupt account restrictions after using OpenClaw. discuss.ai.goog...
  • Affected customers say there were no prior warnings or clear guidance. discuss.ai.goog...
  • Google cited a Terms of Service violation and a zero tolerance stance. discuss.ai.goog...
  • Some users paying $249 per month consider moving data off the platform. discuss.ai.goog...
  • Calls grow for clearer support channels and communications from providers. discuss.ai.goog...

What It Means

  • Security evaluation is consolidating around targeted, scenario-based benchmarks rather than generic tests. openai.com
  • Third‑party connectors can trigger account lockouts, creating unplanned downtime and data migration work. discuss.ai.goog...
  • Premium pricing does not guarantee predictable remediation or responsive support. discuss.ai.goog...
  • Teams should validate both technical rigor and vendor dependency risk in parallel. openai.comdiscuss.ai.goog...

Sector Panels

Tools & Platforms

  • Google restricted some paid AI Ultra accounts associated with OpenClaw OAuth use. discuss.ai.goog...
  • Users report limited support responses and unclear restoration paths after restrictions. discuss.ai.goog...
  • The incident is prompting some paying customers to explore alternative platforms. discuss.ai.goog...

Models & Research

  • EVMbench measures agents on detection, patching, and exploitation of high severity bugs. openai.com
  • Built with Paradigm, it targets responsible evaluation of security capabilities in on-chain systems. openai.com
  • Standardized benchmarking can raise comparability and rigor across security agent research. openai.com

Infra & Policy

  • Google characterized OpenClaw usage as a Terms of Service breach and enforced zero tolerance. discuss.ai.goog...
  • Lack of advance notice damaged trust among subscribed users. discuss.ai.goog...
  • EVMbench underscores the need for robust security infrastructure and measurable safeguards. openai.com
  • Clearer governance of integrations is now a competitive differentiator. discuss.ai.goog...

Deep Dive

🔍 EVMbench lands as a focused yardstick for smart contract security work. The collaboration between OpenAI and Paradigm frames evaluation around three concrete capabilities: finding, fixing, and even exploiting high severity vulnerabilities, a scope that mirrors real attacker and defender workflows. By concentrating on the EVM ecosystem, the benchmark aligns with where much of today’s on-chain risk lives. This specificity gives teams a common baseline to compare approaches and prioritize mitigations. openai.com

🧪 Why this matters for builders: detection without patching leaves gaps, and patching without adversarial validation can be brittle. EVMbench’s triad pushes evaluations to reflect end‑to‑end security outcomes rather than isolated skills. That can inform roadmap decisions for security tooling, audits, and continuous verification in dev pipelines. It also supports more responsible deployment practices in environments where exploits carry material financial impact. openai.com

🛡️ The launch also contextualizes today’s platform risk story. As some developers face account restrictions tied to third‑party integrations, consistent rules and transparent enforcement become as critical as technical safeguards. Benchmarks like EVMbench set expectations for capability; clear policies set expectations for access. Teams need both to reduce operational surprises and strengthen trust in production workflows. openai.comdiscuss.ai.goog...

OpenAI announces Frontier Alliance Partners (openai.com) OpenAI announces Frontier Alliance Partners to help enterprises move from AI pilots to production with secure, scalable agent deployments. openai
Show HN: AI Timeline – 171 LLMs from Transformer (2017) to GPT-5.3 (2026) (llm-timeline.com) The AI Timeline provides a comprehensive overview of over 169 Large Language Models (LLMs) developed from 2017 to 2026. It traces the evolution of these models, starting with the original Transformer… hn
Why we no longer evaluate SWE-bench Verified (openai.com) SWE-bench Verified is increasingly contaminated and mismeasures frontier coding progress. Our analysis shows flawed tests and training leakage. We recommend SWE-bench Pro. openai
Google restricting Google AI Pro/Ultra subscribers for using OpenClaw (discuss.ai.google.dev) Google AI Ultra subscribers are facing account restrictions after using the OpenClaw integration, leading to widespread frustration among users. Many report being locked out of their accounts without… hn
Aqua: A CLI message tool for AI agents (github.com) Aqua is a command-line interface (CLI) messaging tool designed for AI agents, developed by quailyquaily. It facilitates peer-to-peer communication with features such as identity verification, end-to-e… hn
Pinterest is drowning in a sea of AI slop and auto-moderation (404media.co) Pinterest users are expressing frustration over the platform's increasing reliance on artificial intelligence, which they believe is degrading the quality of content and user experience. Many artists… hn