From faster images to formal proofs: a week of practical AI

🧩 The Gist

OpenAI upgraded image generation with a faster, more precise model that is rolling out broadly and is available via API. The company also introduced new scientific evaluation efforts, including a benchmark for reasoning across physics, chemistry, and biology, plus a real‑world wet lab framework that used GPT‑5 to optimize a cloning protocol. Nvidia published a research page for its Nemotron 3 family, signaling more model options tied to its ecosystem. On the developer side, a Rust‑based Python type checker arrived in beta, and a case study showed GPT‑5.2 porting a parsing library in hours. A widely read essay argues AI could push formal verification into mainstream software practice, highlighting how costly proofs have been historically.

🚀 Key Highlights

OpenAI launched the new ChatGPT Images experience powered by an upgraded model, with more precise edits, more consistent details, and image generation up to 4× faster. It is also available in the API as GPT‑Image‑1.5.
OpenAI introduced FrontierScience, a benchmark that evaluates AI reasoning for scientific research tasks across physics, chemistry, and biology.
OpenAI published a real‑world wet lab evaluation framework and used GPT‑5 to optimize a molecular cloning protocol, emphasizing both potential and risk in AI‑assisted experimentation.
Nvidia’s research site published the Nemotron 3 family of models page, indicating new entries in its model lineup.
Simon Willison reported porting JustHTML from Python to JavaScript using Codex CLI and GPT‑5.2 in about 4.5 hours, producing roughly 9,000 lines over 43 commits and passing 9,200 html5lib tests.
Astral announced the beta of ty, an extremely fast Python type checker and language server written in Rust, positioned as an alternative to mypy, Pyright, and Pylance.
Martin Kleppmann argued AI will bring formal verification into the mainstream, citing prior costs such as the seL4 microkernel proof that needed about 20 person‑years and 200,000 lines of Isabelle code for 8,700 lines of C.

🎯 Strategic Takeaways

Models and multimodality
- Faster and more precise image generation reduces latency for creative and product workflows, and API access widens integration paths.
Science and safety
- Standardized benchmarks for scientific reasoning and real‑world wet lab evaluations create clearer yardsticks for progress, usefulness, and risk management.
Developer productivity and reliability
- LLM‑assisted coding can accelerate substantial codebase work when paired with robust test suites, while new tools like ty target day‑to‑day speed.
- Interest in formal verification is rising, and AI assistance could lower the expertise and effort barrier for proofs and specifications.
Platform dynamics
- Nvidia’s Nemotron 3 presence on its research site suggests continued expansion of model options aligned with its hardware and developer ecosystem.

🧠 Worth Reading

FrontierScience, OpenAI’s scientific reasoning benchmark: It focuses on measuring whether AI systems can tackle physics, chemistry, and biology tasks that resemble real scientific research. The practical takeaway is that consistent, domain‑grounded evaluations make it easier to compare systems, track real progress, and decide where AI can responsibly assist scientists.