Dec 17, 2025
From faster images to formal proofs: a week of practical AI
š§© The Gist
OpenAI upgraded image generation with a faster, more precise model that is rolling out broadly and is available via API. The company also introduced new scientific evaluation efforts, including a benchmark for reasoning across physics, chemistry, and biology, plus a realāworld wet lab framework that used GPTā5 to optimize a cloning protocol. Nvidia published a research page for its Nemotron 3 family, signaling more model options tied to its ecosystem. On the developer side, a Rustābased Python type checker arrived in beta, and a case study showed GPTā5.2 porting a parsing library in hours. A widely read essay argues AI could push formal verification into mainstream software practice, highlighting how costly proofs have been historically.
š Key Highlights
- OpenAI launched the new ChatGPT Images experience powered by an upgraded model, with more precise edits, more consistent details, and image generation up to 4Ć faster. It is also available in the API as GPTāImageā1.5.
- OpenAI introduced FrontierScience, a benchmark that evaluates AI reasoning for scientific research tasks across physics, chemistry, and biology.
- OpenAI published a realāworld wet lab evaluation framework and used GPTā5 to optimize a molecular cloning protocol, emphasizing both potential and risk in AIāassisted experimentation.
- Nvidiaās research site published the Nemotron 3 family of models page, indicating new entries in its model lineup.
- Simon Willison reported porting JustHTML from Python to JavaScript using Codex CLI and GPTā5.2 in about 4.5 hours, producing roughly 9,000 lines over 43 commits and passing 9,200 html5lib tests.
- Astral announced the beta of ty, an extremely fast Python type checker and language server written in Rust, positioned as an alternative to mypy, Pyright, and Pylance.
- Martin Kleppmann argued AI will bring formal verification into the mainstream, citing prior costs such as the seL4 microkernel proof that needed about 20 personāyears and 200,000 lines of Isabelle code for 8,700 lines of C.
šÆ Strategic Takeaways
- Models and multimodality
- Faster and more precise image generation reduces latency for creative and product workflows, and API access widens integration paths.
- Science and safety
- Standardized benchmarks for scientific reasoning and realāworld wet lab evaluations create clearer yardsticks for progress, usefulness, and risk management.
- Developer productivity and reliability
- LLMāassisted coding can accelerate substantial codebase work when paired with robust test suites, while new tools like ty target dayātoāday speed.
- Interest in formal verification is rising, and AI assistance could lower the expertise and effort barrier for proofs and specifications.
- Platform dynamics
- Nvidiaās Nemotron 3 presence on its research site suggests continued expansion of model options aligned with its hardware and developer ecosystem.
š§ Worth Reading
- FrontierScience, OpenAIās scientific reasoning benchmark: It focuses on measuring whether AI systems can tackle physics, chemistry, and biology tasks that resemble real scientific research. The practical takeaway is that consistent, domaināgrounded evaluations make it easier to compare systems, track real progress, and decide where AI can responsibly assist scientists.