AI reality check, faster kernels, and a 100T-token snapshot

🧩 The Gist

This roundup pairs market signals with infrastructure and research updates. Ars Technica reports Microsoft cut its AI sales targets after missed quotas, a reminder that enterprise adoption can lag big promises. On the research side, a GitHub project claims reinforcement learning found matrix multiply kernels that beat cuBLAS, while discussion questions how novel the techniques are. OpenRouter published an empirical “100T token” State of AI study, and Hacker News comments flag how API based data can miss self hosted small models. Rounding it out, one group ran five LLMs as stock traders for eight months, and an HN summary suggests portfolio bias may explain who ranked first or last.

🚀 Key Highlights

OpenRouter released “State of AI: An Empirical 100T Token Study,” presenting large scale usage analysis.
An HN commenter questioned a reported decline in small model share, noting OpenRouter is an API service and self hosted small models may be undercounted.
Ars Technica reports Microsoft halved its AI sales growth targets after salespeople missed quotas, citing resistance to unproven AI agents among customers.
CUDA‑l2 on GitHub claims reinforcement learning produced matrix multiplication kernels that surpass cuBLAS performance.
An HN commenter said the CUDA‑l2 methods appear not especially novel and called for clearer citations.
An eight month experiment gave five LLMs 100K to trade, with one HN summary saying Grok led, DeepSeek was close behind, and Gemini trailed, potentially due to heavier non tech exposure.
PyTogether, a lightweight real time collaborative Python IDE, targets classrooms and workshops.

🎯 Strategic Takeaways

Adoption and ROI: Enterprise buyers remain cautious about agentic AI, so vendors may need clearer value stories and proof of outcomes before expecting rapid revenue growth.
Metrics and methodology: Usage studies drawn from an API can miss self hosted deployments, especially for smaller models, so trendlines should be read with sampling limits in mind.
Systems performance: RL tuned low level kernels, if validated, point to a promising path for squeezing more throughput from existing GPUs, but novelty and attribution matter.
Applied experiments: LLM trading results can blur model quality with sector tilts, so evaluations should separate portfolio construction effects from reasoning or planning ability.
Teaching and enablement: Simple collaborative coding tools like PyTogether focus on real time instruction needs, a different priority than heavy AI assisted editors.

🧠 Worth Reading

CUDA‑l2, “Surpassing cuBLAS performance for matrix multiplication through RL.” Core idea: use reinforcement learning to search kernel implementations for GEMM that run faster than NVIDIA’s cuBLAS. Practical takeaway: automated kernel search can unlock performance wins on commodity GPUs, but claims should be vetted for generality, hardware coverage, and proper citations.