Open‑source research agents rise as vertical AI hiring accelerates

🧩 The Gist

Tongyi Lab introduced Tongyi DeepResearch, an open‑source web agent that the team says performs on par with OpenAI’s DeepResearch across several benchmarks. The post lists state‑of‑the‑art scores on academic reasoning and complex browsing tasks, and claims systematic outperformance of existing deep research agents. In parallel, FurtherAI shared details of a $25M Series A led by Andreessen Horowitz, reporting strong traction for AI agents in insurance and active hiring. The pattern is clear, open‑source research agents are maturing while vertical, enterprise agents are scaling.

🚀 Key Highlights

Tongyi DeepResearch is presented as a fully open‑source web agent with resources on GitHub, Hugging Face, ModelScope, and a public showcase.
The team reports parity with OpenAI’s DeepResearch and state‑of‑the‑art results on multiple evaluations.
Listed benchmark scores, 32.9 on Humanity’s Last Exam, 43.4 on BrowseComp, 46.7 on BrowseComp‑ZH, and 75 on xbench‑DeepSearch.
A Hacker News post describes Tongyi DeepResearch as a 30B Mixture of Experts model and frames it as a rival to OpenAI’s DeepResearch.
The announcement drew discussion on Hacker News, indicating community interest in open‑source research agents.
FurtherAI disclosed a $25M Series A led by a16z, building AI agents for the insurance industry and stating they are post‑PMF with strong enterprise adoption.
FurtherAI highlights include more than 10x revenue growth this year, progression from seed to Series A in under a year, and active hiring in San Francisco with a referral program.

🎯 Strategic Takeaways

Open‑source momentum
- Mature research agents are moving from chatbots to autonomous web agents, with public benchmarks and repos that lower adoption barriers.
Benchmarking as a market signal
- Publishing concrete scores on academic reasoning and complex browsing tasks helps buyers compare agents and may pressure proprietary offerings to show comparable results.
Verticalization of agents
- Insurance‑focused agents with reported enterprise adoption suggest near‑term value in domain‑specific workflows, supported by fresh capital and hiring.

🧠 Worth Reading

Humanity’s Last Exam (HLE)
Core idea, an academic reasoning task used to gauge higher‑order problem solving. Practical takeaway, Tongyi DeepResearch’s reported score of 32.9 provides a concrete reference point for evaluating research‑oriented agents against a standardized reasoning benchmark.