Nov 3, 2025
Openāsource research agents rise as vertical AI hiring accelerates
š§© The Gist
Tongyi Lab introduced Tongyi DeepResearch, an openāsource web agent that the team says performs on par with OpenAIās DeepResearch across several benchmarks. The post lists stateāofātheāart scores on academic reasoning and complex browsing tasks, and claims systematic outperformance of existing deep research agents. In parallel, FurtherAI shared details of a $25M Series A led by Andreessen Horowitz, reporting strong traction for AI agents in insurance and active hiring. The pattern is clear, openāsource research agents are maturing while vertical, enterprise agents are scaling.
š Key Highlights
- Tongyi DeepResearch is presented as a fully openāsource web agent with resources on GitHub, Hugging Face, ModelScope, and a public showcase.
- The team reports parity with OpenAIās DeepResearch and stateāofātheāart results on multiple evaluations.
- Listed benchmark scores, 32.9 on Humanityās Last Exam, 43.4 on BrowseComp, 46.7 on BrowseCompāZH, and 75 on xbenchāDeepSearch.
- A Hacker News post describes Tongyi DeepResearch as a 30B Mixture of Experts model and frames it as a rival to OpenAIās DeepResearch.
- The announcement drew discussion on Hacker News, indicating community interest in openāsource research agents.
- FurtherAI disclosed a $25M Series A led by a16z, building AI agents for the insurance industry and stating they are postāPMF with strong enterprise adoption.
- FurtherAI highlights include more than 10x revenue growth this year, progression from seed to Series A in under a year, and active hiring in San Francisco with a referral program.
šÆ Strategic Takeaways
- Openāsource momentum
- Mature research agents are moving from chatbots to autonomous web agents, with public benchmarks and repos that lower adoption barriers.
- Benchmarking as a market signal
- Publishing concrete scores on academic reasoning and complex browsing tasks helps buyers compare agents and may pressure proprietary offerings to show comparable results.
- Verticalization of agents
- Insuranceāfocused agents with reported enterprise adoption suggest nearāterm value in domaināspecific workflows, supported by fresh capital and hiring.
š§ Worth Reading
- Humanityās Last Exam (HLE)
Core idea, an academic reasoning task used to gauge higherāorder problem solving. Practical takeaway, Tongyi DeepResearchās reported score of 32.9 provides a concrete reference point for evaluating researchāoriented agents against a standardized reasoning benchmark.
Introducing IndQA (openai.com) OpenAI introduces IndQA, a new benchmark for evaluating AI systems in Indian languages. Built with domain experts, IndQA tests cultural understanding and reasoning across 12 languages and 10 knowledge areas. openai
AWS and OpenAI announce multi-year strategic partnership (openai.com) OpenAI and AWS have entered a multi-year, $38 billion partnership to scale advanced AI workloads. AWS will provide world-class infrastructure and compute capacity to power OpenAIās next generation of models. openai
New prompt injection papers: Agents rule of two and the attacker moves second (simonwillison.net) hn