Tokyo-based Sakana AI shipped its first industrial product ‘Sakana Marlin’ this week. Sakana group positions it as a Digital CSO (Chief Technique Officer). It’s a B2B autonomous analysis agent constructed for enterprises.
Marlin doesn’t reply in seconds like a chatbot. You give it one analysis matter. It then runs autonomously for as much as about eight hours. Every run returns an extended report plus a presentation slide deck. Sakana says a single session points a whole lot to hundreds of LLM queries.
What’s Sakana Marlin
Marlin is an enterprise analysis agent, not a chat assistant. You give it one matter or query. It then plans hypotheses, browses sources, and verifies findings by itself. It compresses weeks of technique work into hours.
The deliverable is structured for decision-makers. The Japanese announcement describes experiences of dozens of pages. The English announcement cites experiences of as much as roughly 100 pages. At a press hands-on, experiences ran 60–100 pages and cited 60–80 sources. Every report features a principal physique, references, and appendices. Presentation slides are generated utilizing image-generation AI.
Sakana group refined Marlin by way of a closed beta in April 2026. Round 300 professionals examined it on actual duties throughout that beta. These duties spanned technique formulation, market analysis, threat evaluation, and aggressive evaluation. Sakana has additionally partnered with MUFG and brought strategic funding from Citigroup.
Inside AB-MCTS: Wider or Deeper
The spine of Marlin is AB-MCTS, or Adaptive Branching Monte Carlo Tree Search. It comes from the Sakana’s previous analysis “Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search.”
AB-MCTS treats reasoning as a tree-search drawback. At every step the algorithm makes one resolution. It could actually go wider by producing a brand new candidate reply. Or it will possibly go deeper by refining a promising present reply. Customary repeated sampling solely goes wider in parallel, then hopes one reply is correct.
A multi-LLM variant provides a second selection. It could actually route a step to a special mannequin totally. In Sakana’s reported ARC-AGI-2 experiments, this collaboration helped. Combining o4-mini, Gemini 2.5 Professional, and DeepSeek-R1 solved about 27.5% of duties. The o4-mini mannequin alone solved about 23%. Marlin applies the identical adaptive search to long-horizon analysis.
The second key element for Marlin is workflow automation from Sakana’s AI Scientist challenge. That challenge demonstrated autonomous scientific discovery and was revealed in Nature.
Interactive demo: The embeddable widget (marlin-abmcts-demo.html) exhibits the “wider or deeper” resolution reside. Press Run and watch the tree develop. Greener nodes carry increased scores, and the most effective path is highlighted. Toggle “Multi-LLM” to see steps routed throughout totally different fashions.
AB-MCTS: “Wider or Deeper?” — interactive search
A simplified visible of Sakana AI’s Adaptive Branching Monte Carlo Tree Search. Every step the coverage chooses to widen (new candidate) or deepen (refine a promising line).
Search state
Price range used0 / 24
Nodes (candidates)1
Greatest rating0.00
Wider / Deeper0 / 0
low rating
excessive rating
finest path
How Marlin Compares
Marlin competes on depth, not velocity. Standard deep-research instruments reply in minutes to tens of minutes. Marlin intentionally spends hours to boost output high quality. The competitor run occasions beneath are approximate and reported, not official figures.
| Software | Typical run time | Output | Major consumer |
|---|---|---|---|
| Sakana Marlin | As much as ~8 hours | Report (dozens to ~100 pages) + slides | Enterprise technique groups |
| OpenAI Deep Analysis | ~Minutes to tens of minutes | Cited textual content report | Basic and professional customers |
| Perplexity Deep Analysis | ~A couple of minutes | Cited textual content reply | Basic customers |
| Google Gemini Deep Analysis | ~Minutes | Cited textual content report | Basic and workspace customers |
The trade-off is express. You wait longer and pay per run. In return you get deeper speculation testing and a completed deliverable. You possibly can cancel a run anytime, however credit are nonetheless consumed.
Pricing
Sakana provides pay-as-you-go together with Professional, Group, and Enterprise tiers. Pay-as-you-go begins at 100 credit per run, at ¥98 per credit score. Professional is ¥150,000 per 30 days and contains 2,000 credit. Group is ¥400,000 per 30 days and contains 6,000 credit. Enterprise pricing is customized, with devoted assist.
Use Instances, With Examples
Marlin fits high-stakes questions the place analysis is the bottleneck. Listed below are concrete examples drawn from its goal duties.
- Market entry: ‘Assess Japan’s stablecoin and tokenized-payments market after regulatory change.’ Marlin maps drivers, dangers, and structured choices right into a report.
- Danger evaluation: ‘Mannequin decision eventualities for a Strait of Hormuz blockade.’ It compares hypotheses, not simply summaries, earlier than drawing conclusions.
- Aggressive evaluation: Profile three rivals and rank our positioning gaps. It returns slides prepared for a technique overview.
Every instance suits one immediate and one unattended run. A human nonetheless critiques the cited output earlier than any resolution.
Attempt the Engine Your self: TreeQuest
You can not self-host Marlin. However you’ll be able to run its core algorithm right now. Sakana open-sourced AB-MCTS as TreeQuest below the Apache 2.0 license. Set up it, outline a generate perform, then run a set search finances.
import random
import treequest as tq
# Every node holds a user-defined state; rating have to be normalized to [0, 1].
def generate(parent_state):
if parent_state is None: # None means increase from the foundation
new_state = "Preliminary draft"
else:
new_state = f"Refined: {parent_state}"
rating = random.random() # swap this for an LLM-based rating
return new_state, rating
algo = tq.ABMCTSA() # Adaptive Branching MCTS (variant A)
search_tree = algo.init_tree()
for _ in vary(10): # technology finances of 10
search_tree = algo.step(search_tree, {"generate": generate})
best_state, best_score = tq.top_k(search_tree, algo, okay=1)[0]
print("BEST:", best_state, spherical(best_score, 3))
Swap the random rating for an LLM choose to breed the true sample. TreeQuest additionally ships multi-LLM search and checkpointing for lengthy runs. Checkpointing issues as a result of lengthy periods can hit API errors halfway.
Strengths and Weaknesses
Strengths
- Peer-reviewed foundations: AB-MCTS at NeurIPS and AI Scientist in Nature.
- Completed deliverables, together with references, appendices, and slides.
- Adaptive compute spends effort on probably the most promising branches.
- The open-source core (TreeQuest) lets AI researchers research the strategy.
Weaknesses
- Lengthy runtimes make iteration sluggish versus minute-scale analysis instruments.
- Automated experiences can include hard-to-spot errors that want human overview.
- Pricing and design goal enterprises, not particular person builders.
- Marlin itself is closed; solely the underlying algorithm is open.
Key Takeaways
- Sakana Marlin runs autonomous analysis for as much as about eight hours per job.
- One run produces a report of dozens of pages, plus slides.
- It builds on AB-MCTS (NeurIPS 2025 Highlight) and AI Scientist workflows (Nature).
- Entry pricing is pay-as-you-go: 100 credit per run at ¥98 per credit score.
- It targets finance, company technique, consulting, and think-tank groups.
Sources
- Sakana AI — Sakana Marlin launch: https://sakana.ai/marlin-release/
- Sakana AI — Sakana Marlin product web page: https://sakana.ai/marlin/
- Sakana AI — AB-MCTS analysis and TreeQuest: https://sakana.ai/ab-mcts/
- SakanaAI/treequest (GitHub, Apache 2.0): https://github.com/SakanaAI/treequest
