StreamMA

🎯 Key Contributions

📡

Streaming protocol
Step-level forwarding replaces waiting for full responses — lower latency and higher accuracy.

📐

Three closed-form theorems
Effectiveness ordering, speedup upper bound, and cost ratio for Stream / Serial / Single.

🚀

Step-level scaling law
A new orthogonal dimension: more steps per agent → better accuracy + higher speedup.

Stream vs Serial — see the pipeline in action

Four agents on real HMMT 2026 runs (GPT-5.4-none).

Random sample speedup: Graph 1.92× · Chain 1.84× · Tree 1.82×.

Abstract

Multi-agent reasoning systems adopt a generate-then-transfer paradigm that forces end-to-end latency to scale linearly with pipeline depth. We introduce StreamMA, a multi-agent reasoning system that streams each reasoning step to downstream agents as soon as it is generated, pipelining adjacent agents and thus reducing latency.

Surprisingly, this pipelining also improves effectiveness: because multi-step reasoning quality is non-uniform and early steps are more reliable than later ones, working with these reliable early steps instead of the full chain prevents error-prone late steps from misleading downstream agents. We formalise both advantages with the first closed-form joint analysis of stream, serial, and single protocols, deriving the effectiveness ordering, speedup upper bound, and cost ratio.

Across eight reasoning benchmarks spanning mathematics, science, and code, two frontier LLMs (Claude Opus 4.6 and GPT-5.4), and three topologies (Chain, Tree, Graph), StreamMA outperforms both baselines (avg. +7.3 pp, max +22.4 pp on HMMT 2026 with Claude Opus 4.6-high).

We further uncover a step-level scaling law: increasing per-agent steps consistently improves both effectiveness and efficiency, a new scaling dimension orthogonal to and composable with agent-count scaling.

The counter-intuitive finding

When context arrives matters more than how much context arrives.

Serial — sees the whole upstream output

Agent² waits until Agent¹ finishes all steps, then reads the full chain — including the error-prone tail — and inherits its mistakes.

Agent² answer: ✗ wrong

Stream — sees the reliable head first

Agent² starts reasoning after step 1; by the time the bad tail arrives, it has formed its own trajectory and the tail's impact is diluted.

Agent² answer: ✓ correct

Three closed-form theorems

The first joint analysis of Stream, Serial and Single protocols.
One ordering for effectiveness, one upper bound for speed, one ratio for cost.

Six regimes; predicts when Stream / Serial / Single wins.

Depending on how $\bar{p}$, $p_{\mathrm{head}}$, $p_{\mathrm{tail}}$ compare to $p^*$, the sCorr ordering falls into six cases:

I.a

I.b

II.a

II.b

III.a

III.b

I.a — Stream advantage (error accumulation). $\mathrm{sCorr}^{\mathrm{stream}} > \mathrm{sCorr}^{\mathrm{serial}} > \mathrm{sCorr}^{\mathrm{single}}$

Closed-form speedup upper bound; 26.9× measured at A=S=64 (83% of the theoretical limit).

$\displaystyle \mathrm{Speedup} \le \frac{A\bigl[(S+r_{po})\,r_{v_{dp}} + S\bigr]}{(S+A-1)(1 + \alpha\,r_{v_{dp}} + \beta\,r_{v_{dc}})}$

When decode ≫ prefill ($r_{v_{dp}}\!\to\!0$), reduces to $\mathrm{Speedup} \le \tfrac{AS}{S+A-1}$
S=64, A=64 → 32.3× theoretical, 26.9× measured (83%).

Exact cost formula; Stream saves ≈7.5% at ρ=1.

$\displaystyle \frac{C^{\mathrm{stream}}}{C^{\mathrm{serial}}} = \rho \cdot \frac{r_{c_{pd}}\,(\alpha + r_{c_{cp}}\,\beta) + 1}{r_{c_{pd}}\,(1 + r_{po}/S) + 1}$

Claude Opus 4.6 pricing: $5 / $25 / $0.5 per MTok (input / output / cache), A=S=4.
Bound = 0.925ρ with full KV-cache — saves ≈7.5% even at ρ=1.

Step-level perturbation — head/tail asymmetry

Same upstream output, two failure modes: corrupting the tail leaves Stream untouched; corrupting the head trips it.
The asymmetry is exactly what Theorem 1 predicts.

Tail-perturbed → Stream up to +24.0 pp Head-perturbed → Stream down to −36.0 pp

Main results — eight benchmarks, two LLMs, three topologies

Model	Topo	Method	AIME25	AIME26	HMMT26	GPQA-D	HLE	LCB-G	LCB-E	LCB-T	Avg.
Claude Opus 4.6 (high)	—	Single	67.50	60.00	48.11	83.67	18.60	90.25	77.94	84.31	66.30
	Chain	Serial	80.42	72.08	63.26	85.86	23.90	91.33	78.64	92.38	73.48
	Chain	StreamMA	92.50	89.58	85.61	87.37	26.97	91.50	84.41	95.63	81.70
	Tree	Serial	86.25	86.25	75.00	85.18	24.82	91.92	88.45	97.59	79.43
	Tree	StreamMA	93.34	87.92	82.20	85.86	25.07	94.00	94.57	99.55	82.81
	Graph	Serial	77.92	71.67	61.75	85.69	22.17	90.08	75.78	98.27	72.92
	Graph	StreamMA	95.83	87.92	82.58	86.53	27.68	92.17	95.27	98.72	83.34
GPT-5.4 (none)	—	Single	55.83	71.25	40.53	77.95	12.08	91.08	92.48	96.68	67.24
	Chain	Serial	60.00	70.42	54.55	75.08	14.66	90.08	97.43	99.02	70.16
	Chain	StreamMA	61.25	72.50	59.10	80.30	14.94	91.17	99.30	99.47	72.25
	Tree	Serial	59.17	75.83	56.07	76.77	14.83	88.33	93.81	99.25	70.51
	Tree	StreamMA	62.08	75.83	58.34	78.12	15.74	89.50	94.78	99.17	71.70
	Graph	Serial	60.00	74.17	52.65	78.45	14.04	92.25	99.51	98.80	71.13
	Graph	StreamMA	62.50	75.42	56.44	79.63	16.13	93.08	99.79	99.32	72.32

StreamMA rows shaded; numbers reproduce Tab. 1 of the paper.

Step-level scaling law — a new orthogonal dimension

At fixed agent count $A$, simply asking each agent to think in more finer steps $S$ improves both speed and accuracy.
Fully composable with agent-count scaling.

Speedup vs steps S (log-log)

Accuracy heatmap on HMMT 2026

GPT-5.4-none · A=64, S=auto baseline 68.2% → S=64 lifts to 73.5% with 26.9× speedup.

Cost–accuracy Pareto — Stream strictly dominates

Three-agent chain on HMMT 2026 with majority voting over $N\!\in\!\{1,4,16\}$ replicas. Claude Opus 4.6 pricing.

Stream×4 ($2.75, 90.9%) beats Serial×16 ($5.46, 89.4%) — half the cost, higher accuracy. With KV-cache hits, the same 90.9% drops to $1.61.

BibTeX

@article{yang2026streaming,
  title={Streaming Communication in Multi-Agent Reasoning},
  author={Yang, Zhen and Xu, Xiaogang and Wang, Wen and Chen, Cong and Xu, Xander and Chen, Ying-Cong},
  journal={arXiv preprint arXiv:2606.05158},
  year={2026}
}