RoboMemArena: A Comprehensive and Challenging Robotic Memory Benchmark

Benchmark Examples

Sequence

Place cookies & tomato sauce in a container sequentially

Place cream cheese & chocolate pudding in a container sequentially

Pour tomato sauce twice, place cookies into microwave

Counting

Pour tomato sauce twice into frypan, place into drainer

Pick chocolate, pour tomato, place into drainer

Pour wine bottle twice into mug

Occlusion

Open all drawers sequentially, then place butter into the drawer with an item

Open all drawers sequentially, then place butter into the empty drawer

Place two items into the microwave for heating

Transferring

Pick chocolate & butter, cabinet 1 → cabinet 2

Pick tomato, milk & OJ, cabinet 1 → cabinet 2

Pick chocolate & cream, plate 1 → plate 2

Overview

Figure 1. RoboMemArena converts natural-language instructions into keyframe-annotated trajectories through VLM-based task decomposition, autonomous execution, closed-loop verification, and targeted human refinement of unsuitable annotations.

Abstract

Memory is a critical component of robotic intelligence, as robots must rely on past observations and actions to accomplish long-horizon tasks in partially observable environments. However, existing robotic memory benchmarks still lack multimodal annotations for memory formation, provide limited task coverage and structural complexity, and remain restricted to simulation without real-world evaluation. We address this gap with RoboMemArena, a large-scale benchmark of 26 tasks, with average trajectory lengths exceeding 1,000 steps per task and 68.9% of subtasks being memory-dependent. Its generation pipeline uses a vision-language model (VLM) to design and compose subtasks, generates full trajectories through atomic functions, and provides memory-related annotations, including subtask instructions and native keyframe annotations, while paired real-world memory tasks support physical evaluation. We further design PrediMem, a dual-system VLA in which a high-level VLM planner manages a memory bank with recent and keyframe buffers and uses a predictive coding head to improve sensitivity to task dynamics. Extensive experiments on RoboMemArena show that PrediMem outperforms all baselines and provides insights into memory management, model architecture, and scaling laws for complex memory systems.

PrediMem

PrediMem is a dual-system memory framework built around a high-level VLM planner and a low-level VLA executor. The high-level planner receives the current observation together with a memory bank composed of a recent-frame buffer and a keyframe buffer. It predicts the current subtask and decides whether the current frame should be stored as a keyframe, while the low-level policy executes the latest subtask-conditioned action chunk.

During training, PrediMem adds a predictive coding head that predicts the next-frame visual latent representation from the current hidden state. This auxiliary objective makes the VLM representation more sensitive to physical state transitions such as grasp closure, object placement, repeated pouring, and drawer closure. The predictive coding head is removed at inference time, so the deployed system keeps the same dual-system architecture and inference cost.

Benchmark Results

RoboMemArena exposes a clear memory gap in existing policies. The reactive π_0.5 baseline reaches 21.5 % TSR and 38.7 % CSR, while the matched MemER implementation improves to 27.3 % TSR and 49.1 % CSR. PrediMem achieves the best overall simulation performance with 38.5 % TSR and 55.2 % CSR.

Ablations show that both memory components matter. Removing the predictive coding head drops TSR to 32.3 %, while removing the keyframe bank drops TSR to 17.7 %. The best predictive-loss weight is 0.1, and an uncapped keyframe bank performs best for long-horizon tasks where early observations remain decision-critical.

On five real-world memory tasks, PrediMem reaches 52 % average success over 10 rollouts per task, compared with 40 % for MemER and 20 % for π_0.5. These results indicate that keyframe-grounded memory remains useful under physical execution noise.

View the leaderboard page for future submissions and category-wise metrics.

BibTeX

@article{robomemarena2025,
  title   = {RoboMemArena: A Comprehensive and Challenging
             Robotic Memory Benchmark},
  author  = {Huashuo Lei and Wenxuan Song and Huarui Zhang and Jieyuan Pei and Jiayi Chen and Haodong Yan and Han Zhao and Pengxiang Ding and Zhipeng Zhang and Lida Huang and Donglin Wang and Yan Wang and Haoang Li},
  journal = {arXiv preprint arXiv:2605.10921},
  year    = {2026}
}

Benchmark Task Details

Overview of all 26 RoboMemArena tasks, their corresponding memory types, average total timesteps, and key challenges.

Task Name	Memory Type	Avg. #Steps	Task Challenge	Brief Description
Task Suite: Multi-Object Transferring
Transfer Chocolate Cream(Task26)	T	866	transferring	Pick and place chocolate and cream from plate1 to plate2, respectively.
Transfer Butter Cream(Task25)	T	779	transferring	Pick and place butter and cream from plate1 to plate2, respectively.
Transfer chocolate Cream(Task18)	T	779	transferring	Pick and place chocolate and cream from cabinet1 to cabinet2, respectively.
Transfer Sauce Milk Juice(Task19)	T	1265	transferring	Pick and place tomato sauce, milk, and orange juice from cabinet1 to cabinet2.
Task Suite: Multi-Object Occlusion
Put Butter in Not-Empty Drawer(Task4)	O	1020	occlusion	Open and close all drawers in order to check. Put butter into the drawer that already contains an object.
Put Butter in Empty Drawer(Task5)	O	1806	occlusion	Open and close all drawers in order to check. Put butter into the empty drawer.
Put Cookies Butter into Drawer Respectively(Task11)	O + C	1835	occlusion, multi-placement	Put cookies into the top drawer and put butter into another drawer.
Put Cookies Chocolate into Middle Drawer(Task12)	O	1370	occlusion	Put cookies into the middle drawer and then put chocolate into the same drawer.
Put Butter Cookies into Middle Drawer(Task13)	O	1377	occlusion	Put cookies into the middle drawer and then put butter into the same drawer.
Put Cookies Chocolate into Drawer Respectively(Task14)	O	1832	occlusion, multi-placement	Put cookies into the top drawer and put chocolate into another drawer.
Put Butter Chocolate into Middle Drawer(Task17)	O	1502	occlusion	Put butter into the middle drawer and then put chocolate into the same drawer.
Put Cookies Chocolate into Microwave(Task20)	O	1195	occlusion	Put cookies into the microwave and then put chocolate into the location where the cookies were placed.
Put Butter Chocolate into Microwave(Task21)	O	1175	occlusion	Put butter into the microwave and then put chocolate into the location where the butter was placed.
Put Cream Popcorn into Microwave(Task23)	O	1175	occlusion	Put cream into the microwave and then put popcorn into the location where the cream was placed.
Put Cookies Popcorn into Microwave(Task24)	O	1195	occlusion	Put cookies into the microwave and then put popcorn into the location where the cookies were placed.
Task Suite: Multi-Object Counting
Pour Sauce on Cookies x2 Place Sauce into Drainer(Task6)	C + O	624	counting, occlusion	Pour tomato sauce over cookies twice and place the sauce bottle into the bowl drainer.
Pour Sauce on Frypan x2 Place Sauce into Drainer(Task7)	C + O	537	counting, occlusion	Pour tomato sauce over the frypan twice and place the sauce bottle into the bowl drainer.
Pour Sauce Twice over Chocolate in Frypan Place Sauce into Drainer(Task8)	C	910	counting	Pick and place chocolate into the frypan, pour tomato sauce over it twice, then place the sauce bottle into the bowl drainer.
Pour Sauce x2 over Butter in Frypan(Task9)	C	958	counting	Put butter into the frypan and pour sauce over it twice, then place the sauce bottle into the bowl drainer.
Pour Wine into Mug Twice(Task10)	C	472	counting	Pour wine into the mug twice.
Pour Milk Twice over Butter in Frypan(Task15)	C	1055	counting	Pick and place butter into the frypan, then pour milk over it twice.
Pour Milk x2 over Mug Place Milk into Drainer(Task16)	C + O	594	counting, occlusion	Pick milk from the table, pour it into the mug twice, then place the milk container into the bowl drainer.
Task Suite: Multi-Object Sequence
Put Cookies Sauce into Basket in Order(Task1)	S	742	sequence	Pick and place cookies into the basket, then pick and place tomato sauce into the same basket.
Put Butter Popcointo Basket in Order(Task2)	S	708	sequence	Pick and place butter into the basket, then pick and place popcorn into the same basket.
Put Cream Chocolate into Basket in Order(Task3)	S	708	sequence	Pick and place cream into the basket, then pick and place chocolate into the same basket.
Pour Sauce x2 Put Cookies into Microwave（Task22）	S	1565	sequence	Pour tomato sauce over cookies twice, then put the cookies into the microwave.