| Rank | Method | Affiliation | VLM | VLA | TSR (%) | CSR (%) | T (TSR/CSR) | O (TSR/CSR) | C (TSR/CSR) | S (TSR/CSR) | Source | Notes |
|---|
Physical Robot
Real-World Evaluation
Full-task success rate over 10 rollouts per task.
| Method | Affiliation | Pour Bottle ×2 | Brush Plates w/ Swap | Transfer Objects | Shell Game | IHMB | Average |
|---|---|---|---|---|---|---|---|
| PrediMem | HKUST (Guangzhou) | 60 | 60 | 80 | 50 | 10 | 52 |
| MemER* | Stanford University | 30 | 50 | 80 | 40 | 0 | 40 |
| π0.5 | Physical Intelligence | 20 | 10 | 60 | 10 | 0 | 20 |
Real-world evaluation requires physical robot access. If you are interested in evaluating on the real-world benchmark, please contact leihuashuohit@gmail.com.
Metric Definition
- TSR: full-task success rate.
- CSR: cumulative success rate over stage-level verification predicates.
- T / O / C / S: category-wise TSR/CSR for Transferring / Occlusion / Counting / Sequence.
- Evaluation protocol: all reported simulation results are obtained over all 26 RoboMemArena tasks, with 50 evaluation rollouts for each task.
Updated: -