Scenario design
Define domain, principals, roles, relationships, and scoped access rules.
Benchmarking memory governance in multi-principal shared-memory agents.
GateMem evaluates whether persistent memory agents can remain useful while enforcing requester-specific access boundaries and honoring deletion requests. It shifts memory evaluation from single-user recall toward governed shared memory in realistic institutional environments.
Conventional memory benchmarks often reward an agent for retrieving the right fact. GateMem asks a harder deployment question: whether the agent should reveal that fact to the current requester, and whether deleted information remains recoverable later.
GateMem treats persistent memory as a governed shared state rather than a private cache. The benchmark evaluates long-horizon usefulness, contextual authorization, and interface-level deletion compliance in one protocol.
Each episode instantiates principals, relationships, access rules, evolving facts, and deletion requests. Hidden checkpoints query the agent at selected turn boundaries and are judged using structured annotations and leak targets.
Define domain, principals, roles, relationships, and scoped access rules.
Generate long-form multi-party traces with updates, benign noise, and deletion events.
Insert hidden utility, access-control, and active-forgetting queries with judge specifications.
Clinical coordination, patient data, family delegation, cross-patient confusion, and protected lab or medication details.
Project confidentiality, HR records, contractor boundaries, role mismatches, and enterprise workflows.
Campus workflows, student support, counselor interactions, academic records, and scoped institutional access.
Family coordination, residents, guests, caregivers, access codes, care routines, and deleted household instructions.
Across backbone LLMs and memory architectures, no method simultaneously achieves strong utility, robust access control, and reliable active forgetting. High recall often comes with leakage risk.
GateMem supports local evaluation through the released codebase and online leaderboard submission through the Hugging Face submission interface.
Implement a memory agent or score a generated predictions.jsonl file with the official evaluator.
git clone https://github.com/rzhub/GateMem.git
cd GateMem
pip install -r requirements.txt
python bench/scripts/run_eval.py \
--config configs/runs/paper_main.yaml \
--data_dir bench/data/medical \
--agent long_context
Upload predictions.jsonl, fill method metadata, and submit results for maintainer review.
1. Generate predictions.jsonl
2. Open GateMem-Submit
3. Upload predictions
4. Review pending result
5. Update public leaderboard
If you use GateMem, please cite the accompanying paper and dataset.
@misc{ren2026gatemembenchmarkingmemorygovernance,
title={GateMem: Benchmarking Memory Governance in Multi-Principal Shared-Memory Agents},
author={Zhe Ren and Yibo Yang and Yimeng Chen and Zijun Zhao and Benshuo Fu and Zhihao Shu and Bingjie Zhang and Yangyang Xu and Dandan Guo and Shuicheng Yan},
year={2026},
eprint={2606.18829},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2606.18829},
}