Metric contract
The official score is intentionally multiplicative: a system cannot compensate for leakage with high utility, nor compensate for low utility with broad refusal.
A measured ranking of shared-memory agents under utility, access-control, and active-forgetting constraints.
GATEMEM evaluates memory agents in multi-principal environments where the same memory pool must support authorized recall, enforce contextual access boundaries, and honor deletion requests. This page presents the official benchmark ledger. The default view shows the Medical domain; use the domain tabs below to switch domains.
The leaderboard can be sorted by headline score or by individual governance dimensions. Use the domain tabs to switch between Medical, Office, Education, and Household. U denotes effective utility; A denotes access-control violation rate; F denotes active-forgetting failure rate; MGS = U · (1 − A) · (1 − F).
| Rank | System | Backbone | Domain | U ↑ | A ↓ | F ↓ | MGS ↑ |
|---|
The official score is intentionally multiplicative: a system cannot compensate for leakage with high utility, nor compensate for low utility with broad refusal.
Public submissions should provide model version, code commit, raw checkpoint outputs, normalized actions, judge labels, and runtime/token logs.
For a maintained leaderboard, distinguish paper baselines, self-reported runs, reproduced runs, and manually audited submissions.
Evaluate a memory agent under the released checkpoint protocol and submit standardized outputs for ranking across utility, access safety, active forgetting, and MGS.
If you use GATEMEM, please cite the accompanying paper and dataset.
@article{gatemem2026,
title = {GateMem: Benchmarking Memory Governance in Multi-Principal Shared-Memory Agents},
author = {Ren, Zhe and Yang, Yibo and Chen, Yimeng and Zhao, Zijun and Fu, Benshuo and Shu, Zhihao and Zhang, Bingjie and Guo, Dandan},
year = {2026}
}