Official benchmark results

Memory Governance Leaderboard

A measured ranking of shared-memory agents under utility, access-control, and active-forgetting constraints.

GATEMEM evaluates memory agents in multi-principal environments where the same memory pool must support authorized recall, enforce contextual access boundaries, and honor deletion requests. This page presents the official benchmark ledger. The default view shows the Medical domain; use the domain tabs below to switch domains.

View Leaderboard Read Paper Project Page Download Data Submit Results

91long-form episodes

2,218hidden checkpoints

4domains

168baseline result rows

MGSprimary governance score

Benchmark Ledger

The leaderboard can be sorted by headline score or by individual governance dimensions. Use the domain tabs to switch between Medical, Office, Education, and Household. U denotes effective utility; A denotes access-control violation rate; F denotes active-forgetting failure rate; MGS = U · (1 − A) · (1 − F).

Submit New Result

Higher is better for U, MGS, and safety views. Lower is better for A and F. Paper baseline values; judge-based labels.

Rank	System	Backbone	Domain	U ↑	A ↓	F ↓	MGS ↑

Showing 0 rows Click column headers to sort; podium follows the selected ranking view and domain.

Metric contract

The official score is intentionally multiplicative: a system cannot compensate for leakage with high utility, nor compensate for low utility with broad refusal.

U ↑effective utility

A ↓access violation

F ↓forgetting failure

MGS ↑joint reliability

Submission protocol

Public submissions should provide model version, code commit, raw checkpoint outputs, normalized actions, judge labels, and runtime/token logs.

outputsJSONL bundle

judgelabels/logs

costtokens/sec

reprocommit hash

Verification tiers

For a maintained leaderboard, distinguish paper baselines, self-reported runs, reproduced runs, and manually audited submissions.

paperbaseline

selfreported

reprorerun

auditinspected

Submit to the GATEMEM ledger.

Evaluate a memory agent under the released checkpoint protocol and submit standardized outputs for ranking across utility, access safety, active forgetting, and MGS.

Submit Result Download Benchmark Evaluator Code

Citation

If you use GATEMEM, please cite the accompanying paper and dataset.

@misc{ren2026gatemembenchmarkingmemorygovernance,
      title={GateMem: Benchmarking Memory Governance in Multi-Principal Shared-Memory Agents}, 
      author={Zhe Ren and Yibo Yang and Yimeng Chen and Zijun Zhao and Benshuo Fu and Zhihao Shu and Bingjie Zhang and Yangyang Xu and Dandan Guo and Shuicheng Yan},
      year={2026},
      eprint={2606.18829},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2606.18829}, 
}