GATEMEM Benchmark Ledger Utility · Access Control · Active Forgetting
Official benchmark results

Memory Governance Leaderboard

A measured ranking of shared-memory agents under utility, access-control, and active-forgetting constraints.

GATEMEM evaluates memory agents in multi-principal environments where the same memory pool must support authorized recall, enforce contextual access boundaries, and honor deletion requests. This page presents the official benchmark ledger. The default view shows the Medical domain; use the domain tabs below to switch domains.

91long-form episodes
2,218hidden checkpoints
4domains
84baseline result rows
MGSprimary governance score

Benchmark Ledger

The leaderboard can be sorted by headline score or by individual governance dimensions. Use the domain tabs to switch between Medical, Office, Education, and Household. U denotes effective utility; A denotes access-control violation rate; F denotes active-forgetting failure rate; MGS = U · (1 − A) · (1 − F).

Submit New Result
Higher is better for U, MGS, and safety views. Lower is better for A and F. Paper baseline values; judge-based labels.
Rank System Backbone Domain U ↑ A ↓ F ↓ MGS ↑
Showing 0 rows Click column headers to sort; podium follows the selected ranking view and domain.

Metric contract

The official score is intentionally multiplicative: a system cannot compensate for leakage with high utility, nor compensate for low utility with broad refusal.

U ↑effective utility
A ↓access violation
F ↓forgetting failure
MGS ↑joint reliability

Submission protocol

Public submissions should provide model version, code commit, raw checkpoint outputs, normalized actions, judge labels, and runtime/token logs.

outputsJSONL bundle
judgelabels/logs
costtokens/sec
reprocommit hash

Verification tiers

For a maintained leaderboard, distinguish paper baselines, self-reported runs, reproduced runs, and manually audited submissions.

paperbaseline
selfreported
reprorerun
auditinspected

Submit to the GATEMEM ledger.

Evaluate a memory agent under the released checkpoint protocol and submit standardized outputs for ranking across utility, access safety, active forgetting, and MGS.

Citation

If you use GATEMEM, please cite the accompanying paper and dataset.

@article{gatemem2026,
  title  = {GateMem: Benchmarking Memory Governance in Multi-Principal Shared-Memory Agents},
  author = {Ren, Zhe and Yang, Yibo and Chen, Yimeng and Zhao, Zijun and Fu, Benshuo and Shu, Zhihao and Zhang, Bingjie and Guo, Dandan},
  year   = {2026}
}