We are very proud to share that Pretraining Data Detection for Large Language Models: A Divergence-based Calibration Method by Weichao Zhang (CAS), Ruqing Zhang (CAS), Jiafeng Guo (CAS), Maarten de Rijke (University of Amsterdam), Yixing Fan (currently visiting the University of Amsterdam), and Xueqi Cheng (CAS) received a best paper award at EMNLP 2024: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, which was held in Miami, Florida, from November 12 to 16.
A link to the paper: https://staff.fnwi.uva.nl/m.derijke/wp-content/papercite-data/pdf/zhang-2024-pretraining.pdf