Detecting Machine-Generated Texts by Multi-Inhabitants Conscious Optimization for Most Imply Discrepancy
Authors: Shuhai Zhang, Yiliao Song, Jiahao Yang, Yuanqing Li, Bo Han, Mingkui Tan
Summary: Giant language fashions (LLMs) akin to ChatGPT have exhibited exceptional efficiency in producing human-like texts. Nevertheless, machine-generated texts (MGTs) could carry essential dangers, akin to plagiarism points, deceptive info, or hallucination points. Due to this fact, it is rather pressing and essential to detect MGTs in lots of conditions. Sadly, it’s difficult to differentiate MGTs and human-written texts as a result of the distributional discrepancy between them is usually very delicate as a result of exceptional efficiency of LLMs. On this paper, we search to take advantage of textit{most imply discrepancy} (MMD) to handle this situation within the sense that MMD can nicely determine distributional discrepancies. Nevertheless, instantly coaching a detector with MMD utilizing various MGTs will incur a considerably elevated variance of MMD since MGTs could comprise textit{a number of textual content populations} as a result of numerous LLMs. This can severely impair MMD’s means to measure the distinction between two samples. To deal with this, we suggest a novel textit{multi-population} conscious optimization technique for MMD known as MMD-MP, which might textit{keep away from variance will increase} and thus enhance the soundness to measure the distributional discrepancy. Counting on MMD-MP, we develop two strategies for paragraph-based and sentence-based detection, respectively. Intensive experiments on numerous LLMs, eg, GPT2 and ChatGPT, present superior detection efficiency of our MMD-MP. The supply code is accessible at url{https://github.com/ZSHsh98/MMD-MP}