Does This Paper Even Exist? How We Catch Hallucinated References
Fabricated citations went from an embarrassment to a bannable offense in 2026. Here is how ReviewerZero verifies every reference against 240M+ papers and the live web, and tells real fabrications apart from citation mistakes.
Last month, arXiv began banning authors who submit papers with AI-hallucinated references. Around the same time, a Columbia University team audited 2.5 million biomedical papers and found fabricated citations had risen manyfold since 2023, with roughly one paper in 277 now citing a source that does not exist (The Lancet, Retraction Watch). At the top venues it is already public: GPTZero found more than 100 hallucinated citations across 53 accepted NeurIPS 2025 papers (Fortune), and both ICLR and NeurIPS now might desk-reject papers for undisclosed LLM use.
Why it is hard to catch
A hallucinated citation rarely looks fake. It has a believable title, plausible authors, and often a DOI that resolves, just to a different paper than the one named. Catching it means answering one question for every reference, reliably: does the cited work actually exist? Three things make that hard.
Real scholarship is not always easy to index. Whole fields publish mostly at conferences, and legitimate sources also include books, theses, standards, datasets, and reports. A checker that only understands journals will flag real work as missing.
AI will happily confirm a fake. Ask a chatbot whether a paper exists and it will often say yes and repeat the invented details back to you. Verification cannot lean on a model's memory. It has to find real evidence.
But not every flaw is fraud. A real paper cited with the wrong year can be a correction, not misconduct necessarily. Treating the two the same way either misses the genuine fabrications or might accuse honest mistakes.
How ReviewerZero checks every reference
We screen every reference in a submission automatically, and we settle the existence question with evidence with a complex, validated, and auditable system.
We cross-check each reference against several independent scholarly databases rather than trusting any single index. Together they cover more than 240 million works across journals and conference proceedings, including the conference literature most tools miss. When a reference is not in any of them, we search the live web for the actual source, because some legitimate references exist only as a book listing, a repository record, or a single web page. If the reference includes a link, we open it and check that the page genuinely matches what was cited.
The web verification step alone applies dozens of rules: whether the cited link resolves, whether the page content matches, whether the work is the same one in a different edition, whether a preprint was later published under a different venue, whether a link is paywalled rather than truly dead. In all in all, we have dozens of rules that we have validated and audited to ensure they are accurate and reliable.
We also keep two questions apart. "This work does not exist" is an integrity problem that deserves a closer look. "This citation has the wrong venue or year" can be a fixable error.
How well this works
We test the pipeline against public benchmarks of real and fabricated references, and the pattern is consistent. It catches the large majority of in-the-wild fabrications, reliably flags references that point to a work which does not exist, and almost never raises a false alarm on a real reference. That last property is what makes automated screening safe to put in front of editors, because a flag means something.
Performance varies by field and by source type, and some references are genuinely hard to confirm. A real paper in a small regional journal that was never put online can be close to impossible to verify from the outside, and there we would rather report "could not verify" than guess. No automated system, ours included, replaces editorial judgment. It focuses that judgment where it is needed.
We also test against independent benchmarks built specifically to stress citation checking, including HALLMARK, CiteAudit, and CiteTracer, which collect thousands of real and fabricated references along with genuine hallucinations pulled from real submissions.
Interested? Book a demo to see how ReviewerZero can help you.
