Methods

Citation Verifiability in AI Outputs (Jan 2026)

Study question

In a reproducible sample of AI-generated outputs, how often are cited references verifiable using public bibliographic sources?

Sampling

Target design: N = 100 prompts from a fixed prompt bank.
Current published sample: N = 100 (source: ChatGPT).
One response per prompt from the chosen model (v1 uses a single model as a baseline).
Each response is instructed to include exactly 5 references in a strict one-line schema.
Collected outputs are stored as JSONL rows (prompt + full answer text).

What is “verifiable”?

We run each full AI output through Verifing’s Citation Verification tool, which attempts to resolve citations via public bibliographic sources (e.g., Crossref/DataCite/PubMed/OpenAlex/Open Library) using conservative matching.

VERIFIED: citation metadata matches a known record with sufficient confidence.
RETRACTED: the resolved record is known to be retracted (when detectable).
HALLUCINATED: the identifier/citation could not be found in queried sources.
AMBIGUOUS: plausible candidates exist but there isn’t enough information to confirm safely.
ERROR: transient/system failure (timeouts, upstream issues).

Important limitations

“HALLUCINATED” in this study means “not found in the queried sources.” It is not a claim about intent.
Public sources can be incomplete, rate-limited, or delayed; some real citations may be marked AMBIGUOUS or HALLUCINATED.
This v1 study uses a single model and a single run per prompt; results may differ across models and runs.

Reproduction steps

Use the prompt bank at apps/web/src/data/studies/citation-verifiability-jan-2026/prompt-bank.md.
Save outputs to apps/web/src/data/studies/citation-verifiability-jan-2026/ai-outputs.jsonl following the template file.

Run:

node scripts/study-citation-verifiability/run-study.mjs --api https://api.verifing.com \
  --input apps/web/src/data/studies/citation-verifiability-jan-2026/ai-outputs.jsonl

Dataset download (current published sample): /study/citation-verifiability-jan-2026/dataset

Back to study Try citation verification