Plagiarism tools are best used for triage: prioritize what needs review. A common mistake is expecting a single score to mean “plagiarized” or “not plagiarized.” The right interpretation depends on the mode you use.
Think of modes as different questions: intrinsic asks “is this text internally repetitive or unusually self-similar?” Web asks “does this look like public web text?” Hybrid tries to get the best of both and is the safest default for editorial screening.
Choose a mode by goal (quick decision)
- If privacy is a concern or content is confidential: start with Intrinsic.
- If you need “possible sources” and the content is public: use Web or Hybrid.
- If you’re screening submissions and want conservative triage: use Hybrid.
- If the text is short (< ~100 words): expect less stable scores; use Hybrid and rely on excerpts/sources.
Intrinsic mode: internal signals only
- Good for detecting repetition, templated writing, and near-duplicates inside a document.
- Useful when you can’t or shouldn’t query the web (privacy constraints).
- Not designed to name external sources.
Intrinsic is ideal for detecting self-plagiarism and template reuse (standard intros, policy boilerplate, repeated paragraphs), and for “is this submission too similar to itself?” checks when you cannot use external search.
Web mode: “possible sources” from the public web
- Good for spotting copied passages that are publicly accessible.
- Can produce “possible sources” with overlap excerpts.
- Can miss paywalled sources or content behind logins.
What “possible sources” means (and what it doesn’t)
- A “possible source” is a lead: a page that looks text-similar enough to review.
- It is not a legal claim of copying; it may be a common template, shared definitions, or quoted material.
- Overlap is usually localized: a few paragraphs can drive a score even when the rest is original.
- Some copying will not be found (paywalls, PDFs, private docs, forums, walled communities).
Hybrid mode: combine intrinsic + web signals
- Good default when you want both: internal duplication signals plus possible sources.
- Most useful for editorial triage and screening workflows.
How to interpret results (conservatively)
Treat results as leads. A high score means “review this first,” not “this is plagiarism.” A low score means “we didn’t see strong signals,” not “this is original.”
- High score + high-confidence excerpts + a clear matching source = prioritize review (most actionable case).
- High score but sources look like boilerplate/templates = likely benign reuse; document it and move on.
- Low score but you personally suspect copying = try Web/Hybrid and search by distinctive phrases (names, numbers, unique claims).
- Open the “possible source” and confirm the text actually matches (not just topic overlap).
- Check whether the matching text is common boilerplate or truly distinctive.
- If it’s quoted, confirm it is clearly marked and properly cited.
- If it’s paraphrased, check whether the structure and specifics are too close.
A practical escalation policy (teams)
- Distinctive overlap is present (not a template), and it is not quoted/cited.
- Multiple independent sources show similar overlap (pattern suggests copying).
- Key claims/figures appear to match a source with minimal transformation.
- The overlap is mostly generic boilerplate (definitions, disclaimers, standard sections).
- The text is clearly quoted and attributed.
- The match is only topical (same subject, different wording and specifics).