Comparison of plagiarism checker tools
In seeking to find the best plagiarism checker tool on the market in 2021, we conducted an experiment comparing the performances of 9 checkers.
We focused on a series of factors in our analysis. For each tool, we analyzed the amount of plagiarism it was able to detect, the quality of matches, and its usability and trustworthiness.
This article describes our research process, explaining how we arrived at our findings.
- Which plagiarism checkers we selected
- How we prepared test documents
- How we analyzed quantitative results
- How we selected criteria for qualitative analysis
Plagiarism checkers analyzed
We kicked off our analysis by searching for the main plagiarism checkers on the market that can be purchased by individual users, excluding enterprise software. We decided to only compare plagiarism checkers that mentioned students and/or academics as one of their target audiences.
Ultimately, we came up with the following list:
- Scribbr (in partnership with Turnitin)
- Plagly (failed to generate a report)
- DupliChecker (unable to process documents of this size)
- SearchEngineReports (unable to process documents of this size)
- Writer (did not detect any plagiarism, so we did not proceed)
The initial unedited document consisted of 180 plagiarized sections from 180 different sources. These were equally distributed amongst 8 different source types, in order to assess the performance for each source type individually.
The first document was compiled from:
- 20 Wikipedia articles
- 20 older news articles
- 20 recent news articles
- 20 open access journal articles
- 20 small website articles
- 20 big website articles
- 20 theses and dissertations
- 20 other PDFs
Generally speaking, open- and restricted-access journal articles, theses and dissertations, large websites, and PDFs are more relevant to students and academics. Writers and marketers might be more interested in news articles, and websites. Wikipedia articles can be useful for both.
Taking this into account, it is important to distinguish between the various source types. Different users (e.g. students, academics, marketers, writers) use different sources, and have different needs with respect to plagiarism checkers.
Edited test documents
For the next step in our analysis, the unedited test document underwent three levels of editing: light, moderate, and heavy. We wanted to investigate whether each tool was able to find a source text when it had been edited to varying degrees.
- Light: the original copy-pasted source text was edited by replacing 1 word in every sentence.
- Moderate: the original copy-pasted source text was edited by replacing 2 to 3 words in every sentence, provided the sentence was long enough.
- Heavy: the original copy-pasted source text was edited by replacing 4 to 5 words in every sentence, provided the sentence was long enough.
We limited our analysis to the source texts that had been detected by all the checkers previously. Any checker that did not find plagiarism in the manipulated text is likely not able to detect plagiarism if the original text has been altered, so we excluded those sources from analysis.
The edited test documents consisted of 17 source texts from Wikipedia and 2 from older news articles.
We uploaded each version of the document to each plagiarism checker tool for testing. Subsequently, the documents were manually evaluated by looking at each plagiarized paragraph to determine if the checker:
(0) had not been able to attribute any of the sentences to one or multiple sources
(1) had been able to attribute one or multiple sentences to one or multiple sources
(2) had been able to attribute all sentences to one source
This way, we were able to test whether checkers with a high plagiarism percentage actually had been able to find and fully match the source, or if they simply attributed a few sentences of the source text to multiple sources. This method also helped screen for false positives (e.g. highlighting non-plagiarized parts or common phrases as plagiarism).
For the original, unedited document, we calculated the total score for each tool. We weighted the source types based on their relevance for students and academics. The more relevant source types received weight 2. The less relevant ones, namely recent news articles and small websites, received weight 1.
The total possible score for the unedited document was 560. We then calculated the percentage of detected plagiarism for each tool, taking into account the weights.
For the edited texts, we also calculated the percentage of detected plagiarism for each plagiarism checker. However, this time the total possible score was 38, since we only included the 19 source texts that had been found by all plagiarism checkers.
The following results indicate how much plagiarism the tools were able to detect in the unedited, lightly edited, moderately edited and heavily edited document.
For the edited categories, we only used sources from the unedited document that were found by all plagiarism checkers during the first round. Therefore, some checkers were able to find 100% of these sources, even though they did not score 100% for the unedited document with more sources.
|Plagiarism checker||Unedited||Lightly edited||Moderately edited||Heavily edited|
|1. Scribbr (in partnership
|8. Plagiarism Detector||24%||37%||39%||29%|
Evaluating the plagiarism checker tools
Our next step was a qualitative analysis, during which the quality of matches, usability, and trustworthiness were assessed, with the help of pre-set criteria. These contributed to a more standardized, objective evaluation. All plagiarism checkers were evaluated the same way.
The selected criteria cover a great deal of users’ needs:
- Quality of matches: it is crucial for plagiarism checkers to be able to match the entire plagiarized section to the right source. Partial matches, where the checker matches individual sentences to multiple sources, result in a messy, hard to interpret report. False positives, where common phrases are incorrectly marked as plagiarism, are also important to consider, because these skew the plagiarism percentages.
- Usability: it is essential that plagiarism checkers show a clear overview of potential plagiarism issues. The report should be clear and cohesive, with a clean design. It is also important that users can instantly resolve the issues, for example by adding automatically generated citations.
- Trustworthiness: it is important for students and academics that their documents are not stored or sold to third parties. This way, they know for sure that the plagiarism check will not result in plagiarism issues when they submit their text to their educational institution or for publication. It is also important that the tool offers customer support if problems occur.