TY - GEN
T1 - Comparison of overlap detection techniques
AU - Monostori, Krisztián
AU - Finkel, Raphael
AU - Zaslavsky, Arkady
AU - Hodász, Gábor
AU - Pataki, Máté
PY - 2002
Y1 - 2002
N2 - Easy access to the World Wide Web has raised concerns about copyright issues and plagiarism. It is easy to copy someone else's work and submit it as someone's own. This problem has been targeted by many systems, which use very similar approaches. These approaches are compared in this paper and suggestions are made when different strategies are more applicable than others. Some alternative approaches are proposed that perform better than previously presented methods. These previous methods share two common stages: chunking of documents and selection of representative chunks. We study both stages and also propose alternatives that are better in terms of accuracy and space requirement. The applications of these methods are not limited to plagiarism detection but may target other copy-detection problems. We also propose a third stage to be applied in the comparison that uses suffix trees and suffix vectors to identify the overlapping chunks.
AB - Easy access to the World Wide Web has raised concerns about copyright issues and plagiarism. It is easy to copy someone else's work and submit it as someone's own. This problem has been targeted by many systems, which use very similar approaches. These approaches are compared in this paper and suggestions are made when different strategies are more applicable than others. Some alternative approaches are proposed that perform better than previously presented methods. These previous methods share two common stages: chunking of documents and selection of representative chunks. We study both stages and also propose alternatives that are better in terms of accuracy and space requirement. The applications of these methods are not limited to plagiarism detection but may target other copy-detection problems. We also propose a third stage to be applied in the comparison that uses suffix trees and suffix vectors to identify the overlapping chunks.
UR - http://www.scopus.com/inward/record.url?scp=33748374589&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33748374589&partnerID=8YFLogxK
U2 - 10.1007/3-540-46043-8_4
DO - 10.1007/3-540-46043-8_4
M3 - Conference contribution
AN - SCOPUS:33748374589
SN - 3540435913
SN - 9783540435914
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 51
EP - 60
BT - Computational Science, ICCS 2002 - International Conference, Proceedings
T2 - International Conference on Computational Science, ICCS 2002
Y2 - 21 April 2002 through 24 April 2002
ER -