How to detect plagiarism in code

How to detect plagiarism in code

Image for post

Plagiarism is striving, developing, spreading, achieving new forms. It has become a serious problem in many areas: research writing and journalism, arts, software development, but prevailing in education.

Plagiarism is a constant temptation for students: it is easy to perform. Plagiarism is also a serious challenge for teachers: it is pretty difficult to detect. The global plagiarism statistics are unfavorable: 36% of undergraduates have admitted to plagiarizing written material, as well as 54% of students admitted to plagiarizing from the Internet. Moreover, 90% of students who participated in recent US News and World Reports poll believe that cheaters are never caught.

Luckily, there are modern and effective ways of overcoming it.

Plagiarism types: can they be beaten with the same weapon?

To date, there are two basic types of plagiarism:

  1. Textual plagiarism which provides for copying the documents, reports, essays, scientific papers, etc.
  2. Source Code plagiarism, that is copying or reproducing the whole or the parts of source code.

Both textual and source code plagiarism can be detected manually. However, this option is time- and effort-consuming and not foolproof: first, in most cases, various plagiarism techniques are applied to disguise the fact of the plagiarism; second, in the age of Internet, the submissions can be copied from online resources, and plagiarism check would require inputting multiple lines from a work into search engines.

Thus, the optimum solution to the problem is using automatic Plagiarism Detection tools.

The approaches for the automatic text plagiarism detection and automatic computer code plagiarism detection differ. While the essays, reports, reviews and other written tasks normally get copied from online resources, most code assignments are pretty specific which makes it difficult to find a necessary ready-made code on the Internet. As a result, code plagiarism usually takes place within a class, with students copying each other?s submissions.

The requirements for the originality of the submissions are strict. Thus, there is a wide range of plagiarism methods of different complexity applied to disguise the code plagiarism.

Generally, the following methods are distinguished:

  1. Code formatting (i.e. changes of the indentation depth, insertion/deletion of empty lines, extra line breaks, additional spaces, etc.);
  2. Insertion, modification, or deletion of comments;
  3. Changing names of variables, methods, or classes;
  4. Modification of constant values;
  5. Order change, which can relate to the order of variable declarations, statements, functions or methods within a class;
  6. Code redundancy, e.g. dead code, redundant imports, variables, statements, and methods;
  7. Using comprehensions;
  8. Code refactoring (redesigning)

Which Computer Code Plagiarism Detection Tool is better?

The vast majority of the automatic plagiarism detection tools, depending on the algorithm, is robust only to certain plagiarism methods and prone to the remaining.

At Unicheck we have already contributed to the fight against plagiarism and designed the software which accurately finds similarities in the text. And now we present Computer Code Originality Checker ? an innovation in code plagiarism detection.

Its key features are:

  • innovative fuzzy searching algorithm which helps to detect code sequences that differ in one or several elements (so rephrasing and insertion of additional structures, change of variables and functions names, etc. do not affect the similarity);
  • detailed report in which different types of non-unique sequences (absolute similar and partly similar) are highlighted with the different colors;
  • ability to compare two or multiple files (zip-archive).

Also, the feature of checking the code against the Internet is going to be added soon.

Image for post

Computer Code Originality Checker makes it easy to check if the principle of academic integrity is maintained and all the student assignments are original. So now with Computer Code Originality Checker by Unicheck, you can be 100% sure of plagiarism instead of suspecting it!

Computer Code Originality Checker is currently available for Beta Testing with submissions written in Python programming language (both versions 2.x and 3.x). If you would like to participate in free Beta Testing and see if the service can be useful for you, you can apply for access by filling out the Registration Form https://docs.google.com/forms/d/e/1FAIpQLSfa0dgcWBp70NHFwKk44H5QwpA8Appjwj4eZ7GO5YW62xy7Lw/viewform?usp=sf_link

20