A Multi-Indicator AST-Based Approach for Code Similarity Detection in Programming Education
DOI:
https://doi.org/10.24036/jtip.v19i2.1140Keywords:
Programming, Code, Similarity, Abstract Syntax Tree, DetectionAbstract
Evaluation of programming assignments in higher education often faces challenges in ensuring code originality, particularly when students apply cosmetic modifications such as renaming identifiers or altering formatting and comments. This study proposes a multi-indicator code similarity detection system based on Abstract Syntax Tree (AST) analysis to support more objective assessment of programming tasks. The proposed approach analyzes structural, logical, and stylistic aspects of Python programs, combining AST-based analysis with string similarity techniques for comment evaluation. A configurable weighting mechanism is introduced to allow flexible adjustment of similarity assessment according to different evaluation objectives. Experimental results on student programming assignments demonstrate that the system effectively distinguishes varying levels of code similarity and provides consistent similarity measurements. In addition, block-level analysis enables more fine-grained identification of similar code segments. The findings indicate that the proposed method is robust against cosmetic code modifications and can support more systematic and transparent programming assessment in educational contexts.
References
S. Schleimer, D. S. Wilkerson, and A. Aiken, “Winnowing: Local Algorithms for Document Fingerprinting,” in SIGMOD 2003, ACM, Jun. 2003, pp. 76–85. doi: 10.1145/872757.872770.
L. Prechelt, G. Malpohl, and M. Philippsen, “Finding Plagiarisms among a Set of Programs with JPlag,” Journal of Universal Computer Science, vol. 8, pp. 1016–1038, Nov. 2002, [Online]. Available: http://www.jplag.de.
M. Zakeri-Nasrabadi, S. Parsa, M. Ramezani, C. Roy, and M. Ekhtiarzadeh, “A systematic literature review on source code similarity measurement and clone detection: techniques, applications, and challenges,” 2023, doi: 10.1016/j.jss.2023.111796.
G. Lukácsy and P. Szeredi, “Plagiarism detection in source programs using structural similarities,” Acta Cybernetica, vol. 19, no. 1, pp. 191–216, 2009, doi: 10.14232/actacyb.19.1.2009.13.
G. Lee, J. Kim, M. S. Choi, R. Y. Jang, and R. Lee, “Review of Code Similarity and Plagiarism Detection Research Studies,” Applied Sciences (Switzerland), vol. 13, no. 20, Oct. 2023, doi: 10.3390/app132011358.
A. Sheneamer, S. Roy, and J. Kalita, “An Effective Semantic Code Clone Detection Framework Using Pairwise Feature Fusion,” IEEE Access, vol. 9, pp. 84828–84844, 2021, doi: 10.1109/ACCESS.2021.3079156.
Y. Song, C. Lothritz, D. Tang, T. F. Bissyandé, and J. Klein, “Revisiting Code Similarity Evaluation with Abstract Syntax Tree Edit Distance,” in Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Jun. 2024. doi: 10.18653/v1/2024.acl-short.3.
A. Sheneamer and J. Kalita, “A Survey of Software Clone Detection Techniques,” Int. J. Comput. Appl., vol. 137, no. 10, pp. 975–8887, 2016, doi: 10.5120/ijca2016908896.
Y. Mohammed Khazaal and Y. Hammo, “Survey on Software Code Clone Detection,” TECHNIUM, vol. 4, no. 3, pp. 28–36, 2022, doi: 10.47577/technium.v4i3.6361.
C. K. Roy, J. R. Cordy, and R. Koschke, “Comparison and evaluation of code clone detection techniques and tools: A qualitative approach,” Sci. Comput. Program., vol. 74, no. 7, pp. 470–495, May 2009, doi: 10.1016/j.scico.2009.02.007.
E. Hosam, M. Hadhoud, A. Atiya, and M. Fayek, “Classification feature sets for source code plagiarism detection in Java,” Journal of Engineering and Applied Science, vol. 69, no. 1, Dec. 2022, doi: 10.1186/s44147-022-00155-8.
W. Wang, G. Li, B. Ma, X. Xia, and Z. Jin, “Detecting Code Clones with Graph Neural Networkand Flow-Augmented Abstract Syntax Tree,” in Conference: 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER), Feb. 2020. doi: doi.org/10.48550/arXiv.2002.08653.
M. A. Umar, “A Study of Software Testing: Categories, Levels, Techniques, and Types,” Jun. 29, 2020. doi: 10.36227/techrxiv.12578714.v1.
Y. and S. W. and F. S. and W. C. and Z. D. and J. H. Wu, “Fine-Grained Code Clone Detection by Keywords-Based Connection of Program Dependency Graph,” IEEE Trans. Reliab., pp. 1–15, 2025.
Gergely Lucacsy and Peter Szeredi, “Plagiarism Detection in Source Programs Using Structural Similarities,” Acta Cybern, vol. 19, pp. 191–216, 2009.
H. E. Wahanani, M. H. Prami Swari, and F. A. Akbar, “Case based Reasoning Prediksi Waktu Studi Mahasiswa Menggunakan Metode Euclidean Distance dan Normalisasi Min-Max,” Jurnal Teknologi Informasi dan Ilmu Komputer, vol. 7, no. 6, p. 1279, Dec. 2020, doi: 10.25126/jtiik.2020763880.
S. Kalhor and M. R. Keyvanpour, “Weighted Content Similarity Feature for Software Architecture Anti-Patterns Prediction,” International Journal of Web Research (IJWR), vol. 8, no. 3, pp. 33–43, 2025, [Online]. Available: https://ijwr.usc.ac.ir/article_226898.html
B. Kim, K. Lim, S.-J. Cho, and M. Park, “RomaDroid: A Robust and Efficient Technique for Detecting Android App Clones Using a Tree Structure and Components of Each App’s Manifest File,” IEEE Access, vol. 7, pp. 72182–72196, 2019, doi: 10.1109/ACCESS.2019.2920314.
C. K. Roy, J. R. Cordy, and R. Koschke, “Comparison and evaluation of code clone detection techniques and tools: A qualitative approach,” Sci. Comput. Program., vol. 74, no. 7, pp. 470–495, May 2009, doi: 10.1016/j.scico.2009.02.007.
Y. Wang, D. Liu, and M. Hou, “Study of Clone Code Detection Method,” in Proceedings of the 3rd International Conference on Computer Engineering, Information Science & Application Technology (ICCIA 2019), Paris, France: Atlantis Press, 2019. doi: 10.2991/iccia-19.2019.64.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Jurnal Teknologi Informasi dan Pendidikan

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.












.png)













