Development of an Interactive Chatbot Using Sahabat-AI Model with Retrieval-Augmented Generation Method

Fauzi Isyrin Apridal; Humaira; Fitri Nova

doi:10.24036/jtip.v19i1.1105

Authors

Fauzi Isyrin Apridal Politeknik Negeri Padang
Humaira Politeknik Negeri Padang
Fitri Nova Politeknik Negeri Padang

DOI:

https://doi.org/10.24036/jtip.v19i1.1105

Keywords:

Chatbot, Sahabat-AI, RAG, Llama, FAISS

Abstract

The rapid advancement of language-based artificial intelligence, particularly Large Language Models, has enabled the development of adaptive and context-aware virtual assistants. This research aims to develop an interactive chatbot for Politeknik Negeri Padang utilizing the Sahabat-AI model (Gemma2 9B CPT), a large-scale model specialized in Bahasa Indonesia and local dialects (Javanese, Sundanese), combined with the Retrieval-Augmented Generation (RAG) method to enhance document-based answer accuracy. The system architecture integrates a Streamlit-based user interface supporting text/voice input and multilingual output, an automated web-scraping module using Scrapy to update institutional data, a structured knowledge base in Supabase, and a semantic vector search with FAISS. The development process followed a systematic design and implementation approach, with the RAG pipeline incorporating all-indo-e5-small-v4 embeddings to ensure semantic relevance. Performance evaluation using LangSmith demonstrated that Sahabat-AI outperformed Llama 3, achieving an average score of 0.84 (correctness: 0.89, relevance: 0.90, groundedness: 0.80, retrieval quality: 0.77) in Indonesian language testing. The chatbot exhibited strong local language understanding, scoring 0.74 for Javanese and 0.71 for Sundanese, while reducing hallucinations through RAG integration. Black-box testing confirmed the reliability of multimodal features such as speech-to-text and text-to-speech. The findings contribute to the development of the first Sahabat-AI–based multilingual chatbot for Politeknik Negeri Padang, integrating automated document retrieval and embedding pipelines for efficient information services.

References

D. R. Arikkat et al., “IntellBot: Retrieval Augmented LLM Chatbot for Cyber Threat Knowledge Delivery,” Nov. 2024, [Online]. Available: http://arxiv.org/abs/2411.05442

A. Fitriansyah Universitas Borobudur, “Edusight International Journal of Multidisciplinary Studies IMPLEMENTATION OF TECHNOLOGY AND INFORMATION DEVELOPMENTS IN IMPROVING INDONESIAN EDUCATION,” 2024.

N. Mayasari et al., “Effectiveness of Using Artificial Intelligence Learning Tools and Customized Curriculum on Improving Students’ Critical Thinking Skills in Indonesia,” The Eastasouth Journal of Learning and Educations, vol. 2, no. 02, pp. 111–118, 2024, doi: 10.58812/esle.v2i02.

S. Saprudin, “Artificial Intelligence Function Management in Supporting the Process of Government Implementation and Public Services in Indonesia,” Journal of Management and Administration Provision, vol. 4, no. 1, pp. 88–96, Jul. 2024, doi: 10.55885/jmap.v4i1.352.

I. Pujiono, I. M. Agtyaputra, and Y. Ruldeviyani, “IMPLEMENTING RETRIEVAL-AUGMENTED GENERATION AND VECTOR DATABASES FOR CHATBOTS IN PUBLIC SERVICES AGENCIES CONTEXT,” JITK (Jurnal Ilmu Pengetahuan dan Teknologi Komputer), vol. 10, no. 1, pp. 216–223, Aug. 2024, doi: 10.33480/jitk.v10i1.5572.

A. Grattafiori et al., “The Llama 3 Herd of Models,” Nov. 2024, [Online]. Available: http://arxiv.org/abs/2407.21783

P. Lewis et al., “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks,” May 2020, [Online]. Available: http://arxiv.org/abs/2005.11401

A. Fadli Dwi Mulya, “PENGEMBANGAN CHATBOT INTERAKTIF MENGGUNAKAN LARGE LANGUAGE MODEL DENGAN METODE RETRIEVAL-AUGMENTED GENERATION,” Padang, Jul. 2024.

I. O. H. GoTo, “GoToCompany/gemma2-9b-cpt-sahabatai-v1-instruct · Hugging Face.” Accessed: Dec. 17, 2024. [Online]. Available: https://huggingface.co/GoToCompany/gemma2-9b-cpt-sahabatai-v1-instruct

Y. Dikilitaş et al., “Performance Analysis for Web Scraping Tools: Case Studies on Beautifulsoup, Scrapy, Htmlunit and Jsoup,” in Lecture Notes in Networks and Systems, Springer Science and Business Media Deutschland GmbH, 2024, pp. 471–480. doi: 10.1007/978-3-031-56728-5_39.

H. Chaib, M. El Asikri1, S. Krit, and H. Chaib, “Using Web Scraping In A Knowledge Environment To Build Ontologies Using Python And Scrapy Article in,” Eur J Transl Clin Med, 2020, [Online]. Available: https://www.researchgate.net/publication/346215371

N. K. Kahlon and W. Singh, “Comparative Analysis of Web Scraping Tools for Low-Resource Language Text,” International Journal of Engineering Trends and Technology, vol. 72, no. 1, pp. 284–299, Jan. 2024, doi: 10.14445/22315381/IJETT-V72I1P128.

A. Ulfah and I. Najiah, “IMPLEMENTASI WEB SCRAPING PADA SITUS JURNAL SINTA MENGGUNAKAN FRAMEWORK SELENIUM WEBDRIVER PYTHON,” JIKA (Jurnal Informatika), vol. 7, no. 1, p. 29, Feb. 2023, doi: 10.31000/jika.v7i1.7037.

H. J. Yang, E. B. Oh, and J. M. Kim, “Comparison of Automatic Speech Recognition System for School-aged Children’s Narratives: Naver Clova Speech and Google Speech-to-Text,” Communication Sciences and Disorders, vol. 28, no. 1, pp. 30–38, 2023, doi: 10.12963/csd.23952.

N. ’ Arrizqi, I. Santoso, D. Yosua, and A. A. Soetrisno, “IMPLEMENTASI GOOGLE TEXT TO SPEECH PADA APLIKASI PENDETEKSI UANG BERBASIS ANDROID,” Jurnal Ilmiah Teknik Elektro Undip, vol. 10, no. 3, pp. 2685–0206, 2021, doi: https://doi.org/10.14710/transient.v10i3.510-516.

A. Zewdie Ayezabu, “Ayezabu Amanuel Supabase vs Firebase: Evaluation of performance and development of Progressive Web Apps,” 2022. Accessed: May 22, 2025. [Online]. Available: https://www.theseus.fi/handle/10024/771009

M. A. H. Wadud, M. F. Mridha, and M. M. Rahman, “Word Embedding Methods for Word Representation in Deep Learning for Natural Language Processing,” Iraqi Journal of Science, vol. 63, no. 3, pp. 1349–1361, 2022, doi: 10.24996/ijs.2022.63.3.37.

L. Wang et al., “Text Embeddings by Weakly-Supervised Contrastive Pre-training,” Feb. 2024, [Online]. Available: http://arxiv.org/abs/2212.03533

M. Douze et al., “The Faiss library,” Jan. 2024, doi: https://doi.org/10.48550/arXiv.2401.08281.

H. Naveed et al., “A Comprehensive Overview of Large Language Models,” Jul. 2023, [Online]. Available: http://arxiv.org/abs/2307.06435

F. Koto, N. Aisyah, H. Li, and T. Baldwin, “Large Language Models Only Pass Primary School Exams in Indonesia: A Comprehensive Test on IndoMMLU Department Natural Language Processing, MBZUAI,” 1237. [Online]. Available: https://github.com/

R. Replicate, “Replicate - Run AI with an API.,” https://replicate.com/.

S. Samanta, A. Saha, P. Pramanick, S. Zaman, and A. Mitra, “Application of Python-Based Streamlit App for Evaluating Growth and Feed Efficiency in Pterophyllum scalare,” Parana Journal of Science and Education, vol. 11, no. 11, pp. 5–12, 2025, [Online]. Available: https://sites.google.com/site/pjsciencea

M. Fariz, S. Lazuardy, and D. Anggraini, “Modern Front End Web Architectures with React.Js and Next.Js,” International Research Journal of Advanced Engineering and Science, vol. 7, no. 1, pp. 132–141, 2022.

J. Jones, W. Jiang, N. Synovic, G. K. Thiruvathukal, and J. C. Davis, “What do we know about Hugging Face? A systematic literature review and quantitative validation of qualitative claims,” Jun. 2024, doi: 10.1145/3674805.3686665.

L. Langsmith, “Evaluate a RAG application | LangSmith,” https://docs.smith.langchain.com/evaluation/tutorials/rag.

V. Mavroudis, “LangChain,” 2024. Accessed: May 22, 2025. [Online]. Available: https://hal.science/hal-04817573/

L. Zheng et al., “Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena,” Dec. 2023, [Online]. Available: http://arxiv.org/abs/2306.05685

cron-job.org, “Cron Job,” https://cron-job.org/en/.

J. Kryk and M. Plechawska-Wójcik, “Multi-aspect comparative analysis of JavaScript programming frameworks - React.js and Solid.js,” 2025.

A. Maspupah, “LITERATURE REVIEW: ADVANTAGES AND DISADVANTAGES OF BLACK BOX AND WHITE BOX TESTING METHODS,” Jurnal Techno Nusa Mandiri, vol. 21, no. 2, pp. 151–162, Sep. 2024, doi: 10.33480/techno.v21i2.5776.

Development of an Interactive Chatbot Using Sahabat-AI Model with Retrieval-Augmented Generation Method

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License