IMPROVING TRANSLATION QUALITY BETWEEN LANGUAGES: ACHIEVEMENTS AND OPPORTUNITIES IN ENGLISH-KAZAKH TRANSLATION

Authors: Rakhimova D.R., Zhiger A.Zh., Malykh V., Karyukin V.I., Bekarystanqyzy А.
IRSTI 20.19.00

Abstract. Machine translation is one of the rapidly developing and widely used modern technological fields. The process of globalization and the need for multilingual communication have significantly increased the importance of this area. To facilitate information exchange and mutual understanding between different countries and cultures, machine translation tools are being widely used. Specifically, systems such as Google Translate and Yandex Translator are among the most popular and effective platforms on an international level. These systems annually introduce new algorithms and language models to improve translation quality. However, recent research has shown that translations from English to Kazakh and other Turkic languages still remain at a low level. This result is primarily related to the complex morphological and syntactic structure of the Kazakh language, as well as word order and contextual meaning. The aim of this research is to propose effective methods for improving the quality of neural machine translation from English to Kazakh through the adaptation of transformer models and post-editing techniques. For this purpose, a transformer model adapted for Kazakh and other Turkic languages was developed on the OpenNMT platform and trained on a parallel corpus of 180,000 sentences. The evaluation of the translation results was carried out using the BLEU metric. Additionally, the post-editing phase was implemented with the Kaz-RoBERTa model to improve translation quality. The results of the study demonstrated that increasing the quality and volume of parallel data, as well as adapting the transformer model to the linguistic characteristics of a specific language, significantly enhances the accuracy and clarity of the translation.

Keywords: neural machine translation, BLEU translation metric, parallel corpus, open neural machine translation, transformer model, post-editing, Kaz-RoBERTa model.