Learning to Translate Kannada and English Queries for Mixed Script Information Retrieval

B. S. Sowmya Lakshmi

Department of Machine Learning B. M. S. College of Engineering Bangalore, Karnataka
B. R. Shambhavi

Department of Information Science and Engineering B. M. S. College of Engineering Bangalore, Karnataka

Learning to Translate Kannada and English Queries for Mixed Script Information Retrieval

keywords: Code mixing, mixed script queries, cross language information retrieval, machine translation

Due to increase in the availability of numerous languages in the Web, cross language information retrieval is one of the happening issues in the field of natural language processing and information retrieval. Nowadays, people are habituated to combine two or more language words during oral or written discourse. Speakers have also employed intermixing of different languages and scripts in digital media while querying, blogging and on social media platforms. The way of representing two different language words of an utterance in their native scripts is known as mixed scripting. In the present work, we attempted to translate mixed script queries of Kannada and English languages into monolingual queries. We proposed three approaches for translation by constructing bilingual dictionary, word embeddings and Google translate. The proposed method outperforms the conventional dictionary based approach, when word embeddings were combined with the translations learnt from Google Translate and Dictionary.

reference: Vol. 40, 2021, No. 3, pp. 628–647

doi: 10.31577/cai_2021_3_628

Computing and Informatics

formerly Computers and Artificial Intelligence

Learning to Translate Kannada and English Queries for Mixed Script Information Retrieval