MaTra: Multilingual Computing at C-DAC Mumbai

The Center for Development of Advanced Computing (CDAC) Mumbai is actively involved in Research and Development into the areas of Language computing and has produced many products targeted to bridge the language barrier in the era of digitization.

Knowledge Based Computer Systems (KBCS) at CDAC Mumbai is working in the field Natural Language Processing (NLP), Machine learning (ML), Data Mining (DM), Expert systems (ES), etc. and following are some of the prominent works in the area of multilingual computing

MaTra
Xlit
StatMT
SuTra
Rupantar
ChitranTran

MaTra:

MaTra is a Fully-Automatic English to Hindi Indicative Machine Translation System. The approach taken by MaTra is 'Transfer Based' and is very well appreciated in the research community. MaTra is targeted to work with text in open domains like World Wide Web documents and news stories.

System uri : http://cdacmumbai.in/matra/

Salient Features:

Hybrid (Rule Based and Statistical) approach to Machine Translation
Uses target language independent intermediate structured representation and so can be easily adopted for English to other Indian languages machine translation systems

Figure 1:MaTra: Machine Translation System

Xlit:

Xlit is a transliteration tool to convert words from English to Indian languages and back, without losing the phonetic characteristics. It transliterates the words from English to Indian language, eg. converts 'bharat' to 'भारत', 'school' to 'स्कूल', etc. It also suggests more than one option for the given word, like भरत, भारात, बहारत, etc. for 'bharat'. Prototypes are available for Hindi, Marathi, Urdu and Kannada. XlitHindi – an extension for the OpenOffice Writer is available for download at the URL:

http://extensions.services.openoffice.org/project/xlithindi

System uri: http://cdacmumbai.in/xlit/editor/

Salient Features:

Can be easily integrated into any desktop or web application
Uses generalized framework for developing any language pair transliteration system

Figure 2 Xlit: A transliteration system

StatMT:

StatMT is a Statistical Machine Translation (SMT) system which translates the source language sentences (e.g., English) to target language sentences (e.g. Hindi, Marathi, Bengali, etc.) using statistical models. Stat MT system is part of English to Indian Languages Machine Translation System (E-ILMT) consortium and aims to design and deploy a Machine Translation System from English to Indian Languages in Tourism and Healthcare Domains. More information about the system can be found at http://www.cdacmumbai.in/e-ilmt.

SuTra:

Sutra is a multi-user translation assistance tool that makes intelligent suggestions to translators on possible reuse of translations from older version systems or systems with similar domains. The aim is to reduce the translator's efforts and make available translated versions of applications in least possible time. System is released under open source and is available at http://sourceforge.net/projects/sutra/.

Figure 3SuTra: An Intelligent Suggestive Translator for Localisation

Rupantar:

Rupantar is an utility to write in Indian languages using Roman Script. It also allows you to convert text from one script to other script, ex. 'रमेश' in Hindi to ‘ரமேஷ்’ in Tamil. It uses a key map based technique for writing and conversion.

System uri: http://www.cdacmumbai.in/rupantar

Salient Features:

Easy integration with other desktop and web applications
Fast and lightweight application

Figure 4: Rupantar

ChitranTran:

ChitranTran is an utility to extract and transliterate text from images. Available prototype can extract text in English and Hindi and can tranliterate that to other Indian languages.

System uri: http://202.141.152.1/xlit/chitrantran/

Salient Features: