The Center for Development of Advanced Computing (CDAC) Mumbai is
actively involved in Research and Development into the areas of
Language computing and has produced many products targeted to bridge
the language barrier in the era of digitization.
Knowledge Based Computer Systems (KBCS) at CDAC Mumbai is working in
the field Natural Language Processing (NLP), Machine learning (ML),
Data Mining (DM), Expert systems (ES), etc. and following are some of
the prominent works in the area of multilingual computing
- MaTra
- Xlit
- StatMT
- SuTra
- Rupantar
- ChitranTran
MaTra:
MaTra is a
Fully-Automatic English to Hindi Indicative Machine Translation
System. The approach taken by MaTra is 'Transfer Based' and is very
well appreciated in the research community. MaTra is targeted to work
with text in open domains like World Wide Web documents and news
stories.
System uri :
http://cdacmumbai.in/matra/
Salient Features:
- Hybrid (Rule Based and Statistical) approach to Machine Translation
- Uses target language independent intermediate structured representation and so can be easily adopted for English to other Indian languages machine translation systems
Figure 1:MaTra: Machine
Translation System
Xlit:
Xlit
is a transliteration tool to convert words from English to Indian
languages and back, without losing the phonetic characteristics. It
transliterates the words from English to Indian language, eg.
converts 'bharat' to 'भारत',
'school' to 'स्कूल',
etc. It also suggests more than one option for the given word, like
भरत,
भारात,
बहारत,
etc. for 'bharat'. Prototypes are available for Hindi, Marathi, Urdu
and Kannada. XlitHindi – an extension for the OpenOffice Writer is
available for download at the URL:
System
uri: http://cdacmumbai.in/xlit/editor/
Salient
Features:
- Can be easily integrated into any desktop or web application
- Uses generalized framework for developing any language pair transliteration system
Figure 2
Xlit: A transliteration system
StatMT:
StatMT
is a Statistical Machine Translation (SMT) system which translates
the source language sentences (e.g., English) to target language
sentences (e.g. Hindi, Marathi, Bengali, etc.) using statistical
models. Stat MT system is part of English to Indian Languages
Machine Translation System (E-ILMT) consortium and aims to design and
deploy a Machine Translation System from English to Indian Languages
in Tourism and Healthcare Domains. More information about the system
can be found at http://www.cdacmumbai.in/e-ilmt.
SuTra:
Sutra
is a multi-user translation assistance tool that makes intelligent
suggestions to translators on possible reuse of translations from
older version systems or systems with similar domains. The aim is to
reduce the translator's efforts and make available translated
versions of applications in least possible time. System is released
under open source and is available at
http://sourceforge.net/projects/sutra/.
Figure 3SuTra: An Intelligent Suggestive Translator for Localisation
Rupantar:
Rupantar
is an utility to write in Indian languages using Roman Script. It
also allows you to convert text from one script to other script, ex.
'रमेश'
in Hindi to ‘ரமேஷ்’
in
Tamil. It uses a key map based technique for writing and conversion.
System
uri: http://www.cdacmumbai.in/rupantar
Salient
Features:
- Easy integration with other desktop and web applications
- Fast and lightweight application
Figure 4:
Rupantar
ChitranTran:
ChitranTran
is an utility to extract and transliterate text from images.
Available prototype can extract text in English and Hindi and can
tranliterate that to other Indian languages.
System
uri: http://202.141.152.1/xlit/chitrantran/
Salient
Features:
- Easy integration with other desktop and web applications
- Support for Indian language text extraction and transliteration
Figure 5 ChitranTran