Monday, May 7, 2012

Multilingual Computing at C-DAC Mumbai


The Center for Development of Advanced Computing (CDAC) Mumbai is actively involved in Research and Development into the areas of Language computing and has produced many products targeted to bridge the language barrier in the era of digitization.

Knowledge Based Computer Systems (KBCS) at CDAC Mumbai is working in the field Natural Language Processing (NLP), Machine learning (ML), Data Mining (DM), Expert systems (ES), etc. and following are some of the prominent works in the area of multilingual computing

  1. MaTra
  2. Xlit
  3. StatMT
  4. SuTra
  5. Rupantar
  6. ChitranTran

MaTra:
MaTra is a Fully-Automatic English to Hindi Indicative Machine Translation System. The approach taken by MaTra is 'Transfer Based' and is very well appreciated in the research community. MaTra is targeted to work with text in open domains like World Wide Web documents and news stories.
Salient Features:
  • Hybrid (Rule Based and Statistical) approach to Machine Translation
  • Uses target language independent intermediate structured representation and so can be easily adopted for English to other Indian languages machine translation systems
Figure 1:MaTra: Machine Translation System


Xlit:
Xlit is a transliteration tool to convert words from English to Indian languages and back, without losing the phonetic characteristics. It transliterates the words from English to Indian language, eg. converts 'bharat' to 'भारत', 'school' to 'स्कूल', etc. It also suggests more than one option for the given word, like भरत, भारात, बहारत, etc. for 'bharat'. Prototypes are available for Hindi, Marathi, Urdu and Kannada. XlitHindi – an extension for the OpenOffice Writer is available for download at the URL:
Salient Features:
  • Can be easily integrated into any desktop or web application
  • Uses generalized framework for developing any language pair transliteration system
Figure 2 Xlit: A transliteration system




StatMT:
StatMT is a Statistical Machine Translation (SMT) system which translates the source language sentences (e.g., English) to target language sentences (e.g. Hindi, Marathi, Bengali, etc.) using statistical models. Stat MT system is part of English to Indian Languages Machine Translation System (E-ILMT) consortium and aims to design and deploy a Machine Translation System from English to Indian Languages in Tourism and Healthcare Domains. More information about the system can be found at http://www.cdacmumbai.in/e-ilmt.
SuTra:
Sutra is a multi-user translation assistance tool that makes intelligent suggestions to translators on possible reuse of translations from older version systems or systems with similar domains. The aim is to reduce the translator's efforts and make available translated versions of applications in least possible time. System is released under open source and is available at http://sourceforge.net/projects/sutra/.

Figure 3SuTra: An Intelligent Suggestive Translator for Localisation


Rupantar:
Rupantar is an utility to write in Indian languages using Roman Script. It also allows you to convert text from one script to other script, ex. 'रमेश' in Hindi to ரமேஷ்in Tamil. It uses a key map based technique for writing and conversion.
Salient Features:
  • Easy integration with other desktop and web applications
  • Fast and lightweight application
Figure 4: Rupantar


ChitranTran:
ChitranTran is an utility to extract and transliterate text from images. Available prototype can extract text in English and Hindi and can tranliterate that to other Indian languages.
Salient Features:
  • Easy integration with other desktop and web applications
  • Support for Indian language text extraction and transliteration

Figure 5 ChitranTran

Friday, April 1, 2011

Abdul Kalam Pitches A Multilingual, Mobile Web at WWW conference 2011

APJ Abdul Kalam, Former President of India, speaking at the World Wide Web conference in Hyderabad made a pitch for a multilingual web, saying that in its current form, the World Wide Web has its shortcomings – “The language barrier is the biggest hinderance in making the Web truly democratic. Originally the lingua franca of the web was mainly English, and while the situation has started to change, much more needs to be done. The development of a country is directly determined by the amount of content in the countrys native language available on the web.” More interestingly, Kalam suggested cross-lingual access to the web, saying that knowledge grows by sharing, and language should not be an impediment here. He said that rural folk need to be convinced that the web is useful for them, and at present, the community on the web tends to generate content for its own consumption.

Some of the important points from his speech:
Kalam had the following suggestions for the development of the World Wide Web
- To look for solutions on how a mobile device can provide integrated solutions of 3G and 4G applications in its mother tongue. For a farmer, the price of agricultural products, for a fisherman, the market price of fish.
- For Web 2.0 and 3.0 (the semantic web) to enhance services in native languages and the web to offer access without any barriers of language, cost, creed or geographical barriers.
- For the mobile to become a personal authentication device, and for money transactions through the mobile to be highly secure.
- For sensors incorporated into a mobile device holder, to be able to transmit data related to a patient, and get the doctors advice / consultancy
- More societal applications, given large bandwidth that 4g offers, which involve farmers and villagers who are less empowered. The future of the web is going to shift from connecting the corporates to connecting the individual in the rural society.
Kalam also spoke of a societal grid, combining the National Knowledge Network, a Healthcare Grid, an e-governance grid and a Rural grid.

Courtesy and more detail news here at Medianama


Thursday, February 10, 2011

SIGAI Workshop on Emerging Research Trends in Artificial Intelligence (ERTAI - 2011)



CSI Logo

SIGAI Workshop on Emerging Research Trends

in Artificial Intelligence (ERTAI - 2011)

19th - 21st June, 2011, C-DAC, E-City Bengaluru, India

Supported by Computer Society of India (CSI)

C-DAC Logo

About The Workshop

Based on the success of the previous ERTAI workshop, conducted in 2010, CSI- SIGAI has decided to announce the next ERTAI workshop to be held during June 19-21 2011 at C-DAC Electronics City Bengaluru. The backdrop of this workshop remains the same as that of last year. That is, through ERTAI, we plan to continue to provide a forum where those pursuing research in AI can exchange ideas and seek guidance. And those who are seeking to enter into AI research can also get a valuable feel of current research going on in various streams in AI in industry as well as in academia. ERTAI 2011 will enable new and aspiring research scholars to identify relevant and useful research topics and get guidance on their approach and direction.
We invite papers that describe work-in-progress by various research scholars spanning many areas including language processing, multi-agent systems, web mining, information retrieval, semantic web, e-learning, optimization problems, pattern recognition, etc. We also invite suggestions on relevant topics for invited talks -- both in technical areas and and research methology areas. The detailed program for the workshop is being finalized and will be announced shortly.

Proposed Structure of Workshop

It will be a three day programme consisting of,
  • Invited talks covering current trends, specific challenges, etc. in Artificial Intelligence
  • Invited talks on mentoring research scholars on publication, research methodology, etc.
  • Presentations by those currently pursuing research in AI area.
We will have a panel of experienced researchers to evaluate and mentor the research presentations.

Call For Papers

For the research presentations, we are now inviting brief research papers of 5-6 pages, outlining the problem being addressed, approach followed vis-à-vis existing approaches, current status / results, and future plans. A subset will be short-listed for presentation, based on a formal review process. Papers must have significant AI content to be considered for presentation. Relevant topics include (but are not limited to):

Knowledge Representation
Reasoning

Model-Based Learning
Expert Systems

Data Mining
State Space Search

Cognitive Systems
Vision & Perception

Intelligent User Interfaces
Reactive AI

Ambient Intelligence
Artificial Life

Evolutionary Computing
Fuzzy Systems

Uncertainty in AI
Machine Learning

Constraint Satisfaction
Ontologies

Natural Language Processing
Pattern Recognition

Intelligent Agents
Soft Computing

Planning & Scheduling
Neural Networks

Case-Based Reasoning

A one-page call for papers for the ERTAI - 2011 workshop may be obtained from here

Target Audience

Target audience will be primarily:
  • Faculty members pursuing research involving AI as the base or as a tool for an application.
  • Faculty members interested in pursuing research and exploring areas / options.
  • Research scholars working for a post graduate degree.
  • Students seriously interested in research, specifically on AI.

Important Dates

  • Full paper submission deadline - April 30, 2011
  • Acceptance intimation - May 25, 2011
  • Camera ready copy due -  June 05, 2011

ERTAI Secretariat

ERTAI Secretariat
Centre for Development of Advanced Computing
68, Electronics City, Bengaluru 560100.
India.
Telephone: +91 80 28523300
Fax: +91 80 28522590
Email: csi.sigai@gmail.com This e-mail address is being protected from spambots. You need JavaScript enabled to view it
Web: http://sigai.cdacmumbai.in/

Friday, May 28, 2010

Advanced Statistics and Data Mining - Summer School

A Summer School on Advanced Statistics and Data Mining is being organized by the Artificial Intelligence Department of the Computer Science Faculty of the Univ. Politécnica de Madrid. from June 28, 2010 to July 09,2010 in Madrid, Spain. More information can be found at:

http://www.dia.fi.upm.es/index.php?page=presentation&hl=es_ES

Tuesday, May 4, 2010

ACL 2010

The 48th Annual Meeting of the Association for Computational Linguistics will be held in Uppsala, Sweden, July 11–16, 2010. The conference will be organized by the Department of Linguistics and Philology at Uppsala University.
More details can be found at : http://acl2010.org/index.htm