Cross-Lingual Information Retrieval (CLIR)

Cross-Lingual Information Retrieval (CLIR) is the task of ranking foreign documents against a user query. As multilingual documents are more accessible, CLIR is increasingly more important whenever the relevant information is in other languages. In this project, we collborate with researches from Cambridge University, the University of Maryland, Edinburgh University, and Columbia to work on the Machine Translation for English Retrieval of Information in Any Language (MATERIAL) Program funded by IARPA. Given a query in English, we aim to develop a system to retrieve text and speech documents in low-resource languages such as Swahili, Tagalog, and Somali. It will also translate the documents into English and produce a summary of relevant information. At Yale, we are especially (1) developing learning-to-rank methods requiring minimal amounts of training data so the system can be rapidly adapted to new languages and domains, and (2) investigating effective and simple algorithms to combine outputs from different retrieval systems.