Multilingual multimodal research focuses on collecting resources, developing models, and evaluating systems that need to jointly reason over multilingual text and multimodal inputs, including images, videos, texts, and knowledge bases. Multilingual multimodal NLP presents new and unique challenges. First, it is one of the areas that suffer the most from language imbalance issues. Texts in most multimodal datasets are usually only available in high-resource languages. Second, multilingual multimodal research provides opportunities to investigate culture-related phenomena. On top of the language imbalance issue in text-based corpora and models, the data of additional modalities (e.g. images or videos) are mostly collected from North American and Western European sources (and their worldviews). As a result, multimodal models do not capture our world’s multicultural diversity and do not generalise to out-of-distribution data from minority cultures. The interplay of the two issues leads to extremely poor performance of multilingual multimodal systems in real-life scenarios. This workshop encourages and promotes research efforts towards more inclusive multimodal technologies and tools to assess them. We invite papers which focus on the topics of interest include (but are not limited to):

Invited Talks (In alphabetical order)

David Ifeoluwa Adelani

Saarland University
Multilingual Language Model Adaptive Fine-Tuning: A Study on African Languages aff: Saarland University

Lisa-Anne Hendricks

Digging Deeper into Multimodal Transformers

Lei Ji

Microsoft Research Asia
Multimodal Video Understanding with Language Guidance

Preethi Jyothi

IIT Bombay
New Challenges in Learning with Multilingual and Multimodal Data

Important Dates

Organizers and Contact

Organizers are in the alphabetical order. For any question, please contact mml DOT wksp AT gmail DOT com.

