MML 2022 Shared Task

The multilingual multimodal learning (MML) workshop, co-located at ACL 2022, is hosting a shared task on multilingual visually grounded reasoning. The task will be centred around the MaRVL dataset, introduced by Liu & Bugliarello et al. (EMNLP 2021). This dataset extends the NLVR2 task (Suhr et al., ACL 2019) to multicultural and multilingual (Indonesian, Mandarin, Swahili, Tamil, Turkish) inputs: Given two images and a textual description, a system needs to predict whether the description applies to both images (True/False).

The standard setup consists of fine-tuning a multilingual vision-and-language model in the English NLVR2 dataset and then evaluating on MaRVL. We consider two subtasks, as detailed below: zero-shot transfer and few-shot transfer. Both setups have been shown to be challenging (Bugliarello et al., 2022), and we look forward to seeing your approaches to the tasks!

Participants will be invited to describe their system in a paper for the MML workshop proceedings. The task organisers will write an overview paper that describes the task and summarises the different approaches taken, and analyses their results.

Here are the links to quickly download the data for the shared task:

Preprocessed image features are available for two visual encoders:

The IGLUE repository contains sample code and pretrained models to help you get started. Open issues on GitHub or reach out to us for any doubts.

Important Dates


The shared task will consist of two subtasks:

NB: we will only consider submissions that use pre-existing pre-trained models that are publicly available or new models that have been (pre)trained on publicly available data.

“Translate test” methods are accepted but will be ranked separately.

The MaRVL Benchmark

We list existing baseline results for the subtasks below. The numbers are copied from the IGLUE benchmark (Bugliarello et al., 2022) and is average of the five languages. A leaderboard is availible at

ZS results:
Rank Model Accuracy (%)
1 UC2 (Zhou et al., 2021) 57.28
2 M3P (Ni et al., 2021) 56.00
3 xUNITER (Liu & Bugliarello et al., 2021) 54.59
4 mUNITER (Liu & Bugliarello et al., 2021) 53.72
FS results:
Rank Model Accuracy (%)
1 UC2 (Zhou et al., 2021) 58.32
2 xUNITER (Liu & Bugliarello et al., 2021) 57.46
3 mUNITER (Liu & Bugliarello et al., 2021) 53.41
4 M3P (Ni et al., 2021) 49.79


Submissions should be emailed to the organisers by the end of April 30, anywhere on Earth. Submissions need to follow the jsonlines format, where languages are in ISO 639-2 codes:

{"concept": "39-Panci", "language": "id", "chapter": "Basic actions and technology", "id": "id-0", "prediction": true}

Files should be named as {team-name}_{zs/fs}_{xl/tt}_{lang}.jsonl to indicate the subtask (zero-shot or few-shot), whether it’s cross-lingual or translate-test transfer, and the target language.

Description Papers

Papers describing shared task submissions should consist of 4 to 8 pages of content plus additional pages of references, formatted according to the ARR format guidelines for ACL 2022. For shared task paper submission, it is not necessary to blind the team name and authors. Accepted papers will be published online in the ACL 2022 proceedings and will be presented at the MML workshop at ACL 2022. Writeups should be submitted through OpenReview, and are due by 30 April 2022 11:59pm [UTC-12h].


Please contact mml DOT wksp AT gmail DOT com if you have any questions.