================ LREC 2018 Reviews for Submission #560 ============================================================================ Title: Visual Choice of Plausible Alternatives: An Evaluation of Image-based Commonsense Causal Reasoning Authors: Jinyoung Yeo, Gyeongbok Lee, Gengyu Wang, Seungtaek Choi, Hyunsouk Cho, Reinald Kim Amplayo and Seung-won Hwang ============================================================================ REVIEWER #1 ============================================================================ --------------------------------------------------------------------------- Comments --------------------------------------------------------------------------- The paper introduces a novel task of visual commonsense causal reasoning (VCOPA) as a variant of the COPA task (Choice of Plausible Alternatives). The task is well motivated and the differences (introduced by visual questions) with respect to the text-based task are clearly explained through examples. Besides introducing the task, the paper also presents a dataset created by the authors (380 questions with >1,000 images). The creation procedure is clearly explained (with nice examples about the main difficulties it involves) and sound. Moreover, besides the simplest “random” baseline, an experiment with a more competitive baseline (at least in principle) is presented. Results are poor, but they make it clear that the task is not only interesting but also challenging. Maybe, more advanced methods could have been tried, but this is clearly not the focus of the paper. A section discussing the main challenges involved in the task concludes the paper. I particularly liked it because it sketches possible research paths that are worth exploring. ============================================================================ REVIEWER #2 ============================================================================ --------------------------------------------------------------------------- Comments --------------------------------------------------------------------------- This paper introduces a new reasoning task, at the intersection of two domains, knowledge reasoning (KR) and computer vision (CV). The contribution of this work is threefold: (i) corpus constitution methodology, (ii) new task description and analysis, and (iii) baseline system description and analysis. All these contributions may lead to very interesting researches at the intersection of the KR and CV domains. The new task (VCOPA) described in this paper is based on the COPA task (Choice of Plausible Alternatives) but adapted to CV and image analysis: given a premise (textual premise for COPA, visual premise for VCOPA), systems have to choose between two alternatives (textual for COPA, visual for VCOPA). The paper is well structured and the contributions are easy to follow and understand. The corpus design is well described along with the methodological questions. The resulting corpus is comparable to the COPA one. While the VCOPA task aims at supporting research on causal reasoning on images only, the way it is built allows to support other research scenarios such as multimodal reasoning by associating textual questions to each images. A analysis of the different categories of causal relations is provided. I would like to have a corpus description providing statistical information for all these categories (visual disambiguation, temporal disambiguation etc.). Moreover, concerning the corpus description, I would like to have more information on the distribution of the data between dev and test (how many copa-like questions in dev and test? how is the distribution of the different causal relation categories between dev and test done, etc.). Concerning the results provided, I would like to have significance measures between the results with automatic or manual captioning and between the baseline and a random system. ============================================================================ REVIEWER #3 ============================================================================ --------------------------------------------------------------------------- Comments --------------------------------------------------------------------------- This is a nice, well written paper. The core of the paper is to creation and provision of a new multimedia corpus. A baseline experiment is presented that demonstrates the usage of the data. The authors are self-reflexive enough to understand that that baseline is not perfect.