Bibliography

David Ahn, Valentin Jijkoun, Gilad Mishne, Karin Mu ̈ller, Maarten de Rijke, and Stefan Schlobach. 2004. Using Wikipedia at the TREC QA Track. In Text REtrieval Conference (TREC). Jacob Andreas, Marcus Rohrbach, Trevor Darrell, and Dan Klein. 2016. Learning to com- pose neural networks for question answering. In North American Association for Com- putational Linguistics (NAACL), pages 1545–1554. Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C Lawrence Zitnick, and Devi Parikh. 2015. VQA: Visual Question Answering. In International Conference on Computer Vision (ICCV), pages 2425–2433. So ̈ren Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. 2007. DBpedia: A nucleus for a web of open data. In The Semantic Web, pages 722–735. Springer. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine transla- tion by jointly learning to align and translate. In International Conference on Learning Representations (ICLR). Ondrej Bajgar, Rudolf Kadlec, and Jan Kleindienst. 2016. Embracing data abundance: BookTest dataset for reading comprehension. arXiv preprint arXiv:1610.00956. Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for mt evalu- ation with improved correlation with human judgments. In ACL workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, pages 65–72. Petr Baudisˇ. 2015. YodaQA: a modular question answering system pipeline. In POSTER 2015—19th International Student Conference on Electrical Engineering, pages 1156– 1165. Petr Baudisˇ and Jan Sˇedivy. 2015. Modeling of the question answering task in the Yo- daQA system. In International Conference of the Cross-Language Evaluation Forum for European Languages, pages 222–228. Springer. Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. 2013. Semantic parsing on Freebase from question-answer pairs. In Empirical Methods in Natural Language Processing (EMNLP), pages 1533–1544. Jonathan Berant, Vivek Srikumar, Pei-Chun Chen, Abby Vander Linden, Brittany Hard- ing, Brad Huang, Peter Clark, and Christopher D. Manning. 2014. Modeling biological processes for reading comprehension. In Empirical Methods in Natural Language Pro- cessing (EMNLP), pages 1499–1510. Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. Transactions of the Association of Computa- tional Linguistics (TACL), 5:135–146. Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Free- base: a collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 1247–1250. Antoine Bordes, Nicolas Usunier, Sumit Chopra, and Jason Weston. 2015. Large-scale simple question answering with memory networks. arXiv preprint arXiv:1506.02075. Eric Brill, Susan Dumais, and Michele Banko. 2002. An analysis of the AskMSR question- answering system. In Empirical Methods in Natural Language Processing (EMNLP), pages 257–264. Davide Buscaldi and Paolo Rosso. 2006. Mining knowledge from Wikipedia for the ques- tion answering task. In International Conference on Language Resources and Evaluation (LREC), pages 727–730. Eugene Charniak, Yasemin Altun, Rodrigo de Salvo Braz, Benjamin Garrett, Margaret Kosmala, Tomer Moscovich, Lixin Pang, Changhee Pyo, Ye Sun, Wei Wy, et al. 2000. Reading comprehension programs in a statistical-language-processing class. In ANLP/NAACL Workshop on Reading comprehension tests as evaluation for computer- based language understanding sytems, pages 1–5. Ciprian Chelba, Tomas Mikolov, Mike Schuster, Qi Ge, Thorsten Brants, Phillipp Koehn, and Tony Robinson. 2014. One billion word benchmark for measuring progress in sta- tistical language modeling. In Conference of the International Speech Communication Association (Interspeech). Danqi Chen, Jason Bolton, and Christopher D Manning. 2016. A thorough examination of the CNN/Daily Mail reading comprehension task. In Association for Computational Linguistics (ACL), volume 1, pages 2358–2367. Danqi Chen, Adam Fisch, Jason Weston, and Antoine Bordes. 2017. Reading Wikipedia to answer open-domain questions. In Association for Computational Linguistics (ACL), volume 1, pages 1870–1879. Kyunghyun Cho. 2015. Natural language understanding with distributed representation. arXiv preprint arXiv:1511.07916. Kyunghyun Cho, Bart Merrienboer, Caglar Gulcehre, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Empirical Methods in Natural Language Process- ing (EMNLP), pages 1724–1734. Eunsol Choi, He He, Mohit Iyyer, Mark Yatskar, Wen-tau Yih, Yejin Choi, Percy Liang, and Luke Zettlemoyer. 2018. QuAC: Question answering in context. In Empirical Methods in Natural Language Processing (EMNLP), pages 2174–2184. Christopher Clark and Matt Gardner. 2018. Simple and effective multi-paragraph reading comprehension. In Association for Computational Linguistics (ACL), volume 1, pages 845–855. Cody Coleman, Deepak Narayanan, Daniel Kang, Tian Zhao, Jian Zhang, Luigi Nardi, Peter Bailis, Kunle Olukotun, Chris Re ́, and Matei Zaharia. 2017. DAWNBench: An end-to-end deep learning benchmark and competition. In NIPS ML Systems Workshop. Abhishek Das, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, Jose MF Moura, Devi Parikh, and Dhruv Batra. 2017. Visual dialog. In Conference on computer vision and pattern recognition (CVPR), pages 1080–1089. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre- training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Bhuwan Dhingra, Hanxiao Liu, Ruslan Salakhutdinov, and William W Cohen. 2017a. A comparative study of word embeddings for reading comprehension. arXiv preprint arXiv:1703.00993. Bhuwan Dhingra, Kathryn Mazaitis, and William W Cohen. 2017b. Quasar: Datasets for question answering by search and reading. arXiv preprint arXiv:1707.03904. Matthew Dunn, Levent Sagun, Mike Higgins, V Ugur Guney, Volkan Cirik, and Kyunghyun Cho. 2017. SearchQA: A new Q&A dataset augmented with context from a search engine. arXiv preprint arXiv:1704.05179. Anthony Fader, Luke Zettlemoyer, and Oren Etzioni. 2014. Open question answering over curated and extracted knowledge bases. In SIGKDD Conference on Knowledge Discov- ery and Data Mining (KDD). Angela Fan, Mike Lewis, and Yann Dauphin. 2018. Hierarchical neural story generation. In Association for Computational Linguistics (ACL), volume 1, pages 889–898. David Ferrucci, Eric Brown, Jennifer Chu-Carroll, James Fan, David Gondek, Aditya A Kalyanpur, Adam Lally, J William Murdock, Eric Nyberg, John Prager, et al. 2010. Building Watson: An overview of the DeepQA project. AI magazine, 31(3):59–79. Yarin Gal and Zoubin Ghahramani. 2016. A theoretically grounded application of dropout in recurrent neural networks. In Advances in Neural Information Processing Systems (NIPS), pages 1019–1027. Jianfeng Gao, Michel Galley, and Lihong Li. 2018. Neural approaches to conversational AI. arXiv preprint arXiv:1809.08267. Yoav Goldberg. 2017. Neural network methods for natural language processing, vol- ume 10. Morgan & Claypool Publishers. Clinton Gormley and Zachary Tong. 2015. Elasticsearch: The Definitive Guide. O’Reilly Media, Inc. Jiatao Gu, Zhengdong Lu, Hang Li, and Victor O.K. Li. 2016. Incorporating copying mech- anism in sequence-to-sequence learning. In Association for Computational Linguistics (ACL), pages 1631–1640. Daya Guo, Duyu Tang, Nan Duan, Ming Zhou, and Jian Yin. 2018. Dialog-to-action: Conversational question answering over a large-scale knowledge base. In Advances in Neural Information Processing Systems (NIPS), pages 2943–2952. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learn- ing for image recognition. In Conference on computer vision and pattern recognition (CVPR), pages 770–778. Karl Moritz Hermann, Toma ́sˇ Kocˇisky ́, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. 2015. Teaching machines to read and compre- hend. In Advances in Neural Information Processing Systems (NIPS), pages 1693–1701. Felix Hill, Antoine Bordes, Sumit Chopra, and Jason Weston. 2016. The Goldilocks Prin- ciple: Reading children’s books with explicit memory representations. In International Conference on Learning Representations (ICLR). Lynette Hirschman, Marc Light, Eric Breck, and John D Burger. 1999. Deep read: A reading comprehension system. In Association for Computational Linguistics (ACL), pages 325–332. Sepp Hochreiter and Ju ̈rgen Schmidhuber. 1997. Long short-term memory. Neural Com- putation, 9:1735–1780. Hsin-Yuan Huang, Eunsol Choi, and Wen-tau Yih. 2018a. FlowQA: Grasping flow in history for conversational machine comprehension. arXiv preprint arXiv:1810.06683. Hsin-Yuan Huang, Chenguang Zhu, Yelong Shen, and Weizhu Chen. 2018b. FusionNet: Fusing via fully-aware attention with application to machine comprehension. In Inter- national Conference on Learning Representations (ICLR). Mohit Iyyer, Wen-tau Yih, and Ming-Wei Chang. 2017. Search-based neural structured learning for sequential question answering. In Association for Computational Linguistics (ACL), volume 1, pages 1821–1831. Robin Jia and Percy Liang. 2017. Adversarial examples for evaluating reading comprehen- sion systems. In Empirical Methods in Natural Language Processing (EMNLP), pages 2021–2031. Mandar Joshi, Eunsol Choi, Daniel S Weld, and Luke Zettlemoyer. 2017. TriviaQA: A large scale distantly supervised challenge dataset for reading comprehension. In Association for Computational Linguistics (ACL), volume 1, pages 1601–1611. Divyansh Kaushik and Zachary C. Lipton. 2018. How much reading does reading compre- hension require? A critical investigation of popular benchmarks. In Empirical Methods in Natural Language Processing (EMNLP), pages 5010–5015. Aniruddha Kembhavi, Minjoon Seo, Dustin Schwenk, Jonghyun Choi, Ali Farhadi, and Hannaneh Hajishirzi. 2017. Are you smarter than a sixth grader? Textbook question answering for multimodal machine comprehension. In Conference on computer vision and pattern recognition (CVPR), pages 5376–5384. Daniel Khashabi, Snigdha Chaturvedi, Michael Roth, Shyam Upadhyay, and Dan Roth. 2018. Looking beyond the surface: A challenge set for reading comprehension over mul- tiple sentences. In North American Association for Computational Linguistics (NAACL), volume 1, pages 252–262. Yoon Kim. 2014. Convolutional neural networks for sentence classification. In Empirical Methods in Natural Language Processing (EMNLP), pages 1746–1751. Diederik Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980. Walter Kintsch. 1998. Comprehension: A paradigm for cognition. Cambridge University Press. Guillaume Klein, Yoon Kim, Yuntian Deng, Jean Senellart, and Alexander Rush. 2017. OpenNMT: Open-source toolkit for neural machine translation. pages 67–72. Toma ́sˇ Kocˇisky, Jonathan Schwarz, Phil Blunsom, Chris Dyer, Karl Moritz Hermann, Ga ́abor Melis, and Edward Grefenstette. 2018. The NarrativeQA reading comprehen- sion challenge. Transactions of the Association of Computational Linguistics (TACL), 6:317–328. Julian Kupiec. 1993. MURAX: A robust linguistic approach for question answering using an on-line encyclopedia. In ACM SIGIR conference on Research and development in information retrieval, pages 181–190. Guokun Lai, Qizhe Xie, Hanxiao Liu, Yiming Yang, and Eduard Hovy. 2017. RACE: Large-scale reading comprehension dataset from examinations. In Empirical Methods in Natural Language Processing (EMNLP), pages 785–794. Kenton Lee, Shimi Salant, Tom Kwiatkowski, Ankur Parikh, Dipanjan Das, and Jonathan Berant. 2016. Learning recurrent span representations for extractive question answering. arXiv preprint arXiv:1611.01436. Wendy Grace Lehnert. 1977. The process of question answering. Ph.D. thesis, Yale Uni- versity. Tao Lei, Regina Barzilay, and Tommi Jaakkola. 2016. Rationalizing neural predictions. In Empirical Methods in Natural Language Processing (EMNLP), pages 107–117. Tao Lei, Yu Zhang, Sida I Wang, Hui Dai, and Yoav Artzi. 2018. Simple recurrent units for highly parallelizable recurrence. In Empirical Methods in Natural Language Processing (EMNLP), pages 4470–4481. Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, and Bill Dolan. 2016. A diversity- promoting objective function for neural conversation models. In North American Asso- ciation for Computational Linguistics (NAACL), pages 110–119. Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. Text Summarization Branches Out. Yankai Lin, Haozhe Ji, Zhiyuan Liu, and Maosong Sun. 2018. Denoising distantly super- vised open-domain question answering. In Association for Computational Linguistics (ACL), volume 1, pages 1736–1745. Chia-Wei Liu, Ryan Lowe, Iulian Serban, Mike Noseworthy, Laurent Charlin, and Joelle Pineau. 2016. How NOT to evaluate your dialogue system: An empirical study of unsu- pervised evaluation metrics for dialogue response generation. In Empirical Methods in Natural Language Processing (EMNLP), pages 2122–2132. Xiaodong Liu, Yelong Shen, Kevin Duh, and Jianfeng Gao. 2018. Stochastic answer net- works for machine reading comprehension. In Association for Computational Linguistics (ACL), volume 1, pages 1694–1704. Thang Luong, Hieu Pham, and Christopher D Manning. 2015. Effective approaches to attention-based neural machine translation. In Empirical Methods in Natural Language Processing (EMNLP), pages 1412–1421. Christopher D Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J Bethard, and David McClosky. 2014. The Stanford CoreNLP natural language processing toolkit. In Association for Computational Linguistics (ACL): System Demonstrations, pages 55–60. Bryan McCann, James Bradbury, Caiming Xiong, and Richard Socher. 2017. Learned in translation: Contextualized word vectors. In Advances in Neural Information Processing Systems (NIPS), pages 6297–6308. Tomas Mikolov, Edouard Grave, Piotr Bojanowski, Christian Puhrsch, and Armand Joulin. 2017. Advances in pre-training distributed word representations. arXiv preprint arXiv:1712.09405. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Dis- tributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems (NIPS), pages 3111–3119. Alexander Miller, Will Feng, Dhruv Batra, Antoine Bordes, Adam Fisch, Jiasen Lu, Devi Parikh, and Jason Weston. 2017. ParlAI: A dialog research software platform. In Em- pirical Methods in Natural Language Processing (EMNLP), pages 79–84. Alexander Miller, Adam Fisch, Jesse Dodge, Amir-Hossein Karimi, Antoine Bordes, and Jason Weston. 2016. Key-value memory networks for directly reading documents. In Empirical Methods in Natural Language Processing (EMNLP), pages 1400–1409. Mike Mintz, Steven Bills, Rion Snow, and Daniel Jurafsky. 2009. Distant supervision for relation extraction without labeled data. In Association for Computational Linguistics (ACL), pages 1003–1011. Tom M Mitchell, Justin Betteridge, Andrew Carlson, Estevam Hruschka, and Richard Wang. 2009. Populating the semantic web by macro-reading internet text. In Inter- national Semantic Web Conference (IWSC), pages 998–1002. Dan Moldovan, Sanda Harabagiu, Marius Pasca, Rada Mihalcea, Roxana Girju, Richard Goodrum, and Vasile Rus. 2000. The structure and performance of an open-domain question answering system. In Association for Computational Linguistics (ACL), pages 563–570. Karthik Narasimhan and Regina Barzilay. 2015. Machine comprehension with discourse relations. In Association for Computational Linguistics (ACL), volume 1, pages 1253– 1262. Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. 2016. MS MARCO: A human generated machine reading comprehension dataset. arXiv preprint arXiv:1611.09268. Takeshi Onishi, Hai Wang, Mohit Bansal, Kevin Gimpel, and David McAllester. 2016. Who did what: A large-scale person-centered cloze dataset. In Empirical Methods in Natural Language Processing (EMNLP), pages 2230–2235. Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Association for Computational Lin- guistics (ACL), pages 311–318. Ankur Parikh, Oscar Ta ̈ckstro ̈m, Dipanjan Das, and Jakob Uszkoreit. 2016. A decompos- able attention model for natural language inference. In Empirical Methods in Natural Language Processing (EMNLP), pages 2249–2255. Panupong Pasupat and Percy Liang. 2015. Compositional semantic parsing on semi- structured tables. In Association for Computational Linguistics (ACL), pages 1470–1480. Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vec- tors for word representation. In Empirical Methods in Natural Language Processing (EMNLP), pages 1532–1543. Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In North American Association for Computational Linguistics (NAACL), volume 1, pages 2227– 2237. Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. Technical report, OpenAI. Martin Raison, Pierre-Emmanuel Mazare ́, Rajarshi Das, and Antoine Bordes. 2018. Weaver: Deep co-encoding of questions and documents for machine reading. arXiv preprint arXiv:1804.10490. Pranav Rajpurkar, Robin Jia, and Percy Liang. 2018. Know what you don’t know: Unan- swerable questions for SQuAD. In Association for Computational Linguistics (ACL), volume 2, pages 784–789. Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000+ questions for machine comprehension of text. In Empirical Methods in Natu- ral Language Processing (EMNLP), pages 2383–2392. Marc’Aurelio Ranzato, Sumit Chopra, Michael Auli, and Wojciech Zaremba. 2016. Se- quence level training with recurrent neural networks. In International Conference on Learning Representations (ICLR). Siva Reddy, Danqi Chen, and Christopher D Manning. 2019. CoQA: A conversational question answering challenge. Transactions of the Association of Computational Lin- guistics (TACL). Accepted pending revisions. Matthew Richardson, Christopher J.C. Burges, and Erin Renshaw. 2013. MCTest: A chal- lenge dataset for the open-domain machine comprehension of text. In Empirical Methods in Natural Language Processing (EMNLP), pages 193–203. Ellen Riloff and Michael Thelen. 2000. A rule-based question answering system for reading comprehension tests. In ANLP/NAACL Workshop on Reading comprehension tests as evaluation for computer-based language understanding sytems, pages 13–19. Pum-Mo Ryu, Myung-Gil Jang, and Hyun-Ki Kim. 2014. Open domain question answer- ing using Wikipedia-based knowledge model. Information Processing & Management, 50:683–692. Mrinmaya Sachan, Kumar Dubey, Eric Xing, and Matthew Richardson. 2015. Learning answer-entailing structures for machine comprehension. In Association for Computa- tional Linguistics (ACL), volume 1, pages 239–249. Marzieh Saeidi, Max Bartolo, Patrick Lewis, Sameer Singh, Tim Rockta ̈schel, Mike Shel- don, Guillaume Bouchard, and Sebastian Riedel. 2018. Interpretation of natural language rules in conversational machine reading. In Empirical Methods in Natural Language Processing (EMNLP), pages 2087–2097. Amrita Saha, Vardaan Pahuja, Mitesh M. Khapra, Karthik Sankaranarayanan, and Sarath Chandar. 2018. Complex sequential question answering: Towards learning to converse over linked question answer pairs with a knowledge graph. In Conference on Artificial Intelligence (AAAI). Roger C Schank and Robert P Abelson. 1977. Scripts, plans, goals and understanding: An inquiry into human knowledge structures. Lawrence Erlbaum. Abigail See, Peter J Liu, and Christopher D Manning. 2017. Get to the point: Summa- rization with pointer-generator networks. In Association for Computational Linguistics (ACL), volume 1, pages 1073–1083. Minjoon Seo, Aniruddha Kembhavi, Ali Farhadi, and Hannaneh Hajishirzi. 2017. Bidi- rectional attention flow for machine comprehension. In International Conference on Learning Representations (ICLR). Minjoon Seo, Sewon Min, Ali Farhadi, and Hannaneh Hajishirzi. 2018. Neural speed read- ing via Skim-RNN. In International Conference on Learning Representations (ICLR). Shiqi Shen, Yong Cheng, Zhongjun He, Wei He, Hua Wu, Maosong Sun, and Yang Liu. 2016. Minimum risk training for neural machine translation. In Association for Compu- tational Linguistics (ACL), volume 1, pages 1683–1692. Robert F Simmons, Sheldon Klein, and Keren McConlogue. 1964. Indexing and depen- dency logic for answering English questions. American Documentation, 15(3):196–204. Rupesh K Srivastava, Klaus Greff, and Ju ̈rgen Schmidhuber. 2015. Training very deep networks. In Advances in Neural Information Processing Systems (NIPS), pages 2377– 2385. Saku Sugawara, Kentaro Inui, Satoshi Sekine, and Akiko Aizawa. 2018. What makes reading comprehension questions easier? In Empirical Methods in Natural Language Processing (EMNLP), pages 4208–4219. Saku Sugawara, Yusuke Kido, Hikaru Yokono, and Akiko Aizawa. 2017. Evaluation met- rics for machine reading comprehension: Prerequisite skills and readability. In Associa- tion for Computational Linguistics (ACL), volume 1, pages 806–817. Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems (NIPS), pages 3104–3112. Alon Talmor and Jonathan Berant. 2018. The web as a knowledge-base for answering com- plex questions. In North American Association for Computational Linguistics (NAACL), volume 1, pages 641–651. Makarand Tapaswi, Yukun Zhu, Rainer Stiefelhagen, Antonio Torralba, Raquel Urtasun, and Sanja Fidler. 2016. MovieQA: Understanding stories in movies through question- answering. In Conference on computer vision and pattern recognition (CVPR), pages 4631–4640. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Ad- vances in Neural Information Processing Systems (NIPS), pages 5998–6008. Oriol Vinyals and Quoc Le. 2015. A neural conversational model. arXiv preprint arXiv:1506.05869. Ellen M Voorhees. 1999. The TREC-8 question answering track report. In Text REtrieval Conference (TREC), pages 77–82. Hai Wang, Mohit Bansal, Kevin Gimpel, and David McAllester. 2015. Machine compre- hension with syntax, frames, and semantics. In Association for Computational Linguis- tics (ACL), volume 2, pages 700–706. Shuohang Wang and Jing Jiang. 2017. Machine comprehension using Match-LSTM and answer pointer. In International Conference on Learning Representations (ICLR). Shuohang Wang, Mo Yu, Xiaoxiao Guo, Zhiguo Wang, Tim Klinger, Wei Zhang, Shiyu Chang, Gerald Tesauro, Bowen Zhou, and Jing Jiang. 2018a. Rˆ3: Reinforced reader- ranker for open-domain question answering. In Conference on Artificial Intelligence (AAAI). Shuohang Wang, Mo Yu, Jing Jiang, Wei Zhang, Xiaoxiao Guo, Shiyu Chang, Zhiguo Wang, Tim Klinger, Gerald Tesauro, and Murray Campbell. 2018b. Evidence aggrega- tion for answer re-ranking in open-domain question answering. In International Confer- ence on Learning Representations (ICLR). Wenhui Wang, Nan Yang, Furu Wei, Baobao Chang, and Ming Zhou. 2017. Gated self- matching networks for reading comprehension and question answering. In Association for Computational Linguistics (ACL), volume 1, pages 189–198. Kilian Weinberger, Anirban Dasgupta, John Langford, Alex Smola, and Josh Attenberg. 2009. Feature hashing for large scale multitask learning. In International Conference on Machine Learning (ICML), pages 1113–1120. Johannes Welbl, Nelson F Liu, and Matt Gardner. 2017. Crowdsourcing multiple choice science questions. In 3rd Workshop on Noisy User-generated Text, pages 94–106. Johannes Welbl, Pontus Stenetorp, and Sebastian Riedel. 2018. Constructing datasets for multi-hop reading comprehension across documents. Transactions of the Association for Computational Linguistics, 6:287–302. Jason Weston, Sumit Chopra, and Antoine Bordes. 2015. Memory networks. In Interna- tional Conference on Learning Representations (ICLR). Qiang Wu, Christopher JC Burges, Krysta M Svore, and Jianfeng Gao. 2010. Adapting boosting for information retrieval measures. Information Retrieval, 13(3):254–270. Pengtao Xie and Eric Xing. 2017. A constituent-centric neural architecture for reading comprehension. In Association for Computational Linguistics (ACL), volume 1, pages 1405–1414. Caiming Xiong, Victor Zhong, and Richard Socher. 2017. Dynamic coattention net- works for question answering. In International Conference on Learning Representations (ICLR). Caiming Xiong, Victor Zhong, and Richard Socher. 2018. DCN+: Mixed objective and deep residual coattention for question answering. In International Conference on Learn- ing Representations (ICLR). Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William Cohen, Ruslan Salakhut- dinov, and Christopher D Manning. 2018. HotpotQA: A dataset for diverse, explainable multi-hop question answering. In Empirical Methods in Natural Language Processing (EMNLP), pages 2369–2380. Xuchen Yao, Jonathan Berant, and Benjamin Van Durme. 2014. Freebase QA: Information extraction or semantic parsing? In ACL 2014 Workshop on Semantic Parsing, pages 82–86. Adams Wei Yu, David Dohan, Minh-Thang Luong, Rui Zhao, Kai Chen, Mohammad Norouzi, and Quoc V Le. 2018. QANet: Combining local convolution with global self- attention for reading comprehension. In International Conference on Learning Repre- sentations (ICLR). Adams Wei Yu, Hongrae Lee, and Quoc Le. 2017. Learning to skim text. In Association for Computational Linguistics (ACL), volume 1, pages 1880–1890. Saizheng Zhang, Emily Dinan, Jack Urbanek, Arthur Szlam, Douwe Kiela, and Jason We- ston. 2018. Personalizing dialogue agents: I have a dog, do you have pets too? In Association for Computational Linguistics (ACL), volume 1, pages 2204–2213.