printernero.blogg.se

Macaw vs antetype
Macaw vs antetype






The numbers in parentheses denotes the number of questions in each category.Ĭategories are ordered by average-of-averages (highest to lowest), i.e., the models together perform best on general knowledge and worst on false presuppositions. Figure 3: Average score of the four different models on different categories of questions (ignoringĬategories with less than five questions). We select a subset that seem the most interesting, as listed in Table 3, and use these for fine-tuning for 6k further steps. With five available input/output slots there is a plethora of possible angles to train on. Same, but in absence of retrieved context Only generate answer Generate answer without access to MC options Also include explanation in input Generate plausible MC options given question, answer and context Generate plausible question and MC options, given answer and context Figure 2: The different slots (input/output elements) and sample angles supported by Macaw. Which surface is best for roller-skating? (A) gravel (B) blacktop (C) sand blacktop A wheeled vehicle requires smooth surfaces.ĭescription Generate answer and explanation given question, choices and context (primary angle). Examples are given in Figure 2 (upper box) and Section 2.6.Įxample value Roller skating is a popular hobby these days. The semantics of slots are defined by what Macaw saw during training (Section 3).ĭuring training, the context (C) contains either a passage or retrieved text relevant to the question, and the explanation (E) consists of a few (typically two or three) general sentences relevant to the answer (but not a formal chain of reasoning). More examples (both good and bad) are given in Section 5. How could this be? → A: the doctor mistook the boy for his own son Figure 1: A few examples of the more impressive answers given by Macaw. Q: How would you make a house conduct electricity? → A: paint it with a metal paint Q: What is an implication of a cat being an animal? → A: it has feelings Q: Could a newborn look over the top of a fully grown horseradish plant? → A: no, because the plant is so tall Q: Why couldn’t Aristotle have used a cellphone? → A: he was born before cellphones Q: A young boy was rushed to the hospital emergency room, but the ER doctor saw the boy and refused to operate. We hope Macaw proves useful for the community. In this paper, we describe how Macaw was trained, illustrate the variety of capabilities it has,Īnd also identify various question classes it still struggles with. General-purpose, T5-based QA system that we have found to be to be surprisingly strong, zero-shot.

macaw vs antetype

Similarly, the new Jurassic-1 model from AI21 Labs 3 3 3 Īppears powerful, but again did not perform as well in our QA tests (Section 4.2). 2 2 2 There are other T5-CBQA versions alternatively trained on WebQuestions and TriviaQA that we did not evaluate, although NaturalQuestions is arguably the most general and varied of the alternate training sets. One nearest to our goal is Google’s T5-based CBQA (closed-book QA) system (Roberts et al., 2020),īut in our tests of the T5-CBQA model trained on Natural Questions (Kwiatkowski et al., 2019), it did not perform as well as Macaw GPT-3 appears powerful, but is not freely available to the public (Brown et al., 2020). Mainly trained for span prediction and multiple-choice selection UnifiedQA (Khashabi et al., 2020a) is a powerful QA system, but Question-answering (QA) systems freely available. 1 1 1 Macaw is available at \nocopyright 1 IntroductionĪlthough modern pretrained language models have proved surprisingly effective at solving datasets,Į.g., (Radford et al., 2018 Raffel et al., 2020 Khashabi et al., 2020a), there are still few high-quality, general-purpose, off-the-shelf Macaw is freely available, and we hope that it proves useful to the community. Offering insights into the limitations of pretrained language models. We also identify question classes where it still appears to struggle,

macaw vs antetype

Good answers, well outside the training setup. We describe the system, and illustrate a variety of question types where it produces surprisingly Produce a question or take an answer and question, and produce multiple-choice options. To be used, for example Macaw can take a question and produce an answer or take an answer and In addition, Macaw allows different permutations (“angles”) of its inputs and outputs Including outperforming GPT-3 by over 10% (absolute) on Challenge300, a suite of 300 challenge questions,ĭespite being an order of magnitude smaller (11 billion vs. Macaw is built on UnifiedQA, itself built on T5, and exhibits strong performance, zero-shot, on a wide variety of topics, We present Macaw, a versatile, generative question-answering (QA) system that we are making available to the community. General-purpose QA systems that are freely available. Despite the successes of pretrained language models, there are still few high-quality,








Macaw vs antetype