Grammatical Analysis of Pretrained Sentence Encoders with Acceptability Judgments

01/11/2019
by   Alex Warstadt, et al.
NYU college
0

Recent pretrained sentence encoders achieve state of the art results on language understanding tasks, but does this mean they have implicit knowledge of syntactic structures? We introduce a grammatically annotated development set for the Corpus of Linguistic Acceptability (CoLA; Warstadt et al., 2018), which we use to investigate the grammatical knowledge of three pretrained encoders, including the popular OpenAI Transformer (Radford et al., 2018) and BERT (Devlin et al., 2018). We fine-tune these encoders to do acceptability classification over CoLA and compare the models' performance on the annotated analysis set. Some phenomena, e.g. modification by adjuncts, are easy to learn for all models, while others, e.g. long-distance movement, are learned effectively only by models with strong overall performance, and others still, e.g. morphological agreement, are hardly learned by any model.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 7

04/22/2019

Real-time Inference in Multi-sentence Tasks with Deep Pretrained Transformers

The use of deep pretrained bidirectional transformers has led to remarka...
04/29/2020

Asking without Telling: Exploring Latent Ontologies in Contextual Representations

The success of pretrained contextual encoders, such as ELMo and BERT, ha...
12/10/2020

Infusing Finetuning with Semantic Dependencies

For natural language processing systems, two kinds of evidence support t...
10/02/2020

Which *BERT? A Survey Organizing Contextualized Encoders

Pretrained contextualized text encoders are now a staple of the NLP comm...
05/15/2019

What do you learn from context? Probing for sentence structure in contextualized word representations

Contextualized representation models such as ELMo (Peters et al., 2018a)...
08/07/2017

Shortcut-Stacked Sentence Encoders for Multi-Domain Inference

We present a simple sequential sentence encoder for multi-domain natural...
05/24/2019

Human vs. Muppet: A Conservative Estimate of HumanPerformance on the GLUE Benchmark

The GLUE benchmark (Wang et al., 2019b) is a suite of language understan...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The effectiveness and ubiquity of pretrained sentence embeddings for natural language understanding has grown dramatically in recent years. Recent sentence encoders like OpenAI’s Generative Pretrained Transformer (GPT; Radford et al., 2018) and BERT Devlin et al. (2018) achieve the state of the art on the GLUE benchmark Wang et al. (2018). Among the GLUE tasks, these state-of-the-art systems make their greatest gains on the acceptability task with the Corpus of Linguistic Acceptability (CoLA; Warstadt et al., 2018). CoLA contains example sentences from linguistics publications labeled by experts for grammatical acceptability, and written to show subtle grammatical features. Because minimal syntactic differences can separate acceptable sentences from unacceptable ones (What did Bo write a book about? / *What was a book about written by Bo?

), and acceptability classifiers are more reliable when trained on GPT and BERT than on recurrent models, it stands to reason that GPT and BERT have better implicit knowledge of syntactic features relevant to acceptability. Our goal in this paper is to develop an evaluation dataset that can locate which syntactic features that a model successfully learns by identifying the syntactic domains of CoLA in which it performs the best. Using this evaluation set, we compare the syntactic knowledge of GPT and BERT in detail, and investigate the strengths of these models over the baseline BiLSTM model published by warstadt2018neural. The analysis set includes expert annotations labeling the entire CoLA development set for the presence of 63 fine-grained syntactic features. We identify many specific syntactic features that make sentences harder to classify, and many that have little effect. For instance, sentences involving unusual or marked argument structures are no harder than the average sentence, while sentences with long distance dependencies are hard to learn. We also find features of sentences that accentuate or minimize the differences between models. Specifically, the transformer models seem to learn long-distance dependencies much better than the recurrent model, yet have no advantage on sentences with morphological violations.

2 Related Work

Sentence Embeddings

Acceptability

Sentence

Simple

Locative

PP Arg-VP

High Arity

Passive

Binding:Other

Emb Q

Complex QP

Modal

Raising

Trans-Adj

Coord

Ellipsis/Anaphor

Comparative

Infl/Agr Violation

Extra/Missing Expr

The magazines were sent by Mary to herself.
John can kick the ball.
* I know that Meg’s attracted to Harry, but they don’t know who.
They kicked them
Which topic did you choose without getting his approval?
* It was believed to be illegal by them to do that.
* Us love they.
* The more does Bill smoke, the more Susan hates him.
I ate a salad that was filled with lima beans.
That surprised me.
Table 1: A random sample of sentences from the CoLA development set, shown with their original acceptability labels (✓= acceptable, *=unacceptable) and with a subset of our new phenomenon-level annotations.

Robust pretrained word embeddings like word2vec Mikolov et al. (2013) and GloVe Pennington et al. (2014)

have been extemely successful and widely adopted in machine learning applications for language understanding. Recent research tries to reproduce this success at the sentence level, in the form of reusable sentence embeddings with pretrained weights. These representations are useful for language understanding tasks that require a model to classify a single sentence, as in sentiment analysis and acceptability classification; or a pair of sentences, as in paraphrase detection and natural language inference (NLI); or that require a model to generate text based on an input text, as in question-answering. Early work in this area primarily uses recurrent models like Long Short-Term Memory

(Hochreiter and Schmidhuber, 1997, LSTM) networks to reduce variable length sequences into fixed-length sentence embeddings. Current state of the art sentence encoders are pretrained on language modeling or related tasks with unlabeled-data. Among these, ELMo Peters et al. (2018) uses a BiLSTM architecture, while GPT Radford et al. (2018) and BERT Devlin et al. (2018) use the Transformer architecture Vaswani et al. (2017). Unlike most earlier approaches where the weights of the encoder are frozen after pretraining, the last two fine-tune the encoder on the downstream task. With additional fine-tuning on secondary tasks like NLI, these are the top performing models on the GLUE benchmark Phang et al. (2018).

Sentence Embedding Analysis

The evaluation and analysis of sentence embeddings is an active area of research. One branch of this work uses probing tasks which can reveal how much syntactic information a sentence embedding encodes about, for instance, tense and voice Shi et al. (2016), sentence length and word content Adi et al. (2017), or syntactic depth and morphological number Conneau et al. (2018). Related work indirectly probes features of sentence embeddings using language understanding tasks with custom datasets manipulating specific grammatical features. linzen2016assessing uses several tasks including acceptability classification of sentences with manipulated verbal inflection to investigate whether LSTMs can identify violations in subject-verb agreement, and therefore a (potentially long distance) syntactic dependency. ettinger2018assessing test whether sentence embeddings encode the scope of negation and semantic roles using semi-automatically generated sentences exhibiting carefully controlled syntactic variation. kann2019verb also semi-automatically generate data and use acceptability classification to test whether word and sentence embeddings encode information about verbs and their argument structures.

CoLA & Acceptability Classification

The Corpus of Linguistic Acceptability Warstadt et al. (2018) is a dataset of 10k example sentences including expert annotations for grammatical acceptability. The sentences are example sentences taken from 23 theoretical linguistics publications, mostly about syntax, including undergraduate textbooks, research articles, and dissertations. Such example sentences are usually labeled for acceptability by their authors or a small group of native English speakers. A small random sample of the CoLA development set (with our added annotations) can be seen in Table 1

. Within computational linguistics, the acceptability classification task has been explored in various settings. lawrence2000grammatical train RNNs to do acceptability classification over sequences of POS tags corresponding to example sentences from a syntax textbook. wagner2009grammaticality also train RNNs, but using naturally occurring sentences that have been automatically manipulated to be unacceptable. lau2016cognitive predict acceptability from language model probabilities, applying this technique to sentences from a syntax textbook, and sentences which were translated round-trip through various languages. Lau et al. attempt to model gradient crowd-sourced acceptability judgments, rather than binary expert judgments. This reflects an ongoing debate about whether binary expert judgments like those in CoLA are reliable

Gibson and Fedorenko (2010); Sprouse and Almeida (2012). We remain agnostic as to the role of binary judgments in linguistic theory, taking the expert judgments in CoLA at face value. However, warstadt2018neural measure human performance on a subset of CoLA (see Table 4

), finding that new human annotators, while not in perfect agreement with the judgments in CoLA, still outperform the best neural network models by a wide margin.

3 Analysis Set

We introduce a grammatically annotated version of the entire CoLA development set to facilitate detailed error analysis of acceptability classifiers. These 1043 sentences are expert-labeled for the presence of 63 minor grammatical features organized into 15 major features. Each minor feature belongs to a single major feature. A sentence belongs to a major feature if it belongs to one or more of the relevant minor features. The Appendix includes descriptions of each feature along with examples and the criteria used for annotation. The 63 minor features and 15 major features are illustrated in Table 2. Considering minor features, an average of 4.31 features is present per sentence (SD=2.59). The average feature is present in 71.3 sentences (SD=54.7). Turning to major features, the average sentence belongs to 3.22 major features (SD=1.66), and the average major feature is present in 224 sentences (SD=112). Every sentence is labeled with at least one feature.

Major Feature () Minor Features ()
Simple (87) Simple (87)
Pred (256) Copula (187), Pred/SC (45), Result/Depictive (26)
Adjunct (226) Particle (33), VP Adjunct (162), NP Adjunct (52), Temporal (49), Locative (69), Misc Adjunct (75)
Arg Types (428) Oblique (141), PP Arg VP (242), PP Arg NP/AP (81), by-Phrase (58), Expletive (78)
Arg Altern (421) High Arity (253), Drop Arg (112), Add Arg (91), Passive (114)
Imperative (12) Imperatives (12)
Bind (121) Binding:Refl (60), Binding:Other (62)
Question (222) Matrix Q (56), Emb Q (99), Pied Piping (80), Rel Clause (76), Island (22)
Comp Clause (190) CP Subj (15), CP Arg VP (110), CP Arg NP/AP (26), Non-finite CP (24), No C-izer (41), Deep Embed (30)
Auxiliary (340) Neg (111), Modal (134), Aux (201), Psuedo-Aux (26)
to-VP (170) Control (80), Raising (19), VP+Extract (26), VP Arg NP/AP (33), Non-finite VP Misc (38)
N, Adj (278) Deverbal (53), Rel NP (65), Trans NP (21), Compx NP (106), NNCompd (35), Rel Adj (26), Trans Adj (39)
S-Syntax (286) Dislocation (56), Info Struc (31), Frag/Paren (9), Coord (158), Subordinate/Cond (41), Ellipsis/Anaphor (118), S-Adjunct (30)
Determiner (178) Quantifier (139), Partitive (18), NPI/FCI (29), Comparative (25)
Violations (145) Sem Violation (31), Infl/Agr Violation (62), Extra/Missing Expr (65)
Table 2: Major features and their associated minor features (with number of occurrences ).

3.1 Annotation

The sentences were annotated manually by one of the authors, who is a PhD student with extensive training in formal linguistics. The features were developed in a trial stage, in which the annotator performed a similar annotation with different annotation schema for several hundred sentences from CoLA not belonging to the development set.

3.2 Feature Descriptions

Here we briefly summarize the feature set in order of the major features. Many of these constructions are well-studied in syntax, and further background can be found in textbooks such as adger2003core and sportiche2013introduction.

Simple

This major feature contains only one minor feature, simple, including sentences with a syntactically simplex subject and predicate.

Pred(icate)

These three features correspond to predicative phrases, including copular constructions, small clauses (I saw Bo jump), and resultatives/depictives (Bo wiped the table clean).

Adjunct

These six features mark various kinds of optional modifiers. This includes modifiers of NPs (The boy with blue eyes gasped) or VPs (The cat meowed all morning), and temporal (Bo swam yesterday) or locative (Bo jumped on the bed).

Argument types

These five features identify syntactically selected arguments, differentiating, for example, obliques (I gave a book to Bo), PP arguments of NPs and VPs (Bo voted for Jones), and expletives (It seems that Bo left).

Argument Alternations

These four features mark VPs with unusual argument structures, including added arguments (I baked Bo a cake) or dropped arguments (Bo knows), and the passive (I was applauded).

Imperative

This contains only one feature for imperative clauses (Stop it!).

Bind

These are two minor features, one for bound reflexives (Bo loves himself), and one for other bound pronouns (Bo thinks he won).

Question

These five features apply to sentences with question-like properties. They mark whether the interrogative is an embedded clause (I know who you are), a matrix clause (Who are you?), or a relative clause (Bo saw the guy who left); whether it contains an island out of which extraction is unacceptable (*What was a picture of hanging on the wall?); or whether there is pied-piping or a multi-word wh-expressions (With whom did you eat?).

Comp(lement) Clause

These six features apply to various complement clauses (CPs), including subject CPs (That Bo won

is odd

); CP arguments of VPs or NPs/APs (The fact that Bo won); CPs missing a complementizer (I think Bo’s crazy); or non-finite CPs (This is ready for you to eat).

Aux(iliary)

These four minor features mark the presence of auxiliary or modal verbs (I can win), negation, or “pseudo-auxiliaries” (I have to win).

to-VP

These five features mark various infinitival embedded VPs, including control VPs (Bo wants to win); raising VPs (Bo seemed to fly); VP arguments of NPs or APs (Bo is eager to eat); and VPs with extraction (e.g. This is easy to read ts ).

N(oun), Adj(ective)

These seven features mark complex NPs and APs, including ones with PP arguments (Bo is fond of Mo), or CP/VP arguments; noun-noun compounds (Bo ate mud pie); modified NPs, and NPs derived from verbs (Baking is fun).

S-Syntax

These seven features mark various unrelated syntactic constructions, including dislocated phrases (The boy left who was here earlier); movement related to focus or information structure (This I’ve gotta see this); coordination, subordinate clauses, and ellipsis (I can’t); or sentence-level adjuncts (Apparently, it’s raining).

Determiner

These four features mark various determiners, including quantifiers, partitives (two of the boys), negative polarity items (I *do/don’t have any pie), and comparative constructions.

Violations

These three features apply only to unacceptable sentences, and only ones which are ungrammatical due to a semantic or morphological violation, or the presence or absence of a single salient word.

3.3 Correlations

We wish to emphasize that these features are overlapping and in many cases are correlated, thus not all results from using this analysis set will be independent. We analyzed the pairwise Matthews Correlation Coefficient (MCC; Matthews, 1975) of the 63 minor features (giving 1953 pairs), and of the 15 major features (giving 105 pairs). MCC is a special case of Pearson’s for Boolean variables.111MCC measures correlation of two binary distributions, giving a value between -1 and 1. On average, any two unrelated distributions will have a score of 0, regardless of class imbalance. This is contrast to metrics like accuracy or F1, which favor classifiers with a majority-class bias. These results are summarized in Table 3. Regarding the minor features, 60 pairs had a correlation of 0.2 or greater, 17 had a correlation of 0.4 or greater, and 6 had a correlation of 0.6 or greater. None had an anti-correlation of greater magnitude than -0.17. Turning to the major features, 6 pairs had a correlation of 0.2 or greater, and 2 had an anti-correlation of greater magnitude than -0.2. We can see at least three reasons for these observed correlations. First, some correlations can be attributed to overlapping feature definitions. For instance, expletive arguments (e.g. There are birds singing) are, by definition, non-canonical arguments, and thus are a subset of add arg. However, some added arguments, such as benefactives (Bo baked Mo a cake), are not expletives. Second, some correlations can be attributed to grammatical properties of the relevant constructions. For instance, question and aux are correlated because main-clause questions in English require subject-aux inversion and in many cases the insertion of auxiliary do (Do lions meow?). Third, some correlations may be a consequence of the sources sampled in CoLA and the phenomena they focus on. For instance, the unusually high correlation of Emb-Q and ellipsis/anaphor can be attributed to (Chung et al., 1995), which is an article about the sluicing construction involving ellipsis of an embedded interrogative (e.g. I saw someone, but I don’t know who). Finally, two strongest anti-correlations between major features are between simple and the two features related to argument structure, argument types and arg altern. This follows from the definition of simple, which excludes any sentence containing a large number or unusual configuration of arguments.

Label 1 Label 2 MCC
Minor Features
PP Arg NP/AP Rel NP 0.755
by-Phrase Passive 0.679
Coord Ellipsis/Anaphor 0.634
VP Arg NP/AP Trans Adj 0.628
NP Adjunct Compx NP 0.623
Oblique High Arity 0.620
RC Compx NP 0.565
Expletive Add Arg 0.558
CP Arg NP/AP Trans NP 0.546
PP Arg NP/AP Rel Adj 0.528
VP Adjunct Temporal 0.518
Oblique PP Arg VP 0.507
VP Adjunct Misc Adjunct 0.485
Emb Q Ellipsis/Anaphor 0.463
VP Adjunct Locative 0.418
Drop Arg Passive 0.414
Matrix Q Pied Piping 0.411
Major Features
Argument Types Arg Altern 0.406
Question Auxiliary 0.273
Question S-Syntax 0.232
Predicate N, Adj 0.231
Auxiliary S-Syntax 0.224
Question N, Adj 0.211
Simple Arg Altern -0.227
Simple Argument Types -0.238
Table 3: Correlation (MCC) of features in the annotated analysis set. We display only the correlations with the greatest magnitude.
Figure 1: Performance (MCC) on CoLA analysis set by major feature. Dashed lines show mean performance on all of CoLA.

4 Models Evaluated

We train MLP acceptability classifiers for CoLA on top of three sentence encoders: (1) the CoLA baseline encoder with ELMo-style embeddings, (2) OpenAI GPT, and (3) BERT. We use publicly available sentence encoders with pretrained weights.222CoLA baseline: https://github.com/nyu-mll/CoLA-baselines
OpenAI GPT: https://github.com/openai/finetune-transformer-lm
BERT: https://github.com/google-research/bert

LSTM encoder: CoLA baseline

The CoLA baseline model is the sentence encoder with the highest performance on CoLA from Warstadt et al.

The encoder uses a BiLSTM, which reads the sentence word-by-word in both directions, with max-pooling over the hidden states. Similar to ELMo

Peters et al. (2018)

, the inputs to the BiLSTM are the hidden states of a language model (only a forward language model is used in contrast with ELMo). The encoder is trained on a real/fake discrimination task which requires it to identify whether a sentence is naturally occurring or automatically generated. We train acceptability classifiers on CoLA using the CoLA baselines codebase with 20 random restarts, following the original authors’ transfer-learning approach: The sentence encoder’s weights are frozen, and the sentence embedding serves as input to an MLP with a single hidden layer. All hyperparameters are held constant across restarts.

Transformer encoders: GPT and BERT

In contrast with recurrent models, GPT and BERT use a self attention mechanism which combines representations for each (possibly non-adjacent) pair of words to give a sentence embedding. GPT is trained using a standard language modeling task, while BERT is trained with masked language modeling and next sentence prediction tasks. For each encoder, we use the jiant toolkit333https://github.com/jsalt18-sentence-repl/jiant to train 20 random restarts on CoLA feeding the pretrained models published by these authors into a single output layer. Following the methods of the original authors, we fine-tune the encoders during training on CoLA. All hyperparameters are held constant across restarts.

5 Results

5.1 Overall CoLA Results

The overall performance of the three sentence encoders is shown in Table 4. Performance on CoLA is measured using MCC Warstadt et al. (2018). We present the best single restart for each encoder, the mean over restarts for an encoder, and the result of ensembling the restarts for a given encoder, i.e. taking the majority classification for a given sentence, or the majority label of acceptable if tied.444Because we use the development set for analysis, we do not use it to weight models for weighted ensembling. For BERT results, we exclude 5 out of the 20 restarts because they were degenerate (MCC=0). Across the board, BERT outperforms GPT, which outperforms the CoLA baseline. However, BERT and GPT are much closer in performance than they are to CoLA baseline. While ensemble performance exceeded the average for BERT and GPT, it did not outperform the best single model.

Mean (STD) Max Ensemble
CoLA 0.320 (0.007) 0.330 0.320
GPT 0.528 (0.023) 0.575 0.567
BERT 0.582 (0.032) 0.622 0.601
Human 0.697 (0.042) 0.726 0.761
Table 4:

Performance (MCC) on the CoLA test set, including mean over restarts of a given model with standard deviation, max over restarts, and majority prediction over restarts. Human performance is measured by

Warstadt et al.

5.2 Analysis Set Results

Figure 2: Performance (MCC) on CoLA analysis set by minor feature. Dashed lines show mean performance on all of CoLA.

The results for the major features and minor features are shown in Figures 1 and 2, respectively. For each feature, we measure the MCC of the sentences including that feature. We plot the mean of these results across the different restarts for each model, and error bars mark the mean standard deviation. For the Violations features, MCC is technically undefined because these features only contain unacceptable sentences. We report MCC in these cases by including for each feature a single acceptable example that is correctly classified by all models. Comparison across features reveals that the presence of certain features has a large effect on performance, and we comment on some overall patterns below. Within a given feature, the effect of model type is overwhelmingly stable, and resembles the overall difference in performance. However, we observe several interactions, i.e. specific features where the relative performance of models does not track their overall relative performance.

Comparing Features

Among the major features (Figure 1), performance is universally highest on the simple sentences, and is higher than each model’s overall performance. Though these sentences are simple, we notice that the proportion of ungrammatical ones is on par with the entire dataset. Otherwise we find that a model’s performance on sentences of a given feature is on par with or lower than its overall performance, reflecting the fact that features mark the presence of unusual or complex syntactic structure. Performance is also high (and close to overall performance) on sentences with marked argument structures (Argument Types and Arg(ument) Alt(ernation)). While these models are still worse than human (overall) performance on these sentences, this result indicates that argument structure is relatively easy to learn. Comparing different kinds of embedded content, we observe higher performance on sentences with embedded clauses (major feature=Comp Clause) embedded VPs (major feature=to-VP) than on sentences with embedded interrogatives (minor features=Emb-Q, Rel Clause). An exception to this trend is the minor feature No C-izer, which labels complement clauses without a complementizer (e.g. I think that you’re crazy). Low performance on these sentences compared to most other features in Comp Clause might indicate that complementizers are an important syntactic cue for these models. As the major feature Question shows, the difficulty of sentences with question-like syntax applies beyond just embedded questions. Excluding polar questions, sentences with question-like syntax almost always involve extraction of a wh-word, creating a long-distance dependency between the wh-word and its extraction site, which may be difficult for models to recognize. The most challenging features are all related to Violations. Low performance on Infl/Agr Violations, which marks morphological violations (He washed yourself, This is happy), is especially striking because a relatively high proportion (29%) of these sentences are Simple. These models are likely to be deficient in encoding morphological features is that they are word level models, and do not have direct access sub-word information like inflectional endings, which indicates that these features are difficult to learn effectively purely from lexical distributions. Finally, unusual performance on some features is due to small samples, and have a high standard deviation, suggesting the result is unreliable. This includes CP Subj, Frag/Paren, imperative, NPI/FCI, and Comparative.

Comparing Models

Comparing within-feature performance of the three encoders to their overall performance, we find they have differing strengths and weaknesses. BERT stands out over other models in Deep Embed, which includes challenging sentences with doubly-embedded, as well as in several features involving extraction (i.e. long-distance dependencies) such as VP+Extract and Info-Struc. The transformer models show evidence of learning long-distance dependencies better than the CoLA baseline. They outperform the CoLA baseline by an especially wide margin on Bind:Refl, which all involves establishing a dependency between a reflexive and its antecedent (Bo tries to love himself). They also have a large advantage in dislocation, in which expressions are separated from their dependents (Bo practiced on the train an important presentation). The advantage of BERT and GPT may be due in part to their use of the transformer architecture. Unlike the BiLSTM used by the CoLA baseline, the transformer uses a self-attention mechanism that associates all pairs of words regardless of distance. In some cases models showed surprisingly good or bad performance, revealing possible idiosyncrasies of the sentence embeddings they output. For instance, the CoLA baseline performs on par with the others on the major feature adjunct, especially considering the minor feature Particle (Bo looked the word up). Furthermore, all models struggle equally with sentences in Violation, indicating that the advantages of the transformer models over the CoLA baseline does not extend to the detection of morphological violations (Infl/Agr Violation) or single word anomalies (Extra/Missing Expr).

5.3 Length Analysis

Figure 3: Performance (MCC) on the CoLA analysis set by sentence length.

For comparison, we analyze the effect of sentence length on acceptability classifier performance. The results are shown in Figure 3. The results for the CoLA baseline are inconsistent, but do drop off as sentence length increases. For BERT and GPT, performance decreases very steadily with length. Exceptions are extremely short sentences (length 1-3), which may be challenging due to insufficient information; and extremely long sentences, where we see a small (but somewhat unreliable) boost in BERT’s performance. BERT and GPT are generally quite close in performance, except on the longest sentences, where BERT’s performance is considerably better.

6 Conclusion

Using a new grammatically annotated analysis set, we identify several syntactic phenomena that are predictive of good or bad performance of current state of the art sentence encoders on CoLA. We also use these results to develop hypotheses about why BERT is successful, and why transformer models outperform sequence models. Our findings can guide future work on sentence embeddings. A current weakness of all sentence encoders we investigate, including BERT, is the identification of morphological violations. Future engineering work should investigate whether switching to a character-level model can mitigate this problem. Additionally, transformer models appear to have an advantage over sequence models with long-distance dependencies, but still struggle with these constructions relative to more local phenomena. It stands to reason that this performance gap might be widened by training larger or deeper transformer models, or training on longer or more complex sentences. This analysis set can be used by engineers interested in evaluating the syntactic knowledge of their encoders. Finally, these findings suggest possible controlled experiments that could confirm whether there is a causal relation between the presence of the syntactic features we single out as interesting and model performance. Our results are purely correlational, and do not mark whether a particular construction is crucial for the acceptability of the sentence. Future experiments following ettinger2018assessing and kann2019verb can semi-automatically generate datasets manipulating, for example, length of long-distance dependencies, inflectional violations, or the presence of interrogatives, while controlling for factors like sentence length and word choice, in order determine the extent to which these features impact the quality of sentence embeddings.

Acknowledgments

We would like to thank Jason Phang and Thibault Févry for sharing GPT and BERT model predictions on CoLA, and Alex Wang for feedback.

References

Appendix A Feature Descriptions

a.1 Simple

a.1.1 Simple

These are sentences with transitive or intransitive verbs appearing with their default syntax and argument structure. All arguments are noun phrases (DPs), and there are no modifiers or adjuncts on DPs or the VP. . Included . John owns the book. (37) .̱ Park Square has a festive air. (131) .̱ *Herself likes Mary’s mother. (456) . Excluded . Bill has eaten cake. .̱ I gave Joe a book.

a.2 Pred (Predicates)

a.2.1 Copulas

These are sentences including the verb be used predicatively. Also, sentences where the object of the verb is itself a predicate, which applies to the subject. Not included are auxiliary uses of be or other predicate phrases that are not linked to a subject by a verb. . Included . John is eager. (27) .̱ He turned into a frog. (150) .̱ To please John is easy. (315) . Excluded . There is a bench to sit on. (309) .̱ John broke the geode open. .̱ The cake was eaten.

a.2.2 Pred/SC (Predicates and Small Clauses)

These sentences involve predication of a non-subject argument by another non-subject argument, without the presence of a copula. Some of these cases may be analyzed as small clauses. (see Sportiche et al., 2013, pp. 189-193) . Included . John called the president a fool. (234) .̱ John considers himself proud of Mary. (464) .̱ They want them arrested. (856) .̱ the election of John president surprised me. (1001)

a.2.3 Result/Depictive (Resultatives and Depictives)

Modifiers that act as predicates of an argument. Resultatives express a resulting state of that argument, and depictives describe that argument during the matrix event. See Goldberg and Jackendoff (2004). . Included . Resultative . *The table was wiped by John clean. (625) .̱ The horse kicked me black and blue. (898) . .̱ Depictive . John left singing. (971) .̱ In which car was the man seen? (398) . Excluded . He turned into a frog. (150)

a.3 Adjunct

a.3.1 Particle

Particles are lone prepositions associated with verbs. When they appear with transitive verbs they may immediately follow the verb or the object. Verb-particle pairs may have a non-compositional (idiomatic) meaning. See [pp. 69-70]carnie2013syntax and [pp. 16-17]kim2008syntax. . Included . *The argument was summed by the coach up. (615) .̱ Some sentences go on and on and on. (785) .̱ *He let the cats which were whining out. (71)

a.3.2 VP-Adjunct

Adjuncts modifying verb phrases. Adjuncts are (usually) optional, and they do not change the category of the expression they modify. See (Sportiche et al., 2013, pp.102-106). . Included . PP-adjuncts, e.g. locative, temporal, instrumental, beneficiary . Nobody who hates to eat anything should work in a delicatessen. (121) .̱ Felicia kicked the ball off the bench. (127) . .̱ Adverbs . Mary beautifully plays the violin. (40) .̱ John often meets Mary. (65) . .̱ Purpose VPs . We need another run to win. (769) . . Excluded . PP arguments . *Sue gave to Bill a book. (42) .̱ Everything you like is on the table. (736) . .̱ S-adjuncts . John lost the race, unfortunately.

a.3.3 NP-Adjunct

These are adjuncts modifying noun phrases. Adjuncts are (usually) optional, and they do not change the category of the expression they modify. Single-word prenominal adjectives are excluded, as are relative clauses (this has another category). . Included . PP-adjuncts . *Tom’s dog with one eye attacked Frank’s with three legs. (676) .̱ They were going to meet sometime on Sunday, but the faculty didn’t know when. (565) . .̱ Phrasal adjectives . As a statesman, scarcely could he do anything worth mentioning. (292) . .̱ Verbal modifiers . The horse raced past the barn fell. (900) . Excluded . Prenominal Adjectives . It was the policeman met that several young students in the park last night. (227) . .̱ Relative Clauses .̱ NP arguments

a.3.4 Temporal

These are adjuncts of VPs and NPs that specify a time or modify tense or aspect or frequency of an event. Adjuncts are (usually) optional, and they do not change the category of the expression they modify. . Included . Short adverbials (never, today, now, always) . *Which hat did Mike quip that she never wore? (95) . .̱ PPs . Fiona might be here by 5 o’clock. (426) . .̱ When . I inquired when could we leave. (520)

a.3.5 Locative (Locative Adjuncts)

These are adjuncts of VPs and NPs that specify a location of an event or a part of an event, or of an individual. Adjuncts are (usually) optional, and they do not change the category of the expression they modify. . Included . Short adverbials .̱ PPs . The bed was slept in. (298) .̱ *Anson demonized up the Khyber (479) .̱ Some people consider dogs in my neighborhood dangerous. (802) .̱ Mary saw the boy walking toward the railroad station. (73) . .̱ Where . I found the place where we can relax. (307) . Excluded . Locative arguments . *Sam gave the ball out of the basket. (129) .̱ Jessica loaded boxes on the wagon. (164) .̱ I went to Rome.

a.3.6 Misc Adjunct (Miscellaneous Adjuncts)

These are adjuncts of VPs and NPs not described by some other category (with the exception of (6-7)), i.e. not temporal, locative, or relative clauses. Adjuncts are (usually) optional, and they do not change the category of the expression they modify. . Included . Beneficiary . *I know which book José didn’t read for class, and which book Lilly did it for him. (58) . .̱ Instrument . Lee saw the student with a telescope. (770) . .̱ Comitative . Joan ate dinner with someone but I don’t know who. (544) . .̱ VP adjuncts . Which article did Terry file papers without reading? (431) . .̱ Purpose . We need another run to win. (769)

a.4 Argument Types

a.4.1 Oblique

Oblique arguments of verbs are individual-denoting arguments (DPs or PPs) which act as the third argument of verb, i.e. not a subject or (direct) object. They may or may not be marked by a preposition. Obliques are only found in VPs that have three or more individual arguments. Arguments are selected for by the verb, and they are (generally) not optional, though in some cases they may be omitted where they are understood or implicitly existentially quantified over. See [p.40]kim2008syntax. . Included . Prepositional . *Sue gave to Bill a book. (42) .̱ Mary has always preferred lemons to limes. (70) .̱ *Janet broke Bill on the finger. (141) . .̱ Benefactives . Martha carved the baby a toy out of wood. (139) . .̱ Double object . Susan told her a story. (875) .̱ Locative arguments . Ann may spend her vacation in Italy. (289) . .̱ High-arity Passives . *Mary was given by John the book. (626) . Excluded . Non-DP arguments . We want John to win (28) . .̱ 3rd argments where not all three arguments are DPs . We want John to win (28)

a.4.2 PP Arg VP (PP Arguments of VPs)

Prepositional Phrase arguments of VPs are individual-denoting arguments of a verb which are marked by a proposition. They may or may not be obliques. Arguments are selected for by the verb, and they are (generally) not optional, though in some cases they may be omitted where they are understood or implicitly existentially quantified over. . Included . Dative . *Sue gave to Bill a book. (42) . .̱ Conative (at) . *Carla slid at the book. (179) . .̱ Idiosyncratic prepositional verbs . I wonder who to place my trust in. (711) .̱ She voted for herself. (743) . .̱ Locative . John was found in the office. (283) . .̱ PP predicates . Everything you like is on the table. (736) . Excluded . PP adjuncts .̱ Particles .̱ Arguments of deverbal expressions . *the putter of books left. (892) . .̱ By-phrase . Ted was bitten by the spider. (613)

a.4.3 PP Arg NP/AP (PP Arguments of NPs and APs)

Prepositional Phrase arguments of NPs or APs are individual-denoting arguments of a noun or adjective which are marked by a proposition. Arguments are selected for by the head, and they are (generally) not optional, though in some cases they may be omitted where they are understood or implicitly existentially quantified over. . Included . Relational adjectives . Many people were fond of Pat. (936) .̱ *I was already aware of fact. (824) . .̱ Relational nouns . We admired the pictures of us in the album. (759) .̱ They found the book on the atom. (780) . .̱ Arguments of deverbal nouns . *the putter of books left. (892)

a.4.4 By-phrase

Prepositional arguments introduced with by. Usually, this is the (semantic) subject of a passive verb, but in rare cases it may be the subject of a nominalized verb. Arguments are usually selected for by the head, and they are generally not optional. In this case, the argument introduced with by is semantically selected for by the verb, but it is syntactically optional. See [p.190]adger2003core and []collins2005smuggling. . Included . Passives . Ted was bitten by the spider. (613) . .̱ Subjects of deverbal nouns . the attempt by John to leave surprised me. (1003)

a.4.5 Expletive

Expletives, or “dummy” arguments, are semantically inert arguments. The most common expletives in English are it and there, although not all occurrences of these items are expletives. Arguments are usually selected for by the head, and they are generally not optional. In this case, the expletive occupies a syntactic argument slot, but it is not semantically selected by the verb, and there is often a syntactic variation without the expletive. See [p.170-172]adger2003core and [p.82-83]kim2008syntax. . Included . There—inserted, existential . *There loved Sandy. (939) .̱ There is a nurse available. (466) . .̱ It—cleft, inserted . It was a brand new car that he bought. (347) .̱ It bothers me that John coughs. (314) .̱ It is nice to go abroad. (47) . .̱ Environmental it . Kerry remarked it was late. (821) .̱ Poor Bill, it had started to rain and he had no umbrella. (116) .̱ You’ve really lived it up. (160) . Excluded . John counted on Bill to get there on time. (996) .̱ I bought it to read. (1026)

a.5 Arg Altern (Argument Alternations)

a.5.1 High Arity

These are verbs with 3 or more arguments of any kind. Arity refers to the number of arguments that a head (or function) selects for. Arguments are usually selected for by the head, and they are generally not optional. They may be DPs, PPs, CPs, VPs, APs or other categories. . Included . Ditransitive . *[Sue] gave [to Bill] [a book]. (42) .̱ [Martha] carved [the baby] [a toy] out of wood. (139) . .̱ VP arguments . *[We] believed [John] [to be a fountain in the park]. (274) .̱ [We] made [them] [be rude]. (260) . .̱ Particles . [He] let [the cats which were whining] [out]. (71) . .̱ Passives with by-phrase . *[A good friend] is remained [to me] [by him]. (237) . .̱ Expletives . *[We] expect [there] [to will rain]. (282) .̱ [There] is [a seat] [available]. (934) .̱ [It] bothers [me] [that he is here]. (1009) . .̱ Small clause . [John] considers [Bill] [silly]. (1039) . Excluded . Results, depictives . [John] broke [the geode] [open].

a.5.2 Drop Arg (Dropped Arguments)

These are VPs where a canonical argument of the verb is missing. This can be difficult to determine, but in many cases the missing argument is understood with existential quantification or generically, or contextually salient. See [p.106-109]sportiche2013introduction. . Included . Middle voice/causative inchoative . *The problem perceives easily. (66) . .̱ Passive . The car was driven. (296) . .̱ Null complement anaphora . Jean persuaded Robert. (380) .̱ Nobody told Susan. (883) . .̱ Dropped argument . *Kim put in the box. (253) .̱ The guests dined. (835) .̱ I wrote to Bill. (1030) . .̱ Transitive adjective . John is eager. (27) .̱ We pulled free. (144) . .̱ Transitive noun . I sensed his eagerness. (155) . .̱ Expletive insertion . *It loved Sandy. (949) . Excluded . Ted was bitten by the spider. (613)

a.5.3 Add Arg (Added Arguments)

These are VPs in which a non-canonical argument of the verb has been added. These cases are clearer to identify where the additional argument is a DP. In general, PPs which mark locations, times, beneficiaries, or purposes should be analyzed as adjuncts, while PPs marking causes can be considered arguments. See []pylkkanen2008introducing. . Included . Extra argument . *Linda winked her lip. (202) .̱ Sharon fainted from hunger. (204) .̱ I shaved myself. (526) . .̱ Causative . *I squeaked the door. (207) . .̱ Expletive insertion . There is a monster in Loch Ness. (928) .̱ It annoys people that dogs bark. (943) . .̱ Benefactive . Martha carved the baby a toy out of wood. (139)

a.5.4 Passive

The passive voice is marked by the demotion of the subject (either complete omission or to a by-phrase) and the verb appearing as a past participle. In the stereotypical construction there is an auxiliary be verb, though this may be absent. See [p.175-190]kim2008syntax, collins2005smuggling, and [p.311-333]sag2003syntactic. . Included . Verbs . The earth was believed to be round. (157) . .̱ Psuedopassive . The bed was slept in. (298) . .̱ Past participle adjuncts . The horse raced past the barn fell. (900)

a.6 Imperative

a.6.1 Imperative

The imperative mood is marked by the absence of the a subject and the bare form of the verb, and expresses a command, request, or other directive speech act. . Included . *Wash you! (224) .̱ Somebody just left - guess who. (528)

a.7 Binding

a.7.1 Binding:Refl (Binding of Reflexives)

These are cases in which a reflexive (non-possessive) pronoun, usually bound by an antecedent. See [p.163-186]sportiche2013introduction and [p.203-226]sag2003syntactic. . Included . *Ourselves like ourselves. (742) .̱ Which pictures of himself does John like? (386)

a.7.2 Binding:Other (Binding of Other Pronouns)

These are cases in which a non-reflexive pronoun appears along with its antecedent. This includes donkey anaphora, quantificational binding, and bound possessives, among other bound pronouns. See [p.163-186]sportiche2013introduction and [p.203-226]sag2003syntactic. . Included . Bound possessor . The children admire their mother. (382) . .̱ Quantificational binding . Everybody gets on well with a certain relative, but often only his therapist knows which one. (562) . .̱ Bound pronoun . *We gave us to the cause. (747)

a.8 Question

a.8.1 Matrix Q (Matrix Questions)

These are sentences in which the matrix clause is interrogative (either a wh- or polar question). See [pp.282-213]adger2003core, [pp.193-222]kim2008syntax, and [p.315-350]carnie2013syntax. . Included . Wh-question . Who always drinks milk? (684) . .̱ Polar question . Did Athena help us? (486)

a.8.2 Emb Q (Embedded Questions)

These are embedded interrogative clauses appearing as arguments of verbs, nouns, and adjectives. Not including relative clauses and free relatives. See [p.297]adger2003core. . Included . Under VP . I forgot how good beer tastes. (235) .̱ *What did you ask who saw? (508) . .̱ Under NP . That is the reason why he resigned. (313) . .̱ Under AP . They claimed they had settled on something, but it wasn’t clear what they had settled on. (529) . .̱ Free relative . What the water did to the bottle was fill it. (33) . Excluded .̱ Relative clauses, free relatives

a.8.3 Pied Piping

These are phrasal Wh-phrases, in which the wh-word moves along with other expressions, including prepositions (pied-piping) or nouns in the case of determiner wh-words such as how many and which. . Included . Pied-piping . *The ship sank, but I don’t know with what. (541) . .̱ Other phrasal wh-phrases . I know which book Mag read, and which book Bob read my report that you hadn’t. (61) .̱ How sane is Peter? (88)

a.8.4 Rel Clause (Relative Clause)

Relative clauses are noun modifiers appearing with a relativizer (either that or a wh-word) and an associated gap. See [p.223-244]kim2008syntax. . Included . Though he may hate those that criticize Carter, it doesn’t matter. (332) .̱ *The book what inspired them was very long. (686) .̱ Everything you like is on the table. (736) . Excluded . *The more you would want, the less you would eat. (6)

a.8.5 Island

This is wh-movement out of an extraction island, or near-island. Islands include, for example, complex NPs, adjuncts, embedded questions, coordination. A near-island is an extraction that closely resembles an island violation, such as extraction out of an embedded clause, or across-the-board extraction. See [pp.323-333]adger2003core and [pp.332-334]carnie2013syntax. . Included . Embedded question .̱ *What did you ask who Medea gave? (493) . .̱ Adjunct . *What did you leave before they did? (598) . .̱ Parasitic gaps . Which topic did you choose without getting his approval? (311) . .̱ Complex NP . Who did you get an accurate description of? (483)

a.9 Comp Clause (Complement Clauses)

a.9.1 CP Subj (CP Subjects)

These are complement clauses acting as the (syntactic) subject of verbs. See [pp.90-91]kim2008syntax. . Included . That dogs bark annoys people. (942) .̱ The socks are ready for for you to put on to be planned. (112) . Excluded . Expletive insertion . It bothers me that John coughs. (314)

a.9.2 CP Arg - VP (CP Arguments of VPs)

These are complement clauses acting as (non-subject) arguments of verbs. See [pp.84-90]kim2008syntax. . Included . I can’t believe Fred won’t, either. (50) .̱ I saw that gas can explode. (222) .̱ It bothers me that John coughs. (314) .̱ Clefts . It was a brand new car that he bought. (347)

a.9.3 CP Arg - NP/AP (CP Arguments of NPs and APs)

These are complement clauses acting as an argument of a noun or adjective. See [pp.91-94]kim2008syntax. . Included . Under NP . Do you believe the claim that somebody was looking for something? (99) . .̱ Under AP . *The children are fond that they have ice cream. (842)

a.9.4 Non-Finite CP

These are complement clauses with a non-finite matrix verb. Often, the complementizer is for, or there is no complementizer. See [pp.252-253,256-260]adger2003core. . Included . For complementizer . I would prefer for John to leave. (990) . .̱ No Complementizer . Mary intended John to go abroad. (48) . .̱ Ungrammatical . Heidi thinks that Andy to eat salmon flavored candy bars. (363) . .̱ V-ing . Only Churchill remembered Churchill giving the Blood, Sweat and Tears speech. (469)

a.9.5 No C-izer (No Complementizer)

These are complement clauses with no overt complementizer. . Included . Complement clause . I’m sure we even got these tickets! (325) .̱ He announced he would marry the woman he loved most, but none of his relatives could figure out who. (572) . .̱ Relative clause . The Peter we all like was at the party (484)

a.9.6 Deep Embed (Deep Embedding)

These are sentences with three or nested verbs, where VP is not an aux or modal, i.e. with the following syntax: [S …[ VP …[ VP …[ VP …] …] …] …] . Included . Embedded VPs . Max seemed to be trying to force Ted to leave the room, and Walt, Ira. (657) . .̱ Embedded clauses . I threw away a book that Sandy thought we had read. (713)

a.10 Aux (Auxiliaries)

a.10.1 Neg (Negation)

Any occurrence of negation in a sentence, including sentential negation, negative quantifiers, and negative adverbs. . Included . Sentential . I can’t remember the name of somebody who had misgivings. (123) . .̱ Quantifier . No writer, and no playwright, meets in Vienna. (124) . .̱ Adverb . They realised that never had Sir Thomas been so offended. (409)

a.10.2 Modal

Modal verbs (may, might, can, could, will, would, shall, should, must). See [pp.152-155]kim2008syntax. . Included . John can kick the ball. (280) .̱ As a statesman, scarcely could he do anything worth mentioning. (292) . Excluded . Pseudo-modals . Sandy was trying to work out which students would be able to solve a certain problem. (600)

a.10.3 Aux (Auxiliaries)

Auxiliary verbs (e.g. be, have, do). See [pp.149-174]kim2008syntax. . Included . They love to play golf, but I do not. (290) .̱ The car was driven. (296) .̱ he had spent five thousand dollars. (301) . Excluded . Pseudo-auxiliaries . *Sally asked if somebody was going to fail math class, but I can’t remember who. (589) .̱ The cat got bitten. (926)

a.10.4 Psuedo-Aux (Pseudo Auxiliaries)

These are predicates acting as near-auxiliary (e.g. get-passive) or near-modals (e.g. willing) . Included . Near-auxiliaries . *Mary came to be introduced by the bartender and I also came to be. (55) .̱ *Sally asked if somebody was going to fail math class, but I can’t remember who. (589) .̱ The cat got bitten. (926) . .̱ Near-modals . Clinton is anxious to find out which budget dilemmas Panetta would be willing to tackle in a certain way, but he won’t say in which. (593) .̱ Sandy was trying to work out which students would be able to solve a certain problem. (600)

a.11 to-VP (Infinitival VPs)

a.11.1 Control

These are VPs with control verbs, where one argument is a non-finite to-VP without a covert subject co-indexed with an argument of the matrix verb. See [pp.252,266-291]adger2003core, [pp.203-222]sportiche2013introduction, and [pp.125-148]kim2008syntax. . Included . Intransitive subject control . *It tries to leave the country. (275) . .̱ Transitive subject control . John promised Bill to leave. (977) . .̱ Transitive object control . I want her to dance. (379) .̱ John considers Bill to be silly. (1040) . Excluded . VP args of NP/AP . This violin is difficult to play sonatas on. (114) . .̱ Purpose . There is a bench to sit on. (309) . .̱ Subject VPs . To please John is easy. (315) . .̱ Argument present participles . Medea denied poisoning the phoenix. (490) . .̱ Raising . Anson believed himself to be handsome. (499)

a.11.2 Raising

These are VPs with raising predicates, where one argument is a non-finite to-VP without a covert subject co-indexed with an argument of the matrix verb. Unlike control verbs, the coindexed argument is not a semantic argument of the raising predicate. See [pp.260-266]adger2003core, [pp.203-222]sportiche2013introduction, and [pp.125-148]kim2008syntax. . Included . Subject raising . Under the bed seems to be a fun place to hide. (277) . .̱ Object raising . Anson believed himself to be handsome. (499) . .̱ Raising adjective . John is likely to leave. (370)

a.11.3 VP+Extraction (VPs with Extraction)

These are embedded infinitival VPs containing a (non-subject) gap that is filled by an argument in the upper clause. Examples are purpose-VPs and tough-movement. See [pp.246-252]kim2008syntax. . Included . Tough-movement . *Drowning cats, which is against the law, are hard to rescue. (79) . .̱ Infinitival relatives . *Fed knows which politician her to vote for. (302) . .̱ Purpose . the one with a red cover takes a very long time to read. (352) . .̱ Other non-finite VPs with extraction . As a statesman, scarcely could he do anything worth mentioning. (292)

a.11.4 VP arg - NP/AP (VP Arguments of NPs and APs)

These are non-finite VP arguments of nouns and adjectives. . Included . Raising adjectives . John is likely to leave. (370) . .̱ Control adjectives . The administration has issued a statement that it is willing to meet a student group, but I’m not sure which one. (604) . .̱ Control nouns . As a teacher, you have to deal simultaneously with the administration’s pressure on you to succeed, and the children’s to be a nice guy. (673) . .̱ Purpose VPs . there is nothing to do. (983)

a.11.5 Non-Finite VP Misc (Miscellaneous Infinitival VPs)

These are miscellaneous non-finite VPs. . Included . I saw that gas can explode. (222) .̱ Gerunds/Present participles . *Students studying English reads Conrad’s Heart of Darkness while at university. (262) .̱ Knowing the country well, he took a short cut. (411) .̱ John became deadly afraid of flying. (440) . .̱ Subject VPs . To please John is easy. (315) . .̱ Nominalized VPs . *What Mary did Bill was give a book. (473) . Excluded . to-VPs acting as complements or modifiers of verbs, nouns, or adjectives

a.12 N, Adj (Nouns and Adjectives)

a.12.1 Deverbal (Deverbal Nouns and Adjectives)

These are nouns and adjectives derived from verbs. . Included . Deverbal nouns . *the election of John president surprised me. (1001) . .̱ “Light” verbs . The birds give the worm a tug. (815) . .̱ Gerunds . If only Superman would stop flying planes! (773) . .̱ Event-wh . What the water did to the bottle was fill it. (33) . .̱ Deverbal adjectives . His or her least known work. (95)

a.12.2 Rel NP (Relational Nouns)

Relational nouns are NPs with an obligatory (or existentially closed) argument. A particular relation holds between the members of the extension of NP and the argument. The argument must be a DP possessor or a PP. See [pp.82-83]kim2008syntax. . Included . Nouns with of-arguments . John has a fear of dogs. (353) . .̱ Nouns with other PP-arguments . Henri wants to buy which books about cooking? (442) . .̱ Measure nouns . I bought three quarts of wine and two of Clorox. (667) . .̱ Possessed relational nouns . *John’s mother likes himself. (484) . Excluded . Nouns with PP modifiers . Some people consider dogs in my neighborhood dangerous. (802)

a.12.3 Trans-NP (Transitive NPs)

Transitive (non-relational) nouns take a VP or CP argument. See [pp.82-83]kim2008syntax. . Included . VP argument . the attempt by John to leave surprised me. (1003) . .̱ CP argument . *Which report that John was incompetent did he submit? (69) . .̱ QP argument . That is the reason why he resigned. (313)

a.12.4 Complex NP

These are complex NPs, including coordinated nouns and nouns with modifiers (excluding prenominal adjectives). . Included . Modified NPs . *The madrigals which Henry plays the lute and sings sound lousy. (84) .̱ John bought a book on the table. (233) . .̱ NPs with coordination . *The soundly and furry cat slept. (871) .̱ The love of my life and mother of my children would never do such a thing. (806)

a.12.5 NN Compound (Noun-Noun Compounds)

Noun-noun compounds are NPs consisting of two constituent nouns. . Included . It was the peasant girl who got it. (320) .̱ A felon was elected to the city council. (938)

a.12.6 Rel Adj (Relational Adjectives)

These are adjectives that take an obligatory (or existentially closed) argument. A particular relation holds between the members of the extension of the modified NP and the argument. The argument must be a DP or PP. See [pp.80-82]kim2008syntax. . Included . Of-arguments . The chickens seem fond of the farmer. (254) . .̱ Other PP arguments . This week will be a difficult one for us. (241) .̱ John made Bill mad at himself. (1035)

a.12.7 Trans- AP (Transitive Adjectives)

A transitive (non-relational) adjective. I.e. an adjectives that takes a VP or CP argument. See [pp.80-82]kim2008syntax. . Included . VP argument . John is likely to leave. (370) . .̱ CP argument . John is aware of it that Bill is here. (1013) . .̱ QP argument . The administration has issued a statement that it is willing to meet a student group, but I’m not sure which one. (604)

a.13 S-Syntax (Sentence-Level Syntax)

a.13.1 Dislocation

These are expressions with non-canonical word order. See, for example, [p.76]sportiche2013introduction. . Includes . Particle shift . *Mickey looked up it. (24) . .̱ Preposed modifiers . Out of the box jumped a little white rabbit. (215) .̱ *Because she’s so pleasant, as for Mary I really like her. (331) . .̱ Quantifier float . The men will all leave. (43) . .̱ Preposed argument . With no job would John be happy. (333) . .̱ Relative clause extraposition . Which book’s, author did you meet who you liked? (731) . .̱ Misplaced phrases . Mary was given by John the book. (626)

a.13.2 Info Struc (Information Structural Movement)

This includes topicalization and focus constructions. See [pp.258-269]kim2008syntax and [pp.68-75]sportiche2013introduction. . Included . Topicalization . Most elections are quickly forgotten, but the election of 2000, everyone will remember for a long time. (807) . .̱ Clefts . It was a brand new car that he bought. (347) . .̱ Pseudo-clefts . What John promised is to be gentle. (441) . Excluded . There-insertion .̱ Passive

a.13.3 Frag/Paren (Fragments and Parentheticals)

These are parentheticals or fragmentary expressions. . Included . Parenthetical . Mary asked me if, in St. Louis, John could rent a house cheap. (704) . .̱ Fragments . The soup cooks, thickens. (448) . .̱ Tag question . George has spent a lot of money, hasn’t he? (291)

a.13.4 Coord (Coordination)

Coordinations and disjunctions are expressions joined with and, but, or, etc. See [pp.61-68]sportiche2013introduction. . Included . DP coordination . Dave, Dan, Erin, Jaime, and Alina left. (341) . .̱ Right Node Raising . Kim gave a dollar to Bobbie and a dime to Jean. (435) . .̱ Clausal coordination . She talked to Harry, but I don’t know who else. (575) . .̱ Or, nor . *No writer, nor any playwright, meets in Vienna. (125) . .̱ Pseudo-coordination . I want to try and buy some whiskey. (432) . .̱ Juxtaposed clauses . Lights go out at ten. There will be no talking afterwards. (779)

a.13.5 Subord/Cond (Subordinate Clauses and Conditionals)

This includes subordinate clauses, especially with subordinating conjunctions, and conditionals. . Included . Conditional . If I can, I will work on it. (56) . .̱ Subordinate clause . *What did you leave before they did? (598) .̱ *Because Steve’s of a spider’s eye had been stolen, I borrowed Fred’s diagram of a snake’s fang. (677) . .̱ Correlative . *As you eat the most, you want the least. (5)

a.13.6 Ellipsis/Anaphora

This includes VP or NP ellipsis, or anaphora standing for VPs or NPs (not DPs). See [pp.55-61]sportiche2013introduction. . Included . VP Ellipsis . If I can, I will work on it. (56) .̱ Mary likes to tour art galleries, but Bill hates to. (287) . .̱ VP Anaphor . I saw Bill while you did so Mary. (472) . .̱ NP Ellipsis . Tom’s dog with one eye attacked Fred’s. (679) . .̱ NP anaphor . the one with a red cover takes a very long time to read. (352) . .̱ Sluicing . Most columnists claim that a senior White House official has been briefing them, and the newspaper today reveals which one. (557) . .̱ Gapping . Bill ate the peaches, but Harry the grapes. (646)

a.13.7 S-adjunct (Sentence-Level Adjuncts)

These are adjuncts modifying sentences, sentence-level adverbs, subordinate clauses. . Included . Sentence-level adverbs . Suddenly, there arrived two inspectors from the INS. (447) . .̱ Subordinate clauses . The storm arrived while we ate lunch. (852)

a.14 Determiner

a.14.1 Quantifier

These are quantificational DPs, i.e. the determiner is a quantifier. . Included . Quantifiers . *Every student, and he wears socks, is a swinger. (118) .̱ We need another run to win. (769) . .̱ Partitive . *Neither of students failed. (265)

a.14.2 Partitive

These are quantifiers that take PP arguments, and measure nouns. See [pp.109-118]kim2008syntax. . Included . Quantifiers with PP arguments . *Neither of students failed. (265) . .̱ Numerals . One of Korea’s most famous poets wrote these lines. (294) . .̱ Measure nouns . I bought three quarts of wine and two of Clorox. (667)

a.14.3 NPI/FCI (Negative Polarity and Free Choice Items)

These are negative polarity items (any, ever, etc.) and free choice items (any). See kadmon1993any. . Included . NPI . Everybody around here who ever buys anything on credit talks in his sleep. (122) .̱ I didn’t have a red cent. (350) . .̱ FCI . Any owl hunts mice. (387)

a.14.4 Comparative

These are comparative constructions. See Culicover and Jackendoff (1999). . Included . Correlative . The angrier Mary got, the more she looked at pictures. (9) .̱ They may grow as high as bamboo. (337) .̱ I know you like the back of my hand. (775)

a.15 Violations

a.15.1 Sem Violation (Semantic Violations)

These are sentences that include a semantic violation, including type mismatches, violations of selectional restrictions, polarity violations, definiteness violations. . Included . Volation of selectional restrictions . *many information was provided. (218) .̱ *It tries to leave the country. (275) . .̱ Aspectual violations . *John is tall on several occasions. (540) . .̱ Definiteness violations . *It is the problem that he is here. (1018) . .̱ Polarity violations . Any man didn’t eat dinner. (388)

a.15.2 Infl/Agr violation (Inflection and Agreement Violations)

These are sentences that include a violation in inflectional morphology, including tense-aspect marking, or agreement. . Included . Case . *Us love they. (46) . .̱ Agreement . *Students studying English reads Conrad’s Heart of Darkness while at university. (262) . .̱ Gender . *Sally kissed himself. (339) . .̱ Tense/Aspect . *Kim alienated cats and beating his dog. (429)

a.15.3 Extra/Missing Word

These are sentences with a violation that can be identified with the presence or absence of a single word. . Included . Missing word . *John put under the bathtub. (247) .̱ *I noticed the. (788) . .̱ Extra word . *Everyone hopes everyone to sleep. (467) .̱ *He can will go (510)