Log In Sign Up

Actionable Email Intent Modeling with Reparametrized RNNs

Emails in the workplace are often intentional calls to action for its recipients. We propose to annotate these emails for what action its recipient will take. We argue that our approach of action-based annotation is more scalable and theory-agnostic than traditional speech-act-based email intent annotation, while still carrying important semantic and pragmatic information. We show that our action-based annotation scheme achieves good inter-annotator agreement. We also show that we can leverage threaded messages from other domains, which exhibit comparable intents in their conversation, with domain adaptive RAINBOW (Recurrently AttentIve Neural Bag-Of-Words). On a collection of datasets consisting of IRC, Reddit, and email, our reparametrized RNNs outperform common multitask/multidomain approaches on several speech act related tasks. We also experiment with a minimally supervised scenario of email recipient action classification, and find the reparametrized RNNs learn a useful representation.


page 1

page 2

page 3

page 4


RNN Fisher Vectors for Action Recognition and Image Annotation

Recurrent Neural Networks (RNNs) have had considerable success in classi...

Pseudo Siamese Network for Few-shot Intent Generation

Few-shot intent detection is a challenging task due to the scare annotat...

Representation based meta-learning for few-shot spoken intent recognition

Spoken intent detection has become a popular approach to interface with ...

Target Based Speech Act Classification in Political Campaign Text

We study pragmatics in political campaign text, through analysis of spee...

Self-supervised Learning for Unintentional Action Prediction

Distinguishing if an action is performed as intended or if an intended a...

Precise Affordance Annotation for Egocentric Action Video Datasets

Object affordance is an important concept in human-object interaction, p...

1 Introduction

Despite the emergence of many new communication tools in the workplace, email remains a major, if not the dominant, messaging platform in many corporate settings [Agema2015]. Helping people manage and act on their emails can make them more productive. Recently, Google’s system that suggests email replies has gained wide adoption [Kannan et al.2016]. We can imagine many other classes of assistance scenarios that can improve worker productivity. For example, consider a system that is capable of predicting your next action when receiving an email. The system could then offer assistance to accomplish that action, for example in the form of a quick reply, adding a task to your to-do list, or helping you take action against another system. To build and train such systems, email data sets are essential, but unfortunately public email datasets such as klimt2004,oard2015 klimt2004,oard2015 are much smaller than the proprietary data used by Google; and more importantly, they lack any direct information/annotation regarding the recipients’ actions.

In this paper, we design an annotation scheme for such actions and have applied it to a corpus of publicly available emails. In order to overcome the data bottleneck for end-to-end training, we leverage other data and annotations that we hypothesize to contain structures similar to email and recipient actions. We apply multitask and multidomain learning, which use domain or task invariant knowledge to improve performance on a specific task/domain [Caruana1997, Yang and Hospedales2014]

. We show that these secondary domains and tasks in combination with multitask and multidomain learning can help our model discover invariant structures in conversations that improve a classifier on our primary data and task: email recipient action classification.

Previous work in the deep learning literature tackled multidomain/multitask learning by designing an encoder that encodes all data and the domain/task description into a

shared representation space [Collobert and Weston2008, Glorot, Bordes, and Bengio2011, Ammar et al.2016, Yang, Salakhutdinov, and Cohen2017]. The overall model architecture generally is unchanged from the single-domain single-task setting; but the learned representations are now reparametrized to take account of knowledge from additional data and task/domain knowledge. In this work, we propose an alternative approach of model reparametrization. We train multiple parameter-sharing models across different domains and tasks jointly, without maintaining a shared encoded representation in the network. We show that reparametrized LSTMs consistently achieve better likelihood and overall accuracy on test data than common domain adaption variants. We also show that the representation extracted from a network instantiated with the shared parameter weights performs well on a previously unseen task.

The contributions of this paper are:

First, we designed an annotation scheme for labeling actionable workplace emails, which as we argue in section 2.2

, is more amenable to an end-to-end training paradigm, and collected an annotated dataset. Second, we propose a family of reparametrized RNNs for both multitask and multidomain learning. Finally, we show that such models encode domain-invariant features and, in the absence of sufficient data for end-to-end learning, still provide useful features for scoping tasks in an unsupervised learning setting.

2 Data

2.1 The Avocado Dataset

In this study, all email messages we annotate and evaluate on are part of the Avocado dataset [Oard et al.2015], which consists of emails and attachments taken from 279 accounts of a defunct information technology company referred to as “Avocado”.111We considered other email corpora such as the Enron corpus [Klimt and Yang2004]. We decided to use the Avocado dataset because it is the largest and newest one publicly available. Email threads are reconstructed from the recipients’ mailboxes. For the purpose of this paper, we only use complete (thread contains all replies) and linear (every follow-up is a reply to the previous email) threads.222The summary statistics are in table 3.

2.2 Recipient Actions

Workplace email is known to be highly task-oriented [Khoussainov and Kushmerick2005, Corston-Oliver et al.2004]. As opposed to chit chat on the Internet, speaker intents and expected actions on the email are in general very precise. We aim to annotate the actions, which makes our approach differ in a subtle but important way from previous work such as [Cohen, Carvalho, and Mitchell2004], which is mostly focused on annotating emails for sender intents, modeled after illocutionary acts in Speech Act theory [Searle1976]. We believe that annotating recipient actions has the following advantages over annotating sender intents: First, action based annotation is not tied to a particular speech act taxonomy. The design of such a taxonomy is highly dependent on the system’s use cases [Traum1999] and definitions of sender intent can be circular [Riezler2014]. Even within a single domain such as email, there have been several different sender intent taxonomies [Goldstein and Sabin2006]. A speech-act-agnostic scheme that focuses on the recipient’s action generalizes better across scenarios. Our annotation scheme also has a lower risk of injected bias because the annotation relies on expected (or even observed) actions performed in response to an email, as opposed to relying on the annotator’s intuition about the sender’s intent. Lastly, while in this paper we rely on annotators for these action annotations, many of our annotated actions translate into very specific actions on the computer. Therefore we anticipate intelligent user interfaces could be used to capture and remind users of such email actions, as in dredze2008 dredze2008.

Based on our findings in two pilot runs of email annotations among the authors, we propose the set of recipient actions listed in table 1, which fall in three broad categories:

Message sending

We identify that in many cases, the recipient is most likely to send out another email, either as a reply to the sender or to someone else. As listed in table 1, Reply-Yesno, Reply-Ack, Reply-Other, Investigate, Send-New-Email are actions that send out a new email, either on the same thread or a new one.

Software interaction

In our pilot study we find some of the most likely recipient actions to be interaction with office softwares such as Setup-Appointment and Approve-Request.

Share content

On many occasions, the most likely actions are to share a document, either as an attachment or via other means. We have an umbrella action Share-Content to capture these actions.

2.3 Data Annotation

Action Description
Reply-Yesno Short yes/no reply to a question raised in the previous email
Reply-Ack Simple acknowledgements such as ‘got it’, ‘thank you.’
Reply-Other Reply to the thread based on information that is available without doing any additional investigation.
Investigate Look into some questions/problems to gather the necessary information and reply with that information.
Send-New-Email Write a new email that is not a reply to the current thread.
Setup-Appointment Set up appointments/cancel appointments.
Approve-Request Approve requests (typically from subordinates) through an external system such as an expense report system etc.
Share-Content Share content, as an attachment, a link in the email body, or a location on the network that is known to both the sender and recipients
Table 1: Set of possible recipient actions in our annotation scheme.

A subset of the preprocessed email threads described in section 2.1 are subsequently annotated. We ask each annotator to imagine that they are a recipient of threaded emails in a workplace environment. For each message, we ask the annotator to read through the previous messages in the thread, and annotate with the most likely action (in table 1

) they may perform if they had been the addressee of that message. If the most probable action is not defined in our list, we ask the annotators to annotate with an

Other action.

A total of emails from distinct threads have been annotated by two paid and trained independent annotators. Cohen’s Kappa is for the two annotators. The authors arbitrated the disagreements. We include the distribution across the actions in table 1.

Dataset Message
IRC could somebody explain how i get the oss compatibility drivers to load automatically in ubuntu ?
IRC you should try these ones , apt src deb __URL__ unstable/
IRC Ah , cool . Thanks , I ’ll try that .
Reddit Does this really appeal to Sanders supporters ? Can one ( or more of you ) explain to me why ? Full disclosure : I do n’t pay ATM fees .
Table 2: Some example non-email messages that are likely to elicit actions related to those observed in email data. IRC chats are very task specific. They are mostly about obtaining technical help. Therefore, we observe many conversational turns that start with information requests, followed by delivery of that information. The Reddit dataset, on the other hand, is more diverse: the discussions in r/politics more or less pertain to comments on American public policies and politics. We rarely observe messages that require the recipient to take action; but there are requests and deliveries of information which can potentially help learn the underlying representation.
Dataset name (type) # of threads # of messages Average thread length Average message length
Avocado (Email)
r/politics (Reddit)
Ubuntu Dialog (IRC)
Table 3: Statistics of conversational data used in this paper. During preprocessing we truncate each message to words, including bos and eos symbols; and each thread to messages. The original Ubuntu dataset is much larger (with threads). We truncated it to match the Avocado dataset size for faster training and evaluation of our model.

2.4 Additional Domains

The annotations we collect are comparable in size to other speech act based annotation datasets. However like other expert-annotated datasets, ours is not large enough for end-to-end training. Therefore, we aim to enrich our training with additional semantic and pragmatic information derived from other tasks and domains without annotation for expected action. We consider data from the following additional domains for multidomain learning:


The Ubuntu Dialog Corpus is a curated collection of chat logs from Ubuntu’s Internet Relay Chat technical support channels [Lowe et al.2015].


Reddit is an internet discussion community consisting of several subreddits, each of which is more or less a discussion forum pertaining to a certain topic. We curate a dataset from the subreddit r/politics over two consecutive months. Each entry in our dataset consists of the post title, an optional post body, and an accompanying tree of comments. We collect linear threads by recursively sampling from the trees.

Messages from IRC and Reddit are less precise in terms of speaker intents; and our recipient action scheme is not directly applicable to them. However, previous studies on speech acts in Internet forums and chatrooms have shown that there are speech acts common to all these heterogeneous domains, e.g. information requests and deliveries. Some such examples are listed in table 2. [Arguello and Shaffer2015, Moldovan, Rus, and Graesser2011] We hypothesize that more data from these domains will help recognition of these speech acts, which in turn help recognize the resulting recipient actions.

In all experiments in section 4, we use half of the dataset as training data, a quarter as the validation data and the remaining quarter as test data.

2.5 Metadata-Derived Prediction Tasks

The datasets introduced in sections 2.4 and 2.1 are largely unlabeled as far as recipient actions are concerned, except for the small subset of Avocado data that was manually annotated. However we can still extract useful information from their metadata, such as inferred end-of-thread markers or system-logged events that can help us formulate additional prediction tasks for a multitask learning setting (listed in table 4). We also use these multitask labels to evaluate our multitask/domain model in section 4.3.

Identifier Dataset Description
e-t Email end of an email thread
e-a Email this message has attachment(s)
i-t IRC turntaking
r-t Reddit end of a Reddit thread
Table 4: Description of additional prediction labels for multitask learning that we extracted from datasets introduced in section 2.

3 Modeling Threaded Messages

3.1 Notations

We model threaded messages as a two-layer hierarchy: at the lower layer we have a message consisting of a list of words: . And in turn, a thread is a list of messages: . We assume each message thread to come from a specific domain; and therefore define a many-to-one mapping where is the set of all domains. We also define the tasks to have a many-to-one mapping . For prediction we define the predictor of task as , which predicts sequential tags from a thread on (a valid) task . We also define the real-valued task loss of task on thread to be , where is the ground truth.

3.2 Definition of Multitask/domain Loss

In this paper, we define the multitask loss as the sum of task losses of tasks under the same domain for a single (output, ground truth) pair :

and the aggregate loss

is the sum over examples .

We also define the multidomain loss to be the sum of aggregate losses over :


3.3 The Recurrent AttentIve Neural Bag-Of-Words model (Rainbow)

We start with the Recurrent AttentIve Neural Bag-Of-Word model (Rainbow) as the baseline model of threaded messages. From a high level view, Rainbow

 is a hierarchical neural network with two encoder layers: the lower level encoder is a neural bag-of-words encoder that encodes each message

into its message embeddings . And in turn, the upper level encoder transforms the independently encoded message embeddings

into thread embeddings via a learned recurrent neural network

.333There is a slight abuse of annotation since actually differs for of different lengths. Rainbow has three main components: message encoder, thread encoder, and predictor.

Message encoder.

We implement the message encoder as a bag of words model over the words in . Motivated by the unigram features in previous work on email intent modeling, we also add an attentive pooling layer [Rush, Chopra, and Weston2015] to pick up important keywords. The averaged embeddings then undergo a nonlinear transformation:


where is a learned feedforward network, is the word embeddings of and is the (learned) attentive network that judges how much each word contributes towards the final representation .444There may be concerns about the unordered nature of the neural bag-of-words (NBOW) model. However it has been shown that with a deep enough network, an NBOW model is competitive against syntax-aware RNN models such as Tree LSTMs[Tai, Socher, and Manning2015]. In preliminary experiments we did not find the difference between an NBOW and an RNN to be substantial. But the NBOW architecture trains much faster.

Thread encoder and predictor.

The message embeddings are passed onto the thread-level LSTM to produce a thread embeddingsvector:

Thread embeddings are then passed to the predictor layer. In this paper, the predictions are distributions over possible labels. We therefore define the predictor to be a -layer feed forward network that maps thread embeddings to distributions over , the label set of task : . The accompanying loss is naturally defined as the cross entropy between the predictions and the empirical distribution :


3.4 Multi-Task RNN Reparametrization

(a) LSTM cell
(b) Parameter-sharing LSTM cell
Figure 1: A comparison between partial computation graphs of a single (vanilla) LSTM cell, and our proposed parameter-sharing variants described in section 3.4. White circles are learned parameters. Dotted connections indicate parametrization. Parametrized and non-parametrized functions are indicated with blue and gray circles respectively. To model sequences from multiple domains, the conventional LSTM (depicted in fig. 0(a)) either shares everything with a set of parameters (the Tied  setup; ) or do not share parameters at all (the Disjoint  setup; ). In contrast, our parameter-sharing variant in fig. 0(b) models domain-invariant parameters with and domain-specific parameters with .

Rainbow is an extension of Deep Averaging Networks [Iyyer et al.2015] to threaded message modeling. It works well for tagging threaded messages for the messages’ properties, such as conversation-turn marking in online chats and end-of-thread detection in emails. However, in its current form, the model is trained to work on exactly one task. It also does not capture the shared dynamics of these different domains jointly when given out-of-domain data. In this section we describe a family of reparametrized recurrent neural networks that easily accommodates multi-domain multi-task learning settings.

In general, recurrent neural networks take a sequence of input data and recurrently apply a nonlinear function, to get a sequence of transformed representation . Here we denote such transformation with the function parametrized by the RNN parameters as . For an LSTM model, can be formulated as the concatenated vector of input, output, forget and cell gate parameters . And in general, the goal of training an RNN is to find the optimal real-valued vector such that

, for a given loss function


In the context of multidomain learning, we parametrize eq. 1 in a similar fashion:

Here we are faced with two modeling choices (depicted in fig. 0(a)): we can either model every task Disjointly or with Tied parameters. The Disjoint  approach learns a separate set of parameters per task . Therefore, performance of a task is little affected by data from other domain/tasks, except for the regularizing effect through the word embeddings.

On the other hand the Tied  approach ties parameters of all domains to a single , which has been a popular choice for multitask/domain modeling — it has been found that the RNN often learns to encode a good shared representation when trained jointly for different tasks [Collobert et al.2011, Yang, Salakhutdinov, and Cohen2016]. The network also seems to generalize over different domains, too [Ragni et al.2016, Peng and Dredze2016]. However it hinges on the assumption that either all domains are similar, or the network is capable enough to capture the dynamics of data from all domains at the same time.

In this paper we propose an alternative approach. Instead of having a single set of parameters for all domains, we propose to reparametrize as a function of shared components and domain specific components . Namely:


and our goal becomes minimizing the loss w.r.t both :


A comparison between the vanilla RNN and our proposed modification can be found in fig. 1. This reparametrization allows us to share parameters among networks trained on data of different domains with the shared component , while allowing the network to work differently on data from each domain with the domain specific parameters .

The design of the function requires striking a balance between model flexibility and generalizability. In this paper we consider the following variants of :

Additive (Add)

First we consider

to be a linear interpolation of a shared base

and a network specific component :


where . In this formulation Add we learn a shared , and additive domain-specific parameters for each domain. We also learn for each domain , which controls how much effect has on the final parameters.

Both Disjoint  and Tied can be seen as degenerate cases of Add: we recover Disjoint when the shared component is a zero vector: And with we have , namely Tied.

Additive + Multiplicative (AddMul)

Add  has no nonlinear interaction between and : they have independent effects on the composite . In AddMul  we have two components in : the additive component and the multiplicative component which introduces nonlinearity without significantly increasing the parameter count:


where is the Hadamard product and are learned parameters as in the Add formulation.

Affine (Affine)

In this formulation are seen as task embeddings. We apply a learned affine transformation to the task embeddings and add up the shared component :


where is a learned parameter.

3.5 Optimization

We optimize for the multidomain loss as defined in eq. 1 with gradient descent methods. To update parameters, we sample one thread from each domain and optimize the network parameters with the ADAM optimizer.[Kingma and Ba2014]

4 Experiments

4.1 Evaluation Metrics

In this section we evaluate Rainbow and its multitask/multidomain variants on the datasets we introduced in section 2. We also apply our extracted thread embeddings on a real-world task setting of email action classification with impoverished resources.

Probabilistic models are usually evaluated on the log-likelihood of the test data : . However, in our multidomain setting we have multiple datasets that differ in size and average sequence length. Therefore we evaluate our models on mean average cross entropy (MACE):


where are the thread embeddings of , and follow the definition in section 3.3. MACE normalizes by both sequence length and dataset length : a model that ignores the resource-poor tasks or short sequences tends to perform poorly under this metric. MACE can therefore be seen as per-task (log) perplexity: a larger MACE value means the model performs worse on the dataset; and the oracle would obtain a MACE value of

. The average of MACE scores also has the natural interpretation of the geometric mean of log likelihoods over different tasks/domains. In addition to MACE, we also evaluate on accuracy in

table 6.

All experiments in section 4 are trained on train splits. For experiments in sections 4.3 and 4.2 we evaluated on metadata-derived features in table 4

. After each epoch of training, the model is evaluated on the validation split to check if the performance has stopped increasing. The training procedure terminates when no new performance gains are observed for two consecutive epochs.

4.2 Effectiveness of Rainbow: Ablation Studies

We evaluate Rainbow by comparing it, in the single task setup, against two simpler variant architectures: one is taking away the recurrent thread encoder (-R), the other is replacing the attentive pooling layer with an unweighted mean (-A). We evaluate the four configurations on the four labels listed in table 4 and report the averaged MACE numbers in table 5. We find that both attentive pooling and the recurrent network help; but the latter has a much more pronounced effect. Rainbow without the two additions (-R, -A) is reduced to the vanilla Deep Average Network model, a neural baseline that has been shown to be competitive against other neural and non-neural models.

Configuration +R -R
Table 5: MACE values of the Rainbowablation tests (lower is better). +/-R and +/-A indicates the presence/absence of the thread encoder and the attentive pooling layer, respectively.

4.3 Multidomain/task Experiments

Task E-T E-A I-T R-T Average
Aggregated Results
Table 6: Aggregated Multidomain/multitask results of tasks in table 4: bold indicates best average results over all models.

We compare our reparametrized models against the following feature-reparametrizing approaches:


For each task , we concatenate the word embeddings with task embeddings : . are trained along with the network, and hopefully contains task-relevant information. This idea originated from the MaLOPa (MAny Language One PArser) parser [Ammar et al.2016].


In this setting, each task has its own predictor and two message encoders, one shared and the other specific to itself. The two encoder outputs are concatenated, linearly transformed, and fed into the predictor. This is an adaptation of the

Fenda(Frustratingly Easy Neural Domain Adaption) model in [Kim, Stratos, and Sarikaya2016], which in turn is a neural extension of the classic paper by daume2007 daume2007.

We also compare them against the two baselines:


Each task has its own predictor, thread encoder, and message encoder.


Each task has its own predictor. All tasks share the same thread encoder and message encoder. As we noted in section 3.4 it has been empirically found that the model is capable of learning a shared representation across tasks and domains.[Glorot, Bordes, and Bengio2011]

We evaluate our proposed models, feature-reparametrizing models, and the non-domain-adaptive baselines on tasks listed in table 4 in these following multidomain/multitask transfer settings: (E), (E+I), (E+R), (I+R), (E+I+R), where E=Email, I=IRC, R=Reddit. Note that since only the emails have two meta features E-A and E-T, we have (E) as our only multitask transfer setting. The results are in table 6. Difference between results from all models is small. We inspected the model outputs and found they all suffer severely from the label bias problem — all four tasks have very unbalanced label distributions; and the network learns to strongly favor the more frequent label. The label bias problem can potentially be addressed by using a globally normalized model which we leave as future work. Despite the small margins, we can see that both model- and feature-reparametrizing models outperform the baselines in terms of likelihood. Moreover, our reparametrized models consistently achieve higher likelihood than baselines on test data in all transfer settings. In addition, Add  and AddMul  perform comparably well against strong domain-adaptive models in terms of accuracy.

4.4 Recipient Action with Minimal Supervision

Table 7: Results of section 4.4. Add is significantly outperforming the best baseline Fenda() and AddMul  borderline significant () against Fenda , the best-performing domain-adaptive baseline model under paired -test. The difference between Addand AddMul  against other baseline models are also significant under paired

-test. Hyperparameters are regularization strength

and transfer setting.
 Setting E E+I+R E+R E+I
Table 8: Breakdown on different transfer settings.

We now turn to a task-based evaluation where we use our extracted thread embeddings on the task of predicting an email recipient’s next action. In particular, we focus on scenarios where we do not have a sizable amount of annotated data to train a neural network in an end-to-end fashion, and when we simply did not anticipate the task when we trained the model. This setting evaluates the network’s ability to generalize over multiple tasks and learn a good representation.

To be more specific, the setup is as follows: we use trained models from section 4.3 to encode thread embeddings from action-annotated emails of section 2. Subsequently we use these thread embeddings to train

-regularized logistic regression classifiers for the action labels. We compare them against classifiers trained with features extracted from the baselines

Tied, Disjoint, MaLOPa, and Fenda. We also compare it against doc2vec embeddings trained on the whole Avocado corpus (listed in table 7 as Doc2Vec).

Given the small size of annotated data, we decide to evaluate the models with nested cross validation (CV). In the outer layer, we randomly split the annotated emails into (train+dev)-test splits,555120 splits with a ratio of in a thread-wise fashion. In the inner layer, we use 7-fold CV on the (train+dev) split to find the best hyperparameters. The best hyperparameters are then used to train a classifier, which is subsequently evaluated on the test split of the outer layer CV. We report the average in table 7. Disjoint  performs poorly on this task since there is no baked-in constraint for it to learn a shared representation. All shared-representation baselines (Tied, Fenda, MaLOPa) performed better than both Disjoint  and Doc2Vec. Still, our reparametrized models compare favorably against the feature-reparametrizing baselines.

We do another cross validation evaluation, over different transfer settings in table 8. It seems that while both Reddit (E+R) and the IRC (E+I) datasets do better than email only (E), the IRC dataset is much more helpful than Reddit. This resonates with our initial findings in section 2.4 that the IRC dataset is more similar to emails. We note that all the scores are low. Nonetheless we find it encouraging that out-of-domain data is able to help learn a better representation in this extremely resource-scarce setting.

5 Related Work

There has been a lot of work on multidomain/task learning with shared representation as we described in section 1. Our work is also closely related to work on email speech act modeling and recognition [Cohen, Carvalho, and Mitchell2004, Lampert et al.2008, Jeong, Lin, and Lee2009, De Felice and Deane2012]. The idea of model reparametrization for domain adaption is abundant in the literature of hierarchical Bayesian modeling, such as finkel2009,eisenstein2011 finkel2009,eisenstein2011.

Within the deep learning literature, our work is also related to work on DNN reparametrization for multitask learning, such as spieckermann2014,yang2016b spieckermann2014,yang2016b. Our work shows the reparametrization approach also works for domain adaptation. Finally we would like to point out that ha2016 ha2016 introduces an alternative and much more sophisticated reparametrization of RNNs. An interesting future direction of our work is to follow this work by reparametrizing networks as hypernetworks that take a task embedding as an input. In that case, using the terminology introduced in this paper, we will be feature-reparametrizing the hypernetwork; which in turn model-reparametrizes an RNN.

6 Conclusion

In this paper, we have introduced an email recipient action annotation scheme, and a dataset annotated according to this scheme. By annotating the recipient action rather than the sender’s intent, our taxonomy is agnostic to specific speech act theories, and arguably more suitable for training systems that suggest such actions. We have curated an annotated dataset, which achieved good inter-annotator agreement levels. We have also introduced a hierarchical threaded message model Rainbow  to model such emails. To cope the problem of data scarcity, we have introduced RNN reparametrization as an approach to domain adaptation, and applied it onto the problem of email recipient action modeling. It is competitive against common feature-reparametrized neural models when trained in an end-to-end fashion. We also show that while it is not explicitly designed to encode a shared representation across tasks and domains, it learns to generalize in a minimally supervised scenario. There are many possible future directions of our work. For example, with appropriate software, we can obtain more annotation automatically, and possibly learn the taxonomy along. Also our reparametrization framework is quite extensible. For instance, user-specific parameters for each user can be learned for personalized models, as in li2016 li2016.