Sebastian Gehrmann

is this you? claim profile


  • Visual Interaction with Deep Learning Models through Collaborative Semantic Inference

    Automation of tasks can have critical consequences when humans lose agency over decision processes. Deep learning models are particularly susceptible since current black-box approaches lack explainable reasoning. We argue that both the visual interface and model structure of deep learning systems need to take into account interaction design. We propose a framework of collaborative semantic inference (CSI) for the co-design of interactions and models to enable visual collaboration between humans and algorithms. The approach exposes the intermediate reasoning process of models which allows semantic interactions with the visual metaphors of a problem, which means that a user can both understand and control parts of the model reasoning process. We demonstrate the feasibility of CSI with a co-designed case study of a document summarization system.

    07/24/2019 ∙ by Sebastian Gehrmann, et al. ∙ 14 share

    read it

  • End-to-End Content and Plan Selection for Data-to-Text Generation

    Learning to generate fluent natural language from structured data with neural networks has become an common approach for NLG. This problem can be challenging when the form of the structured data varies between examples. This paper presents a survey of several extensions to sequence-to-sequence models to account for the latent content selection process, particularly variants of copy attention and coverage decoding. We further propose a training method based on diverse ensembling to encourage models to learn distinct sentence templates during training. An empirical evaluation of these techniques shows an increase in the quality of generated text across five automated metrics, as well as human evaluation.

    10/10/2018 ∙ by Sebastian Gehrmann, et al. ∙ 2 share

    read it

  • Comparing Rule-Based and Deep Learning Models for Patient Phenotyping

    Objective: We investigate whether deep learning techniques for natural language processing (NLP) can be used efficiently for patient phenotyping. Patient phenotyping is a classification task for determining whether a patient has a medical condition, and is a crucial part of secondary analysis of healthcare data. We assess the performance of deep learning algorithms and compare them with classical NLP approaches. Materials and Methods: We compare convolutional neural networks (CNNs), n-gram models, and approaches based on cTAKES that extract pre-defined medical concepts from clinical notes and use them to predict patient phenotypes. The performance is tested on 10 different phenotyping tasks using 1,610 discharge summaries extracted from the MIMIC-III database. Results: CNNs outperform other phenotyping algorithms in all 10 tasks. The average F1-score of our model is 76 (PPV of 83, and sensitivity of 71) with our model having an F1-score up to 37 points higher than alternative approaches. We additionally assess the interpretability of our model by presenting a method that extracts the most salient phrases for a particular prediction. Conclusion: We show that NLP methods based on deep learning improve the performance of patient phenotyping. Our CNN-based algorithm automatically learns the phrases associated with each patient phenotype. As such, it reduces the annotation complexity for clinical domain experts, who are normally required to develop task-specific annotation rules and identify relevant phrases. Our method performs well in terms of both performance and interpretability, which indicates that deep learning is an effective approach to patient phenotyping based on clinicians' notes.

    03/25/2017 ∙ by Sebastian Gehrmann, et al. ∙ 0 share

    read it

  • LSTMVis: A Tool for Visual Analysis of Hidden State Dynamics in Recurrent Neural Networks

    Recurrent neural networks, and in particular long short-term memory (LSTM) networks, are a remarkably effective tool for sequence modeling that learn a dense black-box hidden representation of their sequential input. Researchers interested in better understanding these models have studied the changes in hidden state representations over time and noticed some interpretable patterns but also significant noise. In this work, we present LSTMVIS, a visual analysis tool for recurrent neural networks with a focus on understanding these hidden state dynamics. The tool allows users to select a hypothesis input range to focus on local state changes, to match these states changes to similar patterns in a large data set, and to align these results with structural annotations from their domain. We show several use cases of the tool for analyzing specific hidden state properties on dataset containing nesting, phrase structure, and chord progressions, and demonstrate how the tool can be used to isolate patterns for further statistical analysis. We characterize the domain, the different stakeholders, and their goals and tasks.

    06/23/2016 ∙ by Hendrik Strobelt, et al. ∙ 0 share

    read it

  • Seq2Seq-Vis: A Visual Debugging Tool for Sequence-to-Sequence Models

    Neural Sequence-to-Sequence models have proven to be accurate and robust for many sequence prediction tasks, and have become the standard approach for automatic translation of text. The models work in a five stage blackbox process that involves encoding a source sequence to a vector space and then decoding out to a new target sequence. This process is now standard, but like many deep learning methods remains quite difficult to understand or debug. In this work, we present a visual analysis tool that allows interaction with a trained sequence-to-sequence model through each stage of the translation process. The aim is to identify which patterns have been learned and to detect model errors. We demonstrate the utility of our tool through several real-world large-scale sequence-to-sequence use cases.

    04/25/2018 ∙ by Hendrik Strobelt, et al. ∙ 0 share

    read it

  • Bottom-Up Abstractive Summarization

    Neural network-based methods for abstractive summarization produce outputs that are more fluent than other techniques, but which can be poor at content selection. This work proposes a simple technique for addressing this issue: use a data-efficient content selector to over-determine phrases in a source document that should be part of the summary. We use this selector as a bottom-up attention step to constrain the model to likely phrases. We show that this approach improves the ability to compress text, while still generating fluent summaries. This two-step process is both simpler and higher performing than other end-to-end content selection models, leading to significant improvements on ROUGE for both the CNN-DM and NYT corpus. Furthermore, the content selector can be trained with as little as 1,000 sentences, making it easy to transfer a trained summarizer to a new domain.

    08/31/2018 ∙ by Sebastian Gehrmann, et al. ∙ 0 share

    read it

  • Very Highly Skilled Individuals Do Not Choke Under Pressure: Evidence from Professional Darts

    Understanding and predicting how individuals perform in high-pressure situations is of importance in designing and managing workplaces, but also in other areas of society such as disaster management or professional sports. For simple effort tasks, an increase in the pressure experienced by an individual, e.g. due to incentive schemes in a workplace, will increase the effort put into the task and hence in most cases also the performance. For the more complex and usually harder to capture case of skill tasks, there exists a substantial body of literature that fairly consistently reports a choking phenomenon under pressure. However, we argue that many of the corresponding studies have crucial limitations, such as neglected interaction effects or insufficient numbers of observations to allow within-individual analysis. Here, we investigate performance under pressure in professional darts as a near-ideal setting with no direct interaction between players and a high number of observations per subject. We analyze almost one year of tournament data covering 23,192 dart throws, hence a data set that is very much larger than those used in most previous studies. Contrary to what would be expected given the evidence in favor of a choking phenomenon, we find strong evidence for an overall improved performance under pressure, for nearly all 83 players in the sample. These results could have important consequences for our understanding of how highly skilled individuals deal with high-pressure situations.

    09/20/2018 ∙ by Christian Deutscher, et al. ∙ 0 share

    read it

  • Improving Human Text Comprehension through Semi-Markov CRF-based Neural Section Title Generation

    Titles of short sections within long documents support readers by guiding their focus towards relevant passages and by providing anchor-points that help to understand the progression of the document. The positive effects of section titles are even more pronounced when measured on readers with less developed reading abilities, for example in communities with limited labeled text resources. We, therefore, aim to develop techniques to generate section titles in low-resource environments. In particular, we present an extractive pipeline for section title generation by first selecting the most salient sentence and then applying deletion-based compression. Our compression approach is based on a Semi-Markov Conditional Random Field that leverages unsupervised word-representations such as ELMo or BERT, eliminating the need for a complex encoder-decoder architecture. The results show that this approach leads to competitive performance with sequence-to-sequence models with high resources, while strongly outperforming it with low resources. In a human-subject study across subjects with varying reading abilities, we find that our section titles improve the speed of completing comprehension tasks while retaining similar accuracy.

    04/15/2019 ∙ by Sebastian Gehrmann, et al. ∙ 0 share

    read it

  • LSTM Networks Can Perform Dynamic Counting

    In this paper, we systematically assess the ability of standard recurrent networks to perform dynamic counting and to encode hierarchical representations. All the neural models in our experiments are designed to be small-sized networks both to prevent them from memorizing the training sets and to visualize and interpret their behaviour at test time. Our results demonstrate that the Long Short-Term Memory (LSTM) networks can learn to recognize the well-balanced parenthesis language (Dyck-1) and the shuffles of multiple Dyck-1 languages, each defined over different parenthesis-pairs, by emulating simple real-time k-counter machines. To the best of our knowledge, this work is the first study to introduce the shuffle languages to analyze the computational power of neural networks. We also show that a single-layer LSTM with only one hidden unit is practically sufficient for recognizing the Dyck-1 language. However, none of our recurrent networks was able to yield a good performance on the Dyck-2 language learning task, which requires a model to have a stack-like mechanism for recognition.

    06/09/2019 ∙ by Mirac Suzgun, et al. ∙ 0 share

    read it

  • GLTR: Statistical Detection and Visualization of Generated Text

    The rapid improvement of language models has raised the specter of abuse of text generation systems. This progress motivates the development of simple methods for detecting generated text that can be used by and explained to non-experts. We develop GLTR, a tool to support humans in detecting whether a text was generated by a model. GLTR applies a suite of baseline statistical methods that can detect generation artifacts across common sampling schemes. In a human-subjects study, we show that the annotation scheme provided by GLTR improves the human detection-rate of fake text from 54 prior training. GLTR is open-source and publicly deployed, and has already been widely used to detect generated outputs

    06/10/2019 ∙ by Sebastian Gehrmann, et al. ∙ 0 share

    read it

  • Evaluating an Automated Mediator for Joint Narratives in a Conflict Situation

    Joint narratives are often used in the context of reconciliation interventions for people in social conflict situations, which arise, for example, due to ethnic or religious differences. The interventions aim to encourage a change in attitudes of the participants towards each other. Typically, a human mediator is fundamental for achieving a successful intervention. In this work, we present an automated approach to support remote interactions between pairs of participants as they contribute to a shared story in their own language. A key component is an automated cognitive tutor that guides the participants through a controlled escalation/de-escalation process during the development of a joint narrative. We performed a controlled study comparing a trained human mediator to the automated mediator. The results demonstrate that an automated mediator, although simple at this stage, effectively supports interactions and helps to achieve positive outcomes comparable to those attained by the trained human mediator.

    06/27/2019 ∙ by Massimo Zancanaro, et al. ∙ 0 share

    read it