A Workflow Manager for Complex NLP and Content Curation Pipelines

04/16/2020
by   Julian Moreno-Schneider, et al.
DFKI GmbH
0

We present a workflow manager for the flexible creation and customisation of NLP processing pipelines. The workflow manager addresses challenges in interoperability across various different NLP tasks and hardware-based resource usage. Based on the four key principles of generality, flexibility, scalability and efficiency, we present the first version of the workflow manager by providing details on its custom definition language, explaining the communication components and the general system architecture and setup. We currently implement the system, which is grounded and motivated by real-world industry use cases in several innovation and transfer projects.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

03/28/2020

Orchestrating NLP Services for the Legal Domain

Legal technology is currently receiving a lot of attention from various ...
01/26/2021

Spark NLP: Natural Language Understanding at Scale

Spark NLP is a Natural Language Processing (NLP) library built on top of...
03/02/2021

A Data-Centric Framework for Composable NLP Workflows

Empirical natural language processing (NLP) systems in application domai...
10/10/2020

Designing for Recommending Intermediate States in A Scientific Workflow Management System

To process a large amount of data sequentially and systematically, prope...
05/14/2021

Methods Included: Standardizing Computational Reuse and Portability with the Common Workflow Language

A widely used standard for portable multilingual data analysis pipelines...
10/28/2021

Be Lean – How to Fit a Model-Based System Architecture Development Process Based on ARP4754 Into an Agile Environment

An emerging service is moving the known aviation sector in terms of tech...
05/19/2018

Partitioning SKA Dataflows for Optimal Graph Execution

Optimizing data-intensive workflow execution is essential to many modern...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The last decades have seen a significant increase of digital data. To allow humans to understand and interact with this data, Natural Language Processing (NLP) tools targeted at specific tasks, such as Named Entity Recognition, Text Summarisation or Question Answering, are under constant development and improvement to be used together with other components in complex application scenarios. While several NLP tasks can be considered far from being solved and others increasingly maturing, one of the next challenges is the combination of different task-specific services based on modern micro-service architectures and service deployment paradigms.

Chaining tools together by combining their output requires not much more than simple interoperability regarding the annotation format used by the semantic enrichment services and individual NLP services. However, the notion of flexible workflows stretches, beyond annotation formats, to the flexible and efficient orchestration of NLP services. While a multitude of components and services is available, the next step, i. e., the management and integration into an infrastructural system, is not straightforward and proves challenging. This is problematic both for technology developers and users, as the whole is greater than the sum of its parts. Developers can add value to their tools by allowing the combination with other components. For users, the benefits of combining annotations obtained from NER with those obtained by coreference resolution, for example, are obvious. There have been several attempts, both commercial and open source, to address interoperability and service orchestration for scenarios that include the processing of document collections, achieving comparatively good results for specific use cases, tasks and domains (see Section 

2 for an overview).

Recently, new opportunities have been generated by the popularity of containerisation technologies (such as Docker111https://www.docker.com), that enable the deployment of services and tools independently from the environment in which they were developed. While integration benefits from this approach by enabling easy ingestion of services, the methodology comes with several challenges that need to be addressed, including, crucially, container management. This is not just about keeping services alive on different nodes, which can be done using tools such as Kubernetes222https://kubernetes.io or Openshift333https://www.openshift.com. The key challenge remains allowing the organisation and inter-connectivity of services in terms of their functionality, ensuring that they work together in an efficient and coordinated way.

The work presented in this paper is carried out under the umbrella of the QURATOR project444https://qurator.ai, in which a consortium of ten partners (ranging from research to industry) works on challenges encountered by the industry partners in their own specific sectors. The central use case addressed in the project is that of content curation in the areas of, among others, journalism, museum exhibitions and public archives [11, 1]

. In QURATOR, we develop a platform that allows users to curate large amounts of heterogeneous multimedia content (including text, image, audio, video). The content is collected, converted, aggregated, summarised and eventually presented in a way that allows the user to produce, for example, an investigative journalism article on a contemporary subject, content for the catalogue of a museum exhibition, or a comprehensive description of the life of a public figure, based on the contents of publicly available archive data on this person. To achieve this, we work with various combinations of different state-of-the-art NLP tools for NER, Sentiment Analysis, Text Summarisation, and several others, which we develop further and integrate into our platform. The interoperability and customisation of workflows, i. e., distributed processing pipelines, are a central technical challenge in the development of our platform.

The key contribution of this paper is the presentation of a novel workflow management system aimed at the sector-specific content curation processes mentioned above. Technically, the approach focuses on the management of containerised services and tools. The system design is optimised and aligned with regard to four different dimensions or requirements: (i) generality, to work with a diverse range of containerised services and tools, independent of the (programming) language or framework they are written in; (ii) flexibility, to allow services or tools – which may be running on different machines – to connect with one another in any order (to the extent that this makes sense, semantically); (iii) scalability, to allow the inclusion of additional services and tools; and (iv) efficiency, by avoiding unnecessary overhead in data storage as well as processing time.

The rest of the paper is structured as follows. Section 2 describes approaches similar to ours that support the specification of workflows for processing document collections. Section 3 provides an overview of the proposed system and lists requirements regarding the services to be included in workflows. Section 4 presents the workflow specification language. Section 5 outlines the general architecture and the following subsections provide more detail on individual components. Finally, Section 6 concludes the article and sketches directions for future work.

2 Related Work

The orchestration and operationalisation of the processing of large amounts of content through a series of tools has been studied and tested in the field of NLP (and others) from many different angles for decades. There is a sizable amount of tools, systems, frameworks and initiatives that address the issue but their off-the-shelf applicability to concrete use cases and heterogeneous sets of services is still an enormous challenges.

One of the most well known industry-driven workflow definition languages is Business Process Model and Notation (BPMN, and its re-definition BPMN V2.0) [9]. Many tools support BPMN, some of them open source (Comidor, Processmaker, Activiti, Bonita BPM or Camunda), others commercial (Signavio Workflow Accelerator, Comindware, Flokzu or Bizagi). There are also other business process management systems, not all of which are based on BPMN, such as WorkflowGen555https://www.workflowgen.com, ezFlow666http://www.ezflow.it, Pipefy777https://www.pipefy.com, Avaza888https://www.avaza.com or Proces.io999http://proces.io. Their main disadvantage with regard to our use case is that they primarily aim at modelling actual business processes at companies, including support to represent human-centric tasks (i. e., foreseen human interaction tasks). This focus on support deviates from our use case, in which a human user interacts with the content, but not necessarily with other humans.

Another class of relevant software are frameworks for container management, focusing on parallelisation management, scalability and clustering. Examples are Kubernetes, Openshift, Rancher101010https://rancher.com and Openstack111111https://www.openstack.org. We use Kubernetes for cluster management. However, because this does not cover (NLP) task orchestration or address interoperability, with our workflow manager we go beyond the typical Kubernetes use case.

On the other hand, there are numerous frameworks and tool kits that focus more on workflow management and the flexible definition of processing pipelines (and less on the technical, hardware related implementations like Kubernetes, Openshift and Rancher). Examples are Apache Kafka121212https://kafka.apache.org, a distributed streaming platform; Apache Commons131313http://commons.apache.org/sandbox/commons-pipeline/; Apache NIFI141414https://nifi.apache.org; Apache Airflow151515https://airflow.apache.org; Kylo161616https://kylo.io; and Apache Taverna171717https://taverna.incubator.apache.org. With our workflow manager, we attempt to cover these workflow-focused features, but, crucially, combine them with the more technical details for cluster management and scalability.

Specifically targeted at NLP, some popular systems are GATE [2] and UIMA [3], and, more recently (but covering a narrower range of tasks), SpaCy181818https://spacy.io. While the data representation format is based on a standard format for some of these (GATE for example supports exporting data in XML), we attempt to extend beyond this and use the NLP Interchange Format (NIF) [4]. Using NIF ensures interoperability for different NLP tasks while at the same time addressing storage and scalability needs. Since NIF is based on RDF triples, the resulting annotations can be included in a triple store to allow for efficient storage and querying. In addition, the above-mentioned systems are designed to run on single systems. Our workflow manager is designed to combine output from different micro-services that address different NLP services, potentially running on different machines. In addition to the above, CLARIN [5] provides an infrastructure for natural language research data and tools. The focus, however, is on sharing resources and not on building NLP pipelines or workflows. A more exhaustive and complete overview of related work can be found in [10].

3 System Overview

The objective of the QURATOR project is to facilitate the execution of complex tasks in the area of content curation. The human experts performing these tasks typically have limited technical skills and are expected to analyse, aggregate, summarise and re-arrange the information contained in the content collections they work with. The Curation Workflow Manager aims to support these users, by allowing them to flexibly and intuitively define just the workflow they need. Ultimately, the aim is to make this as intuitive as using a single call to a single system. The single system will be the Workflow Manager, and the single call will be the request to process the document collection using a specific workflow. The workflow includes all the needed services (i. e., which services, such as NER, summarisation, topic modeling, clustering, etc. to include, and which parameters, such as language or domain, to set). The order of the services, and which can be parallelised, can be specified, as well as which data needs to be stored internally (for immediate processing) or externally. Afterwards, the processed content collection is meant to be presented in a GUI, featuring the relevant data visualisation components, given the original document collection and the result of the individual semantic enrichment processes that have run.

While from a user’s perspective, this high level description may sound similar to comparable systems like GATE (Section 2), the following description provides an idea of the intended deployment scale and ambition of the workflow manager. Though developed in the context of the QURATOR project, we plan to implement the workflow manager also in the technical platform architecture developed in the project European Language Grid.191919https://www.european-language-grid.eu. The main objective of the project ELG is to create the primary platform for Language Technology in Europe [10]. Release 1 alpha of the European Language Grid platform was made available in March 2020 and provides access to more than 150 services including NER, concept identification, dependency parsing, ASR and TTS.

3.1 Service Requirements

Since we want to allow for the inclusion of as many different services as possible in a workflow, yet at the same time have to ensure that they work together seamlessly, we specified a few core dimensions along which to classify services, to establish whether or not they can be included. First (i), we check whether a service is dockerised or not. Then (ii), we check the execution procedure, i. e., is it a fully automated service, or is human intervention or interaction included, or even at the very core (such as, for example, in annotation editors). Furthermore, we check (iii) where the service is located, i. e., is it included in the Docker cluster or is it a service hosted externally? Finally (iv), we check how the service is communicating, i. e., is it accessible through a REST API or a command-line interface? If a given service is (i) dockerised (or otherwise containerised), (ii) does not need human intervention, (iii) is stored inside our Docker cluster and (iv) has a REST API interface through which it can be accessed, we conclude that the service can be included in our workflow.

202020As part of future work we will investigate if and how these core dimensions can be included in the metadata scheme that governs all metadata entries for all services in order to automate this process as much as possible [6].

4 Curation Workflow Definition Language

To facilitate the definition of workflows for users with limited technical knowledge (i. e., little to no programming experience), we opted for the widely used JSON format to specify workflows, considering that the specification of actual workflows will be carried out using a corresponding graphical user interface.

We specified a JSON-based Curation Workflow Definition Language (CWDL). It currently supports the inclusion of services with REST API access [13] (i. e., services must be accessible through HTTP calls), and allows users to specify whether these services should be executed in a synchronous or asynchronous way. The execution in a sequential or parallel fashion can also be specified.

A workflow relies on three main components: controllers, tasks and templates. The controllers element relates to a service to be included. This element communicates basic identity information (controllerName, serviceId, controllerId), queue information (nameInput{Normal|Priority}) and connection information (connection) to the micro-services it is calling. The connection element contains information needed to communicate with the service (via REST API), including method, endpoint_url, parameters, headers and body. Listing 1 shows an example.

The next element, task, sends messages to and from a controller through the messaging control system. The taskId and controllerId fields contain identifying information on the two. Listing 2 illustrates this using an example.

1{
2 "controllerName": "NER Controller",
3 "serviceId": "NER",
4 "controllerId": "NERController",
5 "queues": {
6  "nameInputNormal": "NER_input_normal",
7  "nameInputPriority": "NER_input_prio"
8 },
9 "connection": {
10  "connection_type": "restapi",
11  "method": "POST",
12  "endpoint_url": "http://<host>/path/",
13  "parameters": [
14   {"name": "language","type": "parameter",
15   "default_value": "en","required": true},
16   {"name": "models","type": "parameter",
17   "default_value": "model_1;model_2","required": true},
18   ...],
19  "body": {
20   "content": "documentContentNIF"
21  },
22  "headers": [
23   {"name": "Accept","type": "header",
24    "default_value": "text/turtle","required": true},
25   {"name": "Content-Type","type": "header",
26    "default_value": "text/turtle","required": true}
27  ]
28 }
29}
Listing 1: Example of a Controller definition that connects to an external REST API service.
1{
2  "taskName": "NER Task",
3  "taskId": "NERTask",
4  "controllerId": "NER",
5  "component_type": "rabbitmqrestapi"
6}
Listing 2: Example of a Task definition.

The third element, template, specifies which micro-services are included in the workflow. Basic identification information is specified in workflowTemplateId. The different micro-services included in the template are contained in tasks. Inside this element, the following information is specified:

  1. ParallelTask executes multiple tasks in parallel.

  2. SequentialTask executes tasks sequentially.

  3. split splits the input information to every output.

  4. waitcombiner waits until all connected inputs have finished to combine their results and proceed.

Listing 3 shows an example of the template element.

1{
2 "workflowTemplateName": "GLK",
3 "workflowTemplateId": "ML_GLK",
4 "workflowTemplateDescription": "...",
5 "tasks": [
6  {
7   "order": 1,
8   "taskId": "ParallelTask",
9   "features":{
10    "input": {"component_type": "split"},
11    "output": {"component_type": "waitcombiner"},
12    "tasks":[
13     {"order": 1, "taskId": "NERTask"},
14     {"order": 2, "taskId": "GEOTask"},
15     ...]
16   }
17  },
18  ...]
19}
Listing 3: Example of a workflow definition.

We plan to improve this basic scheme and will make it compliant with BPMN V2.0 in its next iteration.

5 Curation Workflow Manager Architecture

In Section 4, the description of the JSON-based workflow definition language outlines how to instruct the workflow manager to perform complex tasks. In this section, we outline how these task definitions are translated into processes and procedures, by explaining the workflow manager architecture. Our previous work includes a generic workflow manager for curation technologies [1, 11], and two indicative descriptions of an initial prototype of a workflow manager that we conceptualised based on use cases in the legal domain [7, 8]. Figure 1 illustrates this architecture, its individual components are described in the following subsections.

Figure 1: Architecture of the Curation Workflow Manager (CWM)

5.1 Workflow Execution Engine

The core component of the workflow manager is the Workflow Execution Engine. This component manages workflows, from their definition to the management of its execution to the final results that are produced. In the CWM a workflow is composed of the three components described in Section 4, and a workflow execution. More specifically:

  • A controller is a component whose main purpose is to communicate with a service (see Section 5.2).

  • A task can be anything that has to do with taking input in a certain format, and producing output. This can be enriching text though NLP components, converting data to a required format for specific other tasks, combining information from different upstream tasks, or deciding which task to perform next, depending on parameters that are either set in the configuration, or that are the outcome of upstream processing.

  • A template is an abstract definition of a workflow composed of a combination of tasks. It is, in the literal sense of the word, a preset for a collection of tasks that together form a logical processing pipeline. In the object-oriented programming paradigm, it would be the equivalent of a class, i. e., the definition of an object (and the objects would be tasks).

  • A workflow execution is an instance of a workflow template, i. e., a complete workflow created with specific task objects. The workflow execution would be equivalent to an instantiated object in the object-oriented analogy.

5.2 Controllers

Every service is required to be accessible through a REST API and must allow both sending and receiving of task-specific messages. Because the services are independently developed, and their behaviour may change with new versions, the way to communicate with them may change as well. We, therefore, introduce the concept of a proxy element between the messaging control system (for which we use RabbitMQ, see Section 5.3) and the service. This proxy element is the controller. We attempt to maximise flexibility by updating the controller whenever the service changes, so that the rest of the communication chain can remain untouched.

In the current implementation, the controller connects to RabbitMQ and waits for receiving messages. Whenever a message is received, the controller processes its contents and generates a HTTP request for the corresponding service. Depending on whether or not the service in question executes in a synchronous or asynchronous way, the controller waits for the response, or checks back in to collect it later, and subsequently communicates the result.

5.3 Communication Module

The communication module, based on the message control system RabbitMQ, allows the exchange of information between the different workflow components, or with components external to the workflow. As mentioned above, our system requires individual services to be accessible via REST API, and supports both synchronous and asynchronous execution of services.

This communication entails both information relating to tasks to be performed, as well as the result or output of the tasks themselves. We use RabbitMQ, because it allows larger message contents than some of its competitors (Apache Kafka, for example). RabbitMQ handles the communication between the workflow execution engine and the services (through controllers). Both the workflow execution engine and the controllers send messages to and receive messages from RabbitMQ during the execution of a workflow. The workflow execution engine sends a message to every service (through its proxy element, the controller), to execute a processing step. After finishing the processing, the service sends a new message with the result (again, through the controller) to the workflow execution engine.

The CWM is designed to cover complex curation tasks, which can potentially include large files. Since we want to avoid such larger files to use and thereby block resources for other processes, we implemented a priority feature in RabbitMQ queues. We reserve high priority processes for smaller documents and/or processes that take place in (semi-)real-time, while larger documents or more complex tasks can use normal/low priority for offline processing.

5.4 Information Exchange Format

Since interoperability is a key feature of the CWM, we must settle on a shared annotation format which all (or at least most212121This is, first and foremost, relevant if tasks are relying on output of upstream tasks, or their output is input to downstream tasks.) micro-services can work with and further augment in case of pipeline processing. Instead of defining our own format for this, we use the NLP Interchange Format (NIF) [4]. NIF includes an ontology that defines the way in which documents are annotated, with strong roots in the Linked Data paradigm. This allows for easy referencing of external knowledge bases (such as Wikidata) in the annotations on a document. NIF can be serialised in XML-like (RDF-XML), JSON-like (JSON-LD) or N3/turtle (RDF triple) formats. This serialised format is what is communicated as input or output for specific services. An example NIF (turtle) document with annotated named entities is shown in Listing 4 in the Appendix.

5.5 Access Control

Access control for the various API endpoints is defined by the corresponding module, which specifies which operations are allowed for the endpoints of the different components, i. e., how a workflow is modified.

This module defines 12 methods that allow a user to (i) initialize and stop the CWM, (ii) view, create, modify and delete elements necessary to define workflows (i. e., tasks, controllers, templates and workflow executions, (iii) execute a specific workflow, and (iv) obtain the result of a workflow. An overview is provided in Figure 2.

Figure 2: REST APIs

In addition to the above mentioned functionalities, this module also handles security by allowing only users included in a pre-defined list to access the functionalities listed in Figure 2. We are currently working on more detailed user management by implementing user profiles, allowing certain users to access certain procedures only. This improvement will be included in a future version of the workflow manager.

6 Conclusions and Future Work

We present an approach of connecting services and tools developed on different platforms and environments, in order to make them work together by means of a Curation Workflow Manager. The tool is built around the key principles of generality, flexibility, scalability and efficiency. It allows the combination of different tools, i. e., containerised micro-services, in the wider area of NLP, Information Retrieval, Question Answering, and Knowledge Management (triple stores) and uses a shared annotation format (NIF) throughout, addressing, respectively, the generality and flexibility principles. Our main motivation for developing the workflow manager, which comes with its own JSON-based definition language, was to address – under the umbrella of a larger Curation Technology platform – interoperability challenges and hardware-based resource-sharing and -handling issues in one go, addressing, respectively, the scalability and efficiency principles.

The CWM is meant to process large documents, but is, as of now, restricted to text documents. As part of future work, we will also include the processing of multimedia files (images, audio, video). The curation workflow manager’s design will be revised and extended accordingly. Furthermore, we plan to evaluate the workflow manager in a real-world use case provided by one of the partners in the QURATOR project. Additionally, we plan to integrate the CWM in the ELG platform in the medium to long term [10, 6, 12]. We currently work on extensions to the workflow definition language; its next iteration will be compliant with the standardised Business Process Model and Notation, increasing the sustainability and adaptability of our approach. Finally, we are currently considering the development of a visual editor (i. e., a GUI) to define and modify workflows, inspired by the GUI offered by Camunda222222https://camunda.com/products/modeler/.

The source code of the Curation Workflow Manager is available on Gitlab.232323https://gitlab.com/qurator-platform/dfki/curationworkflowmanager

Acknowledgements

The work presented in this paper has received funding from the German Federal Ministry of Education and Research (BMBF) through the project QURATOR (Wachstumskern no. 03WKDA1A) as well as from the European Union’s Horizon 2020 research and innovation programme under grant agreements no. 825627 (European Language Grid) and no. 780602 (Lynx).

7 Bibliographical References

References

  • [1] P. Bourgonje, J. Moreno-Schneider, J. Nehring, G. Rehm, F. Sasaki, and A. Srivastava (2016-06) Towards a Platform for Curation Technologies: Enriching Text Collections with a Semantic-Web Layer. In The Semantic Web, H. Sack, G. Rizzo, N. Steinmetz, D. Mladenić, S. Auer, and C. Lange (Eds.), Lecture Notes in Computer Science, pp. 65–68. Note: ESWC 2016 Satellite Events. Heraklion, Crete, Greece, May 29 – June 2, 2016 Revised Selected Papers Cited by: §1, §5.
  • [2] H. Cunningham, D. Maynard, K. Bontcheva, V. Tablan, N. Aswani, I. Roberts, G. Gorrell, A. Funk, A. Roberts, D. Damljanovic, T. Heitz, M. A. Greenwood, H. Saggion, J. Petrak, Y. Li, and W. Peters (2011) Text Processing with GATE (Version 6). External Links: ISBN 978-0956599315, Link Cited by: §2.
  • [3] D. Ferrucci and A. Lally (2004-09) UIMA: An Architectural Approach to Unstructured Information Processing in the Corporate Research Environment. Natural Language Engineering 10 (3-4), pp. 327–348. External Links: Document Cited by: §2.
  • [4] S. Hellmann, J. Lehmann, S. Auer, and M. Brümmer (2013) Integrating NLP using Linked Data. In 12th International Semantic Web Conference, Note: 21-25 October Cited by: §2, §5.4.
  • [5] E. Hinrichs and S. Krauwer (2014-05) The CLARIN Research Infrastructure: Resources and Tools for eHumanities Scholars. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), Reykjavik, Iceland, pp. 1525–1531. External Links: Link Cited by: §2.
  • [6] P. Labropoulou, K. Gkirtzou, M. Gavriilidou, M. Deligiannis, D. Galanis, S. Piperidis, G. Rehm, M. Berger, V. Mapelli, M. Rigault, V. Arranz, K. Choukri, G. Backfried, J. M. G. Pérez, and A. Garcia-Silva (2020-05) Making Metadata Fit for Next Generation Language Technology Platforms: The Metadata Schema of the European Language Grid. In Proceedings of the 12th Language Resources and Evaluation Conference (LREC 2020), N. Calzolari, F. Béchet, P. Blache, C. Cieri, K. Choukri, T. Declerck, H. Isahara, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, and S. Piperidis (Eds.), Marseille, France, pp. . Note: Accepted for publication. Cited by: §6, footnote 20.
  • [7] J. Moreno-Schneider and G. Rehm (2018-05)

    Curation Technologies for the Construction and Utilisation of Legal Knowledge Graphs

    .
    In Proceedings of the LREC 2018 Workshop on Language Resources and Technologies for the Legal Knowledge Graph, G. Rehm, V. Rodríguez-Doncel, and J. Moreno-Schneider (Eds.), Miyazaki, Japan, pp. 23–29. Note: 12 May 2018 Cited by: §5.
  • [8] J. Moreno-Schneider and G. Rehm (2018-05) Towards a Workflow Manager for Curation Technologies in the Legal Domain. In Proceedings of the LREC 2018 Workshop on Language Resources and Technologies for the Legal Knowledge Graph, G. Rehm, V. Rodríguez-Doncel, and J. Moreno-Schneider (Eds.), Miyazaki, Japan, pp. 30–35. Note: 12 May 2018 Cited by: §5.
  • [9] Cited by: §2.
  • [10] G. Rehm, M. Berger, E. Elsholz, S. Hegele, F. Kintzel, K. Marheinecke, S. Piperidis, M. Deligiannis, D. Galanis, K. Gkirtzou, P. Labropoulou, K. Bontcheva, D. Jones, I. Roberts, J. Hajic, J. Hamrlová, L. Kacena, K. Choukri, V. Arranz, A. Vasiljevs, O. Anvari, A. Lagzdins, J. Melnika, G. Backfried, E. Dikici, M. Janosik, K. Prinz, C. Prinz, S. Stampler, D. Thomas-Aniola, J. M. G. Pérez, A. G. Silva, C. Berrío, U. Germann, S. Renals, and O. Klejch (2020-05) European Language Grid: An Overview. In Proceedings of the 12th Language Resources and Evaluation Conference (LREC 2020), N. Calzolari, F. Béchet, P. Blache, C. Cieri, K. Choukri, T. Declerck, H. Isahara, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, and S. Piperidis (Eds.), pp. . Note: Accepted for publication. Cited by: §2, §3, §6.
  • [11] G. Rehm, P. Bourgonje, S. Hegele, F. Kintzel, J. M. Schneider, M. Ostendorff, K. Zaczynska, A. Berger, S. Grill, S. Räuchle, J. Rauenbusch, L. Rutenburg, A. Schmidt, M. Wild, H. Hoffmann, J. Fink, S. Schulz, J. Seva, J. Quantz, J. Böttger, J. Matthey, R. Fricke, J. Thomsen, A. Paschke, J. A. Qundus, T. Hoppe, N. Karam, F. Weichhardt, C. Fillies, C. Neudecker, M. Gerber, K. Labusch, V. Rezanezhad, R. Schaefer, D. Zellhöfer, D. Siewert, P. Bunk, L. Pintscher, E. Aleynikova, and F. Heine (2020-02) QURATOR: Innovative Technologies for Content and Data Curation. In Proceedings of QURATOR 2020 – The conference for intelligent content solutions, A. Paschke, C. Neudecker, G. Rehm, J. A. Qundus, and L. Pintscher (Eds.), Berin, Germany. Note: CEUR Workshop Proceedings, Volume 2535. 20/21 January 2020 Cited by: §1, §5.
  • [12] G. Rehm, D. Galanis, P. Labropoulou, S. Piperidis, M. Welß, R. Usbeck, J. Köhler, M. Deligiannis, K. Gkirtzou, J. Fischer, C. Chiarcos, N. Feldhus, J. Moreno-Schneider, F. Kintzel, E. Montiel, V. R. Doncel, J. P. McCrae, D. Laqua, I. P. Theile, C. Dittmar, K. Bontcheva, I. Roberts, A. Vasiljevs, and A. Lagzdins (2020-05) Towards an Interoperable Ecosystem of AI and LT Platforms: A Roadmap for the Implementation of Different Levels of Interoperability. In Proceedings of the 1st International Workshop on Language Technology Platforms (IWLTP 2020, co-located with LREC 2020), G. Rehm, K. Bontcheva, K. Choukri, J. Hajic, S. Piperidis, and A. Vasiljevs (Eds.), Marseille, France. Note: 16 May 2020. Accepted for publication. Cited by: §6.
  • [13] L. Richardson, M. Amundsen, and S. Ruby (2013) RESTful web apis. O’Reilly Media, Inc.. External Links: ISBN 1449358063, 9781449358068 Cited by: §4.

Appendix

 a                      nif:RFC5147String, nif:String, nif:Context ;
 nif:beginIndex         "0"^^xsd:nonNegativeInteger ;
 nif:endIndex           "25"^^xsd:nonNegativeInteger ;
 nif:isString           "Monteux was born in Paris"^^xsd:string .
 a                      nif:RFC5147String, nif:String ;
 nif:anchorOf           "Paris"^^xsd:string ;
 nif:beginIndex         "20"^^xsd:nonNegativeInteger ;
 nif:endIndex           "25"^^xsd:nonNegativeInteger ;
 nif:entity             <http://dkt.dfki.de/ontologies/nif#LOC> ;
 nif:referenceContext   <http://dkt.dfki.de/documents/#char=0,25> ;
 itsrdf:taIdentRef      <http://www.geonames.org/2988507> .
 a                      nif:RFC5147String, nif:String ;
 nif:anchorOf           "Monteux"^^xsd:string ;
 nif:beginIndex         "0"^^xsd:nonNegativeInteger ;
 nif:endIndex           "7"^^xsd:nonNegativeInteger ;
 nif:entity             <http://dkt.dfki.de/ontologies/nif#PER> ;
 nif:referenceContext   <http://dkt.dfki.de/documents/#char=0,25> ;
 itsrdf:taIdentRef      <http://d-nb.info/gnd/122700198> .
Listing 4: An example NIF document.