Veröffentlichungen

Während meiner Promotionstätigkeit an der FH Aachen habe ich einige Veröffentlichungen auf Konferenzen getätigt. Hier finden Sie die Liste meiner Veröffentlichungen mit den dazugehörigen Links. Sie können die Liste auch als Bibtex-Dokument herunterladen.

SE2013 Aachen, Deutschland

Software in the City: Visual Guidance Through Large Scale Software Projects

Die SE2013 fand in Aachen statt. Dort habe ich die Ergebnisse aus meiner Masterarbeit bei Microsoft vorgestellt.

Autoren: Marc Schreiber Stefan Hirtbach Bodo Kraft Andreas Steinmetzler

Publisher: Gesellschaft für Informatik (GI)

Erschienen in: Software Engineering 2013: Fachtagung des GI-Fachbereichs Softwaretechnik

ISBN: 978-3-88579-607-7

Erscheinungsjahr: 2013

Seite: 213–224

Zusammenfassung

The size of software projects at Microsoft are constantly increasing. This leads to the problem that developers and managers at Microsoft have trouble to comprehend and overview their own projects in detail. Regarding that there are some research projects at Microsoft with the goal to facilitate analyses on software projects. Those projects provide databases with metadata of the development process which developers, managers, and researchers can use. As an example, the data can be used for recommendation systems and bug analyses.

In the research field of visualization software there are a lot of approaches that try to visualize large software projects. One approach which seems to reach that goal is the visualization of software with real life metaphors. This paper combines existing research projects at Microsoft with a visualization approach that uses the city metaphor. The goal is to guide managers and developers in their day to day development decisions and to improve the comprehension of their software project.

Link

ICICS2014 Irbid, Jordanien

Using Continuous Integration to Organize and Monitor the Annotation Process of Domain Specific Corpora

Die ICICS2014 fand in Irbid in Jordanien an der Jordanien University of Science and Technology statt. Dort habe ich erste Ergebnisse meiner Arbeit aus dem Projekt ETL Quadrat gezeigt. Es war eine interessante Reise in eine arabische Kultur.

Autoren: Marc Schreiber Kai Barkschat Bodo Kraft

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Erschienen in: 5th International Conference on Information and Communication Systems (ICICS)

ISBN: 978-1-4799-3022-7

Erscheinungsjahr: 2014

Seite: 1–6

Zusammenfassung

Applications in the World Wide Web aggregatevast amounts of information from different data sources. The aggregation process is often implemented with Extract, Transform and Load (ETL) processes. Usually ETL processes require information for aggregation available in structured formats, e. g. XML or JSON. In many cases the information is provided in natural language text which makes the application of ETL processes impractical.

Due to the fact that information is provided in natural language, Information Extraction (IE) systems have been evolved. They make use of Natural Language Processing (NLP) tools to derive meaning from natural language text. State-of-the-art NLP tools apply Machine Learning methods. These NLP tools perform on newspapers with good quality, but they drop accuracy in other domains.

However, to improve the quality for IE systems in specific domains often NLP tools are trained on domain specific text which is a time consuming process. This paper introduces an approach using a Continuous Integration pipeline for organizing and monitoring the annotation process on domain specific corpora.

Link

NLP2015 Sydney, Australien

Quick Pad Tagger: An Efficient Graphical User Interface for Building Annotated Corpora with Multiple Annotation Layers

Die NLP2015 fand in Sydney statt. Dort habe ich weitere Ergebnisse meiner Arbeit aus dem Projekt ETL Quadrat gezeigt. Natürlich durfte ein Trip durch die Blue Mountains nicht fehlen.

Autoren: Marc Schreiber Kai Barkschat Bodo Kraft Albert Zündorf

Publisher: Academy and Industry Research Collaboration Center

Erschienen in: Computer Science & Information Technology

ISBN: 978-1-921987-32-8 ISSN: 0975-3826

Erscheinungsjahr: 2015

Seite: 131–143

Zusammenfassung

More and more domain specific applications in the internet make use of Natural Language Processing (NLP) tools (e.g. Information Extraction systems). The output quality of these applications relies on the output quality of the used NLP tools. Often, the quality can be increased by annotating a domain specific corpus. However, annotating a corpus is a time consuming and exhaustive task. To reduce the annotation time we present a custom Graphical User Interface for different annotation layers.

Link

SER&IP 2016 Austin, Texas, USA

Cost-efficient Quality Assurance of Natural Language Processing Tools Through Continuous Monitoring with Continuous Integration

Die SER&IP 2016 fand in Austin, Texas statt, die in Kombination mit der ICSE 2016 veranstaltet wurde. Es war eine große Software-Engineering-Konferenz, die mich auf viele neue Ideen gebracht hat und die mir die Möglichkeit bot, die amerakanische Lebensweise zu erleben.

Autoren: Marc Schreiber Bodo Kraft Albert Zündorf

Publisher: Association for Computing Machinery (ACM)

Erschienen in: Proceedings of the 3rd International Workshop on Software Engineering Research and Industrial Practice

ISBN: 978-1-4503-4170-7

Erscheinungsjahr: 2016

Seite: 46–52

Zusammenfassung

More and more modern applications make use of natural language data, e. g. Information Extraction (IE) or Question Answering (QA) systems. Those application require preprocessing through Natural Language Processing (NLP) pipelines, and the output quality of these applications depends on the output quality of NLP pipelines. If NLP pipelines are applied in different domains, the output quality decreases and the application requires domain specific NLP training to improve the output quality.

Adapting NLP tools to specific domains is a time-consuming and expensive task, inducing two key questions: a) how many documents need to be annotated to reach good output quality and b) what NLP tools build the best performing NLP pipeline? In this paper we demonstrate a monitoring system based on principles of Continuous Integration which addresses those questions and guides IE or QA application developers to build high quality NLP pipelines in a cost-efficient way. This monitoring system is based on many common tools, used in many software engineering projects.

Link

Aachen, Deutschland

Mit Maximum-Entropie das Parsing natürlicher Sprache erlernen

Da weder der Lehrplan vom Studiengang Scientific Programming noch vom Studiengang Technomathematik die theoretische Informatik enthält, erhielt ich die Auflage im Rahmen meiner Promotion durch ein Seminar die Kenntnisse in theoretischer Informatik zu erlangen. Dazu verfasste ich diesen Aufsatz.

Autoren: Marc Schreiber

Publisher: FH Aachen

Erscheinungsjahr: 2016

Zusammenfassung

Für die Verarbeitung von natürlicher Sprache ist ein wichtiger Zwischenschritt das Parsing, bei dem für Sätze der natürlichen Sprache Ableitungsbäume bestimmt werden. Dieses Verfahren ist vergleichbar zum Parsen formaler Sprachen, wie z. B. das Parsen eines Quelltextes. Die Parsing-Methoden der formalen Sprachen, z. B. Bottom-up-Parser, können nicht auf das Parsen der natürlichen Sprache übertragen werden, da keine Formalisierung der natürlichen Sprachen existiert.

In den ersten Programmen, die natürliche Sprache verarbeiten, wurde versucht die natürliche Sprache mit festen Regelmengen zu verarbeiten. Dieser Ansatz stieß jedoch schnell an seine Grenzen, da die Regelmenge nicht vollständig sowie nicht minimal ist und wegen der benötigten Menge an Regeln schwer zu verwalten ist. Die Korpuslinguistik bot die Möglichkeit, die Regelmenge durch Supervised-Machine-Learning-Verfahren abzulösen.

Teil der Korpuslinguistik ist es, große Textkorpora zu erstellen und diese mit sprachlichen Strukturen zu annotieren. Zu diesen Strukturen gehören sowohl die Wortarten als auch die Ableitungsbäume der Sätze. Vorteil dieser Methodik ist es, dass repräsentative Daten zur Verfügung stehen. Diese Daten werden genutzt, um mit Supervised-Machine-Learning-Verfahren die Gesetzmäßigkeiten der natürliche Sprachen zu erlernen. Das Maximum-Entropie-Verfahren ist ein Supervised-Machine-Learning-Verfahren, das genutzt wird, um natürliche Sprache zu erlernen. Ratnaparkhi nutzt Maximum-Entropie, um Ableitungsbäume für Sätze der natürlichen Sprache zu erlernen. Dieses Verfahren macht es möglich, die natürliche Sprache (abgebildet als Σ∗) trotz einer fehlenden formalen Grammatik zu parsen.

Link

SER&IP 2017 Buenos Aires, Argentina

Metrics Driven Research Collaboration: Focusing on Common Project Goals Continuously

Ein zweites Mal zur ICSE! Die SER&IP 2017 fand in Buenos Aires in Kombination der ICSE 2017 statt. Es gab wieder viele interessante Vorträge rund um das Thema Softwareentwicklung zu hören und es machte viel Spaß nach den Vorträgen ein sehr grünes Buenos Aires zu entdecken.

Autoren: Marc Schreiber Bodo Kraft Albert Zündorf

Publisher: IEEE Press

Erschienen in: Proceedings of the 4th International Workshop on Software Engineering Research and Industrial Practice

ISBN: 978-1-5386-2797-6

Erscheinungsjahr: 2017

Seite: 41–47

Zusammenfassung

Research collaborations provide opportunities for both practitioners and researchers: practitioners need solutions for difficult business challenges and researchers are looking for hard problems to solve and publish. Nevertheless, research collaborations carry the risk that practitioners focus on quick solutions too much and that researchers tackle theoretical problems, resulting in products which do not fulfill the project requirements.

In this paper we introduce an approach extending the ideas of agile and lean software development. It helps practitioners and researchers keep track of their common research collaboration goal: a scientifically enriched software product which fulfills the needs of the practitioner's business model.

This approach gives first-class status to application-oriented metrics that measure progress and success of a research collaboration continuously. Those metrics are derived from the collaboration requirements and help to focus on a commonly defined goal.

An appropriate tool set evaluates and visualizes those metrics with minimal effort, and all participants will be pushed to focus on their tasks with appropriate effort. Thus project status, challenges and progress are transparent to all research collaboration members at any time.

Link

IPIN 2017 Sapporo, Japan

Multi-pedestrian tracking by moving Bluetooth-LE beacons and stationary receivers

Oliver brachtete im Rahmen seiner Masterarbeit, die ich betreut habe, wie man Personen mittels Bluetooth-Beacons tracken kann. Seine Ergebnisse haben wir in einem Paper für die IPIN 2017 zusammengefasst, das er in Sapporo vorgestellt hat.

Autoren: Oliver Schmidts Maik Boltes Bodo Kraft Marc Schreiber

Erschienen in: 2017 International Conference on Indoor Positioning and Indoor Navigation (IPIN), 18-21 September 2017, Sapporo, Japan

Erscheinungsjahr: 2017

Seite: 1–4

Zusammenfassung

In this paper we propose an approach for tracking multiple pedestrians with head mounted Bluetooth low energy (LE) beacons in experiments for pedestrian dynamics. To simplify the setup and decrease the costs we invert the common setup for localization with stationary installed Bluetooth beacons for tracking smartphones. Our approach leads to multiple stationary installed receivers and moving Bluetooth beacons attached to peoples’ head. Thus we develop a common architecture setup for both scenarios where the independent positioning solver remains untouched even if the scenarios differ. We use fingerprinting based on stochastic regression for locating individuals in sub areas of rooms instead of determining their exact position.

Link

NAACL 2018 New Orleans, Louisiana

NLP Lean Programming Framework: Developing NLP Applications More Effectively

Die NAACL 2018 fand in New Orleans statt und ich durfte mein Open-Source-Framework NLPf, das die Entwicklung von NLP-Anwendungen vereinfachen und vereinheitlichen soll, dort der wissenschaftlichen Gemeinde präsentieren.

Autoren: Marc Schreiber Bodo Kraft Albert Zündorf

Publisher: Association for Computational Linguistics

Erschienen in: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations

Erscheinungsjahr: 2018

Seite: 1–5

Zusammenfassung

This paper presents NLP Lean Programming framework (NLPf), a new framework for creating custom natural language processing (NLP) models and pipelines by utilizing common software development build systems. This approach allows developers to train and integrate domain-specific NLP pipelines into their applications seamlessly. Additionally, NLPf provides an annotation tool which improves the annotation process significantly by providing a well-designed GUI and sophisticated way of using input devices. Due to NLPf’s properties developers and domain experts are able to build domain-specific NLP applications more efficiently. NLPf is Open-source software and available at https://gitlab.com/schrieveslaach/NLPf.

Link

SER&IP 2018 Gothenburg, Sweden

Continuously evaluated research projects in collaborative decoupled environments

Autoren: Oliver Schmidts Bodo Kraft Marc Schreiber Albert Zündorf

Erschienen in: 2018 ACM/IEEE 5th International Workshop on Software Engineering Research and Industrial PracticePractice

Erscheinungsjahr: 2018

Seite: 1–9

Zusammenfassung

Often, research results from collaboration projects are not transferred into productive environments even though approaches are proven to work in demonstration prototypes. These demonstration prototypes are usually too fragile and error-prone to be transferred easily into productive environments. A lot of additional work is required.

Inspired by the idea of an incremental delivery process, we introduce an architecture pattern, which combines the approach of Metrics Driven Research Collaboration with microservices for the ease of integration. It enables keeping track of project goals over the course of the collaboration while every party may focus on their expert skills: researchers may focus on complex algorithms, practitioners may focus on their business goals.

Through the simplified integration (intermediate) research results can be introduced into a productive environment which enables getting an early user feedback and allows for the early evaluation of different approaches. The practitioners' business model benefits throughout the full project duration.

Link

Towards Effective Natural Language Application Development

Meine Dissertation habe ich im Mai 2019 in Kassel vor meinem Doktorvätern Albert Zündorf und Bodo Kraft sowie der Prüfungskommission verteidigt. Nach einigen Jahren Arbeit ist es endlich soweit gewesen und mir wurde der Doktortitel verliehen.

Autoren: Marc Schreiber

Publisher: Universität Kassel

Erscheinungsjahr: 2019

Zusammenfassung

There is a current trend that more and more computer programs analyze written or spoken natural language. For example, DVAs, IE systems, machine translation systems, and many other types of programs process natural language in order to solve specific use cases when interacting with humans via natural language. Amazon, Google, and Mycroft AI are just some of the companies that have produced DVAs capable of interacting with humans via voice. Such NLP applications use techniques from computer science and artificial intelligence to address their use cases. Additionally, many companies have begun to evaluate the capacity of NLP applications to improve their business processes, for instance by automatically processing customer requests received via e-mail.

The development of NLP applications requires years of experience in computer science, artificial intelligence, ML, linguistics, and similar disciplines. Due to this requirement, development is exclusively available to computer science experts with many years of experience in computer science and artificial intelligence. Years of training and experience are therefore required in order to develop an NLP application capable of, for instance, automatically processing customer e-mails. However, the demand for NLP applications continues to grow, while the quantity of such computer science experts remains limited. Due to this growing demand, companies must be able to develop such applications using in-house developers without years of training or Ph. D. in computer science and artificial intelligence.

Based on this limitation, this thesis identifies the main obstacles encountered by developers without many years of experience in computer science when creating NLP applications. These obstacles are identified through a research project named ETL Quadrat, which aims at building an IE system for gathering EC data from human-readable documents. The development of the IE system is hindered by a number of obstacles.

Link

PREvant (Preview Servant): Composing Microservices into Reviewable and Testable Applications

Autoren: Marc Schreiber

Publisher: Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik

Erschienen in: Joint Post-proceedings of the First and Second International Conference on Microservices (Microservices 2017/2019)

ISBN: 978-3-95977-137-5 ISSN: 2190-6807

Erscheinungsjahr: 2020

Seite: 5:1–5:16

Zusammenfassung

This paper introduces PREvant (preview servant), a software tool which provides a simple RESTful API for deploying and composing containerized microservices as reviewable applications. PREvant’s API serves as a connector between continuous delivery pipelines of microservices and the infrastructure that hosts the applications. Based on the REST API and a web interface developers and domain experts at aixigo AG developed quality assurance workflows that help to increase and maintain high microservice quality.

Link