Veröffentlichungen
Software in the City: Visual Guidance Through Large Scale Software Projects
Autoren: Marc Schreiber Stefan Hirtbach Bodo Kraft Andreas Steinmetzler
Publisher: Gesellschaft für Informatik (GI)
Erschienen in: Software Engineering 2013: Fachtagung des GI-Fachbereichs Softwaretechnik
ISBN: 978-3-88579-607-7
Erscheinungsjahr: 2013
Seite: 213–224
Zusammenfassung
The size of software projects at Microsoft are constantly increasing. This leads to the problem that developers and managers at Microsoft have trouble to comprehend and overview their own projects in detail. Regarding that there are some research projects at Microsoft with the goal to facilitate analyses on software projects. Those projects provide databases with metadata of the development process which developers, managers, and researchers can use. As an example, the data can be used for recommendation systems and bug analyses.
In the research field of visualization software there are a lot of approaches that try to visualize large software projects. One approach which seems to reach that goal is the visualization of software with real life metaphors. This paper combines existing research projects at Microsoft with a visualization approach that uses the city metaphor. The goal is to guide managers and developers in their day to day development decisions and to improve the comprehension of their software project.
Using Continuous Integration to Organize and Monitor the Annotation Process of Domain Specific Corpora
Autoren: Marc Schreiber Kai Barkschat Bodo Kraft
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Erschienen in: 5th International Conference on Information and Communication Systems (ICICS)
ISBN: 978-1-4799-3022-7
Erscheinungsjahr: 2014
Seite: 1–6
Zusammenfassung
Applications in the World Wide Web aggregatevast amounts of information from different data sources. The aggregation process is often implemented with Extract, Transform and Load (ETL) processes. Usually ETL processes require information for aggregation available in structured formats, e. g. XML or JSON. In many cases the information is provided in natural language text which makes the application of ETL processes impractical.
Due to the fact that information is provided in natural language, Information Extraction (IE) systems have been evolved. They make use of Natural Language Processing (NLP) tools to derive meaning from natural language text. State-of-the-art NLP tools apply Machine Learning methods. These NLP tools perform on newspapers with good quality, but they drop accuracy in other domains.
However, to improve the quality for IE systems in specific domains often NLP tools are trained on domain specific text which is a time consuming process. This paper introduces an approach using a Continuous Integration pipeline for organizing and monitoring the annotation process on domain specific corpora.
Quick Pad Tagger: An Efficient Graphical User Interface for Building Annotated Corpora with Multiple Annotation Layers
Autoren: Marc Schreiber Kai Barkschat Bodo Kraft Albert Zündorf
Publisher: Academy and Industry Research Collaboration Center
Erschienen in: Computer Science & Information Technology
ISBN: 978-1-921987-32-8 ISSN: 0975-3826
Erscheinungsjahr: 2015
Seite: 131–143
Zusammenfassung
More and more domain specific applications in the internet make use of Natural Language Processing (NLP) tools (e.g. Information Extraction systems). The output quality of these applications relies on the output quality of the used NLP tools. Often, the quality can be increased by annotating a domain specific corpus. However, annotating a corpus is a time consuming and exhaustive task. To reduce the annotation time we present a custom Graphical User Interface for different annotation layers.
Cost-efficient Quality Assurance of Natural Language Processing Tools Through Continuous Monitoring with Continuous Integration
Autoren: Marc Schreiber Bodo Kraft Albert Zündorf
Publisher: Association for Computing Machinery (ACM)
Erschienen in: Proceedings of the 3rd International Workshop on Software Engineering Research and Industrial Practice
ISBN: 978-1-4503-4170-7
Erscheinungsjahr: 2016
Seite: 46–52
Zusammenfassung
More and more modern applications make use of natural language data, e. g. Information Extraction (IE) or Question Answering (QA) systems. Those application require preprocessing through Natural Language Processing (NLP) pipelines, and the output quality of these applications depends on the output quality of NLP pipelines. If NLP pipelines are applied in different domains, the output quality decreases and the application requires domain specific NLP training to improve the output quality.
Adapting NLP tools to specific domains is a time-consuming and expensive task, inducing two key questions: a) how many documents need to be annotated to reach good output quality and b) what NLP tools build the best performing NLP pipeline? In this paper we demonstrate a monitoring system based on principles of Continuous Integration which addresses those questions and guides IE or QA application developers to build high quality NLP pipelines in a cost-efficient way. This monitoring system is based on many common tools, used in many software engineering projects.
Mit Maximum-Entropie das Parsing natürlicher Sprache erlernen
Autoren: Marc Schreiber
Publisher: FH Aachen
Erscheinungsjahr: 2016
Zusammenfassung
Für die Verarbeitung von natürlicher Sprache ist ein wichtiger Zwischenschritt das Parsing, bei dem für Sätze der natürlichen Sprache Ableitungsbäume bestimmt werden. Dieses Verfahren ist vergleichbar zum Parsen formaler Sprachen, wie z. B. das Parsen eines Quelltextes. Die Parsing-Methoden der formalen Sprachen, z. B. Bottom-up-Parser, können nicht auf das Parsen der natürlichen Sprache übertragen werden, da keine Formalisierung der natürlichen Sprachen existiert.
In den ersten Programmen, die natürliche Sprache verarbeiten, wurde versucht die natürliche Sprache mit festen Regelmengen zu verarbeiten. Dieser Ansatz stieß jedoch schnell an seine Grenzen, da die Regelmenge nicht vollständig sowie nicht minimal ist und wegen der benötigten Menge an Regeln schwer zu verwalten ist. Die Korpuslinguistik bot die Möglichkeit, die Regelmenge durch Supervised-Machine-Learning-Verfahren abzulösen.
Teil der Korpuslinguistik ist es, große Textkorpora zu erstellen und diese mit sprachlichen Strukturen zu annotieren. Zu diesen Strukturen gehören sowohl die Wortarten als auch die Ableitungsbäume der Sätze. Vorteil dieser Methodik ist es, dass repräsentative Daten zur Verfügung stehen. Diese Daten werden genutzt, um mit Supervised-Machine-Learning-Verfahren die Gesetzmäßigkeiten der natürliche Sprachen zu erlernen. Das Maximum-Entropie-Verfahren ist ein Supervised-Machine-Learning-Verfahren, das genutzt wird, um natürliche Sprache zu erlernen. Ratnaparkhi nutzt Maximum-Entropie, um Ableitungsbäume für Sätze der natürlichen Sprache zu erlernen. Dieses Verfahren macht es möglich, die natürliche Sprache (abgebildet als Σ∗) trotz einer fehlenden formalen Grammatik zu parsen.
Metrics Driven Research Collaboration: Focusing on Common Project Goals Continuously
Autoren: Marc Schreiber Bodo Kraft Albert Zündorf
Publisher: IEEE Press
Erschienen in: Proceedings of the 4th International Workshop on Software Engineering Research and Industrial Practice
ISBN: 978-1-5386-2797-6
Erscheinungsjahr: 2017
Seite: 41–47
Zusammenfassung
Research collaborations provide opportunities for both practitioners and researchers: practitioners need solutions for difficult business challenges and researchers are looking for hard problems to solve and publish. Nevertheless, research collaborations carry the risk that practitioners focus on quick solutions too much and that researchers tackle theoretical problems, resulting in products which do not fulfill the project requirements.
In this paper we introduce an approach extending the ideas of agile and lean software development. It helps practitioners and researchers keep track of their common research collaboration goal: a scientifically enriched software product which fulfills the needs of the practitioner's business model.
This approach gives first-class status to application-oriented metrics that measure progress and success of a research collaboration continuously. Those metrics are derived from the collaboration requirements and help to focus on a commonly defined goal.
An appropriate tool set evaluates and visualizes those metrics with minimal effort, and all participants will be pushed to focus on their tasks with appropriate effort. Thus project status, challenges and progress are transparent to all research collaboration members at any time.
Multi-pedestrian tracking by moving Bluetooth-LE beacons and stationary receivers
Autoren: Oliver Schmidts Maik Boltes Bodo Kraft Marc Schreiber
Erschienen in: 2017 International Conference on Indoor Positioning and Indoor Navigation (IPIN), 18-21 September 2017, Sapporo, Japan
Erscheinungsjahr: 2017
Seite: 1–4
Zusammenfassung
In this paper we propose an approach for tracking multiple pedestrians with head mounted Bluetooth low energy (LE) beacons in experiments for pedestrian dynamics. To simplify the setup and decrease the costs we invert the common setup for localization with stationary installed Bluetooth beacons for tracking smartphones. Our approach leads to multiple stationary installed receivers and moving Bluetooth beacons attached to peoples’ head. Thus we develop a common architecture setup for both scenarios where the independent positioning solver remains untouched even if the scenarios differ. We use fingerprinting based on stochastic regression for locating individuals in sub areas of rooms instead of determining their exact position.
NLP Lean Programming Framework: Developing NLP Applications More Effectively
Autoren: Marc Schreiber Bodo Kraft Albert Zündorf
Publisher: Association for Computational Linguistics
Erschienen in: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations
Erscheinungsjahr: 2018
Seite: 1–5
Zusammenfassung
This paper presents NLP Lean Programming framework (NLPf), a new framework for creating custom natural language processing (NLP) models and pipelines by utilizing common software development build systems. This approach allows developers to train and integrate domain-specific NLP pipelines into their applications seamlessly. Additionally, NLPf provides an annotation tool which improves the annotation process significantly by providing a well-designed GUI and sophisticated way of using input devices. Due to NLPf’s properties developers and domain experts are able to build domain-specific NLP applications more efficiently. NLPf is Open-source software and available at https://gitlab.com/schrieveslaach/NLPf.
Continuously evaluated research projects in collaborative decoupled environments
Autoren: Oliver Schmidts Bodo Kraft Marc Schreiber Albert Zündorf
Erschienen in: 2018 ACM/IEEE 5th International Workshop on Software Engineering Research and Industrial PracticePractice
Erscheinungsjahr: 2018
Seite: 1–9
Zusammenfassung
Often, research results from collaboration projects are not transferred into productive environments even though approaches are proven to work in demonstration prototypes. These demonstration prototypes are usually too fragile and error-prone to be transferred easily into productive environments. A lot of additional work is required.
Inspired by the idea of an incremental delivery process, we introduce an architecture pattern, which combines the approach of Metrics Driven Research Collaboration with microservices for the ease of integration. It enables keeping track of project goals over the course of the collaboration while every party may focus on their expert skills: researchers may focus on complex algorithms, practitioners may focus on their business goals.
Through the simplified integration (intermediate) research results can be introduced into a productive environment which enables getting an early user feedback and allows for the early evaluation of different approaches. The practitioners' business model benefits throughout the full project duration.
Towards Effective Natural Language Application Development
Autoren: Marc Schreiber
Publisher: Universität Kassel
Erscheinungsjahr: 2019
Zusammenfassung
There is a current trend that more and more computer programs analyze written or spoken natural language. For example, DVAs, IE systems, machine translation systems, and many other types of programs process natural language in order to solve specific use cases when interacting with humans via natural language. Amazon, Google, and Mycroft AI are just some of the companies that have produced DVAs capable of interacting with humans via voice. Such NLP applications use techniques from computer science and artificial intelligence to address their use cases. Additionally, many companies have begun to evaluate the capacity of NLP applications to improve their business processes, for instance by automatically processing customer requests received via e-mail.
The development of NLP applications requires years of experience in computer science, artificial intelligence, ML, linguistics, and similar disciplines. Due to this requirement, development is exclusively available to computer science experts with many years of experience in computer science and artificial intelligence. Years of training and experience are therefore required in order to develop an NLP application capable of, for instance, automatically processing customer e-mails. However, the demand for NLP applications continues to grow, while the quantity of such computer science experts remains limited. Due to this growing demand, companies must be able to develop such applications using in-house developers without years of training or Ph. D. in computer science and artificial intelligence.
Based on this limitation, this thesis identifies the main obstacles encountered by developers without many years of experience in computer science when creating NLP applications. These obstacles are identified through a research project named ETL Quadrat, which aims at building an IE system for gathering EC data from human-readable documents. The development of the IE system is hindered by a number of obstacles.
PREvant (Preview Servant): Composing Microservices into Reviewable and Testable Applications
Autoren: Marc Schreiber
Publisher: Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik
Erschienen in: Joint Post-proceedings of the First and Second International Conference on Microservices (Microservices 2017/2019)
Erscheinungsjahr: 2020
Seite: 5:1–5:16
Zusammenfassung
This paper introduces PREvant (preview servant), a software tool which provides a simple RESTful API for deploying and composing containerized microservices as reviewable applications. PREvant’s API serves as a connector between continuous delivery pipelines of microservices and the infrastructure that hosts the applications. Based on the REST API and a web interface developers and domain experts at aixigo AG developed quality assurance workflows that help to increase and maintain high microservice quality.