Accepted Papers

Demo
Authors:

Andre Lourenco, Ana Priscila Alves, Carlos Carreiras, Rui Duarte, Ana Fred

Affiliation(s):
Instituto de Telecomunicacoes
CardioID - Technologies LDA

Abstract:
Monitoring of physiological signals while driving is a recenttrend in the automotive industry. CardioWheel is a state-of-the-artmachine learning solution for driver biometrics based on electrocardiographicsignals (ECG). The system pervasively acquires heart signalsfrom the hands of the user through sensors embedded in the steeringwheel, allowing the recognition of the driver's identity. The implementedpipeline combines unsupervised and supervised machine learning algorithms,and is being tested in real-world scenarios, illustrating one of thepotential uses of this technology.

Demo
Authors:

Ilaria Tiddi, Mathieu d'Aquin, Enrico Motta

Affiliation(s):
Knowledge Media Institue

Abstract:
In this paper we present the system Dedalo, whose aim is to generate explanations for data patterns using background knowledge retrieved from Linked Data. In many real-world scenarios, patterns are generally manually interpreted by the experts that have to use their own background knowledge to explain and refine them, while their workload could be relieved by exploiting the open and machine-readable knowledge existing on the Web nowadays. In the light of this, we devised an automatic system that, given some patterns and some background knowledge extracted from Linked Data, reasons upon those and creates well-structured candidate explanations for their grouping. In our demo, we show how the system provides a step towards automatising the interpretation process in KDD, by presenting scenarios in different domains, data and patterns.

Demo
Authors:

Pierre Houdyer, Albrecht Zimmerman, Mehdi Kaytoue, Plantevit Marc, Robardet Celine, Joseph Mitchell

Affiliation(s):
TAPASTREET LTD.
University Lyon 1
INSA de Lyon

Abstract:
We present 'Gazouille', a system for discoveringlocal events in geo-localized social media streams. The system isbased on three core modules: (i) social networks data acquisition onseveral urban areas, (ii) event detection through time series analysis,and (iii) a Web user interface to present events discovered in real-time in a city,associated with a gallery of social media that characterize the event.

Demo
Authors:

Bijan Ranjbar-Sahraei, Julia Efremova, Hossein Rahmani, Toon Calders, Karl Tuyls, Gerhard Weiss

Affiliation(s):
Maastricht University
University of Liverpool
Université Libre de Bruxelles

Abstract:
Entity Resolution (ER) is the task of finding references that refer to the same entity across different data sources. Cleaning a data warehouse and applying ER on it is a computationally demanding task, particularly for large data sets that change dynamically. Therefore, a query-driven approach which analyses a small subset of the entire data set and integrates the results in real-time is significantly beneficial. Here, we present an interactive tool, called HiDER, which allows for query-driven ER in large collections of uncertain dynamic historical data. The input data includes civil registers such as birth, marriage and death certificates in the form of structured data, and notarial acts such as estate tax and property transfers in the form of free text. The outputs are family networks and event timelines visualized in an integrated way. The HiDER is being used and tested at BHIC center (Brabant Historical Information Center); despite the uncertainties of the BHIC input data, the extracted entities have high certainty and are enriched by extra information.

Demo
Authors:

André Lamúrias, Luka Clarke, Francisco Couto

Affiliation(s):
LASIGE
Faculty of Sciences, University of Lisbon

Abstract:
Automatic methods are being developed and applied to transform textual biomedical information into machine-readable formats.Machine learning techniques have been a prominent approach to this problem.However, there is still a lack of systems that are easily accessible to users.For this reason, we developed a web tool to facilitate the access to our text mining framework, IICE (Identifying Interactions between Chemical Entities).This tool annotates the input text with chemical entities and identifies the interactions described between these entities.Various options are available, which can be manipulated to control the algorithms employed by the framework and to the output formats.

Demo
Authors:

Matt McVicar, Cédric Mesnage, Jefrey Lijffijt, Tijl De Bie

Affiliation(s):
University of Bristol

Abstract:
We present an exploratory data mining tool useful for finding patterns in the geographic distribution of independent UK-based music artists. Our system is interactive, highly intuitive, and entirely browser-based, meaning it can be used without any additional software installations from any device. The target audiences are artists, other music professionals, and the general public. Potential uses of our software include highlighting discrepancies in supply and demand of specific music genres in different parts of the country, and identifying at a glance which areas have the highest densities of independent music artists.

Demo
Authors:

Ujval Kamath, Ana Costa e Silva, Michael O'Connell

Affiliation(s):
TIBCO Software Inc.

Abstract:
Within the context of Remote Equipment Monitoring in the area of an upstream oil and gas process, the goal of this project was to improve the efficiency of ESP (Electric Submersible Pump) oil & gas production, by predicting (rather than just reacting to) ESP shutdown and failure and thus avoiding downtime which results in a loss of production as well as repair costs.A methodology and solution for real-time monitoring of production equipment in remote locations is presented. The solution is developed on sensor data, transmitted from equipment to field information systems.The solution uses a combination of Tibco Spotfire, Streambase, and TERR (Tibco Enterprise Run-Time for R), and performs remarkably well, identifying a variety of anomalous equipment behaviour states, and preventing multiple shutdowns and pump failures, with false positive rates close to zero.

Demo
Authors:

Matija Piškorec, Borut Sluban, Tomislav Šmuc

Affiliation(s):
Ruđer Bošković Institute
Jozef Stefan Institute

Abstract:
This paper presents MultiNets: a Javascript library for multilayer network visualization. MultiNets provides reusable HTML components with functions for loading, manipulation and visualization of multilayered networks. These components can be easily incorporated into any web page, and they allow users to perform exploratory analysis of multilayer networks and prepare publication quality network visualizations. MultiNets components are easily extendable to provide custom-based visualizations, such as embedding networks on geographical maps, and can be used for building complex web-based graphical user interfaces for data mining services that operate on multilayered networks and multirelational data in general.

Demo
Authors:

Tao Jiang, Zhanhuai Li, Qun Chen, Zhong Wang, Kaiwen Li, Wei Pan

Affiliation(s):
Northwestern Polytechnical University, China

Abstract:
Order-Preserving SubMatrix (OPSM) has been accepted as a significant tool in modelling biologically meaningful subspace cluster, to discover the general tendency of gene expressions across a subset of conditions. Existing OPSM processing tools focus on giving a or some batch mining techniques, and are time-consuming and do not consider to support OPSM queries. To address the problems, the paper presents and implements a prototype system for OPSM queries, which is called OMEGA (Order-preserving subMatrix mining, indExinG and seArch tool for biologists). It uses Butterfly Network based BSP model to mine OPSMs in parallel. Further, it builds index based on prefix-tree associated with two header tables for gene expression data or OPSM mining results. Then, it processes exact and fuzzy queries based on keywords. Meanwhile, the vital query results are saved for later use. It is demonstrated that OMEGA can improve the effectiveness of OPSM batch mining and queries.

Demo
Authors:

David Tolpin, Jan-Willem van de Meent, Frank Wood

Affiliation(s):
University of Oxford

Abstract:
Anglican is a probabilistic programming system designed to interoperate with Clojure and other JVM languages. We describe the implementation of Anglican and illustrate how its design facilitates both explorative and industrial use of probabilistic programming.

Demo
Authors:

Anton Dries, Angelika Kimmig, Wannes Meert, Joris Renkens, Guy Van den Broeck, Jonas Vlasselaer, Luc De Raedt

Affiliation(s):
Katholieke Universiteit Leuven

Abstract:
We present ProbLog2, the state of the art implementation of theprobabilistic programming language ProbLog. The ProbLog language allows the user tointuitively build programs that do not only encode complexinteractions between a large sets of heterogenous components but alsothe inherent uncertainties that are present in real-lifesituations. The system provides efficient algorithms for querying suchmodels as well as for learning their parameters from data. It isavailable as an online tool on the web and for download. The offlineversion offers both command line access to inference and learning anda Python library for building statistical relational learningapplications from the system's components.

Demo
Authors:

Natalia Andrienko, Gennady Andrienko, Georg Fuchs, Salvatore Rinzivillo, Hans-Dieter betz

Affiliation(s):
Fraunhofer Institute IAIS
CNR ISTI
nowcast GmbH

Abstract:
We demonstrate a system of tools for real-time detection of signifi-cant clusters of spatial events and observing their evolution. The tools include an incremental stream clustering algorithm, interactive techniques for control-ling its operation, a dynamic map display showing the current situation, and displays for investigating the cluster evolution (time line and space-time cube).

Demo
Authors:

Michele Berlingerio, Stefano Braghin, Francesco Calabrese, Cody Dunne, Yiannis Gkoufas, Mauro Martino, Jamie Rasmussen, Steven Ross

Affiliation(s):
IBM Ireland Research Lab
IBM Software Group Watson

Abstract:
We present S&P360, a system to analyze and explore the impact offinancial news regarding a set of companies, by means ofmultidimensional network analysis. The system is based on ABACUS, aninnovative algorithm for community detection in multidimensionalnetworks, grouping together nodes belonging to the same communities indifferent dimensions. The system is enriched by an interactiveinterface for visualization and exploration of the results, allowingthe user to place visual queries by specifying companies, timeinterval, industry, or multidimensional connections. We demonstratethe usefulness of S&P360 on data obtained from Wikipedia, Twitter,and the New York Times, regarding the 500 companies in the Standard &Poor's index.

Demo
Authors:

Andrey Tyukin, Kramer Stefan, Jörg Wicker

Affiliation(s):
Johannes Gutenberg University Mainz

Abstract:
Machine Learning methods and algorithms are often highly modular in the sense that they rely on a large number of subalgorithms that are in principle interchangeable. For example, it is often possible to use various kinds of pre- and post-processing and various base classifiers or regressors as components of the same modular approach. We propose a framework, called Scavenger, that allows evaluating whole families of conceptually similar algorithms efficiently. The algorithms are represented as compositions, couplings and products of atomic subalgorithms. This allows partial results to be cached and shared between different instances of a modular algorithm, so that potentially expensive partial results need not be recomputed multiple times. Furthermore, our framework deals with issues of the parallel execution, load balancing, and with the backup of partial results for the case of implementation or runtime errors. Scavenger is licensed under the GPLv3 and can be downloaded freely at https://github.com/jorro/scavenger.

Demo
Authors:

Roland Assam, Simon Feiden, Seidl Thomas

Affiliation(s):
RWTH Aachen University

Abstract:
Massive amounts of geo-social data is generated daily.In this paper, we propose UrbanHubble, a location-based predictive analytics tool that entails a broad range ofstate-of-the-art location prediction and recommendation algorithms.Besides, UrbanHubble consists of a visualization component that depicts the real-time complex interactions of users on a map, the evolution of friendships over time, and how friendship triggers mobility.

Demo
Authors:

van Leeuwen Matthijs, Lara Cardinaels

Affiliation(s):
KU Leuven

Abstract:
We present VIPER, for Visual Pattern Explorer, an innovative, browser-based system for interactive pattern exploration, assisted by visualisation, recommendation, and algorithmic search. The target audience consists of domain experts who have access to data but not to --potentially expensive-- data mining experts. The goal of the system is to enable the target audience to discover interesting patterns from data, with a focus on subgroup discovery but also facilitating frequent itemset mining. In other words, our goal is to develop an accessible system that allows for true exploratory data mining.

Demo
Authors:

Gennady Andrienko, Natalia Andrienko

Affiliation(s):
Fraunhofer Institute IAIS

Abstract:
We demonstrate interactive visual embedding of partition-based clustering of multidimensional data using methods from the open-source machine learning library Weka. According to the visual analytics paradigm, knowledge is gradually built and refined by a human analyst through iterative application of clustering with different parameter settings and to different data subsets. To show clustering results to the analyst, cluster membership is typically represented by color coding. Our tools support the color consistency between different steps of the process. We shall demonstrate two-way clustering of spatial time series, in which clustering will be applied to places and to time steps.