Accepted Papers

Nectar Track
Authors:

Giorgio Corani, Alessio Benavoli, Francesca Mangili, Marco Zaffalon

Affiliation(s):
IDSIA

Abstract:
Most hypothesis testing in machine learning is done using the frequentist null-hypothesis significance test, which has severe drawbacks. We review recent Bayesian tests which overcome the drawbacks of the frequentist ones.

Nectar Track
Authors:

Harald Bosch, Robert Krüger, Dennis Thom

Affiliation(s):
Institut für Visualisierung und Interaktive Systeme (VIS)

Abstract:
Geolocated social media data streams are challenging data sources due to volume, velocity, variety, and unorthodox vocabulary. However, they also are an unrivaled source of eye-witness accounts to establish remote situational awareness. In this paper we summarize some of our approaches to separate relevant information from irrelevant chatter using unsupervised and supervised methods alike. This allows the structuring of requested information as well as the incorporation of unexpected events into a common overview of the situation. A special focus is put on the interplay of algorithms, visualization, and interaction.

Nectar Track
Authors:

Mathis Boerner, Tim Ruhe, Katharina Morik, Wolfgang Rhode

Affiliation(s):
TU Dortmund University

Abstract:
Astrophysical experiments produce Big Data which need efficient and effective data analytics. In this paper we present a general data analysis process which has been successfully applied to data from the IceCube, a cubic-kilometer large neutrino detector located at the geographic South Pole. The goal of the analysis is to separate neutrinos from the background within the data to determine the muon neutrino energy spectrum. The presented process covers straight cuts, feature selection, classification, and unfolding. A major challenge in the separation is the unbalanced dataset. The expected signal to background ratio was worse than 1:1000 and, moreover, any surviving background would hinder further analysis of the data. The overall process was embedded in a multi-fold cross-validation to control its performance. A following regularized unfolding yields the sought-after energy spectrum.

Nectar Track
Authors:
Markus Schedl, Johannes Kepler University

Abstract:
Music recommender systems are lately seeing a sharp increase in popularity due to many novel commercial music streaming services. Most systems, however, do not decently take their listeners into account when recommending music items. In this note, we summarize our recent work and report our latest findings on the topics of tailoring music recommendations to individual listeners and to groups of listeners sharing certain characteristics. We focus on two tasks: context-aware automatic playlist generation (also known as serial recommendation) using sensor data and music artist recommendation using social media data.

Nectar Track
Authors:

Ferilli Stefano, Esposito Floriana, Domenico Redavid

Affiliation(s):
Univesity of Bari Aldo Moro

Abstract:
Manually building process models is complex, costly and error-prone. Hence, the interest in process mining. Incremental adaptation of the models, and the ability to express/learn complex conditions on the involved tasks, are also desirable. First-order logic provides a single comprehensive and powerful framework for supporting all of the above. This paper presents a First-Order Logic incremental method for inferring process models. Its efficiency and effectiveness were proved with both controlled experiments and a real-world dataset.

Nectar Track
Authors:
Berlingerio Michele, IBM Research Ireland
Veli Bicer, IBM Research Ireland
Adi Botea, IBM Research Ireland
Stefano Braghin, IBM Research Ireland
Nuno Lopes, IBM Research Ireland
Riccardo Guidotti, KDD LAB - Univ. Pisa & ISTI-CNR
Francesca Pratesi, KDD LAB - Univ. Pisa

Abstract:
Smart Cities applications are fostering research in many fields including Computer Science and Engineering. Data Mining is used to support applications such as optimization of a public urban transit network, carpooling, event detection [2], and many more. Along these lines, the aim of the PErsonal TRansport Advisor (PETRA) EU FP7 project is to develop an integrated platform to supply urban travelers with smart journey and activity advises, on a multi-modal network, while taking into account uncertainty. Uncertainty is intrinsic in a transit network, and may come in different forms: delays in time of arrivals, impossibility to board a (full) bus, walking speed, as well as incidents, weather conditions, and so on. In this paper, we briefly describe the architecture of the PETRA platform, and present the results obtained by the embedded journey planner on thousands of planning requests, performed with and without the results coming from the Mobility Mining module. We show how, by integrating private transport routines into a public transit network, it is possible to devise better advises, measured both in terms of number of requests satisfied, and in terms of expected time of arrivals. These experiments are part of the validation for the PETRA use case on Rome, where we assess the quality of the advises coming from the innovative integrated platform.

Nectar Track
Authors:
Mehdi Kaytoue, INSA Lyon
Aleksey Buzmakov, INRIA
Victor Codocedo, INSA Lyon
Jaume Baixeries,
Kuznetsov Sergei, NRU HSE Moscow
Amedeo Napol, Inria Nancy Grand Est / LORIA

Abstract:
This article aims at presenting recent advances in Formal Concept Analysis (2010-2015), especially when the question is dealing with complex data (numbers, graphs, sequences, etc.) in domains such as database (functional dependencies), data-mining (local pattern discovery), information retrieval and information fusion. As these advances are mainly published in artificial intelligence and FCA dedicated venues, a dissemination towards data mining and machine learning is worthwhile.

Nectar Track
Authors:
Alexandros Karakasidis, Hellenic Open University
Georgia Koloniari, University of Macedonia
Verykios Vassilios, Hellenic Open University

Abstract:
Record linkage refers to integrating data from heterogeneous sources to identify information regarding the same entity and provides the basis for sophisticated data mining. When privacy restrictions apply, the data sources may only have access to the merged records of the linkage process, comprising the problem of privacy preserving record linkage. As data is often dirty, and there are no common unique identifiers, the linkage process requires approximate matching and it renders to a very resource demanding task especially for large volumes of data. To speed up the linkage process, privacy preserving blocking and meta-blocking techniques are deployed. Such techniques derive groups of records that are more likely to match with each other. In this nectar paper, we summarize our contributions to privacy preserving blocking and meta-blocking.

Nectar Track
Authors:
Qian Zhang, Northeastern University
Corrado Gioannini, ISI Foundation
Daniela Paolotti, ISI Foundation
Nicola Perra, Northeastern University
Daniela Perrotta, ISI Foundation
Marco Quaggiotto, ISI Foundation
Michele Tizzoni, ISI Foundation
Alessandro Vespignani, Northeastern University

Abstract:
FluOutlook is an online platform where multiple data sources are integrated to initialize and train a portfolio of epidemic models for influenza forecast. During the 2014/15 season, the system has been used to provide real-time forecasts for 7 countries in North America and Europe.

Nectar Track
Authors:

Ricardo Vilalta, Kinjal Dhar Gupta, Ashish Mahabal

Affiliation(s):
University of Houston

Abstract:
Astroinformatics is an interdisciplinary field of science that applies modern computational tools to the solution of astronomical problems. One relevant subarea is the use of machine learning for analysis of large astronomical repositories and surveys. In this paper we describe a case study based on the classification of variable Cepheid stars using domain adaptation techniques; our study highlights some of the emerging challenges posed by astroinformatics.

Nectar Track
Authors:
Yuxiao Dong, University of Notre Dame
Nitesh Chawla, University of Notre Dame
Jie Tang, Tsinghua University
Yang Yang, University of Notre Dame

Abstract:
In this work, we unveil the evolution of social relationships across the lifespan. This evolution reflects the dynamic social strategies that people use to fulfill their social needs. For this work we utilize a large mobile network complete with user demographic information. We find that while younger individuals are active in broadening their social relationships, seniors tend to keep small but closed social circles. We further demonstrate that opposite-gender interactions between two young individuals are much more frequent than those between young same-gender people, while the situation is reversed after around 35 years old. We also discover that while same-gender triadic social relationships are persistently maintained over a lifetime, the opposite-gender triadic circles are unstable upon entering into middle-age. Finally we demonstrate a greater than 80% potential predictability for inferring users' gender and a 73% predictability for age from mobile communication behaviors.

Nectar Track
Authors:

Wouter Duivesteijn, Julia Thaele

Affiliation(s):
TU Dortmund University

Abstract:
FACT, the First G-APD Cherenkov Telescope, detects air showers induced by high-energetic cosmic particles. It is desirable to classify a shower as being induced by a gamma ray or a background particle. Generally, it is nontrivial to get any feedback on the real-life training task, but we can attempt to understand how our classifier works by investigating its performance on Monte Carlo simulated data. To this end, in this paper we develop the SCaPE (Soft Classifier Performance Evaluation) model class for Exceptional Model Mining, which is a Local Pattern Mining framework devoted to highlighting unusual interplay between multiple targets. The SCaPE model class highlights subspaces of the search space where the classifier performs particularly well or poorly. These subspaces arrive in terms of conditions on attributes of the data, hence they come in a language a human understands, which should help us understand where our classifier does (not) work.

Nectar Track
Authors:
Natalia Andrienko, Fraunhofer Institute IAIS
Gennady Andrienko, Fraunhofer Institute IAIS
Georg Fuchs, Fraunhofer Institute IAIS
Piotr Jankwski, San Diego State University

Abstract:
People using mobile devices for making phone calls, accessing the internet, or posting georeferenced contents in social media create episodic digi-tal traces of their presence in various places. Availability of personal traces over a long time period makes it possible to detect repeatedly visited places and identify them as home, work, place of social activities, etc. based on temporal patterns of the person’s presence. Such analysis, however, can compromise per-sonal privacy. We propose a visual analytics approach to semantic analysis of mobility data in which traces of a large number of people are processed simul-taneously without accessing individual-level data. After extracting personal places and identifying their meanings in this privacy-respectful manner, the original georeferenced data are transformed to trajectories in an abstract seman-tic space. The semantically abstracted data can be further analyzed without the risk of re-identifying people based on the specific places they attend.

Nectar Track
Authors:

Yuxiao Dong, Reid Johnson, Nitesh Chawla

Affiliation(s):
University of Notre Dame

Abstract:
A widely used measure of scientific impact is citations. However, due to their power-law distribution, citations are fundamentally difficult to predict. Instead, to characterize scientific impact, we address two analogous questions asked by many scientific researchers: ``How will my h-index evolve over time, and which of my previously or newly published papers will contribute to it?'' To answer these questions, we perform two related tasks. First, we develop a model to predict authors' future h-indices based on their current scientific impact. Second, we examine the factors that drive papers---either previously or newly published---to increase their authors' predicted future h-indices. By leveraging relevant factors, we can predict an author's h-index in five years with an R2 value of 0.92 and whether a previously (newly) published paper will contribute to this future h-index with an F1-score of 0.99 (0.77). We find that topical authority and publication venue are crucial to these effective predictions, while topic popularity is surprisingly inconsequential. Further, we develop an online tool that allows users to generate informed h-index predictions. Our work demonstrates the predictability of scientific impact, and can help researchers to effectively leverage their scholarly position of ``standing on the shoulders of giants.''