User Tools

Site Tools


start

This is an old revision of the document!


Semantic data mining tutorial @ ECML/PKDD'2011

Introduction

The term semantic data mining denotes a data mining approach where domain ontologies are used as background knowledge. Such approach is motivated by large amounts of data that are increasingly becoming openly available and described using real-life ontologies represented in Semantic Web languages, arguably most extensively in the domain of biology. This recently opened up the possibility for interesting large-scale and real-world semantic applications.

The availability of semantically annotated data poses requirements for new kinds of approaches for data mining that would be able to deal with the complexity, and expressivity of the semantic representation languages, leverage on availability of ontologies and explicit semantics of the described resources, and account for novel assumptions (e.g., open world) that underlie reasoning services exploiting ontologies.

The tutorial will address the above issues, focusing on the problems of how machine learning techniques can work directly on the richly structured Semantic Web data, exploit ontologies, and the Semantic Web technologies, what is the value added of machine learning methods exploiting ontologies, and what are the challenges for developers of semantic data mining methods. It will also contain demonstrations of tools supporting semantic data mining.

Outline

The tutorial will present the topic of semantic data mining from three complementary perspectives.

Firstly, it will present a general framework for semantic data mining, following the work [NVTL09]. The first part of the tutorial will also discuss a new method for semantic subgroup discovery: g-SEGS. It will be accompanied with a presentation of the developed tool, a part of Orange4WS environment.

The second part of tutorial will cover the topic of learning from description logics (DL-learning), motivated by the fact that the standard Web ontology language, OWL, is theoretically based on description logics. This will include a demo of a tool supporting DL-learning (a plugin to the Rapid Miner system).

Finally, the third part of the tutorial will cover the topic of semantic meta-mining. This approach has three features that distinguish it from its predecessors. First, more than in previous work, it adopts a process-oriented approach where meta-learning is applied to support design choices at different stages of the complete data mining process or workflow. Second, it complements dataset descriptions with an in-depth analysis and characterization of algorithms—their underlying assumptions, optimization goals and strategies, the models and patterns they generate. Finally, it relies on a data mining ontology which distills extensive background knowledge concerning knowledge discovery itself.

A more detailed outline is presented below:

Part I Introduction to semantic data mining (presenters: Nada Lavrac, Anze Vavpetic)

  • Framework for semantic data mining
  • Semantic subgroup discovery
  • Presentation of developed tool: g-SEGS

Part II Learning from description logics (presenters: Agnieszka Lawrynowicz, Jedrzej Potoniec)

  • Refinement operators for DL-learning
  • Concept learning
  • Frequent pattern mining in DLs
  • Similarity-based learning (e.g. kernel methods, clustering)
  • An overview of example tasks (e.g. ontology evolution, semantic query results aggregation)
  • Presentation of developed tool for DL-learning

Part III Semantic meta-mining (presenters: Melanie Hilario, Alexandros Kalousis)

  • Meta-mining problem definition
  • Goals and applications of data mining ontologies
  • Representing data mining tasks, algorithms, and workflows in a DM ontology
  • Ontology-based pattern extraction from data mining workflows
  • Kernel based approaches
  • Combining dataset, algorithm and workflow descriptors in meta-mining

Target audience

The target audience of the tutorial includes:

  • Researchers in machine learning and data mining with interest in the Semantic Web technologies/ontologies
  • Researchers interested in meta-mining
  • Researchers interested in relational data mining/inductive logic programming
  • Developers of data mining applications that would like to exploit Semantic Web technologies/ontologies to solve data mining and/or machine learning tasks

The tutorial does not require additional prior knowledge from average ECML PKDD 2011 participant.

Information about the presenters

Prof. Nada Lavrac is Head of the Department of Knowledge Technologies (since 2004), was Head of Intelligent Data Analysis and Computational Linguistics research group (in 1999- 2003) at the Department of Intelligent Systems, and researcher of Jožef Stefan Institute, Ljubljana, Slovenia (since 1978). She is Full Professor at University of Nova Gorica and Deputy Head of Information and Communication Technologies Program at Jozef Stefan International Postgraduate School. She was visiting professor at Bristol University, UK (1997-2002, teaching parts of courses Introduction to Machine Learning and Learning from Structured Data) and at Klagenfurt University, Austria (1987-2002, teaching courses on Knowledge Acquisition, Data Mining and Decision Support). In 1984 she was in a group of researchers who were awarded a national prize for research excellence, in 1997 she was awarded the Ambassador of Science of Slovenia prize, and in 2007 she has been elected ECCAI Fellow. Her main research interest is machine learning and data mining, in particular inductive logic programming and intelligent data analysis in medicine. She is coauthor of KARDIO: A Study in Deep and Qualitative Knowledge for Expert Systems, The MIT Press 1989, and Inductive Logic Programming: Techniques and Applications, Ellis Horwood 1994, and co-editor of Relational Data Mining, Springer 2001, Intelligent Data Analysis in Medicine and Pharmacology, Kluwer 1997. She was founder the Solomon European Network and acted as co-coordinator of the EU 5th Framework project Data Mining and Decision Support for Business Competitiveness: A European Virtual Enterprise (Sol-Eu-Net, IST-1999-11495, 2000-2003). She was coordinator of the European Scientific Network in Inductive Logic Programming ILPNET (1993-1996). She is member of editorial boards of Artificial Intelligence in Medicine AI Communications New Generation Computing Applied AI Machine Learning Journal and Data Mining and Knowledge Discovery. She was vicepresident of ECCAI (1996-98), and is member of the International Machine Learning Society board (IMLS, since 2001), and Artificial Intelligence in Medicine board (AIME, since 1999).

Anže Vavpetič is currently working at the Jožef Stefan Institute at the Department of Knowledge Technologies as an undergraduate student of the Faculty of Computer and Information Science, University of Ljubljana, working on his BSc thesis. He is interested in various topics of data mining and machine learning like relational data mining, ILP and subgroup discovery.

Agnieszka Ławrynowicz is Assistant Professor at the Institute of Computing Science at Poznan University of Technology where she also did her Ph.D. on the topic of the tutorial (with distinction). Before joining academia, she worked in industry (Empolis, Bridgestone). She also holds French-Polish DESS Certificate of Ability to Manage Companies (Poznan University of Economics & University of Rennes 1). She had an EU Marie-Curie fellowship within the PERSONET project on personalization, and Web mining at the University of Ulster. Her research interests include data mining involving Semantic Web languages, and ontology engineering. She has nearly 10 years of experience in academic teaching including the subjects of computational logics, logic programming, artificial intelligence, Web technologies, business process modeling, and recently co-authored lectures on Semantic technologies and Social Networking. She has initiated and co-organized a series of international workshops on Inductive Reasoning and Machine Learning from the Semantic Web (IRMLeS) co-located with the major European Semantic Web conference (ESWC'2009-2011), and is a co-chair of ESWC'2011 track on Inductive and Probabilistic Approaches for the Semantic Web.

Melanie Hilario holds a Ph.D. in computer science from the University of Paris VI and currently works at the University of Geneva's Artificial Intelligence Laboratory. She has initiated several European research projects on neuro-symbolic integration, meta-learning, and biological text mining. She is the scientific coordinator of the ongoing EU project e-LICO, whose goal is to build a virtual data mining lab around a planner-based discovery assistant that self-improves through ontology-based meta-mining. She has served on the program committees of conferences and workshops in machine learning and data mining. She is currently an Associate Editor of the International Journal on Artificial Intelligence Tools and a member of the Editorial Board of the Intelligent Data Analysis journal.

Alexandros Kalousis did his PhD thesis on meta-learning for classification algorithm selection at the University of Geneva, where he continues as a senior researcher. He has published widely in the area of machine learning and knowledge discovery, in particular on meta-learning, data preprocessing, feature extraction, model selection and evaluation, and metric and kernel learning for complex structures. Currently his research interests include mining over learned models, meta-mining, and metric and kernel learning. He has served on the program committees of different data mining and machine learning conference, such as ICML, KDD, ECML/PKDD, and participated on a number of European and Swiss research projects.

start.1303381551.txt.gz · Last modified: 2015/10/23 17:16 (external edit)