This is an old revision of the document!
The term semantic data mining denotes a data mining approach where domain ontologies are used as background knowledge. Such approach is motivated by large amounts of data that are increasingly becoming openly available and described using real-life ontologies represented in Semantic Web languages, arguably most extensively in the domain of biology. This recently opened up the possibility for interesting large-scale and real-world semantic applications.
The availability of semantically annotated data poses requirements for new kinds of approaches for data mining that would be able to deal with the complexity, and expressivity of the semantic representation languages, leverage on availability of ontologies and explicit semantics of the described resources, and account for novel assumptions (e.g., open world) that underlie reasoning services exploiting ontologies.
The tutorial will address the above issues, focusing on the problems of how machine learning techniques can work directly on the richly structured Semantic Web data, exploit ontologies, and the Semantic Web technologies, what is the value added of machine learning methods exploiting ontologies, and what are the challenges for developers of semantic data mining methods. It will also contain demonstrations of tools supporting semantic data mining.
The tutorial will present the topic of semantic data mining from three complementary perspectives.
Firstly, it will present a general framework for semantic data mining, following the work [NVTL09]. The first part of the tutorial will also discuss a new method for semantic subgroup discovery: g-SEGS. It will be accompanied with a presentation of the developed tool, a part of Orange4WS environment.
The second part of tutorial will cover the topic of learning from description logics (DL-learning), motivated by the fact that the standard Web ontology language, OWL, is theoretically based on description logics. This will include a demo of a tool supporting DL-learning (a plugin to the Rapid Miner system).
Finally, the third part of the tutorial will cover the topic of semantic meta-mining. This approach has three features that distinguish it from its predecessors. First, more than in previous work, it adopts a process-oriented approach where meta-learning is applied to support design choices at different stages of the complete data mining process or workflow. Second, it complements dataset descriptions with an in-depth analysis and characterization of algorithms—their underlying assumptions, optimization goals and strategies, the models and patterns they generate. Finally, it relies on a data mining ontology which distills extensive background knowledge concerning knowledge discovery itself.
A more detailed outline is presented below:
Part I Introduction to semantic data mining (presenters: Nada Lavrac, Anze Vavpetic) * Framework for semantic data mining * Semantic subgroup discovery * Presentation of developed tool: g-SEGS
Part II Learning from description logics (presenters: Agnieszka Lawrynowicz, Jedrzej Potoniec) * Refinement operators for DL-learning * Concept learning * Similarity-based learning (e.g. kernel methods, clustering) * Frequent pattern mining in DLs * An overview of example tasks (e.g. ontology evolution, semantic query results aggregation) * Presentation of developed tool for DL-learning
Part III Semantic meta-mining (presenters: Melanie Hilario, Alexandros Kalousis) * Meta-mining problem definition * Goals and applications of data mining ontologies * Representing data mining tasks, algorithms, and workflows in a DM ontology * Ontology-based pattern extraction from data mining workflows * Kernel based approaches * Combining dataset, algorithm and workflow descriptors in meta-mining