Please use this identifier to cite or link to this item:
Author(s): Ruy Ramos
Rui Camacho
Title: Distributed generative data mining
Issue Date: 2007
Abstract: A process of Knowledge Discovery in Databases (KDD) involving large amounts of data requires a considerable amount of computational power. The process may be done on a dedicated and expensive machinery or, for some tasks, one can use distributed computing techniques on a network of affordable machines. In either approach it is usual the user to specify the workflow of the sub-tasks composing the whole KDD process before execution starts.In this paper we propose a technique that we call Distributed Generative Data Mining. The generative feature of the technique is due to its capability of generating new sub-tasks of the Data Mining analysis process at execution time. The workflow of sub-tasks of the DM is, therefore, dynamic.To deploy the proposed technique we extended the Distributed Data Mining system HARVARD and adapted an Inductive Logic Programming system (IndLog) used in a Relational Data Ming task.As a proof-of-concept, the extended system was used to analyse an artificialdataset of a credit scoring problem with eighty million records.
Subject: Engenharia de computadores, Engenharia electrotécnica, electrónica e informática
Computer engineering, Electrical engineering, Electronic engineering, Information engineering
Scientific areas: Ciências da engenharia e tecnologias::Engenharia electrotécnica, electrónica e informática
Engineering and technology::Electrical engineering, Electronic engineering, Information engineering
Source: 7th Industrial Conference on Data Mining (ICDM 2007)
Document Type: Artigo em Livro de Atas de Conferência Internacional
Rights: openAccess
Appears in Collections:FEUP - Artigo em Livro de Atas de Conferência Internacional

Files in This Item:
File Description SizeFormat 
64312.pdfDistributed Generative Data Mining146.52 kBAdobe PDFThumbnail

This item is licensed under a Creative Commons License Creative Commons