Please use this identifier to cite or link to this item: http://hdl.handle.net/10216/73310
Author(s): Ruy Ramos
Rui Camacho
Pedro Souto
Title: A commodity platform for Distributed Data Mining - the HARVARD System
Issue Date: 2006
Abstract: Systems performing Data Mining analysis are usually dedicated and expensive. They often require special purpose machines to run the data analysis tool. In this paper we propose an architecture for distributed Data Mining running on general purpose desktop computers. The proposed architecture was deployed in the HARVesting Architecture of idle machines foR Data mining (HARVARD) system.The Harvard system has the following features. Does not require specialpurpose or expensive machines as it runs in general purpose PCs. It isbased on distributed computing using a set of PCs connected in a network. In a Condor fashion it takes advantage of a distributed setting of available and idle computational resources and is adequate for problems that may be decomposed into coarse grain subtasks. The system includes a dynamic updating of the computational resources. It is written in Java and therefore runs on several dierent platforms that include Linux and Windows. It has fault-tolerant features that make it quite reliable. It may use a wide variety of data analysis tools without modication since it is independent of the data analysis tool. It uses a easy but powerful task specication and control language.The HARVARD system was deployed using two data analysis tools. ADecision tree tool called C4.5 and an Inductive Logic Programming (ILP)tool.
Subject: Engenharia de computadores, Engenharia electrotécnica, electrónica e informática
URI: http://hdl.handle.net/10216/73310
Source: 6th Industrial Conference on Data Mining (ICDM 2006)
Document Type: Artigo em Livro de Atas de Conferência Internacional
Rights: openAccess
License: https://creativecommons.org/licenses/by-nc/4.0/
Appears in Collections:FEUP - Artigo em Livro de Atas de Conferência Internacional

Files in This Item:
File Description SizeFormat 
64802.pdfA commodity platform for Distributed Data Mining -- the HARVARD System205.59 kBAdobe PDFThumbnail
View/Open


This item is licensed under a Creative Commons License Creative Commons