Never Ending Language metaLearning: model management for CMU's ReadTheWeb project

Tiago Miguel Martins Vieira

doi:10.34626/aghe-df40

Please use this identifier to cite or link to this item: https://hdl.handle.net/10216/110259

Author(s):	Tiago Miguel Martins Vieira
Title:	Never Ending Language metaLearning: model management for CMU's ReadTheWeb project
Issue Date:	2015-07-20
Abstract:	The main goal of CMU's ReadTheWeb project is to build a new kind of machine learning system that continuously reads the web, 24 hours per day, 7 days per week. This system is called the Never-Ending Language Learner (NELL) . While this goal is not necessarily unheard-of, NELL stands out as being capable of improving the way it learns over time, that is to say, it learns to read the web better than it did the day before. To succeed in such an arduous quest, NELL combines several subsystem components that implement complementary knowledge extraction methods. For the same task, NELL is able to use different extraction methods. The performance of the components that use such methods, that is the quality of the extracted knowledge, will however change over time. In order to maximize the performance of the system as a whole, it becomes necessary to choose the best component for a task at any given time. Due to the amount of data and algorithm's involved, traditional testing and selection methods are not a viable option. A preliminary approach to use metalearning to address this issue was proposed by Santos . In this project, we extend this work. Our approach seeks to relate the innate (meta)features of the data and the performance of algorithms. A first step will be to gather different sets of data (used in NELL) and test the performance of the above mentioned subsystem components on such data. The results are used to create a meta-learning system that can select the best algorithm for future sets of data. Proven successful, this system can then be implemented on NELL's framework to improve its learning capability.
Description:	O principal objetivo do projeto ReadTheWeb da CMU é construir um novo tipo de sistema de aprendizagem que lê a web continuamente, 24 horas por dia, 7 dias por semana. Este sistema é chamado de "Never-Ending Language Learner" (NELL) . Embora este objetivo não seja necessariamente novo, a NELL destaca-se como sendo capaz de melhorar a forma como aprende ao longo do tempo, o que equivale a dizer que lê a web melhor hoje do que leu no dia anterior. Para ser bem sucedido nesta árdua tarefa, a NELL combina vários componentes de subsistema que implementam métodos de extração de conhecimento complementares. Para uma mesma tarefa, a NELL é capaz de usar diferentes métodos de extração. A performance dos componentes que usam tais métodos, isto é a qualidade do conhecimento extraído, irá variar ao longo do tempo. De forma a maximizar a performance do sistema como um todo, torna-se necessário, em qualquer momento, escolher o melhor componente para cada tarefa. Devido à grande quantidade de informação e algoritmos envolvidos no processo, métodos tradicionais de teste e seleção não são viáveis. Uma abordagem preliminar usando meta-aprendizagem para combater este problema foi já proposta por Santos. Este projeto propõe-se a estender esse trabalho. A nossa abordagem pretende relacionar as (meta)características dos dados e a performance dos algoritmos. Um primeiro passo passará por recolher diferentes conjuntos de dados (usados na NELL) e testar a performance dos acima mencionados componentes de subsistema nesses mesmos dados. Os resultados obtidos serão usados para criar um sistema de meta-aprendizagem capaz de selecionar o melhor algoritmo para futuros conjuntos de dados. Se bem sucedido, este sistema poderá então ser implementado no sistema da NELL para melhorar as suas capacidades de aprendizagem.
Subject:	Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering
Scientific areas:	Ciências da engenharia e tecnologias::Engenharia electrotécnica, electrónica e informática Engineering and technology::Electrical engineering, Electronic engineering, Information engineering
DOI:	10.34626/aghe-df40
TID identifier:	201322269
URI:	https://hdl.handle.net/10216/110259
Document Type:	Dissertação
Rights:	openAccess
Appears in Collections:	FEUP - Dissertação

Files in This Item:

File	Description	Size	Format
35414.pdf	Never Ending Language Metalearning: model management for CMU's ReadTheWeb project	4.09 MB	Adobe PDF	View/Open

Show full item record Recommend this item Display Statistics