Please use this identifier to cite or link to this item: https://hdl.handle.net/10216/70210
Author(s): Sérgio Nunes
Cristina Ribeiro
Gabriel David
Title: Term frequency dynamics in collaborative articles
Issue Date: 2010
Abstract: Documents on the World Wide Web are dynamic entities. Mainstream information retrieval systems and techniques are primarily focused on the latest version a document, generally ignoring its evolution over time. In this work, we study the term frequency dynamics in web documents over their lifespan. We use the Wikipedia as a document collection because it is a broad and public resource and, more important, because it provides access to the complete revision history of each document. We investigate the progression of similarity values over two projection variables, namely revision order and revision date. Based on this investigation we find that term frequency in encyclopedic documents - i.e. comprehensive and focused on a single topic - exhibits a rapid and steady progression towards the document's current version. The content in early versions quickly becomes very similar to the present version of the document.
Subject: Engenharia electrotécnica, electrónica e informática
Electrical engineering, Electronic engineering, Information engineering
Scientific areas: Ciências da engenharia e tecnologias::Engenharia electrotécnica, electrónica e informática
Engineering and technology::Electrical engineering, Electronic engineering, Information engineering
URI: https://hdl.handle.net/10216/70210
Source: DOCENG2010: PROCEEDINGS OF THE 2010 ACM SYMPOSIUM ON DOCUMENT ENGINEERING
Document Type: Artigo em Livro de Atas de Conferência Internacional
Rights: openAccess
License: https://creativecommons.org/licenses/by-nc/4.0/
Appears in Collections:FEUP - Artigo em Livro de Atas de Conferência Internacional

Files in This Item:
File Description SizeFormat 
60248.pdf362.03 kBAdobe PDFThumbnail
View/Open


This item is licensed under a Creative Commons License Creative Commons