Please use this identifier to cite or link to this item:
https://hdl.handle.net/10216/5255
Author(s): | Sérgio Nunes Cristina Ribeiro Gabriel David |
Title: | Using neighbors to date web documents |
Issue Date: | 2007 |
Abstract: | Time has been successfully used as a feature in web information retrieval tasks. In this context, estimating a document's inception date or last update date is a necessary task. Classic approaches have used HTTP header fields to estimate a document's last update time. The main problem with this approach is that it is applicable to a small part of web documents. In this work, we evaluate an alternative strategy based on a document's neighborhood. Using a random sample containing 10,000 URLs from the Yahoo! Directory, we study each document's links and media assets to determine its age. If we only consider isolated documents, we are able to date 52% of them. Including the document's neighborhood, we are able to estimate the date of more than 85\% of the same sample. Also, we find that estimates differ significantly according to the type of neighbors used. The most reliable estimates are based on the document's media assets, while the worst estimates are based on incoming links. These results are experimentally evaluated with a real world application using different datasets. |
Subject: | Tecnologia da informação Information technology |
URI: | https://repositorio-aberto.up.pt/handle/10216/5255 |
Source: | WIDM '07: Proceedings of the 9th annual ACM international workshop on Web information and data management |
Document Type: | Artigo em Livro de Atas de Conferência Internacional |
Rights: | restrictedAccess |
License: | https://creativecommons.org/licenses/by-nc/4.0/ |
Appears in Collections: | FEUP - Artigo em Livro de Atas de Conferência Internacional |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
54513.pdf Restricted Access | 5.24 MB | Adobe PDF | Request a copy from the Author(s) |
This item is licensed under a Creative Commons License