Impacto
Downloads
Downloads per month over past year
Pérez Sechi, Carlos Ignacio (2021) Leveraging entities knowledge to bypass the Cold-Start recommender problem on Microsoft News Dataset. [Trabajo Fin de Máster]
Preview |
PDF
Creative Commons Attribution. 4MB |
Abstract
Online news has been a hot topic since 2002 when the New York Times published its news RSS feed (Doree, 2007). At this time, users subscribed to the feed using their Netscape Navigator browser and received a daily update of the titles published in the newspaper. This feed was a service with no cost for the user and no monetary income for the journal. Nowadays, user engagement to services is one of the most profitable features for information companies, so they pay more attention to what they show to the users.
To keep the user engaged, news aggregators offer their clients relevant information based on their interests. In the early days, companies asked the user the definitive source of the information he wanted to read to gather the user's preferences. Later, the aggregators ask the user about the kind and features of the information of interest. Still, most current systems do not ask for explicit information from the users but model their behavior from their navigation history.
Recommendations arise on top of user's interest models, matching interests and news features. The intersection of both was first done by exact classification match and become fuzzier every moment till now where we have stickiness probabilities. Users and news featuring have walked a long path from its manual classification to the machine learning classification techniques used here, improving the user's recommendations. This study dives into the Microsoft News Dataset (Wu et al., 2020), analyzing the users' behavior in it. The main objective was to estimate the click gesture given the news clicked by users previously. This prediction is helpful for any news aggregator portal to show the most relevant news for the user's interest, saving time to the user, reducing the resources consumed by the portal, and, most importantly, improving its engagement score.
Showing the users the most relevant news is a cold-start problem, where portals do not have enough collaborative information about news themselves before they become obsolete. News is a volatile asset, which dramatically depreciates as time passes. There is not enough time for an article to get relevant information from the users' community to profile collaborative filtering (CF) score. Therefore, this technique is excluded from the hypothesis explained hereabove. Instead, featuring engineering and featuring inference were used to predict the importance of a news article for a user.
Item Type: | Trabajo Fin de Máster |
---|---|
Directors: | Directors Director email Portela García-Miguel, Javier jportela@ucm.es |
Uncontrolled Keywords: | Microsoft News Dataset, Databases, News |
Subjects: | Sciences > Statistics Sciences > Statistics > Social sciences research |
Título del Máster: | Máster en Minería de Datos e Inteligencia de Negocios |
ID Code: | 66309 |
Deposited On: | 22 Jun 2021 10:07 |
Last Modified: | 22 Jun 2021 11:59 |
Origin of downloads
Repository Staff Only: item control page