Yahoo Lab Machine Learning Dataset: Info From 20M Users Now Available to Researchers

A billboard for the technology company Yahoo is seen August 5, 2015 in Washington, D.C. (Karen Bleier/AFP/Getty Images)

By    |   Friday, 15 January 2016 11:00 AM EST ET

Yahoo revealed Thursday that it is making its Yahoo Lab machine learning dataset available for academic research through the lab's Webscope program, Tech Crunch reported Thursday.

The dataset is about 13.5 terabytes and holds the interactions of roughly 20 million users from February 2015 through May 2015, including those happening on Yahoo's homepage along with Yahoo News, Yahoo Sports, Yahoo Finance, and Yahoo Real Estate, according to Tech Crunch.

"Data is the lifeblood of research in machine learning," Suju Rajan, Yahoo's director of personalization science at Yahoo Labs, said in a statement. "However, access to truly large-scale datasets is a privilege that has been traditionally reserved for machine learning researchers and data scientists working at large companies — and out of reach for most academic researchers."

The dataset also contains demographic information such as age range, gender, and generalized geographic data. Items in the dataset include title, summary, and key phrases of the news article in question, local timestamps, and some device information, the tech website noted.

"Research scientists at Yahoo Labs have long enjoyed working on large-scale machine learning problems inspired by consumer-facing products," Rajan said in the Yahoo statement. "This has enabled us to advance the thinking in areas such as search ranking, computational advertising, information retrieval, and core machine learning.

"A key aspect of interest to the external research community has been the application of new algorithms and methodologies to production traffic and to large-scale datasets gathered from real products," he continued.

According to ZDNet.com, the University of California, San Diego's Jacob School of Engineering plans to use the data to improve research into machine learning, artificial intelligence, and big data applications.

"Access to datasets of this size is essential to design and develop machine learning algorithms and technology that scales to truly 'big' data," Gert Lanckriet, a professor in the department of electrical and computer engineering at the university said in a statement, according to ZDNet.com.

© 2025 Newsmax. All rights reserved.


TheWire
Yahoo revealed Thursday that it is making its Yahoo Lab machine learning dataset available for academic research through the lab's Webscope program, Tech Crunch reported Thursday.
yahoo, lab, machine, learning, dataset
322
2016-00-15
Friday, 15 January 2016 11:00 AM
Newsmax Media, Inc.

View on Newsmax