Introduction to Hadoop by Bill Graham (@billgraham)

Very nice introduction of Bill Graham (@billgraham) into Big Data and Hadoop.


UC Berkeley School of Information has a great course, where UC Berkeley professors and Twitter engineers are lectureing on the most cutting-edge algorithms and software tools for data analytics as applied to Twitter microblog data. Topics include applied natural language processing algorithms such as sentiment analysis, large scale anomaly detection, real-time search, information diffusion and outbreak detection, trend detection in social streams, recommendation algorithms, and advanced frameworks for distributed computing.
Bill Graham (@billgraham), who is active in the Hadoop community and a Pig contributor, gave a very clear and detailed intro to Hadoop and outlined how it is used at Twitter. His slides can be found here.

Follow the course on :
UC Berkeley Course Lectures: Analyzing Big Data with Twitter

Nicolas Spiegelberg – Multi-tenant HBase Solutions at Facebook

Nicolas Spiegelberg – Multi-tenant HBase Solutions at Facebook from newthinking on Vimeo.

Facebook first started looking for a distributed OLTP database solution in 2010. We ultimately chose HBase as the best solution for a variety of our workloads. Since then, we have rolled out multiple large production systems using HBase. For example, our current Messages infrastructure runs on HBase and handles over 180 billion person-to-person messages per month. This talk will discuss multiple Facebook projects that are running on HBase now, our selection criteria in choosing HBase as a good fit, and the functionality we added to open source to optimize a growing variety of use cases.

More info: berlinbuzzwords.de/sessions/multi-tenant-hbase-solutions-facebook

Peter Voss – Analyzing Hadoop Source Code with Hadoop

from newthinking

Peter Voss – Analyzing Hadoop Source Code with Hadoop from newthinking on Vimeo.

Using Hadoop based business intelligence analytics, we analyzed the Hadoop source code and its development over time and found some interesting and fun facts we want to share with the community. This talk will illustrate text and related analytics with Hadoop on Hadoop to reveal the true hidden secrets of the elephant.
This entertaining session highlights the value of data correlation across multiple datasets and the visualization of those correlations to reveal hidden data relationships.

More Info: berlinbuzzwords.de/sessions/analyzing-hadoop-source-code-hadoop

Creative Data Agency from Germany