Welcome to our Data Blog

Data Without Limits

Read our Blog

Many Simple Models over One Complicated Model

Jurgen Appelo has an interesting theory; when people have invested time and energy in a model (tool, framework, method), people have a tendency to make their models more and more complicated. “Let’s add another dimension.” “Let’s deepen the domains.” “Let’s add some columns or swim lanes.” “Let’s draw an extra diagram.”

The main approach to solve Big Data challenges is to take out the complexity of the data sets.

Complexity itself is anti-methodology. It is against “one size fits all.”
- Tom Petzinger, Interaction of Complexity and Management

This means it makes more sense to use multiple simple models instead of one complicated model. Having a toolkit of methods and frameworks, which each fail in their own way, is a smarter approach than relying on one method or framework to deal with all situations.

Read more on:

Jurgen Appelo’s blog noop.nl

 

The Pragmatic Definition of Big Data by Mike Gualtieri

Mike Gualtieri says; forget about the three Vs

Big data is not defined by how you can measure data in terms of volume, velocity, and variety. The three Vs are just measures of data how much, how fast, and how diverse? A quaint definition of big data to be sure, but not an actionable, complete definition for IT and business professionals. A more pragmatic definition of big data must acknowledge that:

  • Exponential data growth makes it continuously difficult to manage — store, process, and access.
  • Data contains nonobvious information that firms can discover to improve business outcomes.
  • Measures of data are relative; one firm’s big data is another firm’s peanut.

A pragmatic definition of big data must be actionable for both IT and business professionals.

The Definition Of Big Data

Big Data is the frontier of a firm’s ability to store, process, and access (SPA) all the data it needs to operate effectively, make decisions, reduce risks, and serve customers.

To remember the pragmatic definition of big data, think SPA — the three questions of big data:

  • Store. Can you capture and store the data?
  • Process. Can you cleanse, enrich, and analyze the data?
  • Access. Can you retrieve, search, integrate, and visualize the data?

Hear me explain this definition on a special episode of Forrester TechnoPolitics: The Pragmatic Definition of Big Data Explained

Read more on: Forrester Blogs

Collaboration is the future, According to Amazon Web Services Chief Data Scientist Matt Wood

Once data makes its way to the cloud, it opens up entirely new methods of collaboration where researchers or even entire industries can access and work together on shared datasets too big to move around. “This sort of data space is something that’s becoming common in fields where there are very large datasets,” Wood said, citing as an example the 1000 Genomes project dataset that AWS houses.

dnanexus

 

DNAnexus’s cloud-based architecture

The genetics space is drooling over the promise of cloud computing. The 1000 Genomes database is only 200TB, Wood explained, but very few project leads could get the budget to store that much data and make it accessible to their peers, much less the computation power required to process it. And even in fields such as pharmaceuticals, Amazon CTO Werner Vogels told me during an earlier interview, companies are using the cloud to collaborate on certain datasets so companies don’t have to spend time and money reinventing the wheel.

Please continue reading the artikel writen by on gigaom.com

Let’s Talk Your Data, Not Big Data

‘Big data’ is a buzzword that has gone in and out of popularity since it was measured in megabytes. Unfortunately the immensity of its popularity in its current boom is doing some serious harm. Too many people are getting distracted by the ‘Big’ excitement and are only adding more friction to their goal of analysis. A recent Microsoft research investigation facetiously titled ‘No one ever got…

Read more: InnovationInsights form WIRED (The Artikel was Posted by Dave Fowler)

Introduction to Hadoop by Bill Graham (@billgraham)

Very nice introduction of Bill Graham (@billgraham) into Big Data and Hadoop.


UC Berkeley School of Information has a great course, where UC Berkeley professors and Twitter engineers are lectureing on the most cutting-edge algorithms and software tools for data analytics as applied to Twitter microblog data. Topics include applied natural language processing algorithms such as sentiment analysis, large scale anomaly detection, real-time search, information diffusion and outbreak detection, trend detection in social streams, recommendation algorithms, and advanced frameworks for distributed computing.
Bill Graham (@billgraham), who is active in the Hadoop community and a Pig contributor, gave a very clear and detailed intro to Hadoop and outlined how it is used at Twitter. His slides can be found here.

Follow the course on :
UC Berkeley Course Lectures: Analyzing Big Data with Twitter

Big Data comes to Munich, with Keynote Speaker Philippe Souidi

Yesterday in Munich IBM held its first SmartCamp event in Germany. It was also the first SmartCamp with a specific focus on Big Data and Business Analytics.

Keynote Speaker Philippe Souidi, Founder of echofy.me, summarized this topic perfectly when he called Big Data the “Oil of the next Century”… fitting, isn’t it?

The Gate Garching, the host of the event and a Munich Technology and Entrepreneur Center, was the perfect location for mindshare around the next generation of cutting edge startups, fitting because it is the home to several in-house innovative, young companies and close to the campus of the Technical University of Munich.

Let’s learn a little more about the startups who participated. 3 Big Data and Analytics startups received intensive mentoring from 15 Mentors representing different backgrounds, different industries, and different perspectives. Mentors included VCs, angels, serial entrepreneurs and industry experts, all of which had a common interest in Smarter Analytics.

SmartCamp Participants:

Celonis Softwate Solutions is the leading vendor for the analysis of operative process data created by IT systems. Their unique analysis technology, Process Business Intelligence, enables customers to intuitively dive into their process data and use it to improve their operational performance.

HoneyTracks provides the deepest analytics solution for monetization of online games and help Game Companies to understand the success factors of their game and how it generates revenues based on big data.

JouleX Energy Manager Solutions reduces energy costs up to 60% by monitoring, analyzing and managing energy usage of all network-connected devices and systems, without the use of costly and unwieldy agents.

And the winner is… JouleX!

As expected we had some very strong teams however the judges selected Joulex as the winner. Joulex leverages big data in order to create business intelligence by aggregating and correlating the energy information from all IP-enabled devices to provide unprecedented visibility into the energy consumption and utilization of those devices throughout the distributed office, data center and facilities environments. JouleX takes this a step further by applying advanced analytics to identify energy, cost, and carbon savings opportunities and a management platform to implement policies to realize this savings.

They have offices in Germany, US and Japan and are headed by Tom Noonan – Tom was previously CEO of Internet Security Systems (ISS), which was acquired by IBM for $1.5 billion. We look forward to working with them in the coming months.

Congratulations again to Joulex, Celonis and Honeytracks, an impressive set of Analytics and Big Data startups to kick off the very first SmartCamp in Germany!

Also a very big thanks to all our partners who were key to a very successful event.

Read more on: IBM Smart Camp

Peter Voss Datameer interviewed by tecpunk

Peter Voss Datameer from newthinking on Vimeo.

Peter Voss Datameer from newthinking on Vimeo.

Peter Voss is CTO at Datameer with extensive experience in software engineering and architecture of large-scale data processing. His focus has been largely on UNIX based enterprise systems with extensive background in Java, Spring, Hadoop, Lucene and Eclipse plug-in development.

Prior to Datameer, Peter consulted on a number of big data business intelligence projects with companies such as EMI Music and Krugle. Earlier, he was architect and developer for Deutsche Post and their ePost project, a distributed production system that processed more than 1 billion letters per year. Peter studied biology and has a Diplom (i.e., a Masters) in biochemistry and bioinformatics from the University of Köln.

Recorded at berlin buzzwords 2012.
More at berlinbuzzwords.de

Produzed by Alexander Oelling and Philippe Souidi.

Nicolas Spiegelberg – Multi-tenant HBase Solutions at Facebook

Nicolas Spiegelberg – Multi-tenant HBase Solutions at Facebook from newthinking on Vimeo.

Facebook first started looking for a distributed OLTP database solution in 2010. We ultimately chose HBase as the best solution for a variety of our workloads. Since then, we have rolled out multiple large production systems using HBase. For example, our current Messages infrastructure runs on HBase and handles over 180 billion person-to-person messages per month. This talk will discuss multiple Facebook projects that are running on HBase now, our selection criteria in choosing HBase as a good fit, and the functionality we added to open source to optimize a growing variety of use cases.

More info: berlinbuzzwords.de/sessions/multi-tenant-hbase-solutions-facebook

Peter Voss – Analyzing Hadoop Source Code with Hadoop

from newthinking

Peter Voss – Analyzing Hadoop Source Code with Hadoop from newthinking on Vimeo.

Using Hadoop based business intelligence analytics, we analyzed the Hadoop source code and its development over time and found some interesting and fun facts we want to share with the community. This talk will illustrate text and related analytics with Hadoop on Hadoop to reveal the true hidden secrets of the elephant.
This entertaining session highlights the value of data correlation across multiple datasets and the visualization of those correlations to reveal hidden data relationships.

More Info: berlinbuzzwords.de/sessions/analyzing-hadoop-source-code-hadoop

Creative Data Agency from Germany