Enable javascript in your browser for better experience. Need to know to enable it? Go here.

Big Data envy

Last updated : Mar 29, 2017
NOT ON THE CURRENT EDITION
This blip is not on the current edition of the Radar. If it was on one of the last few editions, it is likely that it is still relevant. If the blip is older, it might no longer be relevant and our assessment might be different today. Unfortunately, we simply don't have the bandwidth to continuously review blips from previous editions of the Radar. Understand more
Mar 2017
Hold ?

We continue to see organizations chasing "cool" technologies, taking on unnecessary complexity and risk when a simpler choice would be better. One particular theme is using distributed, Big Data systems for relatively small data sets. This behavior prompts us to put Big Data envy on hold once more, with some additional data points from our recent experience. The Apache Cassandra database promises massive scalability on commodity hardware, but we have seen teams overwhelmed by its architectural and operational complexity. Unless you have data volumes that require a 100+ node cluster, we recommend against using Cassandra. The operational team you'll need to keep the thing running just isn't worth it. While creating this edition of the Radar, we discussed several new database technologies, many offering "10x" performance improvements over existing systems. We're always skeptical until new technology—especially something as critical as a database—has been properly proven. Jepsen provides analysis of database performance under difficult conditions and has found numerous bugs in various NoSQL databases. We recommend maintaining a healthy dose of skepticism and keeping an eye on sites such as Jepsen when you evaluate database tech.

Nov 2016
Hold ?

We continue to see organizations chasing "cool" technologies, taking on unnecessary complexity and risk when a simpler choice would be better. One particular theme is using distributed, Big Data systems for relatively small data sets. This behavior prompts us to put Big Data envy on hold once more, with some additional data points from our recent experience. The Apache Cassandra database promises massive scalability on commodity hardware, but we have seen teams overwhelmed by its architectural and operational complexity. Unless you have data volumes that require a 100+ node cluster, we recommend against using Cassandra. The operational team you’ll need to keep the thing running just isn’t worth it. While creating this edition of the Radar, we discussed several new database technologies, many offering "10x" performance improvements over existing systems. We’re always skeptical until new technology—especially something as critical as a database—has been properly proven. Jepsen provides analysis of database performance under difficult conditions and has found numerous bugs in various NoSQL databases. We recommend maintaining a healthy dose of skepticism and keeping an eye on sites such as Jepsen when you evaluate database tech.

Apr 2016
Hold ?

While we've long understood the value of Big Data to better understand how people interact with us, we've noticed an alarming trend of Big Data envy : organizations using complex tools to handle "not-really-that-big” Data. Distributed map-reduce algorithms are a handy technique for large data sets, but many data sets we see could easily fit in a single-node relational or graph database. Even if you do have more data than that, usually the best thing to do is to first pick out the data you need, which can often then be processed on such a single node. So we urge that before you spin up your clusters, take a realistic assessment of what you need to process, and if it fits—maybe in RAM—use the simple option.

Published : Apr 05, 2016

Download the PDF

 

 

 

English | Español | Português | 中文

Sign up for the Technology Radar newsletter

 

Subscribe now

Visit our archive to read previous volumes