pero on anything

hadoop

Integrating MySQL and Hadoop – or – A different approach on using CSV files in MySQL

We use both MySQL and Hadoop a lot. If you utilize each system to its strengths then this is a powerful combination. One problem we are constantly facing is to make data extracted from our Hadoop cluster available in MySQL. The problem Look at this simple example: Let’s say we have a table customer: CREATE [...]

Improve performance on small hadoop clusters

Hadoop is designed to run on huge clusters containing several hundred machines. But some people just don’t need such a big cluster and are able to use the benefits of HDFS and MapReduce on a smaller scale. We managed to improve performance of our 10-node-test-cluster by almost 100% by adjusting the heartbeat intervals. Namenode and [...]

Simulating indexes in Hadoop

You should not try to use Hadoop as a “drop-in” replacement of your current (R)DBMS. That said it is still possible to utilize the power of cluster computing while circumventing its weaknesses when it comes to ad-hoc or real-time queries. We use Hadoop as an on-line system tightly integrated with our application and use it [...]

Increasing Performance of Hadoop-Unit-Tests

Adding a lot of unit tests for our application that uses Hadoop and its Map-Reduce-Engine significantly increased integration build time. Hadoop comes with a LocalJobRunner which is used by default so you do not have to set up a complete cluster in order to run some Unit-Tests. This is great! But the problem is: it [...]