Friday, February 19, 2010

Cassandra & Distributed File System

Unlike HBase, Cassandra has no dependencies on a distributed file system. However, in a sense it implements a partitioned strategy that effectively replaces any (otherwise) required hard dependence on distributed file system capabilities.

This statement should not be taken as an endorsement of one or the other. Instead, it is simply an important architectural observation that directly impacts the overall system architecture.

Monday, February 01, 2010

Hadoop Combiners and Map-Reduce

According to the Hadoop documentation, a Hadoop Combiner can be used to speedup Hadoop Map-Reduce if the Reduce function is both commutative and associative. It would be interesting to write a program that introspects a Hadoop Reduce function to draw a conclusion regarding the validity of this constraint for the use of Hadoop Combiner but it will not be trivial.