Tuesday, March 15, 2011

Setting up a Hadoop development environment

From http://www.techrepublic.com
Hadoop is a platform for performing distributed computing. That’s easy enough to understand, right? There are some add-ons for things such as distributed file storage and distributed database access, but at the heart of it, Hadoop is a processing platform that partitions the work across multiple machines in a cluster.
More ...

Friday, March 11, 2011

Is BI Really for Everyone?

Interesting article on blog.technologyevaluation read it 

RainStor 4.5 For Big Data


RainStor has announced the next generation of its online data repository, adding data deduplication capabilities and improved optimization for storing computer-generated historical data. Able to run on a storage area network (SAN) or network-attached storage (NAS) system as a repository for structured data, RainStor 4.5 is aimed at capturing and then serving up online transaction processing (OLTP) data sets, user log data and metadata. Computerworld reports that the software comes with a RDF interface to automatically join data from relational databases to the repository.

Implementing MapReduce with Akka and Jython

Actor model is usually used for implementations of concurrent systems. A recent representative of concurrent systems is MapReduce
Read how Saeki try to  implement MapReduce system using Akka.