Showing posts with label DBMS. Show all posts
Showing posts with label DBMS. Show all posts

Friday, April 15, 2011

Setting up a Hadoop development environment

Hadoop is a platform for performing distributed computing. That's easy enough to understand, right? There are some add-ons for things such as distributed file storage and distributed database access, but at the heart of it, Hadoop is a processing platform that partitions the work across multiple machines in a cluster.
Read this post for setting it up.

Saturday, April 10, 2010

Intercept null value on a query

These last days working to create a Pentaho report with openERP data, I came across a serious problem:
if a query return a null value arithmetic operations are in error. The question was how to intercept the null value, surfing on Google I found this
SELECT IFNULL (1,0) this query return 1
SELECT IFNULL(null, 0) this query return 0
Fine my problem was solved NO!!  openERP use postgresql and postgresql doesn't support IFNULL
Again surfing on Google and I found COALESCE
The postgresql manual say:
The COALESCE function returns the first arguments of ITS THAT IS NOT NULL. Returned is null only if all arguments are null. Often it is used to substitute a default value for null values 

Finally my problem was solved my query now  is something like this
SELECT COALESCE(sum(aal.amount * -1) , 0) FROM account_analytic_line AS aal WHERE (aal.account_id=83) and aal.journal_id = 3

Rinaldo

Sunday, February 28, 2010

Optimizing db performance

A couple of weeks ago my colleague Daniel asked me tips and info on optimizing db.

With the documentation I have, I found a certain number of techniques, some available on opensource DBMS and  other only on proprietary DBMS.
  • Indexing normaly when you query a table all the table data will be read (full table scan). Indexing data is like indexes in a book. So primary key is a special kind of index, it's the unique identifier for a row on the db. How to build indexes? the answer is not easy, but we must always remember that creating a db is not a project but an ongoing process.
    If for example I want to get all the sales ordered by date I will create an index on date_sales field with descending sort order.
    In this way retrieving last week's sales won't be slow.
  • Bitmap indexing this is a sepcial form of index (unfortunatly  not available on MySql) normaly used in columns with a low number of unique values gender, color, ...). The idea is to create for the distinct number of the values in the column, a bit projection is created. This enable faster access to the indexed values.
  • Partitioning introduced in MySql 5.1 it's the capibility  to slice a table into multiple physical pieces. For example a large table containing sales data can be partioned by year so that queries for a given year will be performed only on the portion of the data for the requested year.
  • ...
 This is only a starting point, in future I will investigate Change Data Capture
techniques, use of products such as Kettle, Mondrian, ...
Good eveningRinaldo