Sunday, February 28, 2010

Optimizing db performance

A couple of weeks ago my colleague Daniel asked me tips and info on optimizing db.

With the documentation I have, I found a certain number of techniques, some available on opensource DBMS and  other only on proprietary DBMS.
  • Indexing normaly when you query a table all the table data will be read (full table scan). Indexing data is like indexes in a book. So primary key is a special kind of index, it's the unique identifier for a row on the db. How to build indexes? the answer is not easy, but we must always remember that creating a db is not a project but an ongoing process.
    If for example I want to get all the sales ordered by date I will create an index on date_sales field with descending sort order.
    In this way retrieving last week's sales won't be slow.
  • Bitmap indexing this is a sepcial form of index (unfortunatly  not available on MySql) normaly used in columns with a low number of unique values gender, color, ...). The idea is to create for the distinct number of the values in the column, a bit projection is created. This enable faster access to the indexed values.
  • Partitioning introduced in MySql 5.1 it's the capibility  to slice a table into multiple physical pieces. For example a large table containing sales data can be partioned by year so that queries for a given year will be performed only on the portion of the data for the requested year.
  • ...
 This is only a starting point, in future I will investigate Change Data Capture
techniques, use of products such as Kettle, Mondrian, ...
Good eveningRinaldo

Saturday, February 27, 2010

Nested Sets

It is often required to have tree structures, even if only to manage categories. A tree structure is a tree consisting of a single or multiple root which each node can have multiple nodes son, and that with infinite depth. 
Generally the method used to do this is to have a unique identifier for each node of the tree, and insert the identifier of the father node in the registration of the node.
The disadvantage of this technique is that one is obliged to make a SQL query for each node to retrieve all the son of a node. We can improve that by downloading all the nodes in the database, and rebuilding the tree by hand using an array indexed by parent node.
This technique is not really suitable for very large tree structures of several thousands of nodes.
So why don't trying by represent the tree differently, not as a tree, but as  Nested Sets.

The Nested Set is an adjacent list realized in SQL as a tree.

For example having the following tree structure:


We can represent it as a set



The SQL for create the table is :


CREATE TABLE `tree` (
`id` INT NOT NULL AUTO_INCREMENT ,
`name` VARCHAR( 32 ) NOT NULL ,
`parentid` INT NOT NULL ,
`lft` INT NOT NULL ,
`rgt` INT NOT NULL ,
PRIMARY KEY ( `id` ))

Selection of all nodes is very simple:
SELECT
  n0.id
FROM
  tree n0,
  tree n1
WHERE
    n1.id = 0
AND n0.lft BETWEEN n1.lft AND n1.rgt;


...
ASAP I will wrote a Java classes to maintain Nested Sets

keep in touch 
Rinaldo

references: http://www.mti.epita.fr/blogs/2008/07/06/avoir-une-structure-arborescente-en-sql/




  

Friday, February 26, 2010

OpenERP Eclipse XML (view) Template

OpenERP's XML based view templates to create view, wizard, report with predefined templates. 
Watch the video

Thursday, February 25, 2010

A PHP/Java Bridge

For those who want to integrate applications written in php with java I report a series of two articles. By working on the jAPS project I have to investigate about integrating a platform for e-learning written in PHP and the jAPS Intranet platform. Definitely a good basis to resume analysis

Monday, February 22, 2010

Customising the Pentaho User Console

For those who use the Pentaho User Console and want to customise it, I suggest reading the series of post from  Prashant Raju

Rinaldo