Wednesday, January 29, 2014

Move Column in Datastore

When working with ABT tables, the obstacles we face are a little different than regular DWH problems. Fact tables in a DWH are to be thin and long whereas we really don’t have that luxury to decide whether the table’s going to be fat or thin. Especially when the data size is “big”, it gets impossible to offer several ABT tables since the performance of other tools on joining “big” tables isn’t that great.

To the topic… We call a table fat, when the number of columns are a lot. By a lot, I mean more than 100. Imagine you need to add a new column, after the 10th (yeah, analysts can have weird requests) column. Now that task is time consuming, but doable. Edit a datastore, add the column and start clicking on move up button.  Well, that’s doable, but there should be an easier way to do this. I say more fun and reusable. Our dear ODI SDK comes into play.

Since 2011, several people posted the use of ODI SDK, but the number of examples out there is not that many. The idea is mostly the same with all of them, yet there are some blank points that can cause problems. There’s a move function in OdiColumn class. You can get the position of a column and set it (move it). But did you know that you cannot change the position of a column by move function if the file has fixed-length columns? For a fixed length file, the information is hold within FileDescriptor of each OdiColumn. The move column function in OdiColumn class sets the position of a column very well, but for fixed length files, the position is not defined by the “position” of the OdiColumn. It’s the FileDescriptor of the OdiColumn. If you try to getPosition() of the columns in such a datastore, you will get 0. If you add a column to the end as default, the position of that column will be 0. This FileDescriptor has a start position and bytes (length). To move a column in a datastore (of a fixed-length file) you have to loop through each column until you reach the position you would like to move the column to. And then keep calculating the new starting positions of the rest of the columns. Anyways, here’s what we do.

To make things simple, I’ll be using the following table and I would like to move C4 to 51.

Column Name
Starting Position
Length
C1
1
50
C2
51
50
C3
101
50
C4
151
50
C5
201
50

Basically, it’s a simple math problem where we all think that’s quite easy, yet fail to code on top of our head. Let’s say we would like to move column C4 to the second position. That would be pushing C2 and C3 by the length of C4. Also keep in mind that since we’re moving C4, C5 can/should remain unchanged. Our plan is as follows:

Loop through columns of the datastore:
      If the starting position of the current column is the same as START_POSITION then
Push the following columns by LENGTH.
      If the column name is the same as COLUMN_NAME then
Set the starting position to START_POSITION
            Set the bytes to LENGTH
            Don’t move the rest of the columns.

Let’s get to it. We start by importing some packages from SDK. After that we need some info on the model, datastore and new column.

import oracle.odi.blablabla
TABLE_A = 'MY_FILE';
MODEL_A = 'MY_MODEL'
NEW_COLUMN_INFO = ['C4',     51,   50]
Then we add the regular transaction definition etc.
ITransactionDefinition txnDef = new DefaultTransactionDefinition();
ITransactionManager tm = odiInstance.getTransactionManager();
ITransactionStatus txnStatus = tm.getTransaction(txnDef);
IOdiEntityManager mgr = odiInstance.getTransactionalEntityManager();

In order to make a change, we need to find our object within the repository.

OdiDataStore table_a_datastore = ((IOdiDataStoreFinder)mgr.getFinder(OdiDataStore.class)).findByName(TABLE_A, MODEL_A);
println(table_a_datastore.getName());

The code above does a few things. Firstly, we find objects by finders within the classes. OdiDataStore.class is our class and we get finder from it. The finder that we get with “getFinder()” function, has several other sub functions such as findByName(). Since we have the model name and the datastore name, we move on with those. By printing the datastore’s name, we actually make sure that table_a_datastore variable is not null. I know that in regular programming, we should be checking and printout warnings etc. but we are not here to write a code that does things efficiently nor by rules. This is groovy, and just like any Python users say “we’re here to hack it”. If the variable is null, then we would have an error, so be it, and nothing would get hurt. This is also a good place to state that, since we are to perform a task and this task only, we can make some assumptions. And the most important one is that we assume the column we would like to move is not already where we would like it to be (We’re not trying to move C4 to the 4th position). And the second most important assumption we made is that the position we would like to move the column is above the column (We’re not trying to move C1 to the 3rd position).
It’s also a good idea to run what we have up until now on ODI.
Let’s continue with setting up what we have;

COLUMN_NAME = NEW_COL_INFO[0]
START_POSITION = NEW_COL_INFO[1]
LENGTH = NEW_COL_INFO[2]

As planned, we need to go through each column. But the catch is, we need to make the changes in the FileDescriptors.

OdiColumn[] col_list = table_a_datastore.getColumns();
move_the_rest = 0
for(OdiColumn col : col_list){
      if(col.getFileDescriptor().getStartPosition() == START_POSITION){
            move_the_rest = 1
      }
      if(col.getName().equals(COLUMN_NAME)){
            col.getFileDescriptor().setStartPosition(START_POSITION);
            col.getFileDescriptor().setBytes(LENGTH);
            move_the_rest = 0
      } else if(move_the_rest == 1){
            old_start_position = col.getFileDescriptor().getStartPosition()
            col.getFileDescriptor().setStartPosition(old_start_position + LENGTH);
      }
}
println('Saving...');
tm.commit(txnStatus);
println('Completed.');

So that does it. With a few alterations, you can move multiple columns to the positions where they should be.

1 comment:

  1. Good job.
    I think there should be more examples about ODI SDK.

    ReplyDelete