When working with ABT tables, the obstacles we face
are a little different than regular DWH problems. Fact tables in a DWH are to
be thin and long whereas we really don’t have that luxury to decide whether the
table’s going to be fat or thin. Especially when the data size is “big”, it
gets impossible to offer several ABT tables since the performance of other
tools on joining “big” tables isn’t that great.
To the topic… We call a table fat, when the number of
columns are a lot. By a lot, I mean more than 100. Imagine you need to add a
new column, after the 10th (yeah, analysts can have weird requests)
column. Now that task is time consuming, but doable. Edit a datastore, add the
column and start clicking on move up button. Well, that’s doable, but there should be an
easier way to do this. I say more fun and reusable. Our dear ODI SDK comes into
play.
Since 2011, several people posted the use of ODI SDK,
but the number of examples out there is not that many. The idea is mostly the
same with all of them, yet there are some blank points that can cause problems.
There’s a move function in OdiColumn class. You can get the position of a
column and set it (move it). But did you know that you cannot change the
position of a column by move function if the file has fixed-length columns? For
a fixed length file, the information is hold within FileDescriptor of each
OdiColumn. The move column function in OdiColumn class sets the position of a
column very well, but for fixed length files, the position is not defined by the
“position” of the OdiColumn. It’s the FileDescriptor of the OdiColumn. If you
try to getPosition() of the columns in such a datastore, you will get 0. If you
add a column to the end as default, the position of that column will be 0. This
FileDescriptor has a start position and bytes (length). To move a column in a
datastore (of a fixed-length file) you have to loop through each column until
you reach the position you would like to move the column to. And then keep
calculating the new starting positions of the rest of the columns. Anyways,
here’s what we do.
To make things simple, I’ll be using the following
table and I would like to move C4 to 51.
Column
Name
|
Starting
Position
|
Length
|
C1
|
1
|
50
|
C2
|
51
|
50
|
C3
|
101
|
50
|
C4
|
151
|
50
|
C5
|
201
|
50
|
Basically, it’s a simple math problem where we all
think that’s quite easy, yet fail to code on top of our head. Let’s say we
would like to move column C4 to the second position. That would be pushing C2
and C3 by the length of C4. Also keep in mind that since we’re moving C4, C5
can/should remain unchanged. Our plan is as follows:
Loop
through columns of the datastore:
If the starting position of the current
column is the same as START_POSITION then
Push the following columns by
LENGTH.
If the column name is the same as
COLUMN_NAME then
Set the starting position to
START_POSITION
Set the bytes to LENGTH
Don’t move the rest of the columns.
Let’s get to it. We start by importing some packages
from SDK. After that we need some info on the model, datastore and new column.
import
oracle.odi.blablabla
TABLE_A =
'MY_FILE';
MODEL_A =
'MY_MODEL'
NEW_COLUMN_INFO
= ['C4', 51, 50]
Then we
add the regular transaction definition etc.
ITransactionDefinition
txnDef = new DefaultTransactionDefinition();
ITransactionManager
tm = odiInstance.getTransactionManager();
ITransactionStatus
txnStatus = tm.getTransaction(txnDef);
IOdiEntityManager
mgr = odiInstance.getTransactionalEntityManager();
In order to make a change, we need to find our object
within the repository.
OdiDataStore
table_a_datastore =
((IOdiDataStoreFinder)mgr.getFinder(OdiDataStore.class)).findByName(TABLE_A,
MODEL_A);
println(table_a_datastore.getName());
The code above does a few things. Firstly, we find
objects by finders within the classes. OdiDataStore.class is our class and we
get finder from it. The finder that we get with “getFinder()”
function, has several other sub functions such as findByName(). Since
we have the model name and the datastore name, we move on with those. By
printing the datastore’s name, we actually make sure that table_a_datastore
variable is not null. I know that in regular programming, we should be checking
and printout warnings etc. but we are not here to write a code that does things
efficiently nor by rules. This is groovy, and just like any Python users say
“we’re here to hack it”. If the variable is null, then we would have an error, so
be it, and nothing would get hurt. This is also a good place to state that,
since we are to perform a task and this task only, we can make some
assumptions. And the most important one is that we assume the column we would
like to move is not already where we would like it to be (We’re not trying to
move C4 to the 4th position). And the second most important
assumption we made is that the position we would like to move the column is
above the column (We’re not trying to move C1 to the 3rd position).
It’s also a good idea to run what we have up until now
on ODI.
Let’s continue with setting up what we have;
COLUMN_NAME
= NEW_COL_INFO[0]
START_POSITION
= NEW_COL_INFO[1]
LENGTH =
NEW_COL_INFO[2]
As planned, we need to go through each column. But the
catch is, we need to make the changes in the FileDescriptors.
OdiColumn[]
col_list = table_a_datastore.getColumns();
move_the_rest
= 0
for(OdiColumn
col : col_list){
if(col.getFileDescriptor().getStartPosition()
== START_POSITION){
move_the_rest = 1
}
if(col.getName().equals(COLUMN_NAME)){
col.getFileDescriptor().setStartPosition(START_POSITION);
col.getFileDescriptor().setBytes(LENGTH);
move_the_rest = 0
} else if(move_the_rest == 1){
old_start_position =
col.getFileDescriptor().getStartPosition()
col.getFileDescriptor().setStartPosition(old_start_position
+ LENGTH);
}
}
println('Saving...');
tm.commit(txnStatus);
println('Completed.');
So that does it. With a few alterations, you can move
multiple columns to the positions where they should be.
Good job.
ReplyDeleteI think there should be more examples about ODI SDK.