data integration: January 2014

Thursday, January 30, 2014

Move Column in Datastore v2

Someone asked via email about my post on moving columns upwards. The question was about having several columns, not just one.

I’m not gonna go over the whole code now, but instead of having a list of information as a NEW_COL_INFO, have a list of column infos.

NEW_COL_INFO = [

['C5', 101, 10]

, ['C6', 99, 10]

, ['C7', 120, 10]

];

Now you gotta put the whole main part within a loop, so that each column is processed. But there’s another problem where the columns are not ordered. If we try to move C5 before C6, then the START_POSITION and LENGTH calculation would be all wrong. That’s why we need to sort the list by START_POSITION.

This is the code that I ended up with. As I said in the earlier post; it gets the job done.

import oracle.odi.blablabla;

TABLE_A = 'MY_FILE';

MODEL_A = 'MY_MODEL'

//COLUMN_NAME, START_POSITION, LENGTH

NEW_COL_INFO = [

['C5', 101, 10]

, ['C6', 99, 10]

, ['C7', 120, 10]

];

//sort

NEW_COL_INFO=NEW_COL_INFO.sort{p1, p2 -> p1[1] <=> p2[1] }

ITransactionDefinition txnDef = new DefaultTransactionDefinition();

ITransactionManager tm = odiInstance.getTransactionManager();

ITransactionStatus txnStatus = tm.getTransaction(txnDef);

IOdiEntityManager mgr = odiInstance.getTransactionalEntityManager();

OdiDataStore table_a_datastore = ((IOdiDataStoreFinder)mgr.getFinder(OdiDataStore.class)).findByName(TABLE_A, MODEL_A);

println(table_a_datastore.getName());

for(int j=0; j<NEW_COL_INFO.size(); j++){

COLUMN_NAME = NEW_COL_INFO[j][0]

START_POSITION = NEW_COL_INFO[j][1]

LENGTH = NEW_COL_INFO[j][2]

OdiColumn[] col_list = table_a_datastore.getColumns();

is_move_up = -1

for(OdiColumn col : col_list){

if(col.getName().equals(COLUMN_NAME)){

if(col.getFileDescriptor().getStartPosition() > START_POSITION){

is_move_up = 1

} else if(col.getFileDescriptor().getStartPosition() == START_POSITION){

is_move_up = 0

}

if(is_move_up == 1){

move_the_rest = 0

for(OdiColumn col : col_list){

if(col.getFileDescriptor().getStartPosition() == START_POSITION){

move_the_rest = 1

}

if(col.getName().equals(COLUMN_NAME)){

col.getFileDescriptor().setStartPosition(START_POSITION);

col.getFileDescriptor().setBytes(LENGTH);

move_the_rest = 0

} else if(move_the_rest == 1){

old_start_position = col.getFileDescriptor().getStartPosition()

col.getFileDescriptor().setStartPosition(old_start_position + LENGTH);

}

} else if(is_move_up == 0) {

println('Kolon zaten olmasi gereken yerde')

} else {

println('Kolonu yukari tasiyabiliyoruz. Asagi indiremiyoruz.')

}

//error checking

col_list = table_a_datastore.getColumns();

i = 1

err = 0

for(OdiColumn col : col_list){

println('Calculated pos:' + i + '\tCol Name:' + col.getName() + '\tCol Index:' + col.getPosition() + '\tStart Pos:' + col.getFileDescriptor().getStartPosition() + '\tLength:' + col.getFileDescriptor().getBytes());

if(i != col.getFileDescriptor().getStartPosition()){

err = 1

println('err')

}

i = i + col.getFileDescriptor().getBytes()

}

if(err == 0){

println('Saving...');

tm.commit(txnStatus);

println('Completed.');

}

Wednesday, January 29, 2014

Move Column in Datastore

When working with ABT tables, the obstacles we face are a little different than regular DWH problems. Fact tables in a DWH are to be thin and long whereas we really don’t have that luxury to decide whether the table’s going to be fat or thin. Especially when the data size is “big”, it gets impossible to offer several ABT tables since the performance of other tools on joining “big” tables isn’t that great.

To the topic… We call a table fat, when the number of columns are a lot. By a lot, I mean more than 100. Imagine you need to add a new column, after the 10^th (yeah, analysts can have weird requests) column. Now that task is time consuming, but doable. Edit a datastore, add the column and start clicking on move up button. Well, that’s doable, but there should be an easier way to do this. I say more fun and reusable. Our dear ODI SDK comes into play.

Since 2011, several people posted the use of ODI SDK, but the number of examples out there is not that many. The idea is mostly the same with all of them, yet there are some blank points that can cause problems. There’s a move function in OdiColumn class. You can get the position of a column and set it (move it). But did you know that you cannot change the position of a column by move function if the file has fixed-length columns? For a fixed length file, the information is hold within FileDescriptor of each OdiColumn. The move column function in OdiColumn class sets the position of a column very well, but for fixed length files, the position is not defined by the “position” of the OdiColumn. It’s the FileDescriptor of the OdiColumn. If you try to getPosition() of the columns in such a datastore, you will get 0. If you add a column to the end as default, the position of that column will be 0. This FileDescriptor has a start position and bytes (length). To move a column in a datastore (of a fixed-length file) you have to loop through each column until you reach the position you would like to move the column to. And then keep calculating the new starting positions of the rest of the columns. Anyways, here’s what we do.

To make things simple, I’ll be using the following table and I would like to move C4 to 51.

Column Name	Starting Position	Length
C1	1	50
C2	51	50
C3	101	50
C4	151	50
C5	201	50

Basically, it’s a simple math problem where we all think that’s quite easy, yet fail to code on top of our head. Let’s say we would like to move column C4 to the second position. That would be pushing C2 and C3 by the length of C4. Also keep in mind that since we’re moving C4, C5 can/should remain unchanged. Our plan is as follows:

Loop through columns of the datastore:

If the starting position of the current column is the same as START_POSITION then

Push the following columns by LENGTH.

If the column name is the same as COLUMN_NAME then

Set the starting position to START_POSITION

Set the bytes to LENGTH

Don’t move the rest of the columns.

Let’s get to it. We start by importing some packages from SDK. After that we need some info on the model, datastore and new column.

import oracle.odi.blablabla

TABLE_A = 'MY_FILE';

MODEL_A = 'MY_MODEL'

NEW_COLUMN_INFO = ['C4', 51, 50]

Then we add the regular transaction definition etc.

ITransactionDefinition txnDef = new DefaultTransactionDefinition();

ITransactionManager tm = odiInstance.getTransactionManager();

ITransactionStatus txnStatus = tm.getTransaction(txnDef);

IOdiEntityManager mgr = odiInstance.getTransactionalEntityManager();

In order to make a change, we need to find our object within the repository.

OdiDataStore table_a_datastore = ((IOdiDataStoreFinder)mgr.getFinder(OdiDataStore.class)).findByName(TABLE_A, MODEL_A);

println(table_a_datastore.getName());

The code above does a few things. Firstly, we find objects by finders within the classes. OdiDataStore.class is our class and we get finder from it. The finder that we get with “getFinder()” function, has several other sub functions such as findByName(). Since we have the model name and the datastore name, we move on with those. By printing the datastore’s name, we actually make sure that table_a_datastore variable is not null. I know that in regular programming, we should be checking and printout warnings etc. but we are not here to write a code that does things efficiently nor by rules. This is groovy, and just like any Python users say “we’re here to hack it”. If the variable is null, then we would have an error, so be it, and nothing would get hurt. This is also a good place to state that, since we are to perform a task and this task only, we can make some assumptions. And the most important one is that we assume the column we would like to move is not already where we would like it to be (We’re not trying to move C4 to the 4^th position). And the second most important assumption we made is that the position we would like to move the column is above the column (We’re not trying to move C1 to the 3^rd position).

It’s also a good idea to run what we have up until now on ODI.

Let’s continue with setting up what we have;

COLUMN_NAME = NEW_COL_INFO[0]

START_POSITION = NEW_COL_INFO[1]

LENGTH = NEW_COL_INFO[2]

As planned, we need to go through each column. But the catch is, we need to make the changes in the FileDescriptors.

OdiColumn[] col_list = table_a_datastore.getColumns();

move_the_rest = 0

for(OdiColumn col : col_list){

if(col.getFileDescriptor().getStartPosition() == START_POSITION){

move_the_rest = 1

}

if(col.getName().equals(COLUMN_NAME)){

col.getFileDescriptor().setStartPosition(START_POSITION);

col.getFileDescriptor().setBytes(LENGTH);

move_the_rest = 0

} else if(move_the_rest == 1){

old_start_position = col.getFileDescriptor().getStartPosition()

col.getFileDescriptor().setStartPosition(old_start_position + LENGTH);

}

println('Saving...');

tm.commit(txnStatus);

println('Completed.');

So that does it. With a few alterations, you can move multiple columns to the positions where they should be.

Tuesday, January 28, 2014

How to Debug ORA-01438 errors with ODI

Hello Everybody,

When you work in large enterprises where source systems sometimes forget to inform the datawarehouse about changes on data format / size or you are aggregating a continuously enlarging data, Getting an ORA-01438 becomes routine in once in a month.

Within this post i'll try to explain a practical method to make your ODI scenarios debug themselves .

ORA-01438: value larger than specified precision allows for this column

Without ODI:

When dealing with sql statements we can use the benefits of DBMS_ERRLOG package.
[doc. http://docs.oracle.com/cd/E11882_01/appdev.112/e40758/d_errlog.htm#ARPLS680]

if you have a huge Insert statement which throws an ORA-01438 you can simply create a log table via

EXECUTE DBMS_ERRLOG.CREATE_ERROR_LOG('<TARGET_TABLE_SCHEMA>.<TARGET_TABLE_NAME>','<LOG_TABLE_NAME>');
/*LOG_TABLE_NAME is usually named with a convention like ERR_<TARGET_TABLE_NAME>*/

then all you have to do is call your Insert statement by adding "LOG ERRORS INTO ...."
as shown below to log all the erroneous records.

INSERT INTO <TARGET_TABLE_SCHEMA>.<TARGET_TABLE_NAME> (............)
SELECT ...........
FROM <SOURCE_TABLE_SCHEMA>.<SOURCE_TABLE_NAME>
LOG ERRORS INTO <TARGET_TABLE_SCHEMA>.<LOG_TABLE_NAME> ('Label to specify for which run the records are logged') REJECT LIMIT <Number of rejected records>;

Then you can examine the log to specify the problem.

But it is important to remember that running all insert statements with LOG ERRORS adds an expense at the moment of inserting. This method should be used only for detecting the problem.

With ODI:
So we can not just embed the phrases into a KM and wait for the error happen because of the performance. But we can integrate this facility into our KM's and use parameter to invoke or revoke error logging on demand.

But how?
Simplest way is adding a declare type of variable to your packages to manage on demand run type. Then pass this variable to KM through an option. When we edit the KM in the proper way then we are done.

Below are this steps in detail:

1. Create the variable:

Select The KM to edit. I am editing the basic IKM SQL Control Append. Don not forget duplicating the KM and renaming it to make your remember its function.

I renamed it in order to make it understandable that this KM has Oracle Specific template.

Add an option with Value Type and set the default value as the variable:

Add a new detail to the KM for creating the log tables if necessary This detail must be added to the first order:

BEGIN
<@ if ("1".equals("<%=odiRef.getOption("DEBUG_MODE")%>") ) { @>
DBMS_ERRLOG.CREATE_ERROR_LOG('<%=odiRef.getTable("L","TARG_NAME","A")%>','ERR_<%=odiRef.getTargetTable("RES_NAME")%>');
<@ } @>
NULL;
EXCEPTION
WHEN OTHERS THEN
IF SQLCODE = -955 THEN
NULL;
END IF;
NULL;

END;

Modify "insert new rows" detail the way the logging statement will be added to the regular insert statement when requested via #DEBUG_MODE=1

<@ if ("1".equals("<%=odiRef.getOption("DEBUG_MODE")%>") ) { @>
LOG ERRORS INTO <%=odiRef.getTargetTable("SCHEMA")%>.ERR_<%=odiRef.getTargetTable("RES_NAME")%>('<%=odiRef.getSession("SESS_NO")%>') REJECT LIMIT 10

<@ } @>

Caution! Once you add the variable and KM to your Packages do not forget to send #DEBUG_MODE=0 to the scenarios on regular runs!.