Decision on Source data Format

gbehel · Post by **gbehel** » Tue Jan 21, 2014 10:36 am

Hello All,

Was wondering which would be the best form of data source for faster data loadng to TM1 cube using TI process :

Available Data Set :
Table columns : Product, Channel, Region, Division, Jan01, Feb01, Mar01 till Dec15
Column Jan01 to Jan15 are data columns and may contain NULL or a default value "#MI" that needs to be avoided before the CellPutN Statement. The number of CellPutN statements will be equal to the number of measure columns. Have already loaded around 48 Million records using this data set but would like to look into the alternate approach also.

Alternate Approach ;
Table columns : Product, Channel, Region, Division, Period, Data
Period column will have values between Jan01 and Dec15. In this case there wont be NULL or #MI sine the same can be filtered out while populating the tables using PL/SQL

I have the option of desgining the alternate approach. Im planning to do the test run before finalising on the approach. The advantage with the second approach is th NULL and #MI can be avoided. however, the number of records in the second apporach will be much more due to column data converted to rows.

Any adivse/Opinion on which approah i should prefer?

Kind Regards
Gbehel

declanr · Post by **declanr** » Tue Jan 21, 2014 10:46 am

If you are purely interested in which is faster why don't you just build both and test them?
The logs will show a start and end time.

rmackenzie · Post by **rmackenzie** » Tue Jan 21, 2014 11:28 am

In the abstract, the alternate approach is preferred IMO.

Primarily, if in the future you need to load Jan03 through Dec18 then there's a good chance you are going to have to go in and tinker with the TI code when using the first approach. With the second approach, you ought to be able to write more generic code to deal with dynamic time periods that mean the TI will be able to deal with loads well into the future.

From a performance point of view, in the first approach you will probably be making use of the ISNULL function. If you have (15 years x 12 months) 180 columns of monthly data in your TI and for each you write a CELLPUTN wrapped in an IF ( ISNULL... ) condition then I imagine this performance overhead may be more than the effort required to process the increased volume of data. For products that only sell a couple of times a year then you will get a few extra rows instead of wasting processor cycles on checking for null data.

As Declan advises - you should test both methods. Depending on the data texture, it might be that the performance hit means it is worth the maintenance overhead tinkering with the TI everytime the dataset changes. But, then again, it might not! YMMV.

TM1 Forum

Decision on Source data Format

Decision on Source data Format

Re: Decision on Source data Format

Re: Decision on Source data Format