Techniques for handling very large models

Post Reply
David Usherwood
Site Admin
Posts: 1454
Joined: Wed May 28, 2008 9:09 am

Techniques for handling very large models

Post by David Usherwood »

I'm looking for ideas in how to push a very large set of data through a complex TM1 model. It's doing insurance projections, taking policies in force and pushing them out over 120 months x 180 'Ages' x 300 lines (50 of the latter being fed and stored, other lines being rate type lines which are used for calculation but not fed).
The client has 240k of model points to put through the model. I can only bring the model up by using very selective feeder control to restrict how much of the model is active. We have a 48gig server, with a 128gig server due very soon, but it seems that we will need to push it through in sections.
I've built some of this, with a set of flags dividing up the largest dimension (Scheme) which turn the flags on and off. But I am finding it hard to keep the feeders down. I don't think CubeProcessFeeders removes existing feeders, unless I run CubeUnload, but if I do, I find that the server can't work with the unloaded cube and crashes. Restarting clears feeders but we would like to avoid/minimise this to get a better production process.
I can manage the calculation cache by sending a number to an irrelevant cell, but if fellow forumers can suggest reliable techniques for managing feeder caches stably, I'd much appreciate it.
Version - client has CX 9 = 9.4, and I port it to 9.5.1 on our own servers. Haven't detected any relevant differences between the two.
tomok
MVP
Posts: 2832
Joined: Tue Feb 16, 2010 2:39 pm
OLAP Product: TM1, Palo
Version: Beginning of time thru 10.2
Excel Version: 2003-2007-2010-2013
Location: Atlanta, GA
Contact:

Re: Techniques for handling very large models

Post by tomok »

David Usherwood wrote:I'm looking for ideas in how to push a very large set of data through a complex TM1 model. It's doing insurance projections, taking policies in force and pushing them out over 120 months x 180 'Ages' x 300 lines (50 of the latter being fed and stored, other lines being rate type lines which are used for calculation but not fed).
When you are dealing with really large models like this you have to make sure of several things, 1) calculations are minimized, this means don't do intermediate calculations unless necessary. if you can combine several formulas together algebraically then doing so will save memory; 2) make sure you are only feeding cells that are going to have data in the time period, line of business, policy type, etc. in the cell you are feeding. You may be able to reduce feeders using conditional logic even though doing so adds other issues, and 3) do you really need that much detail. i.e, does the client really need to project policy data by Policy for 10 years? Perhaps you can do your calculations by policy for 36 months, summarize it to just zip codes for the next 36 and then summarize further to just states for the last 48.
Tom O'Kelley - Manager Finance Systems
American Tower
http://www.onlinecourtreservations.com/
David Usherwood
Site Admin
Posts: 1454
Joined: Wed May 28, 2008 9:09 am

Re: Techniques for handling very large models

Post by David Usherwood »

Thanks. Bit buried to do a proper response & thanks at present.

Further findings.

a Combining CubeUnload and CubeProcessFeeders in the same TI, or in different TIs in the same chore, leads to a server crash. Running them separately by hand does not.

b I have found that if I ensure that the input data is held against a measure which doesn't fire the feeders, the system stays small. My plan is therefore to copy segments of the data into the fed area, calculate the model, drop the output into an unruled cube, then clear the input, rinse and repeat. But at present I can manage the calculation cache but not the feeder cache, because of the stability issues noted at point a.
Gabor
MVP
Posts: 170
Joined: Fri Dec 10, 2010 4:07 pm
OLAP Product: TM1
Version: [2.x ...] 11.x / PAL 2.0.9
Excel Version: Excel 2013-2016
Location: Germany

Re: Techniques for handling very large models

Post by Gabor »

Hi David,
my question would be, is there a duplication in your feeder information? I guess yes.
Usually there is a source dataset which is small enough to be loaded, but feeding goes into one or more dims/elements which represent derived versions of your initial dataset (calculated of course).

One possible solution is to feed into 1 element per dim only and put this element below each element you would feed traditionally. This brings down your feeding and it will not cause overfeeding, since you would apply the same information anyway (my question above). Zero suppression still works even for those C-elements, because consolidations are naturally fed by underlying elements.
There is a disadvantage of course, you cannot write N-element rules to such cubes/elements anymore. This makes the rule creation a little harder, sometimes a little more :-), but it works fine.
Functions like ConsolidateChildren are made for the model I am talking about.
I have a lot of cubes (up to 16 dims) with millions of variants of the original data volume and sophisticated rules, I am far beyond any traditional feeder volume. I usually create a derived cube without any physical data (virtual cube), but having all dims of my source data plus extra dims for my calculated variants (like for example timing compressions and expansions ... squeezing a cube). Then it's up to you to play with the rules and number of dummy feeder elements to get good performance and correct results, both is possible, I have started to use this with version 2.5!

By the way, I have asked several times for an inbuilt feature "feeding into a complete dim without duplication", last conversation happened about 7 months ago, the TM1 Product Manager wanted to forward this once again to the technical folks in the background, maybe it's time for a reminder.

Hopefully I was clear enough with my "TM1 thoughts"
Gabor
lotsaram
MVP
Posts: 3654
Joined: Fri Mar 13, 2009 11:14 am
OLAP Product: TableManager1
Version: PA 2.0.x
Excel Version: Office 365
Location: Switzerland

Re: Techniques for handling very large models

Post by lotsaram »

I would second what Gabor has said. If feeders are the problem then you can change the structure of the measure dimension to "feed" calculations from values of N level descendants. This means having C level rules not N level rules (and it may also mean having to use additive rules or ConsolidateChildren) but is does mean that you can virtually eliminate feeders (fantastic for multi-currency conversion models where each currency just becomes a parent of "Local" and many retail or FMCG type measure dimensions where for example the presence of data in "sales units" is the trigger for many downstream calculations).

Of course the caveat is that this is most applicable for large models where the requirement to feed a calculation is (mostly) driven off a single measure dimension. If you have a complex situation where the intersection of 2 or more dimensions plus one or more conditions is required to figure out whether to perform a calculation then it might not work (or be much more difficult to get it to work.)

Like Tomok I would also question whether the model is falling into the precision vs. accuracy trap. Why so detailed for so many time periods? Why not condense the last 5 years of the time horizon to yearly level not monthly, does it really make a difference considering a model like this will always be very sensitive to starting assumptions which are much more important to the models accuracy than minutiae around the precise calculation mechanism.
David Usherwood
Site Admin
Posts: 1454
Joined: Wed May 28, 2008 9:09 am

Re: Techniques for handling very large models

Post by David Usherwood »

Thanks to all for the responses.

I have raised a PMR on the CubeUnload/CubeProcessFeeders crash. I did stagger under the weight of support emails, but fortunately IBM Support were able to reproduce the problem and it is wending its merry way towards an APAR (fix), sadly probably not in time to be used on the current phase of the project.

I spent some time investigating the 'inverse consolidation' approach suggested by Gabor and Lotsaram. From my tests, it can be used to cause a value to appear as if fed, but since it is not fed, it is not possible to feed the value onwards - when you want to add it up further, the consolidation logic 'looks past' the calculated value to its child and thus delivers garbage.

But I do have a number of calculations where I need to add up rates, or calculate an average which doesn't require onward consolidation. I recall encountering ConsolidateChildren in 8.2.10 and that, then, it behaved very badly, causing most unwelcome RAM and calc time explosion. Because of that I have not used it since, but hey, they might have sorted it out. So I tested a couple of the the calculations, taking them out of the fed lines and using ConsolidateChildren. My tests indicate a saving in feeder size and time (well, yeah) but a larger increase in calculation size and time. Given the issues with managing the feeder cache I think it is probably worthwhile, but only on balance, so I have extended it further.

On granularity - I do agree. The model isn't _quite_ at individual policy level, but 45% of the model points hold a single policy. But the client believes that this level of detail is required - this is a Solvency II related project, with all the baggage that carries. I haven't been able to move them on this. Accordingly, a batch process to move chunks of the model through is the best I can devise. I am currently trying out doing a CubeUnload at the _end_ of the copy and then queuing up a separate job to start (with a CubeProcessFeeders) when the first has finished.

And finally - I did meet up with Steve Rowe for a Christmas pint or three. He suggested splitting the whole process up across a set of parallel cubes, then running a set of jobs. Lot of work, but nice if it came off. Fortunately I had a way of establishing whether this might work, since the server holds three completely independent models for different lines of business. They share only some interest rates etc, all of which are stable during the runs. So I tried setting up a pair of timed chores, one for each model. I recalled being advised that setting the logging status would cause data loaders to wait (both wanting to write to }cubes), so I took that out. Sadly, quite definitely, the second job waited for the first. I recall hearing lots of stuff about parallel loading but never about parallel calculation. I really believe this needs to be addressed if the product is going to scale properly.

The work continues....
Post Reply