Page 1 of 1

952 - Memory Leak?

Posted: Thu May 26, 2011 12:45 pm
by Steve Vincent
As is always the case, testing 952 never showed any issues with memory consumption but as soon as our production models are moved to 952 from 90 we are experiencing out of memory issues. I've had all 3 production models crash now, howing similar symptoms but never any indication of the underlying problem.

I have performance monitor running on all 3 and archive the stats once every 30 mins. There are noticable spikes in RAM used (especially in garbage) but at the time of the spike there are no TIs running to explain it. No data is being loaded by other means (DBRA etc) but its possible that users are trying to grab data from the cubes. Has anyone found issues with a memory leak in 952 that are being looked in to by IBM, or even better have been fixed but not yet communicated? I've had 3 years of rock solid stability on these servers and i'll be damned if i'll let that reputation get ruined. I've been through that before and know just how hard it is to get back.

TIA

Re: 952 - Memory Leak?

Posted: Thu May 26, 2011 1:03 pm
by jim wood
Steve,

There is a 952 thread with peopls finding on. The main issue found so far is the issue with TI crashing the server when trying to preview a file greater than 1200 lines long. Threre is a hotfix available for this. It might be worth applying the hot fit to see if it also fixes your memory issue,

Jim.

Re: 952 - Memory Leak?

Posted: Thu May 26, 2011 3:03 pm
by Steve Vincent
Yeah i know i was one of those that reported it :lol:

Those hotfixes are on my hitlist but the notes that came with them made no mention of this specific issue. Thanks to a user that broke one server today i can now replicate the problem and am now about to try and replicate it in the planning sample demo.

Currently in our models i have a reporting cube that pulls data in with rules from 2 different cubes. When a user browses that cube, as soon as every dimension has a level 0 element picked the "building cube view" dialog appears and the server will munch RAM until none is left. It behaves as if skipcheck is not working but if i can replicate it then i should be able to get IBM support on to the case.

Re: 952 - Memory Leak?

Posted: Thu May 26, 2011 5:04 pm
by Steve Vincent
  • can't replicate the issue in the planning sample db
  • only an issue when using rules to drag data from one of the 2 cubes
  • even if i use rules to pull just one value from the cube, it crashes (so not over fed)
  • cube affected is re-ordered, but putting back to original order (and restart) makes no difference
  • created a new source cube and copied data to it. changed rules to new cube, same result
None of this makes sense, it was all fine in 90. Loathed to write TIs to do the job as it then relies on humans to remember to run it, but at this rate i may not have an option.

Re: 952 - Memory Leak?

Posted: Fri May 27, 2011 2:40 pm
by image2x
Steve,

We too have been experiencing aberrant yet consistent memory issues since we moved to 9.5.2.

Symptoms:
- Slow but continuous increase in tm1s.exe memory footprint (18%->24% over three days with minimal data activity).
- Swap jumps to full as TM1 somehow gets entangled despite available physical ram (at least when I look at ram after the event)
- TM1 will run (in swapped fashion) until the next “event” occurs at which point everything goes south.

My hypothesis is that something (likely tm1) is momentarily spiking memory (which means a spike of ~60GB, 2x the startup size, to max out physical ram) at which point the OS (AIX in our case) begins using swap. Swap (“only 1.5GB”) jumps immediately to 100% as it’s now entangled with tm1 data from whichever TM1 instance next requested more memory. The spike ends and physical ram returns to normal but useable swap is 0%. On the next physical memory spike, as there is no available swap or ram, processes begin dying/stalling. At this point, I cannot even ssh into the server and instead need to remotely shutdown tm1 instance(s) in order for sshd to spawn.

I am going to try setting up more continuous logging of memory activity as I haven’t been able to catch the spike so far using a 5-minute sampling interval.

I’ve opened a ticket with IBM but thus far haven’t gotten anywhere.

-- John

Re: 952 - Memory Leak?

Posted: Fri May 27, 2011 3:23 pm
by jim wood
I've not seen any memory leekage yet but I have seen some very strange behaviour arond TI and security.

Details:

I created a TI process based on a security control cube view. When I ran it a few it was only randomly updating the control cube using ElementSecurtyPut. I then changed the process so that it used CellPutS instead. The cube was fully updated. I refreshed security. I then cleared teh control cube out using the same process and refreshed security. After that I ran the original process and it worked as expected. Very starnage indeed.

Re: 952 - Memory Leak?

Posted: Wed Jun 01, 2011 6:43 pm
by Steve Vincent
Thanks John. I've been hammering this for over a week now and just hit wall after wall. I've a list as long as the Great Wall of China for things that it's not and i've near as damn it run out of ideas.

My suspicion lays with something behind the scenes with the calculation algorithms. I do know if i take a small amount of data and use rules to copy from one cube to another, if there are no hierarchies in the dims it takes significantly less time to calculate a view of all N level elements than it does the same view but where dims have our real hierarchies. My manager likens it to trying to view data without zero suppression on. If you drag every dim to either the vertical or horizontal axis, then select all N level elements in all dims i could in theory be asking for around 5 x 2 x 700 x 2500 x 100 x 250 x 18000 = 7,875,000,000,000,000 elements. Our data is extremely sparse so in reality its nothing like that but add to that the detailed consolidations we have in some dims and the number is dramatically increased. If it tries to calculate all that despite a tiny proportion being selected then it could explain the issue. In general our users are complaining that response times have dropped and i tend to agree with them, especially where rules are involved.

I need to work on the data to make it possible to pass it on to IBM (data sensitivity) in a 9.0 server to show what it should be doing. Then take the same files, put them untouched in to 9.5.2 and watch it wither and die. Only that way can i be sure its something they can look at and fix. In the mean time i have logged it with IBM and explained the situation, with the plan to provide them with files to test at a later date. Hope they can start work on their own testing while i clear our data, i get the feeling this won't be a quick one to solve.

Re: 952 - Memory Leak?

Posted: Thu Jun 02, 2011 1:18 pm
by Steve Vincent
NAILED IT :D

Its taken so long to find but i have finally located the issue. There is a parameter in the server config called "AllRuleCalcStargateOptimization". According to the help files;
AllRuleCalcStargateOptimization
The AllRuleCalcStargateOptimization parameter can improve performance in calculating views that contain only rule-calculated values.

Parameter type:

Optional

Static

If you change this parameter value, restart the TM1® server to apply the new value.

Typically, TM1 performs calculations for standard consolidations and then calculates values for rule-based consolidations, which may end up overriding values in the standard consolidations. The AllRuleCalcStargateOptimization parameter provides optimization that first checks if every value in the view is rule-calculated and then proceeds as follows:

If every value in the view is rule-calculated, then TM1 skips the unnecessary calculations for standard consolidations and just performs the rule-calculated consolidations.

If the view contains even a single value which is not rule-calculated, then this optimization parameter will have no effect.

When this parameter is set to True, some additional processing will take place for every view that is requested to first check if the view contains only rule-calculated values. For most views, this additional processing is minimal since the optimization is stopped after the first value in the view is found to be not rule-calculated.

To enable this parameter, set the parameter's value to T in the TM1 server configuration file, Tm1s.cfg, as follows:

AllRuleCalcStargateOptimization=T
The default setting is disabled (F).
I had that in every production server as T. In my test server i had it as F. In all cases i had the same problem but if i remove that parameter completely the servers are just fine. The paragraph about "additional processing" is maybe the area that is borked. If the checks it makes are flawed to the point that it gets in to a circular calculation it would explain the view never returning a value and the server consuming all its RAM. If anyone is using 9.5.2 i suggest removing that parameter completely in case it causes other problems. I'll be updating my ticket with IBM to see if they can resolve it.

Re: 952 - Memory Leak?

Posted: Thu Jun 02, 2011 2:05 pm
by David Usherwood
That's good to know, given that IBM are planning to get mulish about taking calls for old, stable, working versions.

Re: 952 - Memory Leak?

Posted: Wed Aug 10, 2011 12:31 pm
by Steve Vincent
IBM have now managed to replicate the issue with test models i provided & have passed it to the engineering team as a defect. i'll move this to bugs now it is confirmed.

Code: Select all

____________________________PMR UPDATE_________________________________ 
Replicated in-house Defect COGCQ00623786 created - to be promoted by    
3LS/Keith                                                               
APAR PM45430  created 

Re: 952 - Memory Leak?

Posted: Thu Aug 11, 2011 3:46 pm
by harrytm1
Hi Steve,

I believe I'm experiencing the same issue now. In my P&L cube and BS cube, whenever I try to generate a view that displays all Cost Centres for 1 Company, it will take more than 1 minute. It behaves as if Skipcheck is not there. I have reviewed the feeders and it does not explain why the view generation time is so long as compare to other implementations of similar scale.

I'm using 9.5.2. Even after I have set the AllRuleCalcStargateOptimization parameter to F, it does not help. Sparse consolidation does not seem to work anymore.

FYI, I have STET and all the characteristics that you listed. I just want to be sure that this is a 9.5.2 issue.

Harry

Re: 952 - Memory Leak?

Posted: Thu Aug 11, 2011 4:14 pm
by Steve Vincent
harrytm1 wrote:I'm using 9.5.2. Even after I have set the AllRuleCalcStargateOptimization parameter to F, it does not help. Sparse consolidation does not seem to work anymore.
Have you tried #ing out that parameter in the server config? thats what i reported and atm is the only thing stopping our server crashes. only hope is IBM find a fix to that and the other 5 issues i have logged with them...

Re: 952 - Memory Leak?

Posted: Sat Aug 13, 2011 7:18 am
by harrytm1
Hi Steve,

Yes, I tried both setting the parameter to F as well as hexing it out in the tm1s.cfg file. Still crawling... behave like Skipcheck statement is not in the rule, even though it is.

FYI, I'm using XDI rule worksheet that was created in Excel 2003, but currently applied and saved using Excel 2010.

Harry

Re: 952 - Memory Leak?

Posted: Wed Sep 12, 2012 2:24 am
by BigG
Symptoms:
- Slow but continuous increase in tm1s.exe memory footprint (18%->24% over three days with minimal data activity).
- Swap jumps to full as TM1 somehow gets entangled despite available physical ram (at least when I look at ram after the event)
- TM1 will run (in swapped fashion) until the next “event” occurs at which point everything goes south.

My hypothesis is that something (likely tm1) is momentarily spiking memory (which means a spike of ~60GB, 2x the startup size, to max out physical ram) at which point the OS (AIX in our case) begins using swap. Swap (“only 1.5GB”) jumps immediately to 100% as it’s now entangled with tm1 data from whichever TM1 instance next requested more memory. The spike ends and physical ram returns to normal but useable swap is 0%. On the next physical memory spike, as there is no available swap or ram, processes begin dying/stalling. At this point, I cannot even ssh into the server and instead need to remotely shutdown tm1 instance(s) in order for sshd to spawn.

hi image2x ,

We are suffering these same symptoms in 9.5.2 , but do not have AllRuleCalcStargateOptimization in the tm1s.cfg file at all. Did you resolve this in any other way?

Cheers