Server crash due to subset corruption

Post Reply
User avatar
DevGreg
Posts: 14
Joined: Tue Sep 09, 2008 4:12 pm

Server crash due to subset corruption

Post by DevGreg »

Dear all,

We're experiencing a very annoying issue with our TM1 server, so I'd like to know if you've ever encountered the same problem...

Our server is v9.0 SP3, and is running under Windows Server 2003. It is rebooted every night, as recommended by the former Applix team.

However, for no particular reason, some time during the working day, a subset starts to block everything.
I mean that, if I click on this subset, or if I try to reach it via a TI process, or if I launch a view that uses it, my action blocks every other action for all users.

We use the TM1 Top Custom application to terminate the action, but we do not have another solution than to reboot our server to set everything back to normal again.

This problem is rising more and more often as users are more numerous, and is becoming critical for us.

Have you ever experienced it?
What do you think is the cause of the problem? (corrupted RAM maybe?)
Is there any way to investigate these kind of problems in TM1? (I'm not the one operating the server)
Will an upgrade of TM1 solve it? Going to Unix? Change material configuration? Reboot more often?

Thanks in advance for your answers!

Regards,

Greg
User avatar
jim wood
Site Admin
Posts: 3951
Joined: Wed May 14, 2008 1:51 pm
OLAP Product: TM1
Version: PA 2.0.7
Excel Version: Office 365
Location: 37 East 18th Street New York
Contact:

Re: Server crash due to subset corruption

Post by jim wood »

Hi Greg,

I have not experienced this problem before the way I would get round it is:

1) Bring the server down.
2) Delete the subset off the hard drive.
3) Restart server.
4) rebuild the subset as soon as the server is back up.

This should solve any file corruption problems. If if it still occurs then you need to look at what is in the subset. Is there a particular measure that has a complicated or large calculation? If there is, can the calculation be simplified? Also check any feeder to make sure you have not over fed a measure. To check this you may want to complete the above 4 steps but rebuild the subset element at a time after the restart to identify the said criminal.

I hope this helps,

Jim.
Struggling through the quagmire of life to reach the other side of who knows where.
Shop at Amazon
Jimbo PC Builds on YouTube
OS: Mac OS 11 PA Version: 2.0.7
User avatar
DevGreg
Posts: 14
Joined: Tue Sep 09, 2008 4:12 pm

Re: Server crash due to subset corruption

Post by DevGreg »

Hello Jim,

Thanks for the quick answer.

The problem is that it's not a particular subset that create this problem, but random subsets!

They can be either dynamic or static, until now they weren't measure dimensions, but that's the whole point of the problem: there's no way for us to identify how their handle gets "corrupted" at some point...

That's why I was suspecting hardware issues: maybe there are some addresses in our RAM that don't work properly, and when TM1 uses these defected bits of RAM, it blocks the system...
Do you think is explanation is plausible?
User avatar
jim wood
Site Admin
Posts: 3951
Joined: Wed May 14, 2008 1:51 pm
OLAP Product: TM1
Version: PA 2.0.7
Excel Version: Office 365
Location: 37 East 18th Street New York
Contact:

Re: Server crash due to subset corruption

Post by jim wood »

Sounds like a viable answer. Do you have a backup server that you can test this theory on? If not you could always try smaller chunks say on a laptop. The later is not ideal as it is not a true reflection of your live environment but it may lead you to a more specific cause. (If there is one.)

Jim.
Struggling through the quagmire of life to reach the other side of who knows where.
Shop at Amazon
Jimbo PC Builds on YouTube
OS: Mac OS 11 PA Version: 2.0.7
User avatar
jim wood
Site Admin
Posts: 3951
Joined: Wed May 14, 2008 1:51 pm
OLAP Product: TM1
Version: PA 2.0.7
Excel Version: Office 365
Location: 37 East 18th Street New York
Contact:

Re: Server crash due to subset corruption

Post by jim wood »

Just thoughtof something else. Have you tried turning the logging on to maximum to make sure you track every calculation. This may also give you something that you can pass on to Cognos Support. (To find out how to change your logging setting check out the operations that you'll fidn in the installation directory on your server.)

Jim.
Struggling through the quagmire of life to reach the other side of who knows where.
Shop at Amazon
Jimbo PC Builds on YouTube
OS: Mac OS 11 PA Version: 2.0.7
User avatar
DevGreg
Posts: 14
Joined: Tue Sep 09, 2008 4:12 pm

Re: Server crash due to subset corruption

Post by DevGreg »

Thanks again Jim for your feedback.

We only have one server at our disposal, but I'll be pushing my colleagues and direction so that we get another machine to perform these tests...

Logging is already implemented for all our cubes, we didn't analyse it, but I doubt that we'll be able to find anything in it...

Indeed, these crashes also occurred on our development server (based on the same machine), and it was ignited by various events, which were not data entry: save of a rule, changing access rights, launching a TI...
User avatar
jim wood
Site Admin
Posts: 3951
Joined: Wed May 14, 2008 1:51 pm
OLAP Product: TM1
Version: PA 2.0.7
Excel Version: Office 365
Location: 37 East 18th Street New York
Contact:

Re: Server crash due to subset corruption

Post by jim wood »

Hello again. When I say logging I mean server logging. If you set this lowest level every calculation and action is traced. It as near to a server dump as you can get,

Jim.

PS. As I said earlier, have a quick look through operations manual. (Spelt badly and mis-typed of course) It goes through all the logging options.
Struggling through the quagmire of life to reach the other side of who knows where.
Shop at Amazon
Jimbo PC Builds on YouTube
OS: Mac OS 11 PA Version: 2.0.7
User avatar
paulsimon
MVP
Posts: 808
Joined: Sat Sep 03, 2011 11:10 pm
OLAP Product: TM1
Version: PA 2.0.5
Excel Version: 2016
Contact:

Re: Server crash due to subset corruption

Post by paulsimon »

Greg,

I have had cases of rogue MDX subsets in that version causing crashes, but never random issues. I would do a search on the data directory for *.??$. (Star dot question question dollar) to see if it finds an files. These can be left over from failed attempts by TM1 to save a file.

Another possibility is that there is a virus checker locking the files in the directory.

Regards


Paul Simon
User avatar
Steve Vincent
Site Admin
Posts: 1054
Joined: Mon May 12, 2008 8:33 am
OLAP Product: TM1
Version: 10.2.2 FP1
Excel Version: 2010
Location: UK

Re: Server crash due to subset corruption

Post by Steve Vincent »

Another possibility is that there is a virus checker locking the files in the directory.
if its random subsets, then the virus checking is where i'd be looking first. We had all sorts of problems with ours, you need to ensure the entire data directory is excluded from any virus checking on the server.
If this were a dictatorship, it would be a heck of a lot easier, just so long as I'm the dictator.
Production: Planning Analytics 64 bit 2.0.5, Windows 2016 Server. Excel 2016, IE11 for t'internet
laurent
Posts: 1
Joined: Tue Nov 04, 2008 2:42 pm

Re: Server crash due to subset corruption

Post by laurent »

I experienced same TM1 server hang using MDX dynamic subset (TM1 90SP3U5).
What do you suggest exactly?
-remove *.??$ files
-do not scan the database directory ?

Thank you for your help.Laurent
User avatar
mce
Community Contributor
Posts: 352
Joined: Tue Jul 20, 2010 5:01 pm
OLAP Product: Cognos TM1
Version: Planning Analytics Local 2.0.x
Excel Version: 2013 2016
Location: Istanbul, Turkey

Re: Server crash due to subset corruption

Post by mce »

similar issue after so many years.
PH17265: TM1 SERVER CRASH DUE TO CORRUPT SUBSET
at
https://www.ibm.com/support/pages/node/3799731
Post Reply