PMR69311 999 866: Corrupt Log Files > Server Crash

Post bug reports and the status of reported bugs
Post Reply
User avatar
Steve Vincent
Site Admin
Posts: 1049
Joined: Mon May 12, 2008 8:33 am
OLAP Product: TM1
Version: 10.2.2 FP1
Excel Version: 2010
Location: UK

PMR69311 999 866: Corrupt Log Files > Server Crash

Post by Steve Vincent » Wed Jul 13, 2011 1:51 pm

Feel like this is my primary role in life at the minute; logging faults with IBM :roll:

Problem description
Since upgrading from 9.0 to 9.5.2. we have found our servers crashing
while replicating data at what seems to be random times. An issue i
stumbled upon during testing appears to effect more than just what i
discovered at the time, but due to it's nature is impossible for me to
replicate in a testing environment.

Out of a month of log files, about 4 of them had a corrupted 1st data
record. Instead of beginning "", like all the other lines, it read ",
(one quote instead of two). As replication uses log files to determine
what data to copy, i believe the way it parses the logs is unable to
cope with the malformed log entry and causes the server to crash.

During testing i found issues with the transaction log viewer where
data i knew was logged did not appear in the viewer. I eventually
traced it to the malformed logs, manually corrected them using notepad
and the viewer could then pick them up. Time was tight and as this did
not crash a server and was a minor issue to a few admins it was not
logged. Now it appears the same issue also hurts replication (was
unable to test replication due to lack of servers & time) which makes
it a much more serious issue.

Business impact
Without manually checking every single log file there is no way to know
or predict if a log is OK or not. We only find out when a replication
is run and a server crashes.
If this were a dictatorship, it would be a heck of a lot easier, just so long as I'm the dictator.
Production: TM1 64 bit 10.2.2, Windows 2008/2012 Server. Excel 2010, IE11 for t'internet

jay9
Posts: 2
Joined: Wed Aug 19, 2009 4:09 am
OLAP Product: TM1
Version: 9.1.4
Excel Version: 2000

Re: PMR69311 999 866: Corrupt Log Files > Server Crash

Post by jay9 » Thu Jul 21, 2011 3:58 am

I had a similar issue yesterday when the TM1 server wouldn't restart due to a corrupted log file. It looks like log files created by a SaveDataAll process result in a missing " in the first line of the log file. The log files created by a manual Save Data seem to be OK.

Martin Erlmoser
Community Contributor
Posts: 122
Joined: Wed May 28, 2008 1:22 pm
OLAP Product: TM1, Cognos Express,..
Version: 9.1.4 FP1
Excel Version: 2010
Location: Vienna
Contact:

Re: PMR69311 999 866: Corrupt Log Files > Server Crash

Post by Martin Erlmoser » Tue Aug 02, 2011 7:01 pm

same issue here, starts with single double-quote and sometimes it looks like the following sample
Also found mixed entries from 2011-07-26 and 2011-08-01/02 in the logfile of 2011-08-01/ 02


good?:

Code: Select all

"","20110802142223","20110802142223","USER","S","WHATEVER","WHATEVER2","CUBE","USER","WHATEVER",""
bad?:

Code: Select all

",20110802023131,20110802023131.","20110802142828","20110802142828","USER","N","1234.123","0.","CUBE","2010","00","AA1","WW","0032","Whatever","333","23456000",""
#",20110802023131,20110802023131.","20110802142828","Whatev er ,20110802023131,20110802023131. yeah : 1"
but maybe its working as intended
okay, seems that transaction log query tools displays everything correct but server cant start with such a tm1s.log..

User avatar
Steve Vincent
Site Admin
Posts: 1049
Joined: Mon May 12, 2008 8:33 am
OLAP Product: TM1
Version: 10.2.2 FP1
Excel Version: 2010
Location: UK

Re: PMR69311 999 866: Corrupt Log Files > Server Crash

Post by Steve Vincent » Wed Aug 03, 2011 8:09 am

IBM have asked me to run some diagnostics to try and catch the server crash. I use savedataall but it doesn't always cause a corrupted log. Will have to see how this one pans out.
If this were a dictatorship, it would be a heck of a lot easier, just so long as I'm the dictator.
Production: TM1 64 bit 10.2.2, Windows 2008/2012 Server. Excel 2010, IE11 for t'internet

Martin Erlmoser
Community Contributor
Posts: 122
Joined: Wed May 28, 2008 1:22 pm
OLAP Product: TM1, Cognos Express,..
Version: 9.1.4 FP1
Excel Version: 2010
Location: Vienna
Contact:

Re: PMR69311 999 866: Corrupt Log Files > Server Crash

Post by Martin Erlmoser » Fri Aug 12, 2011 6:19 am

could test a bit yesterday
the lines starting with #" (changeset entries) are not the bad guys on the "server fails loading tm1s.log on start" issue, can confirm that the problem is only related to the data entries with single-double quote and this can be found (sometimes) in line1

bad!
","20110802142223","20110802142223","USER","S","WHATEVER","WHATEVER2","CUBE","USER","WHATEVER",""

Bob Stuecheli
Posts: 12
Joined: Fri Dec 12, 2008 8:53 pm
OLAP Product: TM1
Version: 10.2.2, 8.4.5, 2.5 (in 1987)
Excel Version: 2013
Location: Troy, MI

Re: PMR69311 999 866: Corrupt Log Files > Server Crash

Post by Bob Stuecheli » Thu Aug 18, 2011 7:49 pm

Just ran into the same problem this afternoon. We use the SaveDataAll in a TI process every night before the LAN server is backed up. A TM1 Admin user was viewing the transaction logs beginning with several days ago and the server crashed. Our TM1 service is set to restart if it goes down on its own, and an email is sent to a group of us TM1 admins and the LAN admin group. Well we started getting messages every minute that the server was up, then down, then up, then down, etc. Traced it back to the corrupt tm1s.log file with a missing " in the first row of data. Every time the server came back up it tried to reload the tm1s.log and crashed again.

We noticed that an occassional saved tm1s.log file has this error. Can't tell why or when it happens, there is no recognizable pattern that we can find. I submitted a PMR with IBM, but I haven't heard back from them yet.

User avatar
Steve Vincent
Site Admin
Posts: 1049
Joined: Mon May 12, 2008 8:33 am
OLAP Product: TM1
Version: 10.2.2 FP1
Excel Version: 2010
Location: UK

Re: PMR69311 999 866: Corrupt Log Files > Server Crash

Post by Steve Vincent » Fri Aug 19, 2011 7:55 am

Can you ask them to link your PMR with mine? I can't replicate the issue reliably and currently i am just awaiting the next crash to provide some debug files to IBM. If there is debug data you can provide in the mean time then we might be able to get something to their engineering team to look at :)
If this were a dictatorship, it would be a heck of a lot easier, just so long as I'm the dictator.
Production: TM1 64 bit 10.2.2, Windows 2008/2012 Server. Excel 2010, IE11 for t'internet

Bob Stuecheli
Posts: 12
Joined: Fri Dec 12, 2008 8:53 pm
OLAP Product: TM1
Version: 10.2.2, 8.4.5, 2.5 (in 1987)
Excel Version: 2013
Location: Troy, MI

Re: PMR69311 999 866: Corrupt Log Files > Server Crash

Post by Bob Stuecheli » Fri Aug 19, 2011 5:53 pm

Will do. I mentioned your PMR when I logged mine. The person I takled to didn't think that the crash dumps would show the root cause of the problem, just that the problem was from the tm1s.log file. They didn't even suggest that we create any debug files. We went live on 7/29/11 and 3 of the daily log files have had the problem.

By the way, my PMR number is 15662,033,000.

jay9
Posts: 2
Joined: Wed Aug 19, 2009 4:09 am
OLAP Product: TM1
Version: 9.1.4
Excel Version: 2000

Re: PMR69311 999 866: Corrupt Log Files > Server Crash

Post by jay9 » Mon Aug 29, 2011 12:47 am

I'm not sure if this is useful in debugging this issue, but here is the code from my SaveDataAll process:

Prolog:

Code: Select all

ProcessName = 'zSaveDataAll';
#Enter the Start Time in the Parameter Cube
StartVal= TIMST(Now, '\d/\m/\Y  \h:\i:\s');
CellPutS(StartVal, 'Parameters','1',ProcessName,'Start Time');
Epilog:

Code: Select all

SaveDataAll;
#Enter the End Time in the Parameter Cube
EndVal= TIMST(Now, '\d/\m/\Y  \h:\i:\s');
CellPutS(EndVal, 'Parameters','1',ProcessName,'End Time');
This results in the missing character in the log file:
","20110827081558","20110827081558","R*Save_Data_615pm","S","27/08/2011 12:15:00","27/08/2011 18:15:00","Parameters","1","zSaveDataAll","Start Time",""
"","20110827081558","20110827081558","R*Save_Data_615pm","S","27/08/2011 12:15:47","27/08/2011 18:15:58","Parameters","1","zSaveDataAll","End Time",""
I run this as the only process in a chore. When I remove the CellPutS from the Prolog tab the log file looks good:
"","20110828232000","20110828232000","R*zSave_test","S","29/08/2011 09:16:57","29/08/2011 09:20:00","Parameters","1","zSaveDataAll","End Time",""
Could it be that the SaveDataAll needs to be the first line of code in the process to not cause this issue?

User avatar
Steve Vincent
Site Admin
Posts: 1049
Joined: Mon May 12, 2008 8:33 am
OLAP Product: TM1
Version: 10.2.2 FP1
Excel Version: 2010
Location: UK

Re: PMR69311 999 866: Corrupt Log Files > Server Crash

Post by Steve Vincent » Wed Aug 31, 2011 8:25 am

Not likely, my TI has just that line of code and i've seen it happen on a manual save data too. Hoping IBM can nail the reason and solve it...
If this were a dictatorship, it would be a heck of a lot easier, just so long as I'm the dictator.
Production: TM1 64 bit 10.2.2, Windows 2008/2012 Server. Excel 2010, IE11 for t'internet

Bob Stuecheli
Posts: 12
Joined: Fri Dec 12, 2008 8:53 pm
OLAP Product: TM1
Version: 10.2.2, 8.4.5, 2.5 (in 1987)
Excel Version: 2013
Location: Troy, MI

Re: PMR69311 999 866: Corrupt Log Files > Server Crash

Post by Bob Stuecheli » Wed Aug 31, 2011 2:51 pm

jay9 wrote:Could it be that the SaveDataAll needs to be the first line of code in the process to not cause this issue?
My TI process has only 1 line in it. The Prolog has the following:
#****Begin: Generated Statements***
#****End: Generated Statements****

SAVEDATAALL;

That is the only code in the process. I call it from a chore that runs nightly. I haven't had the problem show up in the past week, and we are running 6 servers (several are test only) that run the process nightly.

What is the LOGID that sometimes populates that first field?

asutcliffe
Regular Participant
Posts: 162
Joined: Tue May 04, 2010 10:49 am
OLAP Product: Cognos TM1
Version: 9.4.1 - 10.1
Excel Version: 2003 and 2007

Re: PMR69311 999 866: Corrupt Log Files > Server Crash

Post by asutcliffe » Fri Sep 02, 2011 9:14 am

I ran into this yesterday too. Upon restart after an unrelated (I presume) crash, the server crashed citing a similarly corrupt log. We're not using replication so saved logs haven't caused a problem so no idea how prevalent this is.

Note the most recent prior save I did was done manually, not via TI.

Cheers,
Alex

Martin Erlmoser
Community Contributor
Posts: 122
Joined: Wed May 28, 2008 1:22 pm
OLAP Product: TM1, Cognos Express,..
Version: 9.1.4 FP1
Excel Version: 2010
Location: Vienna
Contact:

Re: PMR69311 999 866: Corrupt Log Files > Server Crash

Post by Martin Erlmoser » Mon Oct 03, 2011 5:29 pm

TM1 9.5.2 FP1 released
IBM says that this issue has been solved.

maybe a link to release notes:
https://www-304.ibm.com/support/docview ... s=swgimgmt#

https://www-304.ibm.com/support/docview ... wg1PM39604

But APAR status = OPEN...

Post Reply