Making log analysis easier with Unix commands
A couple of weeks ago, I presented a webinar on the topic of analyzing logs for SAP BusinessObjects BI4. The objective for that webinar was to share my opinion of the skills and framework required for being successful at troubleshooting the BI platform with the assistance of logs. Based upon the initial feedback, it seems like this was a useful endeavor. As such, I wanted to start a short series focused on providing more concrete examples of analyzing logs and troubleshooting the platform. This will be the first in the series.
I would like to begin by providing information on the various tools that I use when engaged in a troubleshooting exercise. As discussed in my webinar, I find that I can categorize most of these engagements as either tracking down a user error, performance or stability issue, or implementing a monitoring plan for a long term analysis of a recurring issue.
Unix commands and the UnxUtils package for Windows
If you’re using Unix, Linux, or a Mac to do your log analysis, then you probably have most of these commands already installed on your system. If you’re using Windows, then you should immediately download UnxUtils. This makes your analysis much quicker as it provides ports of some of the most useful Unix commands.
The first command that I use frequently is grep. It allows for keyword search against any file – plain text or not. When searching for a known error, I would typically use this like so: grep “ORA-” *.glf. This allows me to see all instances of an Oracle related error in the traces from my SAP BusinessObjects services. An example of the output is below.
If I’m searching for an error, but I’m not really sure of the specific error message, then I’ll usually search for the “|E|” text first. This is inserted into the logs when an error occurs. The command would be grep “|E|” *.glf. An example of the output is below.
We could further refine this by using the Importance and Severity columns in the logs to only pull out critical errors. If we wanted to search for all errors that were an exception, then the command would then become grep “|E|” *.glf | grep “|X|”. You could modify this to only search for assertions or fatal exceptions by using the grep “|E|” *.glf | grep “|A|” or grep “|E|” *.glf | grep “|F|” commands respectively.
Another useful command is find. This allows you to search for files in the filesystem that meet specific criteria. I typically use this when I am investigating an issue realtime or need to find entries in the logs from a specific period of time. For example, if I wanted to limit my search to any logs that were updated in the past one minute, then I could run the find . -mmin -1 command to do so. You can see the results below. This would be useful for restricting my search to only those logs that were updated during the timeframe that the error occurred.
I could also further augment this command to then search for a keyword within the resulting logs. For example, with the find . -mmin -5 -exec grep “runtime_error” {} \; command, I could search all of the logs updated in the last five minutes with the keyword “runtime_error.”
The final command that I’ll discuss in this post is awk. It is actually an interpreted programming language with the awk command being the interpreter of the code that you pass to it. When it comes to analyzing logs and text files, there is quite a bit that it can do; however, I typically start out a basic level by just using awk to only show me the fields of information in which I’m interested. For example, you can use the awk -F”[|]” ‘/\|E\|/ {print $3,$10,$23,$34,$36}’ *.glf command against the logs from the BI Launchpad application with the following results.
With the above command, awk will search for all GLF files containing the “|E|” keyword and will then print out fields 3, 10, 23, 34, and 36 delimited by a pipe (i.e., “|”). The result is that you will only see the date, time, calling application, application being called, and the error or message generated from this request. This allows you to limit the amount of information you must consume.
You can further augment this command to provide you with unique results rather than showing every instance of an error that has occurred. This is useful in determining how many distinct errors are actually occurring within the environment. An example of this command would be awk -F”[|]” ‘/\|E\|/ {print $10,$23,$34,$36}’ *.glf | uniq.
These results will show you all of the unique error messages that have occurred in the environment without the date and time. I pull them out as they will make all of the results unique due to the time going down to the millisecond.
I hope that you have found this post useful for determining which commands are the most useful for beginning your analysis of an event in your SAP BusinessObjects deployment. In the next post, I will be diving more into the actual analysis of the log entries.
Thanks for reading.