We like logs – no shock there. However, system administrators also like logs. Some of the most difficult logs to work with come from the Microsoft world. I’ve seen DNS debug logs in Active Directory, IIS and Message Tracking logs in Exchange, and Windows Event Logs from just about everything else. My current view is around Unified Logging System (ULS) logs. These trace logs are produced by several packages, including Microsoft SharePoint and Project Server. They are also among the most requested format for support that I get. They are useful when it comes to diagnosing problems in your Microsoft IT infrastructure since they get down to the individual .NET calls that happen under the covers, but they are far from easy to understand. They are so problematic that Microsoft released ULSViewer to assist in their understanding. Of course, it only handles one file at a time and has rudimentary search and no statistical analysis, but that’s why you use Splunk. Splunk offers the SharePoint administrator some really cool features, but the main one we need is that basic function of Splunk – getting your logs into one place for troubleshooting.
Let’s start by taking a look at a typical log entry.
06/11/2013 15:15:49.90 PSConfigUI.exe (0x0624) 0x04D0 SharePoint Foundation Upgrade fbv7 Medium [psconfigui] [SPDelegateManager] [DEBUG] [6/11/2013 3:15:49 PM]: Waiting for mutex to initialize type dictionary
This is actually from my test SharePoint 2010 environment and goes through a debug message. There isn’t anything special about it – it tells you the executable, the PID, the session ID, the product, severity and a message – all good stuff. We can decode this relatively easily with some regular expression magic. Here is a typical multi-line log entry.
06/11/2013 15:15:49.87 PSConfigUI.exe (0x0624) 0x04D0 SharePoint Foundation Topology 8xqz Medium Updating SPPersistedObject SPFarm Name=SharePoint_Config. Version: -1 Ensure: False, HashCode: 37121646, Id: 19eb72de-6a41-4c79-9210-1b4ae749c790, Stack: at Microsoft.SharePoint.Administration.SPPersistedObject.BaseUpdate() at Microsoft.SharePoint.Administration.SPFarm.Update() at Microsoft.SharePoint.Administration.SPConfigurationDatabase.RegisterDefaultDatabaseServices(SqlConnectionStringBuilder connectionString) at Microsoft.SharePoint.Administration.SPConfigurationDatabase.Provision(SqlConnectionStringBuilder connectionString) at Microsoft.SharePoint.Administration.SPFarm.Create(SqlConnectionStringBuilder configurationDatabase, SqlConnection StringBuilder administrationContentDatabase, IdentityType identityType, String farmUser, SecureString farmPassword, SecureString... 06/11/2013 15:15:49.87* PSConfigUI.exe (0x0624) 0x04D0 SharePoint Foundation Topology 8xqz Medium ... masterPassphrase) at Microsoft.SharePoint.Administration.SPFarm.Create(SqlConnectionStringBuilder configurationDatabase, SqlConnectionStringBuilder administrationContentDatabase, String farmUser, SecureString farmPassword, SecureString masterPassphrase) at Microsoft.SharePoint.PostSetupConfiguration.ConfigurationDatabaseTask.CreateOrConnectConfigDb() at Microsoft.SharePoint.PostSetupConfiguration.ConfigurationDatabaseTask.Run() at Microsoft.SharePoint.PostSetupConfiguration.TaskThread.ExecuteTask() at System.Threading.ExecutionContext.runTryCode(Object userData) at System.Runtime.CompilerServices.RuntimeHelpers.ExecuteCodeWithGuaranteedCleanup(TryCode code, CleanupCode backoutCode, Object userData) at System.Threading.ExecutionContext.Run(ExecutionContext execu... 06/11/2013 15:15:49.87* PSConfigUI.exe (0x0624) 0x04D0 SharePoint Foundation Topology 8xqz Medium ...tionContext, ContextCallback callback, Object state) at System.Threading.ThreadHelper.ThreadStart()
Notice how the embedded token (three dots in a row on either side) is in the middle of the line. Before the three dots, the information is the same except for an asterix next to the date. It’s relatively easy to handle events that have the continuation token at the beginning of the line, but in the middle? Aside from this continuation aspect (about which – yes – I’m just a little bitter) we have a header to contend with and line breaking issues because of the asterix.
Of course Splunk can handle all this. We have the following steps we need to get through:
- Read the data through a tailing file monitor
- Remove or ignore the header area we don’t need
- Separate the data into events through a custom line breaker
- Remove the data continuation so the event is “correct”
Let’s take a look at the first bit. I’m receiving my ULS log from my Microsoft SharePoint 2010 farm servers, so I will tag the data with the MSSharePoint:2010:ULS source type. You will need to install a Splunk Universal Forwarder on each SharePoint server and create an app to read the data into your indexer. Then add the following to the inputs.conf:
[monitor://C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\14\LOGS] whitelist=.*-\d+-\d+\.log$ sourcetype=MSSharePoint:2010:ULSAudit queue=parsingQueue disabled=false
Note that we need a white list because Microsoft places other types of data in this directory, known as the “14 Hive”. The white list restricts the data we are reading to just the ULS log files. Our second step is to create a transforms.conf entry that ignores the header information. The header information contains the field headers and the first thing is “Timestamp” – normally, the file contains a date-time stamp as the beginning of the line, so we can ignore that:
[uls_remove_comments] REGEX=^Timestamp DEST_KEY=queue FORMAT=nullQueue
This is linked into the events using a props.conf entry:
[MSSharePoint:2010:ULSAudit] SHOULD_LINEMERGE=false CHECK_FOR_HEADER=false TRANSFORMS-ulscomment=uls_remove_comments
We next need to set up an appropriate line breaker. The date format is always the same, so we can break on that:
LINE_BREAKER=([\r\n]+)\d{2}/\d{2}/\d{4}\s\d{2}:\d{2}:\d{2}.\d{2}\s
The final piece of this caused me the most concern. How do I make the ULS log events look like they should, without the continuation tokens. Fortunately, I was in our European headquarters recently, and I tackled one of our senior professional services guys, who (after just a little bit of work) came up with the following props.conf addition:
SEDCMD-cleanup=s/(\.\.\.([^\*]+).*?\.\.\.)//g
Our completed props.conf entry looks like this:
[MSSharePoint:2010:ULSAudit] SHOULD_LINEMERGE=false CHECK_FOR_HEADER=false LINE_BREAKER=([\r\n]+)\d{2}/\d{2}/\d{4}\s\d{2}:\d{2}:\d{2}.\d{2}\s TRANSFORMS-ulscomment=uls_remove_comments SEDCMD-cleanup=s/(\.\.\.([^\*]+).*?\.\.\.)//g
You will still need to do field extractions on the resultant events. But the heavy lifting of getting the events into Splunk is now done. You will see each event is complete and does not have the intervening additions from the logs. So, as a bonus, the events are actually smaller than the original log.
Now, what will you do with SharePoint ULS logs?