The dangers of going live without thorough investment in testing

This post is a little more generic than my normal product focus on SharePoint technologies, reason being it applies to all IT solutions projects delivering a solution.

As you may have read in the news over the last couple of weeks a couple of high profile problems hit the headlines. These were the failure of Tesco’s banking platform migration (http://www.bbc.co.uk/news/business-13888891) and The Olympics ticketing system for London 2012 (http://www.guardian.co.uk/sport/2011/jun/24/olympic-tickets-sale-second-round).

Both scenarios indicate that sufficient testing of their respective systems was clearly not carried out to avoid these unfortunate outcomes. With the advancement in virtualisation technologies there should be no reason why replicated scenarios can’t be simulated to forecast and avoid this happening.

One of the main reasons why testing is reduced is due to the financial overhead of such intense testing. This includes provisioning hardware, additional software licenses, resources to commission the system, testing the systems and the list goes on and effectively this is a project in itself.

What the system glitches didn't highlight and especially in the case the Olympics' ticket application system was that they simply didn't cater for the volume of applicants. This begs the question, why didn't they? Surely they were expecting this many applicants to apply?

This brings me on to the next point which is systems should be tested for every eventuality ‘What if’ and the outcome / behaviour of such a scenario.

Personally for the Olympic application (of which first time round I was unsuccessful) I was frustrated by how the results were displayed when searching for an event to attend. The results displayed all events even if they had been sold! All this did was make it harder for me to find events with availability and in the end I gave up. What I believe should have happened was as an event was sold out it was removed from the results to allow for easy navigation of availability. This was a case of the ‘what if’ scenario and how easy it should be for potential applicants to apply for tickets especially as second time round it was first come first serve so people would have been panicking trying to find availability as it rapidly disappeared.

So when discussing the financial cost of testing migrations / go-live scenario’s were the relevant risks pointed out of not thoroughly testing? Was the risk absorbed and understood?

IT may have been the one’s who were having the finger pointed at them however if they have highlighted the risks and issues of not thoroughly testing then is it IT’s fault? I certainly don’t think so and it easy to blame IT for this.

User Profile Synchronization Service failed to start due to Kerberos issue

If you’ve reached this page and its the first one you’ve read about setting up the SharePoint user profile synchronization service I would highly recommend making sure you have read in detail this article http://technet.microsoft.com/en-us/library/ee721049.aspx.

If you have read this and are still having issues read on..

So the issue I was having was that after upgrading a MOSS 2007 server to SharePoint 2010 (in place) the User Profile Synchronization service would attempt to start then after 5 minutes or so return to ‘stopped’. At this point I want to make clear that no part of my farm is setup for Kerberos authentication.

Looking into the server logs I found the following logs that looked suspicious:

Security ID:                DOMAIN\farmaccount
Account Name:                farmaccount
Account Domain:                DOMAIN
Logon ID:                0x58732
Logon Type:                        3
Account For Which Logon Failed:
Security ID:                NULL SID
Account Name:               
Account Domain:               
Failure Information:
Failure Reason:                An Error occured during Logon.
Status:                        0xc000005e
Sub Status:                0x0
Process Information:
Caller Process ID:        0xed8
Caller Process Name:        C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\14\BIN\OWSTIMER.EXE
Network Information:
Workstation Name:        SHAREPOINTSERVER
Source Network Address:        -
Source Port:                -
Detailed Authentication Information:
Logon Process:                C
Authentication Package:        Kerberos
Transited Services:        -
Package Name (NTLM only):        -
Key Length:                0

I also found an entry in the SharePoint Logs:

06/23/2011 10:43:23.73     OWSTIMER.EXE (0x1E2C)    0x1FE8    SharePoint Portal Server     User Profiles   9q15    High    UserProfileApplication.SynchronizeMIIS: Failed to configure ILM, will attempt during next rerun. Exception: System.Security.SecurityException: There are currently no logon servers available to service the logon request.       at System.Security.Principal.WindowsIdentity.KerbS4ULogon(String upn)     at System.Security.Principal.WindowsIdentity..ctor(String sUserPrincipalName, String type)     at System.Security.Principal.WindowsIdentity..ctor(String sUserPrincipalName)     at Microsoft.IdentityManagement.SetupUtils.IlmWSSetup.GetDomainAccountSIDHexString(String domainName, String accountName)     at Microsoft.IdentityManagement.SetupUtils.IlmWSSetup.GrantSQLRightsToServiceAccount()     at Microsoft.IdentityManagement.SetupUtils.IlmWSSetup.IlmBuildDatabase()     at Microsoft.Office.Server.Us...   

From researching this I managed to find this blog which suggested the issue is related to Kerberos.

Now this seems strange as my farm is not running Kerberos and nor are the Web Applications so I could have quite easily discarded the blog however stranger things have happened so I followed the blog mentioned and added a Kerberos SPN for the farm account to AD.

Sure enough this fixed the issue and the User Profile Synchronisation Service started and has since worked perfectly.

One thing that to note is that if you remove the SPN after starting the service and it for some reason returns to the stopped state you will need to re-enter the SPN to start the service.

Credit to my colleague James Brennan for assisting in resolving this issue.

SharePoint databases report ‘Database is in compatibility range and upgrade is recommended’ after upgrade

Following on from previous blog and day 2 of upgrade found some of the databases were reporting the following error:

Database is in compatibility range and upgrade is recommended

This was appearing on non content databases (as in SharePoint system databases).

This seemed strange as I mentioned in my previous blog that the upgrade had been successful so was a little perplexed to find this issue.

After a little research there didn't seem to be anything that looked to answer my problem.

Although this blog was referring to Windows 7 it was interesting in helping my trail of thought and also made me aware that you can only upgrade content databases and not just standard SharePoint system databases using PowerShell http://blogs.technet.com/b/blairb/archive/2010/07/16/patching-sharepoint-2010-on-windows-7.aspx.

Another article relating to this was found here where no solution had actually been found (at the time of blogging) http://social.technet.microsoft.com/Forums/en-IE/sharepoint2010setup/thread/7d3ee7cf-56f8-46fe-b2c9-b8662e88825a

So concluding that I could only upgrade content databases the above did not help in resolving the issue.

The two databases in question were:

Database – Type

WSS_Search - SPSearchDatabase

WSS_UsageApplication - SPUsageDatabase

I finally managed to resolve the problem by running a b2b upgrade command:

PSConfig.exe -cmd upgrade -inplace b2b -force -cmd applicationcontent -install -cmd installfeatures

Credit to this blog (http://blog.techgalaxy.net/archives/2585) for pointing out the b2b command as I was having issues running this prior to finding this.

People Picker Searchcustomquery vs. Searchcustomfilter property

So I wanted to share an issue I recently had with the people picker which I have to admit was due to user error on my part but you may fall into the same trap!

So the trap I fell into was that I assumed the SharePoint staadm peoplepicker only had one configuration option for querying AD and this is where I went wrong!

So if you follow this TechNet guide for the people picker this is why I believe it isn't as clear as I thought! http://technet.microsoft.com/en-us/library/cc263318(v=office.12).aspx (appreciating this is for MOSS it was suggested this was still applicable to SharePoint 2010).

If you notice from the previous URL it suggests that there is only stsadm -o setproperty -pn peoplepicker-searchadcustomquery available for AD queries so I pursued this route to find very strange results appearing when searching for users. For example lets say I told my parameter only to return users where they have a phone number this command above returned anyone who had a phone number regardless of what you entered in the search term.

So after lots of head scratching I realised after taking a step back that rather than look at the parameters appended to the command I needed to look at the makeup of the command. On closer inspection I realised that there are 2 types of AD commands applicable to the people picker -pn peoplepicker-searchadcustomquery and -pn peoplepicker-searchadcustomfilter (The latter being the actual command I needed) from reading this further TechNet article which actually does the opposite to the previous article and does not mention the query command http://technet.microsoft.com/en-us/library/gg602075.aspx#section8.

Hopefully you can see how easily it was to be confused. So I after the light bulb moment I quickly changed the command and after hours of failure doubting the structure of the LDAP query I realised it was a simple mistake everything worked like a dream.

So to conclude on the commands:

Stsadm -o setproperty -pn peoplepicker-searchadcustomquery – THIS WILL SHOW RESULTS BASED UPON THE QUERY YOU PLACE HERE REGARDLESS OF THE VALUE ENTERED INTO THE PEOPLEPICKER

Stsadm -o setproperty -pn peoplepicker-searchadcustomfilter – THIS WILL SHOW RESULTS BASED UPON FILTERING A VALUE APPLICABLE.

I’m not entirely sure why you would use the customquery option so this is why I didnt wven give it a second thought this existed. If anyone has any scenario’s where this could be used I’d be interested to hear from you.

Further reading:

http://technet.microsoft.com/en-us/library/gg602068.aspx

Thanks to Tristan Watkins and Glyn Clough for their input and ideas on Twitter.

In place upgrade from MOSS 2007 to SharePoint 2010 fails

Its been too long! I just realised I haven’t blogged for over 3 months – must try harder!

So today I was working on a client site on an in place upgrade from MOSS 2007 Standard to SharePoint 2010 Standard. Firstly I must say I’m not a fan of any in place upgrade whether this is SharePoint or another experience with Microsoft however due to time constraints and hardware availability this was the best way forward.

Background to the upgrade process

Following the TechNet article on in place upgrade to SharePoint 2010 http://technet.microsoft.com/en-us/library/cc263212(v=office.14).aspx. MOSS 2007 was installed with service pack 2 (Build number 12.0.0.6421) and the pre-upgrade checker run reporting no errors (at this stage it is worth noting that the October 2009 CU updates the pre-upgrade checker however due to timescales this CU update was not installed).

SharePoint 2010 media was slipstreamed with August 2010 CU that would take the new SharePoint build to 14.0.5123.5000.

The Upgrade

The upgrade failed with 3 errors 2 of which were due to features and these were deactivated and solutions retracted for the purposes of removing errors in the short term however 1 error remained from running the command:

psconfig -cmd upgrade -inplace v2v –passphrase <passphrase> -wait

From the logs generated the errors in the log (the log file is typically named Upgrade-Date-Time-error.log) were as follows:

[PSCONFIG] [SPUpgradeSession] [INFO] [6/22/2011 1:45:30 PM]: No context object
[PSCONFIG] [SPUpgradeSession] [ERROR] [6/22/2011 1:45:30 PM]: This upgrade session has been stopped. Possible causes include the process being terminated abruptly or the OS has rebooted. Please restart the upgrade again.

As you can see from the error it wasn't very helpful as this did not apply as I tried to complete the upgrade several times. Further errors were also present when running the command from the prompt:

Failed to upgrade SharePoint Products.

An exception of type Microsoft.SharePoint.Administration.SPUpdatedConcurrencyException was thrown. Additional exception information: An update conflict has occurred, and you must re-try this action. The object SPUpgradeSession Name=xxxxxxxxxxxxxxxxxxx was updated by domain\user, in the PSCONFIG (10056) process, on machine SERVER. View the tracing log for more information about the conflict.

Microsoft.SharePoint.Administration.SPUpdatedConcurrencyException: An update conflict has occurred, and you must re-try this action. The object SPUpgradeSession Name=xxxxxxxxxxxxxxxxxxx was updated by domain\user, in the PSCONFIG (10056) process, on machine SERVER. View the tracing log for more information about the conflict.

From researching other blogs I found a Microsoft article http://support.microsoft.com/kb/939308 suggesting that when upgrading you may need to clear the cache on the WFE servers. This also suggests it is for MOSS 2007 and not SharePoint 2010 and at this stage the server was upgraded with SharePoint 2010 binaries although this blog by Stefan Goßner that suggested it may be applicable to SharePoint 2010 http://blogs.technet.com/b/stefan_gossner/archive/2011/03/04/delayed-february-2011-cu-for-sharepoint-2010-is-now-available.aspx (see Stefan’s comments).

I tried this procedure several times with no luck!

I found a further article here http://social.technet.microsoft.com/Forums/en-US/sharepoint2010setup/thread/8406c0f6-5de9-4079-bf37-8640a60d1a19/ and detached all content databases from the farm and this did not work.

Basically time and time again receiving the breakdown at the bottom of each attempt:

Total number of configuration settings run: 3

Total number of successful configuration settings: 2

Total number of unsuccessful configuration settings: 1

Successfully stopped the configuration of SharePoint Products.

So at this point I was starting to run out of ideas and short of reinstalling which I really didn't want to do.

Solution

I finally managed to resolve my issue by taking a brave step of upgrading the farm (see my previous note that I was very short of ideas and time at this) to a later cumulative update (Should this have failed I had a complete backup prior to the upgrade to fall back on). Not being a fan of installing the latest CU just for the sake of it being the latest I chose February 2011 CU (Build number 14.0.5136.5002) and after implementing this CU I was successful in upgrading with a concluding statement along the lines of 4 out of 4 tasks completed successfully.

As always I hope this helps if you end up reading this, please do leave me a comment on your experiences.