Holiday season preparation

Preparing for the Holiday season is a year round task for all of us here at Bazaarvoice.  This year we saw many retailers extending their seasonal in-store specials to their websites as well. We also saw retailers going as far as closing physical stores on Thanksgiving (Nordstrom, Costco, Home Depot, etc.) and Black Friday (REI).  Regardless of which of the above strategies were taken,  the one common theme amongst retailers  was the increase in online sales.

This trend is not new. Online sales are catching up to in stores sales (http://www.usnews.com/news/business/articles/2015/11/28/black-friday-store-sales-fall-as-americans-buy-more-online) over the holiday season.  Along with the demand in online sales was the increase in demand on the Bazaarvoice network.

So here are just a few of the metrics that the Bazaarvoice network saw in the busiest week of online shopping:

blog_unique_visitors

Unique Visitors throughout 2013-2015

blog_impressions

Pageviews and Impressions 2013-2015

So how does the Bazzarvoice team prepare the Holiday Season?

As soon as the online traffic settles from the peak levels, the R&D team begins preparing for the next  year’s Holiday Season.  First by looking back at the numbers and how we did as a team through various retrospectives. Taking inventory of what went well and what we can improve upon for the next year. Before you know it the team gets together in June to being preparations for the next years efforts. I want to touch on just a few of the key areas the team focused on this past year to prepare for a successful Holiday Season:

  1. Client communication
  2. Disaster Recovery
  3. Load/Performance Testing
  4. Freeze Plan

Client Communication

One key improvement this year was client communication both between R&D and other internal teams as well as externally to clients. This was identified as an area we could improve from last year.  Internally a response communication plan was developed. This plan makes sure that key representatives in R&D and support teams were on call at all times and everyone understands escalation paths and procedures should an issue occur. It was then the responsibility of the  on call representative to communicate any needs with the different engineers and client support staff. The on call period lasted from November 24th to Tuesday December the 1st.

A small focused team was identified for creation and delivery of all client communication.  As early as August, “Holiday Preparedness” communications were delivered to clients informing them of our service level objectives. Monthly client communications followed containing load target calculations, freeze plans, disaster recover preparations, as well as instructions on how to contact Bazaarvoice in the event of an issue as well as how we would communicate current status of our network during this critical holiday season.

Internally there was also an increased emphasis on the creation and evaluation of runbooks. Runbooks are ‘play by play’ instructions which engineers should carry out for different scenarios. The collection of procedures and operations were vital in the teams disaster recovery planning.

Disaster Recovery

To improve our operational excellence, we needed to ensure our teams were conducting exploratory disaster scenario testing to know for certain how our apps/service behaved and improve our Dev Ops code, monitoring/alerting, runbooks, etc.  Documenting the procedures was completed in the late summer.  That quickly moved into evaluating our assumptions and correcting where necessary.

All teams were responsible for:

  • documentation the test plan
  • documentation of the results
  • capture the MTTR (mean time to recovery) when appropriate

Sign off was required for all test plans and results shared amongst the teams.  We also executed a full set of Disaster Recovery scenarios and performed additional Green Flag fire drills to ensure all systems and personnel were prepared for any contingencies during the holiday season.

Load/Performance Testing

For an added layer of insurance, we pre scaled our environment ahead of the anticipated holiday load profile.  Analysis of 3 years of previous holiday traffic showed a predictable increase of approximately 2.5x the highest load average over the past 10 months. For this holiday season we tested at 4x the highest load average over that time period to ensure we were covered. The load test allowed the team to test beyond expected target traffic profile to ensure all systems would execute above expected levels.

Load testing initially was isolated per each system.  Conducting tests in such environment helped quickly identify any failure points. As satisfactory results were obtain, complexities were introduced by running systems in tandem. This simulated a environments more representative of what would be encountered in the holiday season.

One benefit experienced through this testing was the identification and evaluation of other key metrics to ensure the systems are operating and performing successfully. Also, a predictive model was created to evaluate our expected results.  The accuracy of the daily model was within 5% of the expected daily results and overall, for the 2015 season, was within 3%. This new model will be a essential tool when preparing for the next holiday season.

Freeze Plan

Once again, we locked down the code prior to the holiday season. Limiting the number of ‘moving parts’ and throughly testing the code in place increased our confidence that we would not experience any major issues.  As the image below demonstrates, two critical time periods were identified:

  • critical change freeze – code change to be introduced only if sites were down.
  • general change freeze – priority one bug fixes were accepted. Additional risk assessments performed on all changes.

As you can see the critical times coincide with the times we see increased online traffic.

blog_change_freeze

Summary

A substantial amount of the work was all completed in the months prior to Black Friday and Cyber Monday. The team’s coordinated efforts prior to the holiday season ensured that our client’s online operations ran smoothly.  Over half of the year was spent ensuring performance and scalability for these critical times in the holiday season.  Data, as far back as three years, was also used to predict web traffic forecasts and ensure we would scale appropriately. This metric perspective also provided new insightful models to be used in future year’s forecasts.

The preparation paid off, and Bazaarvoice was able to handle 8.398 Billion impressions over Black Friday thru Cyber Monday (11/27-11/30), a new record for the our network.