Tag Archives: Quality Assurance

How We Scale to 16+ Billion Calls

The holiday season brings a huge spike in traffic for many companies. While increased traffic is great for retail business, it also puts infrastructure reliability to the test. At times when every second of uptime is of elevated importance, how can engineering teams ensure zero downtime and performant applications? Here are some key strategies and considerations we employ at Bazaarvoice as we prepare our platform to handle over 16 Billion API calls during Cyber Week.

Key to approaching readiness for peak load events is defining the scope of testing. Identify which services need to be tested and be clear about success requirements.  A common trade off will be choosing between reliability and cost. When making this choice, reliability is always the top priority. ‘Customer is Key’ is a key value at Bazaarvoice, and drives our decisions and behavior.  Service Level Objectives (SLOs) drive clarity of reliability requirements through each of our services.

Reliability is always the top priority

When customer traffic is at its peak, reliability and uptime must take precedence over all other concerns. While cost efficiency is important, the customer experience is key during these critical traffic surges. Engineers should have the infrastructure resources they need to maintain stability and performance, even if it means higher costs in the short-term.

Thorough testing and validation well in advance is essential to surfacing any issues before the holidays. All critical customer-facing services undergo load and failover simulations to identify performance bottlenecks and points of failure. In a Serverless-first architecture, ensuring configuration like reserved concurrency and quota limits are sufficient for autoscaling requirements are valuable to validate.  Often these simulations will uncover problems you have not previously encountered. For example, in this year’s preparations our load simulations uncovered scale limitations in our redis cache which required fixes prior to Black Friday.

“It’s not only about testing the ability to handle peak load”

It’s important to note readiness is not only about testing the ability to handle peak load. Disaster recovery plans are validated through simulated scenarios. Runbooks are verified as up-to-date, to ensure efficient incident response in the event something goes wrong. Verifying instrumentation and infrastructure that supports operability are tested, ensuring our tooling works when we need it most.

Similarly ensuring the appropriate tooling and processes are in place to address security concerns is another key concern. Preventing DDoS attacks which could easily overwhelm the system if not identified and mitigated, preventing impact of service availability.

Predicting the future

Observability through actionable monitoring, logging, and metrics provides the essential visibility to detect and isolate emerging problems early. It also provides the historical context and growth of traffic data over time, which can help forecast capacity needs and establish performance baselines that align with real production usage. In addition to quantitative measures, proactively reaching out to clients means we are in step with client needs about expected traffic helping align testing to actual usage patterns.  This data is important to simulate real world traffic patterns based on what has gone before, and has enabled us to accurately predict Black Friday traffic trends. However it’s important our systems are architected to scale with demand, to handle unpredicted load if need be, key to this is observing and understanding how our systems behave in production.

Traffic Trends

What did it look like this year? Consumer shopping patterns remained quite consistent on an elevated scale. Black Friday continues to be the largest shopping day of the year, and consumers continue to shop online in increasing numbers. During Cyber Week alone, Bazaarvoice handled over 16 Billion API calls.

Solving common problems once

While individual engineering teams own service readiness, having a coordinated effort ensures all critical dependencies are covered. Sharing forecasts, requirements, and learnings across teams enables better preparation. Testing surprises on dependent teams should be avoided through clear communication.

Automating performance testing, failover drills, and monitoring checks as part of regular release cycles or scheduled pipelines reduces the overhead of peak traffic preparation. Following site reliability principles and instilling always-ready operational practices makes services far more resilient year-round. 

For example, we recently put in place a shared dev pattern for continuous performance testing.  This involves a quick setup of k6 performance script, an example github action pipeline and observability configured to monitor performance over time. We also use an in-house Tech Radar to converge on common tooling so a greater number of teams can learn and stand on the shoulders of teams who have already tried and tested tooling in their context.

Other examples include, adding automation to performance tests to replay production requests for a given load profile makes tests easier to maintain, and reflect more accurately production behavior. Additionally, make use of automated fault injection tooling, chaos engineering and automated runbooks.

Adding automation and ensuring these practices are part of your everyday way of working are key to reducing the overhead of preparing for the holidays.

Consistent, continuous training conditions us to always be ready

Moving to an always-ready posture ensures our infrastructure is scalable, reliable and robust all year round. Implementing continuous performance testing using frequent baseline tests provides frequent feedback on performance from release to release.  Automated operational readiness service checks ensure principles and expectations are in place for production services and are continuously checked.  For example, automated checking of expected monitors, alerts, runbooks and incident escalation policy requirements.

At Bazaarvoice our engineering teams align on shared System Standards which gives technical direction and guidance to engineers on commonly solved problems, continuously evolving our systems and increasing our innovation velocity.  To use a trail running analogy, System Standards define the preferred paths and combined with Tech Radar, provide recommendations to help you succeed.  For example, what trail running shoes should I choose, what energy refuelling strategy should I use, how should I monitor performance.  The same is true for building resilient reliable software, as teams solve these common problems, share the learnings for those teams which come after.

Looking Ahead

With a relentless focus on reliability, scalability, continuous testing, enhanced observability, and cross-team collaboration, engineering organizations can optimize performance and minimize downtime during critical traffic surges. 

Don’t forget after the peak has passed and we have descended from the summit, analyze the data.  What went well, what didn’t go well, and what opportunities are there to improve for the next peak.

Maintaining Test Data with the “someObject” Test Structure

Language: Scala
TestTool: Scalatest

How did we get here?

When systems become reasonably complex, tests must manage cumbersome amounts of data. A test case that may test a small bit of functionality may start to require large amounts of domain knowledge about the system being tested. This is often done through the mock data used to set up the test. Maintenance of this data becomes cumbersome, monotonous and can feel Sisyphean. To solve these problems we created “someObject”, a modular system that allows us to maintain data in only one location while providing the flexibility to create specific data for our tests.

Let’s Do A Code Time!

To start this post, we’re going to build a system without the “someObject” test structure to provide context for its use. (To skip to the “someObject” structure, jump to here!). Suppose we are building a service that reports on advertising campaigns. We may create a class that describes an advertising campaign and call it `Campaign`.

case class Campaign(id: String,
    name: String,
    client: String,
    startDate: UTCDate,
    endDate:UTCDate,
    deleted: Boolean)

Now we are going to store this campaign in a database, and we need to write some integration tests to make sure this operation is performed properly. A test that confirms a campaign is stored properly might look like this:

it should "properly store a single campaign" in {
    Given("we have a proper campaign")
    val campaignToStore = Campaign(
        id = "someId",
        name = "someName",
        client = "someClient",
        startDate = UTCDate(1955, 10, 6),
        endDate = UTCDate(1956, 10, 6),
        deleted = false)

    And("we store this campaign in the database")
    database.storeCampaign()

    When("we fetch the given campaign")
    val fetchedCampaign:Campaign = database.fetchCampaign()

    Then("All the campaign values were stored properly")
    fetchedCampaign.id shouldBe campaignToStore.id
    fetchedCampaign.name shouldBe campaignToStore.name
    fetchedCampaign.client shouldBe campaignToStore.client
    fetchedCampaign.startDate shouldBe campaignToStore.startDate
    fetchedCampaign.endDate shouldBe campaignToStore.endDate
    fetchedCampaign.deleted shouldBe campaignToStore.deleted
}

BlogIntro.scala

Add some functionality:

Now we add the ability to update some values for this campaign in the database, and we need to test this new piece of functionality. That test might look something like this:

it should "properly update a single campaign" in {
    Given("we have a proper campaign")
    val campaignToStore = Campaign(
        id = "someId",
        name = "someName",
        client = "someClient",
        startDate = UTCDate(1955, 10, 6),
        endDate = UTCDate(1956, 10, 6),
        deleted = false)

    And("some update parameters for our campaign")
    val updateParameters = UpdateParameters(name = Some("someNewName"))

    And("we store this campaign in the database")
    database.storeCampaign(campaignToStore)

    When("we update this campaign")
    database.updateCampaign("someId", updateParameters)

    val fetchedCampaign:Campaign = database.fetchCampaign("someId")

    Then("All the campaign values were stored properly")
    //Unchanged values
    fetchedCampaign.id shouldBe campaignToStore.id
    fetchedCampaign.client shouldBe campaignToStore.client
    fetchedCampaign.startDate shouldBe campaignToStore.startDate
    fetchedCampaign.endDate shouldBe campaignToStore.endDate
    fetchedCampaign.deleted shouldBe campaignToStore.deleted
    //Changed values
    fetchedCampaign.name shouldBe updateParameters.name
}

BlogUpdateFunction.scala

But here we see the duplication of test boilerplate code in `campaignToStore`. We don’t want to have to copy over `campaignToStore` into every test, so we might want to abstract that out to be used all over the suite.

object MyCampaignTestObjects {
    val campaignToStore = Campaign(
        id = "someId",
        name = "someName",
        client = "someClient",
        startDate = UTCDate(1955, 10, 6),
        endDate = UTCDate(1956, 10, 6),
        deleted = false)
}

BlogAbstractToSuite.scala

Now we can use the same data in every test!

Add a Test that requires unique data:

Suppose we now write a function to fetch all the campaigns that are stored in the database. We might need a test that involves uniqueness in the data we store, such as the following example:

it should "properly fetch all stored campaigns" in {
    Given("we store several unique campaigns")
    val anotherCampaignToStore = Campaign(
        id = "someSecondId",
        name = "someSecondName",
        client = "someSecondClient",
        startDate = UTCDate(1955, 10, 6),
        endDate = UTCDate(1956, 10, 6),
        deleted = false)
    database.storeCampaign(campaignToStore)
    database.storeCampaign(anotherCampaignToStore)

    When("we fetch all campaigns")
    val allCampaigns = database.fetchAllCampaigns()

    Then("all campaigns are returned")
    allCampaigns shouldBe List(campaignToStore, anotherCampaignToStore)
}

BlogTestWithUniqueTestData.scala

In the example, we re-used the campaign we abstracted out earlier for conciseness, but this makes this test unclear that `anotherCampaignToStore` is unique. What if someone else comes in and changes `campaignToStore` and it happens to match data from `anotherCampaignToStore`? This test would then become flakey and nobody likes flakey tests. We might decide to just make all data used in this test local to this test, but then we will need to maintain the test data in both this test, and `MyCampaignTestObjects`.

Add Some Arbitrary Data Constraints:

Suppose now that there is a new design constraint on how campaigns can be stored in the database. Now all client names must be lowercased in all campaigns:

object MyCampaignTestObjects {
    val campaignToStore = Campaign(
    id = "someId",
    name = "someName",
    //We change the client name to match our new requirement
    client = "some_client",
    startDate = UTCDate(1955, 10, 6),
    endDate = UTCDate(1956, 10, 6),
    deleted = false)
}

BlogNewConstraint.scala

Now we start to see the issue with maintaining test data across the whole suite that we’ve been constructing. We need to find every mock campaign that is used in our suite and ensure that its client field data is lowercased. Realistically, many of our tests, (specifically in this example, the `fetchAllCampaigns` test) don’t care about the client field of their campaign, and so we shouldn’t need to care about the client field value while setting up our mock test data. Because this example is small, it’s not cumbersome to directly update the value to satisfy the new constraint. Now let us Imagine a large set of suites, each containing hundreds of unique test cases. Suddenly this single suite requires a large amount of work to refactor one field across each test case. Nobody wants to do that monotonous maintenance. To address this, our team adopted the “someObject” structure to minimize this data maintenance within our tests.

someObject Test Structure:

When designing this test structure, we wanted to make our test data extendable for use anywhere it is needed. We used Scala’s `trait` to mix in necessary functions to provide test objects to the objects inside our tests, such as the `MyCampaignTestObjects` object above:

trait CampaignTestObjects {
    def someCampaign(id: String = "someId",
                     name: String = "someName",
                     client: String = "some_client",
                     startDate: UTCDate = UTCDate(1955, 10, 6),
                     endDate: UTCDate = UTCDate(1956, 10, 6),
                     deleted: Boolean = false): Campaign =
        Campaign(
            id = id,
            name = name,
            client = client,
            startDate = startDate,
            endDate = endDate,
            deleted = deleted)
}
 object MyCampaignTestObjects extends CampaignTestObjects {
    //Any other setup methods for test data
}

Now we can revisit our `fetchAllCampaigns` test example.

it should "properly fetch all stored campaigns" in {
    Given("we store several unique campaigns")
    val campaignToStore = someCampaign(id = "someId")
    val anotherCampaignToStore = someCampaign(id = "someNewId")
    database.storeCampaign(campaignToStore)
    database.storeCampaign(anotherCampaignToStore)

    When("we fetch all campaigns")
    val allCampaigns = database.fetchAllCampaigns()

    Then("all campaigns are returned")
    allCampaigns shouldBe List(campaignToStore, anotherCampaignToStore)
}

BlogBasicSomeObject.scala

Inside this test, we’ve set up two unique campaigns, by calling the `someCampaign` method from our test data trait. This populates the returned campaign with dummy data that we don’t care about. All we need out of this method is “some campaign” with “some data”. Now, instead of obscuring the intent of the test case by setting up cumbersome, overly-expressive mock data, we can simply override the implicitly available mock objects with only the necessary data. For the unique campaigns needed in the `fetchAllCampaigns` test, we only only really care about each campaign’s identifier. We don’t update the name, client, startDate, etc. because this test doesn’t care about any of that data. We only need to care that the campaigns are unique for our database. Under this test structure, when we receive the design change about the client names being lowercased, we don’t need to update our `fetchAllCampaigns`.

Another Example:

Let’s provide another example that our team encountered. Campaigns inside our database now need to also store the amount of money spent on each ad campaign. We’re adding a new column to our database, and changing the database schema.

case class Campaign(id: String,
    name: String,
    client: String,
    totalAdSpend: Int,
    startDate: UTCDate,
    endDate: UTCDate,
    deleted: Boolean)

Now, every test that has a campaign involved needs to be updated to include a new field; but under the “someObject” structure we only need to add two lines and all existing tests should be working fine again:

trait CampaignTestObjects {

    def someCampaign(id: String = "someId",
                     name: String = "someName",
                     client: String = "some_client",
                     totalAdSpend: Int = 123456,
                     startDate: UTCDate = UTCDate(1955, 10, 6),
                     endDate: UTCDate = UTCDate(1956, 10, 6),
                     deleted: Boolean = false): Campaign =
        Campaign(
            id = id,
            name = name,
            client = client,
            totalAdSpend = totalAdSpend,
            startDate = startDate,
            endDate = endDate,
            deleted = deleted)
}

BlogPostSchemaChange.scala

 

Behavior Driven Tests:

The purpose of the “someObject” structure is to minimize data maintenance within tests. We want to ensure that we’re disciplined about only setting data that the tests need to care about. There are cases where data might seem necessary for what we are testing, but we can abstract that data away to de-couple the test’s reliance on hard coded values. For example, suppose we have a function that returns the sum of all the `totalAdSpend` across our database.

def sumAllSpend(campaignsToSum: List[Campaign]):Int = 
    campaignsToSum.reduce(_.totalAdSpend + _.totalAdSpend)

To test this function, we might write a test like this:

it should "properly sum all totalAdSpend" in {
    Given("we store several unique campaigns")
    val campaignToStore = someCampaign(id = "someId", totalAdSpend = 123)
    val anotherCampaignToStore = someCampaign(id = "someNewId", totalAdSpend = 456)
    database.storeCampaign(campaignToStore)
    database.storeCampaign(anotherCampaignToStore)

    When("we sum all ad spend")
    val totalTotalAdSpend = sumAllSpend(database.fetchAllCampaigns())

    Then("the result is the sum of all ad spend")
    totalTotalAdSpend shouldBe 975
}

BlogPostNonBehaviorTest.scala

While this test does work, and it utilizes this “someObject” structure, it still forces data management at the test level.

`sumAllSpend` doesn’t really care about any one campaign’s `totalAdSpend` value. It only cares that we add all of the `totalAdSpend` values up correctly. We could instead write our test to assert on this behavior instead of doing the math ourselves and taking on the responsibility of managing more data.

it should "properly sum all totalAdSpend" in {
   Given("we store several unique campaigns")
   val campaignToStore = someCampaign(id = "someId")
   val anotherCampaignToStore = someCampaign(id = "someNewId")

   val allCampaignsToStore = List(campaignToStore,anotherCampaignToStore)
   allCampaignsToStore.foreach(campaign => database.storeCampaign(campaign))

   When("we sum all ad spend")
   val totalTotalAdSpend = sumAllSpend(database.fetchAllCampaigns())

   Then("the result is the sum of all ad spend")
   totalTotalAdSpend shouldBe allCampaignsToStore.reduce(_.totalAdSpend + _.totalAdSpend)
}

BlogPostBehaviorTest.scala

With this test, we don’t care what campaigns sales were, we don’t care how many campaigns are stored, and we don’t care about any constant value. This test will return the sum of all campaign’s `totalAdSpend` value that we store in the database.

Conclusion:

In this introductory blog post, we explored the someObject testing structure in scala, but this concept is not intended to be language specific. Scala makes this concept easy to implement through the use of Default Parameter Values but in a future post I’ll show how it can be implemented through the Builder Pattern in a language like Java. Another unexplored “someObject” concept is the granularity of control in setting default data. This post introduces the “global” and test specific setting of default data, but doesn’t explore how to set test suite level data for our test objects, and the cases in which that might be useful. I’ll discuss that the future post as well.

Code snippets:

BlogIntro.scala
BlogUpdateFunction.scala
BlogAbstractToSuite.scala
BlogTestWithUniqueTestData.scala
BlogNewConstraint.scala
BlogBasicSomeObject.scala
BlogPostSchemaChange.scala
BlogPostNonBehaviorTest.scala
BlogPostBehaviorTest.scala

Database Migration

MalcolmInTheMiddleGif

(Always One More Thing…)

Who Are We?

The Ad Management team here at Bazaarvoice grew out of an incubator team. The goal of our incubator is to quickly iterate on ideas, producing prototypes and “proof of concept” projects to be iterated on if they validate a customer need. The project of interest here generates reports based on aggregations of event data gathered from several other teams at the company. As our project gained traction, it grew in size and scope, eventually leading to the need to revisit some of the design decisions made in the prototyping phase. Specifically, we found the original database system we chose, EmoDB, to not meet our needs as our requirements evolved.

Why Migrate?

When this project was started, it began as a prototype designed to get the project rolling as quickly and easily as possible. The initial team chose EmoDB since they were familiar with the in-house technology from their other projects and it fit our initial needs. As the project gained traction, and we had more data to operate against, we encountered scalability issues, initially resolved with caching and some refactoring. We found that we were querying EmoDB as if it were a typical relational database, when it’s not actually designed for that use case. (Emodb is an eventually consistent json blob store with a change notification databus that spans multiple AWS AZs and Regions. EmoDB powers many of our solutions at Bazaarvoice and is now open-source and available at:
https://github.com/bazaarvoice/emodb

We chose to switch to MySql to leverage the Relational Data Model for rolling up aggregations of data we collect and calculate. We ran into problems previously when we retrieved whole documents to perform aggregations on our data, leading us to decide that a technology that is optimized for relational models would suit the project much better.

How to Migrate?

Since our project already had trained users by the time we wanted to migrate database systems, we needed to design our migration with a no-down-time approach; “seamlessly” changing out the back-end implementation for our users. We also made these transitions configurable, such that we wouldn’t need to make one large switch to master from the new system, but we could choose which services were ready to be cut over to the new data back-end.
The following image is our design document that describes how we planned our migration. On the left is our origin code base named “legacy”. On the right was the proposed design for our new service stack for the migration. Inserted into the middle is the “Service Facade” where we intended to run our quality assurance against live data between the legacy technology stack and our new technology stack.

Copy of ad-management migration to MySql based stack - Page 1

How to Maintain Data Consistency?

Depending on the size of the data that is being diffed and migrated between databases, it can be expensive to run the necessary migrations. Our solution was both writing specific tasks that backfilled data or directly migrating data sets to the new data source. This allowed us to smoke test that our services are working, without expending large amounts of time or money finding bugs along the way. As our confidence grew in our custom tooling and services, we would backfill and migrate larger chunks of data, until we had migrated everything necessary to master from our new service.

What is a Service Facade?

The service facade layer is responsible for executing the respective operations out of the legacy and new stacks. This is where we placed our diffing logic to compare the results returned from Emo and Mysql for the same operation. The facade returns data from the pre-defined configured stack. This meant that certain areas of the application could be sourcing from Mysql, while other areas, that we weren’t confident in, continued to source from Emo. For example, our CampaignRoiReportBuilderServiceFacade written in Scala looked something like this:


class CampaignRoiReportBuilderServiceFacade @Inject()(
private val campaignRoiReportBuilderServiceLegacy:CampaignRoiReportBuilderServiceLegacy,
private val campaignRoiReportService:CampaignRoiReportService,
private val campaignConfig: CampaignConfiguration,
private val facadeDiffTool: FacadeDiffTool) {
...
  def buildReport(...): Future[Option[CampaignRoiReport]] = {
    val roiReportFromEmoDbFuture: Future[Option[CampaignRoiReport]] = campaignRoiReportBuilderServiceLegacy.buildReport(...)
    val roiReportFromMySqlFuture: Future[Option[CampaignRoiReport]] = campaignRoiReportService.buildReport(...)

    // Extract data from scala futures
    for {
      roiReportFromEmoDbMaybe <- roiReportFromEmoDbFuture
      roiReportFromMySqlMaybe <- roiReportFromMySqlFuture } { // Pattern Matching to extract values from scala Option (roiReportFromEmoDbMaybe, roiReportFromMySqlMaybe) match { case (None, None) => //this is an impossible case, but listed to avoid compilation warning
        case (Some(_), None) => LOG.warn("/*Report missing/mismatched data*/")
        case (None, Some(_)) => LOG.warn("/*Report missing/mismatched data*/")
        case (Some(roiReportFromEmo),Some(roiReportFromMySql)) =>
          val mismatches = facadeDiffTool.campaignROIReportLegacyDiff(roiReportFromEmo,roiReportFromMySql)
          if(mismatches.nonEmpty)
            LOG.warn("/*Report missing/mismatched data*/")
      }
    }

//This is how we configure what source we return to the resource.
    campaignConfig.masteringFrom match {
      case EmoDb =>
        roiReportFromMySqlFuture.onFailure{
          case e:Throwable => LOG.warn("Failed to build an ROI report on the MySql side", e)
        }
        roiReportFromEmoDbFuture
      case MySql =>
        roiReportFromEmoDbFuture.onFailure{
          case e:Throwable => LOG.warn("Failed to build an ROI report on the EmoDb side", e)
        }
        roiReportFromMySqlFuture
    }
  }
}

The original resource classes will be modified to call from the new facade layer, but no other functionality should change. If constructed properly, the facade layer will act in the same way as the original service because the facade mimics the public functions available in the original service class. These duplicated functions will call to the methods from the legacy service class as well as the new service. With the responses from both the legacy and new services, the facade layer can make an assessment on the differences between the two service stacks. To report our differences such that we could be notified during API usage, we would log them out to our log management and monitoring system.

How Did We Capture Mismatches?

Logging was a large concern of ours. We knew that there would be many differences per call while we were debugging our new service stack. As an example, on one call, we reported 2000+ differences. We wanted to compose all differences into one log per call in a meaningful way. For this, we wrote custom diff tooling that would return differences in the data as sets of MismatchedField classes.


case class MismatchedField[T](name: String,
                              legacyValue: T,
                              newValue: T)


This templated class will hold the values returned from both the legacy service (legacyValue), and the new stack’s service (newValue), as well as some meaningful tag with which to identify where this mismatch came from (name). We would then compose all mismatches for any given call into a single log through our custom diff tool. Every function within our custom diff tool returns Set[MismatchedField[Any]]. We can then compose each set into a single set of differences such that we can use only one log call to write out the whole set of differences in one log entry.

An Interesting Finding:

One of the most interesting findings we had through this migration weren’t bugs that came in constructing our new service stack for the new database, but that we found bugs in our original database stack. One take-away from this was to make sure to investigate any mismatches found down to the source data. We found during the code migration process that some of our legacy functionality was written incorrectly. As an example, in our legacy code we were storing some aggregated data in sets, unintentionally masking duplicate data. When re-implementing these same aggregations for our new service stack, they were correctly implemented as a list, producing a mismatch in the data. Through our investigation, instead of simply matching the data to how our legacy service worked, we went back to our origin data, and ran the calculations manually through the Scala REPL. In doing so, we found that the new service was correct, where our legacy code was wrong. Fortunately, the bug within our legacy code was a simple fix. We implemented the fix within the legacy code and our mismatch disappeared.

Other Take-aways:

An important team take-away was to be very upfront and declarative about the work that the migration would require. Our investigation into the migration not only involved setting up a new technology stack for MySql, but also changing our build tool from Maven to SBT, introducing a Flyway + Jooq plugin to enforce type safety throughout the migration, designing a new data model (which was ultimately the driving factor for doing the migration in the first place), as well as upgrading our code up to the newest scala version to leverage all of the previous changes. Ultimately, we severely underestimated, and under-ticketed the work necessary to start our migration.

It is also important to keep in mind that every team is different and has different needs. When having conversations about database migrations, take the time to do a proper risk assessment for the work ahead. Keep these conversations going during the migration as well. As a team, we ended up prioritizing new feature requests and non-migration related bugs because the migration felt orthogonal to our production environment.

A further takeaway is that we could have saved ourselves a lot of time if we had more realistically assessed our users. In retrospect, the users of these reports were internal and would have been more lenient with smaller service outages, which would have allowed us to leverage our configurable services to migrate much sooner. At the expense of stability, we believe that we could have had a quicker migration by forcing ourselves to fix problems forward, instead of maintaining our legacy code for as long as we did. Still, most scenarios don’t have this luxury and we hope the façade based approach is of help to you.

Front End Application Testing with Image Recognition

One of the many challenges of software testing has always been cross-browser testing. Despite the web’s overall move to more standards compliant browser platforms, we still struggle with the fact that sometimes certain CSS values or certain JavaScript operations don’t translate well in some browsers (cough, cough IE 8).

In this post, I’m going to show how the Curations team has upgraded their existing automation tools to allow for us to automate spot checking the visual display of the Curations front end across multiple browsers in order to save us time while helping to build a better product for our clients.

The Problem: How to save time and test all the things

The Curations front end is a highly configurable product that allows our clients to implement the display of moderated UGC made available through the API from a Curations instance.

This flexibility combined with BV’s browser support guidelines means there are a very large number ways Curations content can be rendered on the web.

Initially, rather than attempt to test ‘all the things’, we’ve codified a set of possible configurations that represent general usage patterns of how Curations is implemented. Functionally, we can test that content can be retrieved and displayed however, when it comes whether that the end result has the right look-n-feel in Chrome, Firefox and other browsers, our testing of this is largely manual (and time consuming).

How can we better automate this process without sacrificing consistency or stability in testing?

Our solution: Sikuli API

Sikuli is an open-source Java-based application and API that allows users to automate web, mobile and OS applications across multiple platforms using image recognition. It’s platform based and not browser specific, so it enables us to circumvent limitations with screen capture and compare features in other automation tools like Webdriver.

Imagine writing a test script that starts with clicking the home button within an iOS simulator, simply by providing the script a .png of the home button itself. That’s what Sikuli can do.

You can read more about Sikuli here. You can check out their project here on github.

Installation:

Sikuli provides two different products for your automation needs – their stand-alone scripting engine and their API. For our purposes, we’re interested in the Sikuli API with the goal to implement it within our existing Saladhands test framework, which uses both Webdriver and Cucumber.

Assuming you have Java 1.6 or greater installed on your workstation, from Sikuli.org’s download page, follow the link to their standalone setup JAR

http://www.sikuli.org/download.html

Download the JAR file and place it in your local workstation’s home directory, then open it.

Here, you’ll be prompted by the installer to select an installation type. Select option 3 if wish to use Sikuli in your Java or Jython project as well as have access to its command line options. Select option 4 if you only plan on using Sikuli within the scope of your Java or Jython project.

Once the installation is complete, you should have a sikuli.jar file in your working directory. You will want to add this to your collection of external JARs for your installed JRE.

For example, if you’re using Eclipse, go to Preferences > Java > Installed JREs, select your JRE version, click Edit and add Sikuli.jar to the collection.

Alternately, if you are using Maven to build your project, you can add Sikuli’s API to your project by adding the following to your POM.XML file:

<dependency>
    <groupId>org.sikuli</groupId>
    <artifactId>sikuli-api</artifactId>
    <version>1.2.0</version>
</dependency>

Clean then build your project and now you’re ready to roll.

Implementation:

Ultimately, we wanted a method we could control using Cucumber that allows us to articulate a web application using Webdriver that could take a screen shot of a web application (in this case, an instance of Curations) and compare it to a static screen shot of specific web elements (e.g. Ratings and Review stars within the Curations display).

This test method would then make an assumption that either we could find a match to the static screen element within the live web application or have TestNG throw an exception (test failure) if no match could be found.

First, now that we have the ability to use Sikuli, we created a new helper class that instantiates an object from their API so we can compare screen output.

import org.sikuli.api.*;
import java.io.IOException;
import java.io.File;
/**
* Created by gary.spillman on 4/9/15.
*/
public class SikuliHelper {

public boolean screenMatch(String targetPath) {
new ImageTarget(new File(targetPath));

Once we import the Sikuli API, we create a simple class with a single class method. In this case, screenMatch is going to accept a path within the Java project relative to a static image we are going to compare against the live browser window. True or false will be returned depending on if we have a match or not.

//Sets the screen region Sikuli will try to match to full screen
ScreenRegion fullScreen = new DesktopScreenRegion();

//Set your taret to compare from
Target target = new ImageTarget(new File(targetPath));

The main object type Sikuli wants to handle everything with is ScreenRegion. In this case, we are instantiating a new screen region relative to the entire desktop screen area of whatever OS our project will run on. Without passing any arguments to DesktopScreenRegion(), we will be defining the region’s dimension as the entire viewable area of our screen.

double fuzzPercent = .9;

try {
    fuzzPercent = Double.parseDouble(PropertyLoader.loadProperty(&quot;fuzz.factor&quot;));
}
catch (IOException e) {
    e.printStackTrace();
}
new ImageTarget(new File(targetPath));

Sikuli allows you to define a fuzzing factor (if you’ve ever used ImageMagick, this should be a familiar concept). Essentially, rather than defining a 1:1 exact match, you can define a minimal acceptable percentage you wish your screen comparison to match. For Sikuli, you can define this within a range from 0.1 to 1 (ie 10% match up to 100% match).

Here we are defining a default minimum match (or fuzz factor) of 90%. Additionally, we load in from a set of properties in Saladhand’s test.properties file a value which, if present can override the default 90% match – should we wish to increase or decrease the severity of test criteria.

target.setMinScore(fuzzPercent);
new ImageTarget(new File(targetPath));

Now that we know what fuzzing percentage we want to test with, we use target’s setMinScore method to set that property.

ScreenRegion found = fullScreen.find(target);

//According to code examples, if the image isn't found, the screen region is undefined
//So... if it remains null at this point, we're assuming there's no match.

if(found == null) {
    return false;
}
else {
    return true;
}
new ImageTarget(new File(targetPath));

This is where the magic happens. We create a new screen region called found. We then define that using fullScreen’s find method, providing the path to the image file we will use as comparison (target).

What happens here is that Sikuli will take the provided image (target) and attempt to locate any instance within the current visible screen that matches target, within the lower bound of the fuzzing percentage we set and up to a full, 100% match.

The find method either returns a new screen region object, or returns nothing. Thus, if we are unable to find a match to the file relative to target, found will remain undefined (null). So in this case, we simply return false if found is null (no match) or true of found is assigned a new screen region (we had a match).

Putting it all together:

To completely incorporate this behavior into our test framework, we write a simple cucumber step definition that allows us to call our Sikuli helper method, and provide a local image file as an argument for which to compare it against the current, active screen.

Here’s what the cucumber step looks like:

public class ScreenShotSteps {

    SikuliHelper sk = new SikuliHelper();

    //Given the image &quot;X&quot; can be found on the screen
    @Given(&quot;^the image \&quot;([^\&quot;]*)\&quot; can be found on the screen$&quot;)
    public void the_image_can_be_found_on_the_screen(String arg1) {

        String screenShotDir=null;

        try {
            screenShotDir = PropertyLoader.loadProperty(&quot;screenshot.path&quot;).toString();
        }
        catch (IOException e) {
            e.printStackTrace();
        }

        Assert.assertTrue(sk.screenMatch(screenShotDir + arg1));
    }
    new ImageTarget(new File(targetPath));
}

We’re referring to the image file via regex. The step definition makes an assertion using TestNG that the value returned from our instance of SikuliHelper’s screen match method is true (Success!!!). If not, TestNG throws an exception and our test will be marked as having failed.

Finally, since we already have cucumber steps that let us invoke and direct Webdriver to a live site, we can write a test that looks like the following:

Feature: Screen Shot Test
As a QA tester
I want to do screen compares
So I can be a boss ass QA tester

Scenario: Find the nav element on BV's home page
Given I visit &quot;http://www.bazaarvoice.com&quot;
Then the image &quot;screentest1.png&quot; can be found on the screen
new ImageTarget(new File(targetPath));

In this case, the image we are attempting to find is a portion of the nav element on BV’s home page:

screentest1

Considerations:

This is not a full-stop solution to cross browser UI testing. Instead, we want to use Sikuli and tools like it to reduce overall manual testing as much as possible (as reasonably as possible) by giving the option to pre-warn product development teams of UI discrepancies. This can help us make better decisions on how to organize and allocate testing resources – manual and otherwise.

There are caveats to using Sikuli. The most explicit caveat is that tests designed with it cannot run heedlessly – the test tool requires a real, actual screen to capture and manipulate.

Obviously, the other possible drawback is the required maintenance of local image files you will need to check into your automation project as test artifacts. How deep you will be able to go with this type of testing may be tempered by how large of a file collection you will be able to reasonably maintain or deploy.

Despite that, Sikuli seems to have a large number of powerful features, not limited to being able to provide some level of mobile device testing. Check out the project repository and documentation to see how you might be able to incorporate similar automation code into your project today.