Tag Archives: Test Automation

How We Scale to 16+ Billion Calls

The holiday season brings a huge spike in traffic for many companies. While increased traffic is great for retail business, it also puts infrastructure reliability to the test. At times when every second of uptime is of elevated importance, how can engineering teams ensure zero downtime and performant applications? Here are some key strategies and considerations we employ at Bazaarvoice as we prepare our platform to handle over 16 Billion API calls during Cyber Week.

Key to approaching readiness for peak load events is defining the scope of testing. Identify which services need to be tested and be clear about success requirements. A common trade off will be choosing between reliability and cost. When making this choice, reliability is always the top priority. ‘Customer is Key’ is a key value at Bazaarvoice, and drives our decisions and behavior. Service Level Objectives (SLOs) drive clarity of reliability requirements through each of our services.

“Reliability is always the top priority“

When customer traffic is at its peak, reliability and uptime must take precedence over all other concerns. While cost efficiency is important, the customer experience is key during these critical traffic surges. Engineers should have the infrastructure resources they need to maintain stability and performance, even if it means higher costs in the short-term.

Thorough testing and validation well in advance is essential to surfacing any issues before the holidays. All critical customer-facing services undergo load and failover simulations to identify performance bottlenecks and points of failure. In a Serverless-first architecture, ensuring configuration like reserved concurrency and quota limits are sufficient for autoscaling requirements are valuable to validate. Often these simulations will uncover problems you have not previously encountered. For example, in this year’s preparations our load simulations uncovered scale limitations in our redis cache which required fixes prior to Black Friday.

“It’s not only about testing the ability to handle peak load”

It’s important to note readiness is not only about testing the ability to handle peak load. Disaster recovery plans are validated through simulated scenarios. Runbooks are verified as up-to-date, to ensure efficient incident response in the event something goes wrong. Verifying instrumentation and infrastructure that supports operability are tested, ensuring our tooling works when we need it most.

Similarly ensuring the appropriate tooling and processes are in place to address security concerns is another key concern. Preventing DDoS attacks which could easily overwhelm the system if not identified and mitigated, preventing impact of service availability.

Predicting the future

Observability through actionable monitoring, logging, and metrics provides the essential visibility to detect and isolate emerging problems early. It also provides the historical context and growth of traffic data over time, which can help forecast capacity needs and establish performance baselines that align with real production usage. In addition to quantitative measures, proactively reaching out to clients means we are in step with client needs about expected traffic helping align testing to actual usage patterns. This data is important to simulate real world traffic patterns based on what has gone before, and has enabled us to accurately predict Black Friday traffic trends. However it’s important our systems are architected to scale with demand, to handle unpredicted load if need be, key to this is observing and understanding how our systems behave in production.

Traffic Trends

What did it look like this year? Consumer shopping patterns remained quite consistent on an elevated scale. Black Friday continues to be the largest shopping day of the year, and consumers continue to shop online in increasing numbers. During Cyber Week alone, Bazaarvoice handled over 16 Billion API calls.

Solving common problems once

While individual engineering teams own service readiness, having a coordinated effort ensures all critical dependencies are covered. Sharing forecasts, requirements, and learnings across teams enables better preparation. Testing surprises on dependent teams should be avoided through clear communication.

Automating performance testing, failover drills, and monitoring checks as part of regular release cycles or scheduled pipelines reduces the overhead of peak traffic preparation. Following site reliability principles and instilling always-ready operational practices makes services far more resilient year-round.

For example, we recently put in place a shared dev pattern for continuous performance testing. This involves a quick setup of k6 performance script, an example github action pipeline and observability configured to monitor performance over time. We also use an in-house Tech Radar to converge on common tooling so a greater number of teams can learn and stand on the shoulders of teams who have already tried and tested tooling in their context.

Other examples include, adding automation to performance tests to replay production requests for a given load profile makes tests easier to maintain, and reflect more accurately production behavior. Additionally, make use of automated fault injection tooling, chaos engineering and automated runbooks.

Adding automation and ensuring these practices are part of your everyday way of working are key to reducing the overhead of preparing for the holidays.

Consistent, continuous training conditions us to always be ready

Moving to an always-ready posture ensures our infrastructure is scalable, reliable and robust all year round. Implementing continuous performance testing using frequent baseline tests provides frequent feedback on performance from release to release. Automated operational readiness service checks ensure principles and expectations are in place for production services and are continuously checked. For example, automated checking of expected monitors, alerts, runbooks and incident escalation policy requirements.

At Bazaarvoice our engineering teams align on shared System Standards which gives technical direction and guidance to engineers on commonly solved problems, continuously evolving our systems and increasing our innovation velocity. To use a trail running analogy, System Standards define the preferred paths and combined with Tech Radar, provide recommendations to help you succeed. For example, what trail running shoes should I choose, what energy refuelling strategy should I use, how should I monitor performance. The same is true for building resilient reliable software, as teams solve these common problems, share the learnings for those teams which come after.

Looking Ahead

With a relentless focus on reliability, scalability, continuous testing, enhanced observability, and cross-team collaboration, engineering organizations can optimize performance and minimize downtime during critical traffic surges.

Don’t forget after the peak has passed and we have descended from the summit, analyze the data. What went well, what didn’t go well, and what opportunities are there to improve for the next peak.

Looking Good While Testing: Automated Testing With a Visual Regression Service

A lot of (virtual) ink has been spilled on this blog about automated testing (no, really). This post is another in a series of dives into different automated testing tools and how you can use them to deliver a better, higher-quality web application.

Here, we’re going to focus on tools and services specific to ‘visual regression testing‘ – specifically cross-browser visual testing of an application front end.

What?

By visual regression testing, we mean regression testing applied specifically to an app’s appearance may have changed across browsers or over time as opposed to functional behavior.

Why?

One of the most common starting points in testing a web app is to simply fire up a given browser, navigate to the app in your testing environment, and note any discrepancy in appearance (“oh look, the login button is upside down. Who committed this!?”).

A spicy take on how to enforce code quality – we won’t be going here.

The strength of visual regression testing is that you’re testing the application against a very humanistic set of conditions (how does the application look to the end user versus how it should look). The drawback is that doing this is generally time consuming and tedious. But that’s what we have automation for!

How our burger renders in Chrome vs how it renders in IE 11…

How?

A wise person once said, ‘In software automation, there is no such thing as a method call for “if != ugly, return true”‘.

For the most part – this statement is true. There really isn’t a ‘silver bullet’ for the software automation problem of fully automating testing the appearance of your web application across a given browser support matrix. At least not with some caveats.

The methods and tools for doing so can run afoul of at least one of the following:

They’re Expensive (in terms of time, money or both)
They’re Fragile (tests emit false negatives, can be unreliable)
They’re Limited (covers only a subset of supported browsers)

Sure, you can support delicate tools. Just keep in mind the total cost.

Tools

We’re going to show how you can quickly set up a set of tools using WebdriverIO, some simple JavaScript test code and the visual-regression-service module to create snapshots of your app front end and perform a test against its look and feel.

Setup

Assuming you already have a web app ready for testing in your choice of environment (hopefully not in production) and that you are familiar with NodeJS, let’s get down to writing our testing solution:

This meme is so old, I can’t even…

1. From the command line, create a new project directory and do the following in order:

‘npm init’ (follow the initialization prompts – feel free to use the defaults and update them later)
‘npm install –save-dev webdriverio’
‘npm install –save-dev wdio-visual-regression-service’
‘npm install –save-dev chai’

2. Once you’ve finished installing your modules, you’ll need to configure your instance of WebdriverIO. You can do this manually by creating the file ‘wdio.conf.js’ and placing it in your project root (refer to the WebdriverIO developers guide on what to include in your configuration file) or you can use the wdio automated configuration script.

3. To quickly configure your tools, kick off the automated configuration script by running ‘npm run wdio’ from your project root directory. During the configuration process, be sure to select the following (or include this in your wdio.conf.js file if you’re setting things up manually):

Under frameworks, be sure to enable Mocha (we’ll use this to handle things like assertions)
Under services be sure to enable the following:
- visual-regression
- Browserstack (we’ll leverage Browserstack to handle all our browser requests from WebdriverIO)

Note that in this case, we won’t install the selenium standalone service or any local testing binaries like Chromedriver. The purpose of this exercise is to quickly package together some tools with a very small footprint that can handle some high-level regression testing of any given web app front end.

Once you have completed the configuration script, you should have a wdio.conf.js file in your project configured to use WebdriverIO and the visual-regression service.

Next, we need to create a test.

Writing a Test

First, make a directory within your project’s main source called tests/. Within that directory, create a file called homepage.js.

Set the contents of the file to the following:

describe('home page', () => {
  beforeEach(function () {
  browser.url('/');
});

it('should look as expected', () => {
browser.checkElement('#header');
});
});

That’s it. Within the single test function, we are calling a method from the visual-regression service, ‘checkElement()’. In our code, we are providing the ID, ‘header’ as an argument but you should replace this argument with the ID or CSS selector of a container element on the page you wish to check.

When executed, WebdriverIO will open the root URL path it is provided for our web application then will execute its check element comparison operation. This will generate a series of reference screen shots of the application. The regression service will then generate screen shots of the app in each browser it is configured to test with and provide a delta between these screens and the reference image(s).

A More Complex Test:

You may have a need to articulate part of your web application before you wish to execute your visual regression test. You also may need to execute the checkElement() function multiple times with multiple arguments to fully vet your app front end’s look and feel in this manner.

Fortunately, since we are simply inheriting the visual regression service’s operations through WebdriverIO, we can combine WebriverIO-based method calls within our tests to manipulate and verify our application:

describe('home page', () => {
  beforeEach(function () {
  browser.url('/');
  });

  it('should look as expected', () => {
    browser.waitForVisible('#header');
    browser.checkElement('#header');
  });

  it('should look normal after I click the button, () => {
    browser.waitForVisible('.big_button');
    browser.click('.big_button');
    browser.waitForVisible('#main_content');
    browser.checkElement('#main_content');
  });

  it('should have a footer that looks normal too, () => {
    browser.scroll('#footer');
    browser.checkElement('#footer');
  });
});

Broad vs. Narrow Focus:

One of several factors that can add to the fragility of a visual test like this is attempting to account for minor changes in your visual elements. This can be a lot to bite off and chew at once.

Attempting to check the content of a large and heavily populated container (e.g. the body tag) is likely going to contain so many possible variations across browsers that your test will always throw an exception. Conversely, attempting to narrow your tests’ focus to something very marginal (e.g. selecting a single instance of a single button) may never be touched by code implemented by front end developers and thus, you may be missing crucial changes to your app UI.

This is waaay too much to test at once.

The visual-regression service’s magic is in that it allows you to target testing to specific areas of a given web page or path within your app – based on web selectors that can be parsed by Webdriver.

And this is too little…

Ideally, you should be choosing web selectors with a scope of content that is not too large nor too small but in between. A test that focuses on comparing content of a specific div tag that contains 3-4 widgets will likely deliver much more value than one that focuses on the selector of a single button or a div that contains 30 widgets or assorted web elements.

Alternately, some of your app front end may be generated by templating or scaffolding that never received updates and is siloed away from code that receives frequent changes from your team. In this case, marshalling tests around these aspects may result in a lot of misspent time.

But this is just about right!

Choose your area of focus accordingly.

Back to the Config at Hand:

Before we run our tests, let’s make a few updates to our config file to make sure we are ready to roll with our initial homepage verification script.

First, we will need to add some helper functions to facilitate screenshot management. At the very top of the config file, add the following code block:

var path = require('path');
var VisualRegressionCompare = require('wdio-visual-regression-service/compare');

function getScreenshotName(basePath) {
  return function(context) {
    var type = context.type;
    var testName = context.test.title;
    var browserVersion = parseInt(context.browser.version, 10);
    var browserName = context.browser.name;
    var browserViewport = context.meta.viewport;
    var browserWidth = browserViewport.width;
    var browserHeight = browserViewport.height;
 
    return path.join(basePath, `${testName}_${browserName}_v${browserVersion}_${browserWidth}x${browserHeight}.png`);
  };
}

This function will be utilized to build the paths for our various screen shots we will be taking during the test.

As stated previously, we are leveraging Browserstack with this example to minimize the amount of code we need to ship (given we would like to pull this project in as a resource in a Jenkins task) while allowing us greater flexibility in which browsers we can test with. To do this, we need to make sure a few changes in our config file are in place.

Note that if you are using a different browser provisioning services (SauceLabs, Webdriver’s grid implementation), see WebdriverIO’s online documentation for how to set you wdio configuration for your respective service here).

Open your wdio.conf.js file and make sure this block of code is present:

user: process.env.BSTACK_USERNAME,
key: process.env.BSTACK_KEY,
host: 'hub.browserstack.com',
port: 80,

This allows us to pass our browser stack authentication information into our wdio script via the command line.

Next, let’s set up which browsers we wish to test with. This is also done within our wdio config file under the ‘capabilities’ object. Here’s an example:

capabilities: [
{
  browserName: 'chrome',
  os: 'Windows',
  project: 'My Project - Chrome',
  'browserstack.local': false,
},
{
  browserName: 'firefox',
  os: 'Windows',
  project: 'My Project - Firefox',
  'browserstack.local': false,
},
{
  browserName: 'internet explorer',
  browser_version: 11,
  project: 'My Project - IE 11',
  'browserstack.local': false,
},
],

Where to Put the Screens:

While we are here, be sure you have set up your config file to specifically point to where you wish to have your screen shots copied to. The visual-regression service will want to know the paths to 4 types of screenshots it will generate and manage:

Yup… Too many screens

References: This directory will contain the reference images the visual-regression service will generate on its initial run. This will be what our subsequent screen shots will be compared against.

Screens: This directory will contain the screen shots generated per browser type/view by tests.

Errors: If a given test fails, an image will be captured of the app at the point of failure and stored here.

Diffs: If there is a given comparison performed by the visual-regression service between an element from a browser execution and the reference images which results in a discrepancy, a ‘heat-map’ image of the difference will be capture and stored here. Consider the content of this directory to be your test exceptions.

Things Get Fuzzy Here:

Fuzzy… Not Fozzy

Finally, before kicking off our tests, we need to enable our visual-regression service instance within our wdio.conf.js file. This is done by adding a block of code to our config file that instructs the service on how to behave. Here is an example of the code block taken from the WebdriverIO developer guide:

visualRegression: {
  compare: new VisualRegressionCompare.LocalCompare({
    referenceName: getScreenshotName(path.join(process.cwd(), 'screenshots/reference')),
    screenshotName: getScreenshotName(path.join(process.cwd(), 'screenshots/screen')),
    diffName: getScreenshotName(path.join(process.cwd(), 'screenshots/diff')),
    misMatchTolerance: 0.20,
  }),
  viewportChangePause: 300,
  viewports: [{ width: 320, height: 480 }, { width: 480, height: 320 }, { width: 1024, height: 768 }],
  orientations: ['portrait'],
},

Place this code block within the ‘services’ object in your file and edit it as needed. Pay attention to the following attributes and adjust them based on your testing needs:

‘viewports’: This is a JSON object that provides width/height pairs to test the application at. This is very handy if you have an app that has specific responsive design constraints. For each pair, the test will be executed per browser – resizing the browser for each set of dimensions.

‘orientations’: This allows you to configure the tests to execute using portrait and/or landscape view if you happen to be testing in a mobile browser (default orientation is portrait).

‘viewportChangePause’: This value pauses the test in milliseconds at each point the service is instructed to change viewport sizes. You may need to throttle this depending on app performance across browsers.

‘mismatchTolerance’: Arguably the most important setting there. This floating-point value defines the ‘fuzzy factor’ which the service will use to determine at what point a visual difference between references and screen shots should fail. The default value of 0.10 indicates that a diff will be generated if a given screen shot differs, per pixel from the reference by 10% or more. The greater the value the greater the tolerance.

Once you’ve finished modifying your config file, lets execute a test.

Running Your Tests:

Provided your config file is set to point to the root of where your test files are located within the project, edit your package.json file and modify the ‘test’ descriptor in the scripts portion of the file.

Set it to the following:

‘./node_modules/.bin/wdio /wdio.desktop.conf.js’

To run your test, from the command line, do the following:

‘BSTACK_USERNAME= BSTACK_KEY= npm run test — –baseUrl=’

Now, just sit back and wait for the test results to roll in. If this is the first time you are executing these tests, the visual-regression service can fail while trying to capture initial references for various browsers via Browserstack. You may need to increase your test’s global timeout initially on the first run or simply re-run your tests in this case.

Reviewing Results:

If you’re used to your standard JUnit or Jest-style test execution output, you won’t necessarily similar test output here.

If there is a functional error present during a test (an object you are attempting to inspect isn’t available on screen) a standard Webdriver-based exception will be generated. However, outside of that, your tests will pass – even if a discrepancy is visually detected.

However, examine your screen shot folder structure we mentioned earlier. Note the number of files that have been generated. Open a few of them to view what has been capture through IE 11 vs. Chrome while testing through Browserstack.

Note that the files have names appended to them descriptive of the browser and viewport dimensions they correspond to.

Example of a screen shot from a specific browser

Make note if the ‘Diff’ directory has been generated. If so, examine its contents. These are your test results – specifically, your test failures.

Example of a diff’ed image

There are plenty of other options to explore with this basic set of tooling we’ve set up here. However, we’re going to pause here and bask in the awesomeness of being able to perform this level of browser testing with just 5-10 lines of code.

Is there More?

This post really just scratches the surface of what you can do with a set of visual regression test tools. There are many more options to use these tools such as enabling mobile testing, improving error handling and mating this with your build tools and services.

We hope to cover these topics in a bit more depth in a later post. For now, if you’re looking for additional reading, feel free to check out a few other related posts on visual regression testing here, here and here.

Maintaining Test Data with the “someObject” Test Structure

Language: Scala
TestTool: Scalatest

How did we get here?

When systems become reasonably complex, tests must manage cumbersome amounts of data. A test case that may test a small bit of functionality may start to require large amounts of domain knowledge about the system being tested. This is often done through the mock data used to set up the test. Maintenance of this data becomes cumbersome, monotonous and can feel Sisyphean. To solve these problems we created “someObject”, a modular system that allows us to maintain data in only one location while providing the flexibility to create specific data for our tests.

Let’s Do A Code Time!™

To start this post, we’re going to build a system without the “someObject” test structure to provide context for its use. (To skip to the “someObject” structure, jump to here!). Suppose we are building a service that reports on advertising campaigns. We may create a class that describes an advertising campaign and call it `Campaign`.

case class Campaign(id: String,
    name: String,
    client: String,
    startDate: UTCDate,
    endDate:UTCDate,
    deleted: Boolean)

Now we are going to store this campaign in a database, and we need to write some integration tests to make sure this operation is performed properly. A test that confirms a campaign is stored properly might look like this:

it should "properly store a single campaign" in {
    Given("we have a proper campaign")
    val campaignToStore = Campaign(
        id = "someId",
        name = "someName",
        client = "someClient",
        startDate = UTCDate(1955, 10, 6),
        endDate = UTCDate(1956, 10, 6),
        deleted = false)

    And("we store this campaign in the database")
    database.storeCampaign()

    When("we fetch the given campaign")
    val fetchedCampaign:Campaign = database.fetchCampaign()

    Then("All the campaign values were stored properly")
    fetchedCampaign.id shouldBe campaignToStore.id
    fetchedCampaign.name shouldBe campaignToStore.name
    fetchedCampaign.client shouldBe campaignToStore.client
    fetchedCampaign.startDate shouldBe campaignToStore.startDate
    fetchedCampaign.endDate shouldBe campaignToStore.endDate
    fetchedCampaign.deleted shouldBe campaignToStore.deleted
}

BlogIntro.scala

Add some functionality:

Now we add the ability to update some values for this campaign in the database, and we need to test this new piece of functionality. That test might look something like this:

it should "properly update a single campaign" in {
    Given("we have a proper campaign")
    val campaignToStore = Campaign(
        id = "someId",
        name = "someName",
        client = "someClient",
        startDate = UTCDate(1955, 10, 6),
        endDate = UTCDate(1956, 10, 6),
        deleted = false)

    And("some update parameters for our campaign")
    val updateParameters = UpdateParameters(name = Some("someNewName"))

    And("we store this campaign in the database")
    database.storeCampaign(campaignToStore)

    When("we update this campaign")
    database.updateCampaign("someId", updateParameters)

    val fetchedCampaign:Campaign = database.fetchCampaign("someId")

    Then("All the campaign values were stored properly")
    //Unchanged values
    fetchedCampaign.id shouldBe campaignToStore.id
    fetchedCampaign.client shouldBe campaignToStore.client
    fetchedCampaign.startDate shouldBe campaignToStore.startDate
    fetchedCampaign.endDate shouldBe campaignToStore.endDate
    fetchedCampaign.deleted shouldBe campaignToStore.deleted
    //Changed values
    fetchedCampaign.name shouldBe updateParameters.name
}

BlogUpdateFunction.scala

But here we see the duplication of test boilerplate code in `campaignToStore`. We don’t want to have to copy over `campaignToStore` into every test, so we might want to abstract that out to be used all over the suite.

object MyCampaignTestObjects {
    val campaignToStore = Campaign(
        id = "someId",
        name = "someName",
        client = "someClient",
        startDate = UTCDate(1955, 10, 6),
        endDate = UTCDate(1956, 10, 6),
        deleted = false)
}

BlogAbstractToSuite.scala

Now we can use the same data in every test!

Add a Test that requires unique data:

Suppose we now write a function to fetch all the campaigns that are stored in the database. We might need a test that involves uniqueness in the data we store, such as the following example:

it should "properly fetch all stored campaigns" in {
    Given("we store several unique campaigns")
    val anotherCampaignToStore = Campaign(
        id = "someSecondId",
        name = "someSecondName",
        client = "someSecondClient",
        startDate = UTCDate(1955, 10, 6),
        endDate = UTCDate(1956, 10, 6),
        deleted = false)
    database.storeCampaign(campaignToStore)
    database.storeCampaign(anotherCampaignToStore)

    When("we fetch all campaigns")
    val allCampaigns = database.fetchAllCampaigns()

    Then("all campaigns are returned")
    allCampaigns shouldBe List(campaignToStore, anotherCampaignToStore)
}

BlogTestWithUniqueTestData.scala

In the example, we re-used the campaign we abstracted out earlier for conciseness, but this makes this test unclear that `anotherCampaignToStore` is unique. What if someone else comes in and changes `campaignToStore` and it happens to match data from `anotherCampaignToStore`? This test would then become flakey and nobody likes flakey tests. We might decide to just make all data used in this test local to this test, but then we will need to maintain the test data in both this test, and `MyCampaignTestObjects`.

Add Some Arbitrary Data Constraints:

Suppose now that there is a new design constraint on how campaigns can be stored in the database. Now all client names must be lowercased in all campaigns:

object MyCampaignTestObjects {
    val campaignToStore = Campaign(
    id = "someId",
    name = "someName",
    //We change the client name to match our new requirement
    client = "some_client",
    startDate = UTCDate(1955, 10, 6),
    endDate = UTCDate(1956, 10, 6),
    deleted = false)
}

BlogNewConstraint.scala

Now we start to see the issue with maintaining test data across the whole suite that we’ve been constructing. We need to find every mock campaign that is used in our suite and ensure that its client field data is lowercased. Realistically, many of our tests, (specifically in this example, the `fetchAllCampaigns` test) don’t care about the client field of their campaign, and so we shouldn’t need to care about the client field value while setting up our mock test data. Because this example is small, it’s not cumbersome to directly update the value to satisfy the new constraint. Now let us Imagine a large set of suites, each containing hundreds of unique test cases. Suddenly this single suite requires a large amount of work to refactor one field across each test case. Nobody wants to do that monotonous maintenance. To address this, our team adopted the “someObject” structure to minimize this data maintenance within our tests.

someObject Test Structure:

When designing this test structure, we wanted to make our test data extendable for use anywhere it is needed. We used Scala’s `trait` to mix in necessary functions to provide test objects to the objects inside our tests, such as the `MyCampaignTestObjects` object above:

trait CampaignTestObjects {
    def someCampaign(id: String = "someId",
                     name: String = "someName",
                     client: String = "some_client",
                     startDate: UTCDate = UTCDate(1955, 10, 6),
                     endDate: UTCDate = UTCDate(1956, 10, 6),
                     deleted: Boolean = false): Campaign =
        Campaign(
            id = id,
            name = name,
            client = client,
            startDate = startDate,
            endDate = endDate,
            deleted = deleted)
}

 object MyCampaignTestObjects extends CampaignTestObjects {
    //Any other setup methods for test data
}

Now we can revisit our `fetchAllCampaigns` test example.

it should "properly fetch all stored campaigns" in {
    Given("we store several unique campaigns")
    val campaignToStore = someCampaign(id = "someId")
    val anotherCampaignToStore = someCampaign(id = "someNewId")
    database.storeCampaign(campaignToStore)
    database.storeCampaign(anotherCampaignToStore)

    When("we fetch all campaigns")
    val allCampaigns = database.fetchAllCampaigns()

    Then("all campaigns are returned")
    allCampaigns shouldBe List(campaignToStore, anotherCampaignToStore)
}

BlogBasicSomeObject.scala

Inside this test, we’ve set up two unique campaigns, by calling the `someCampaign` method from our test data trait. This populates the returned campaign with dummy data that we don’t care about. All we need out of this method is “some campaign” with “some data”. Now, instead of obscuring the intent of the test case by setting up cumbersome, overly-expressive mock data, we can simply override the implicitly available mock objects with only the necessary data. For the unique campaigns needed in the `fetchAllCampaigns` test, we only only really care about each campaign’s identifier. We don’t update the name, client, startDate, etc. because this test doesn’t care about any of that data. We only need to care that the campaigns are unique for our database. Under this test structure, when we receive the design change about the client names being lowercased, we don’t need to update our `fetchAllCampaigns`.

Another Example:

Let’s provide another example that our team encountered. Campaigns inside our database now need to also store the amount of money spent on each ad campaign. We’re adding a new column to our database, and changing the database schema.

case class Campaign(id: String,
    name: String,
    client: String,
    totalAdSpend: Int,
    startDate: UTCDate,
    endDate: UTCDate,
    deleted: Boolean)

Now, every test that has a campaign involved needs to be updated to include a new field; but under the “someObject” structure we only need to add two lines and all existing tests should be working fine again:

trait CampaignTestObjects {

    def someCampaign(id: String = "someId",
                     name: String = "someName",
                     client: String = "some_client",
                     totalAdSpend: Int = 123456,
                     startDate: UTCDate = UTCDate(1955, 10, 6),
                     endDate: UTCDate = UTCDate(1956, 10, 6),
                     deleted: Boolean = false): Campaign =
        Campaign(
            id = id,
            name = name,
            client = client,
            totalAdSpend = totalAdSpend,
            startDate = startDate,
            endDate = endDate,
            deleted = deleted)
}

BlogPostSchemaChange.scala

Behavior Driven Tests:

The purpose of the “someObject” structure is to minimize data maintenance within tests. We want to ensure that we’re disciplined about only setting data that the tests need to care about. There are cases where data might seem necessary for what we are testing, but we can abstract that data away to de-couple the test’s reliance on hard coded values. For example, suppose we have a function that returns the sum of all the `totalAdSpend` across our database.

def sumAllSpend(campaignsToSum: List[Campaign]):Int = 
    campaignsToSum.reduce(_.totalAdSpend + _.totalAdSpend)

To test this function, we might write a test like this:

it should "properly sum all totalAdSpend" in {
    Given("we store several unique campaigns")
    val campaignToStore = someCampaign(id = "someId", totalAdSpend = 123)
    val anotherCampaignToStore = someCampaign(id = "someNewId", totalAdSpend = 456)
    database.storeCampaign(campaignToStore)
    database.storeCampaign(anotherCampaignToStore)

    When("we sum all ad spend")
    val totalTotalAdSpend = sumAllSpend(database.fetchAllCampaigns())

    Then("the result is the sum of all ad spend")
    totalTotalAdSpend shouldBe 975
}

BlogPostNonBehaviorTest.scala

While this test does work, and it utilizes this “someObject” structure, it still forces data management at the test level.

`sumAllSpend` doesn’t really care about any one campaign’s `totalAdSpend` value. It only cares that we add all of the `totalAdSpend` values up correctly. We could instead write our test to assert on this behavior instead of doing the math ourselves and taking on the responsibility of managing more data.

it should "properly sum all totalAdSpend" in {
   Given("we store several unique campaigns")
   val campaignToStore = someCampaign(id = "someId")
   val anotherCampaignToStore = someCampaign(id = "someNewId")

   val allCampaignsToStore = List(campaignToStore,anotherCampaignToStore)
   allCampaignsToStore.foreach(campaign =&amp;gt; database.storeCampaign(campaign))

   When("we sum all ad spend")
   val totalTotalAdSpend = sumAllSpend(database.fetchAllCampaigns())

   Then("the result is the sum of all ad spend")
   totalTotalAdSpend shouldBe allCampaignsToStore.reduce(_.totalAdSpend + _.totalAdSpend)
}

BlogPostBehaviorTest.scala

With this test, we don’t care what campaigns sales were, we don’t care how many campaigns are stored, and we don’t care about any constant value. This test will return the sum of all campaign’s `totalAdSpend` value that we store in the database.

Conclusion:

In this introductory blog post, we explored the someObject testing structure in scala, but this concept is not intended to be language specific. Scala makes this concept easy to implement through the use of Default Parameter Values but in a future post I’ll show how it can be implemented through the Builder Pattern in a language like Java. Another unexplored “someObject” concept is the granularity of control in setting default data. This post introduces the “global” and test specific setting of default data, but doesn’t explore how to set test suite level data for our test objects, and the cases in which that might be useful. I’ll discuss that the future post as well.

Code snippets:

BlogIntro.scala
BlogUpdateFunction.scala
BlogAbstractToSuite.scala
BlogTestWithUniqueTestData.scala
BlogNewConstraint.scala
BlogBasicSomeObject.scala
BlogPostSchemaChange.scala
BlogPostNonBehaviorTest.scala
BlogPostBehaviorTest.scala

Database Migration

(Always One More Thing…)

Who Are We?

The Ad Management team here at Bazaarvoice grew out of an incubator team. The goal of our incubator is to quickly iterate on ideas, producing prototypes and “proof of concept” projects to be iterated on if they validate a customer need. The project of interest here generates reports based on aggregations of event data gathered from several other teams at the company. As our project gained traction, it grew in size and scope, eventually leading to the need to revisit some of the design decisions made in the prototyping phase. Specifically, we found the original database system we chose, EmoDB, to not meet our needs as our requirements evolved.

Why Migrate?

When this project was started, it began as a prototype designed to get the project rolling as quickly and easily as possible. The initial team chose EmoDB since they were familiar with the in-house technology from their other projects and it fit our initial needs. As the project gained traction, and we had more data to operate against, we encountered scalability issues, initially resolved with caching and some refactoring. We found that we were querying EmoDB as if it were a typical relational database, when it’s not actually designed for that use case. (Emodb is an eventually consistent json blob store with a change notification databus that spans multiple AWS AZs and Regions. EmoDB powers many of our solutions at Bazaarvoice and is now open-source and available at:
https://github.com/bazaarvoice/emodb

We chose to switch to MySql to leverage the Relational Data Model for rolling up aggregations of data we collect and calculate. We ran into problems previously when we retrieved whole documents to perform aggregations on our data, leading us to decide that a technology that is optimized for relational models would suit the project much better.

How to Migrate?

Since our project already had trained users by the time we wanted to migrate database systems, we needed to design our migration with a no-down-time approach; “seamlessly” changing out the back-end implementation for our users. We also made these transitions configurable, such that we wouldn’t need to make one large switch to master from the new system, but we could choose which services were ready to be cut over to the new data back-end.
The following image is our design document that describes how we planned our migration. On the left is our origin code base named “legacy”. On the right was the proposed design for our new service stack for the migration. Inserted into the middle is the “Service Facade” where we intended to run our quality assurance against live data between the legacy technology stack and our new technology stack.

How to Maintain Data Consistency?

Depending on the size of the data that is being diffed and migrated between databases, it can be expensive to run the necessary migrations. Our solution was both writing specific tasks that backfilled data or directly migrating data sets to the new data source. This allowed us to smoke test that our services are working, without expending large amounts of time or money finding bugs along the way. As our confidence grew in our custom tooling and services, we would backfill and migrate larger chunks of data, until we had migrated everything necessary to master from our new service.

What is a Service Facade?

The service facade layer is responsible for executing the respective operations out of the legacy and new stacks. This is where we placed our diffing logic to compare the results returned from Emo and Mysql for the same operation. The facade returns data from the pre-defined configured stack. This meant that certain areas of the application could be sourcing from Mysql, while other areas, that we weren’t confident in, continued to source from Emo. For example, our CampaignRoiReportBuilderServiceFacade written in Scala looked something like this:


class CampaignRoiReportBuilderServiceFacade @Inject()(
private val campaignRoiReportBuilderServiceLegacy:CampaignRoiReportBuilderServiceLegacy,
private val campaignRoiReportService:CampaignRoiReportService,
private val campaignConfig: CampaignConfiguration,
private val facadeDiffTool: FacadeDiffTool) {
...
  def buildReport(...): Future[Option[CampaignRoiReport]] = {
    val roiReportFromEmoDbFuture: Future[Option[CampaignRoiReport]] = campaignRoiReportBuilderServiceLegacy.buildReport(...)
    val roiReportFromMySqlFuture: Future[Option[CampaignRoiReport]] = campaignRoiReportService.buildReport(...)

    // Extract data from scala futures
    for {
      roiReportFromEmoDbMaybe &lt;- roiReportFromEmoDbFuture
      roiReportFromMySqlMaybe &lt;- roiReportFromMySqlFuture } { // Pattern Matching to extract values from scala Option (roiReportFromEmoDbMaybe, roiReportFromMySqlMaybe) match { case (None, None) =&gt; //this is an impossible case, but listed to avoid compilation warning
        case (Some(_), None) =&gt; LOG.warn("/*Report missing/mismatched data*/")
        case (None, Some(_)) =&gt; LOG.warn("/*Report missing/mismatched data*/")
        case (Some(roiReportFromEmo),Some(roiReportFromMySql)) =&gt;
          val mismatches = facadeDiffTool.campaignROIReportLegacyDiff(roiReportFromEmo,roiReportFromMySql)
          if(mismatches.nonEmpty)
            LOG.warn("/*Report missing/mismatched data*/")
      }
    }

//This is how we configure what source we return to the resource.
    campaignConfig.masteringFrom match {
      case EmoDb =&gt;
        roiReportFromMySqlFuture.onFailure{
          case e:Throwable =&gt; LOG.warn("Failed to build an ROI report on the MySql side", e)
        }
        roiReportFromEmoDbFuture
      case MySql =&gt;
        roiReportFromEmoDbFuture.onFailure{
          case e:Throwable =&gt; LOG.warn("Failed to build an ROI report on the EmoDb side", e)
        }
        roiReportFromMySqlFuture
    }
  }
}

The original resource classes will be modified to call from the new facade layer, but no other functionality should change. If constructed properly, the facade layer will act in the same way as the original service because the facade mimics the public functions available in the original service class. These duplicated functions will call to the methods from the legacy service class as well as the new service. With the responses from both the legacy and new services, the facade layer can make an assessment on the differences between the two service stacks. To report our differences such that we could be notified during API usage, we would log them out to our log management and monitoring system.

How Did We Capture Mismatches?

Logging was a large concern of ours. We knew that there would be many differences per call while we were debugging our new service stack. As an example, on one call, we reported 2000+ differences. We wanted to compose all differences into one log per call in a meaningful way. For this, we wrote custom diff tooling that would return differences in the data as sets of MismatchedField classes.


case class MismatchedField[T](name: String,
                              legacyValue: T,
                              newValue: T)

This templated class will hold the values returned from both the legacy service (legacyValue), and the new stack’s service (newValue), as well as some meaningful tag with which to identify where this mismatch came from (name). We would then compose all mismatches for any given call into a single log through our custom diff tool. Every function within our custom diff tool returns Set[MismatchedField[Any]]. We can then compose each set into a single set of differences such that we can use only one log call to write out the whole set of differences in one log entry.

An Interesting Finding:

One of the most interesting findings we had through this migration weren’t bugs that came in constructing our new service stack for the new database, but that we found bugs in our original database stack. One take-away from this was to make sure to investigate any mismatches found down to the source data. We found during the code migration process that some of our legacy functionality was written incorrectly. As an example, in our legacy code we were storing some aggregated data in sets, unintentionally masking duplicate data. When re-implementing these same aggregations for our new service stack, they were correctly implemented as a list, producing a mismatch in the data. Through our investigation, instead of simply matching the data to how our legacy service worked, we went back to our origin data, and ran the calculations manually through the Scala REPL. In doing so, we found that the new service was correct, where our legacy code was wrong. Fortunately, the bug within our legacy code was a simple fix. We implemented the fix within the legacy code and our mismatch disappeared.

Other Take-aways:

An important team take-away was to be very upfront and declarative about the work that the migration would require. Our investigation into the migration not only involved setting up a new technology stack for MySql, but also changing our build tool from Maven to SBT, introducing a Flyway + Jooq plugin to enforce type safety throughout the migration, designing a new data model (which was ultimately the driving factor for doing the migration in the first place), as well as upgrading our code up to the newest scala version to leverage all of the previous changes. Ultimately, we severely underestimated, and under-ticketed the work necessary to start our migration.

It is also important to keep in mind that every team is different and has different needs. When having conversations about database migrations, take the time to do a proper risk assessment for the work ahead. Keep these conversations going during the migration as well. As a team, we ended up prioritizing new feature requests and non-migration related bugs because the migration felt orthogonal to our production environment.

A further takeaway is that we could have saved ourselves a lot of time if we had more realistically assessed our users. In retrospect, the users of these reports were internal and would have been more lenient with smaller service outages, which would have allowed us to leverage our configurable services to migrate much sooner. At the expense of stability, we believe that we could have had a quicker migration by forcing ourselves to fix problems forward, instead of maintaining our legacy code for as long as we did. Still, most scenarios don’t have this luxury and we hope the façade based approach is of help to you.

Front End Application Testing with Image Recognition

One of the many challenges of software testing has always been cross-browser testing. Despite the web’s overall move to more standards compliant browser platforms, we still struggle with the fact that sometimes certain CSS values or certain JavaScript operations don’t translate well in some browsers (cough, cough IE 8).

In this post, I’m going to show how the Curations team has upgraded their existing automation tools to allow for us to automate spot checking the visual display of the Curations front end across multiple browsers in order to save us time while helping to build a better product for our clients.

The Problem: How to save time and test all the things

The Curations front end is a highly configurable product that allows our clients to implement the display of moderated UGC made available through the API from a Curations instance.

This flexibility combined with BV’s browser support guidelines means there are a very large number ways Curations content can be rendered on the web.

Initially, rather than attempt to test ‘all the things’, we’ve codified a set of possible configurations that represent general usage patterns of how Curations is implemented. Functionally, we can test that content can be retrieved and displayed however, when it comes whether that the end result has the right look-n-feel in Chrome, Firefox and other browsers, our testing of this is largely manual (and time consuming).

How can we better automate this process without sacrificing consistency or stability in testing?

Our solution: Sikuli API

Sikuli is an open-source Java-based application and API that allows users to automate web, mobile and OS applications across multiple platforms using image recognition. It’s platform based and not browser specific, so it enables us to circumvent limitations with screen capture and compare features in other automation tools like Webdriver.

Imagine writing a test script that starts with clicking the home button within an iOS simulator, simply by providing the script a .png of the home button itself. That’s what Sikuli can do.

You can read more about Sikuli here. You can check out their project here on github.

Installation:

Sikuli provides two different products for your automation needs – their stand-alone scripting engine and their API. For our purposes, we’re interested in the Sikuli API with the goal to implement it within our existing Saladhands test framework, which uses both Webdriver and Cucumber.

Assuming you have Java 1.6 or greater installed on your workstation, from Sikuli.org’s download page, follow the link to their standalone setup JAR

http://www.sikuli.org/download.html

Download the JAR file and place it in your local workstation’s home directory, then open it.

Here, you’ll be prompted by the installer to select an installation type. Select option 3 if wish to use Sikuli in your Java or Jython project as well as have access to its command line options. Select option 4 if you only plan on using Sikuli within the scope of your Java or Jython project.

Once the installation is complete, you should have a sikuli.jar file in your working directory. You will want to add this to your collection of external JARs for your installed JRE.

For example, if you’re using Eclipse, go to Preferences > Java > Installed JREs, select your JRE version, click Edit and add Sikuli.jar to the collection.

Alternately, if you are using Maven to build your project, you can add Sikuli’s API to your project by adding the following to your POM.XML file:

<dependency>
    <groupId>org.sikuli</groupId>
    <artifactId>sikuli-api</artifactId>
    <version>1.2.0</version>
</dependency>

Clean then build your project and now you’re ready to roll.

Implementation:

Ultimately, we wanted a method we could control using Cucumber that allows us to articulate a web application using Webdriver that could take a screen shot of a web application (in this case, an instance of Curations) and compare it to a static screen shot of specific web elements (e.g. Ratings and Review stars within the Curations display).

This test method would then make an assumption that either we could find a match to the static screen element within the live web application or have TestNG throw an exception (test failure) if no match could be found.

First, now that we have the ability to use Sikuli, we created a new helper class that instantiates an object from their API so we can compare screen output.

import org.sikuli.api.*;
import java.io.IOException;
import java.io.File;
/**
* Created by gary.spillman on 4/9/15.
*/
public class SikuliHelper {

public boolean screenMatch(String targetPath) {
new ImageTarget(new File(targetPath));

Once we import the Sikuli API, we create a simple class with a single class method. In this case, screenMatch is going to accept a path within the Java project relative to a static image we are going to compare against the live browser window. True or false will be returned depending on if we have a match or not.

//Sets the screen region Sikuli will try to match to full screen
ScreenRegion fullScreen = new DesktopScreenRegion();

//Set your taret to compare from
Target target = new ImageTarget(new File(targetPath));

The main object type Sikuli wants to handle everything with is ScreenRegion. In this case, we are instantiating a new screen region relative to the entire desktop screen area of whatever OS our project will run on. Without passing any arguments to DesktopScreenRegion(), we will be defining the region’s dimension as the entire viewable area of our screen.

double fuzzPercent = .9;

try {
    fuzzPercent = Double.parseDouble(PropertyLoader.loadProperty(&quot;fuzz.factor&quot;));
}
catch (IOException e) {
    e.printStackTrace();
}
new ImageTarget(new File(targetPath));

Sikuli allows you to define a fuzzing factor (if you’ve ever used ImageMagick, this should be a familiar concept). Essentially, rather than defining a 1:1 exact match, you can define a minimal acceptable percentage you wish your screen comparison to match. For Sikuli, you can define this within a range from 0.1 to 1 (ie 10% match up to 100% match).

Here we are defining a default minimum match (or fuzz factor) of 90%. Additionally, we load in from a set of properties in Saladhand’s test.properties file a value which, if present can override the default 90% match – should we wish to increase or decrease the severity of test criteria.

target.setMinScore(fuzzPercent);
new ImageTarget(new File(targetPath));

Now that we know what fuzzing percentage we want to test with, we use target’s setMinScore method to set that property.

ScreenRegion found = fullScreen.find(target);

//According to code examples, if the image isn't found, the screen region is undefined
//So... if it remains null at this point, we're assuming there's no match.

if(found == null) {
    return false;
}
else {
    return true;
}
new ImageTarget(new File(targetPath));

This is where the magic happens. We create a new screen region called found. We then define that using fullScreen’s find method, providing the path to the image file we will use as comparison (target).

What happens here is that Sikuli will take the provided image (target) and attempt to locate any instance within the current visible screen that matches target, within the lower bound of the fuzzing percentage we set and up to a full, 100% match.

The find method either returns a new screen region object, or returns nothing. Thus, if we are unable to find a match to the file relative to target, found will remain undefined (null). So in this case, we simply return false if found is null (no match) or true of found is assigned a new screen region (we had a match).

Putting it all together:

To completely incorporate this behavior into our test framework, we write a simple cucumber step definition that allows us to call our Sikuli helper method, and provide a local image file as an argument for which to compare it against the current, active screen.

Here’s what the cucumber step looks like:

public class ScreenShotSteps {

    SikuliHelper sk = new SikuliHelper();

    //Given the image &quot;X&quot; can be found on the screen
    @Given(&quot;^the image \&quot;([^\&quot;]*)\&quot; can be found on the screen$&quot;)
    public void the_image_can_be_found_on_the_screen(String arg1) {

        String screenShotDir=null;

        try {
            screenShotDir = PropertyLoader.loadProperty(&quot;screenshot.path&quot;).toString();
        }
        catch (IOException e) {
            e.printStackTrace();
        }

        Assert.assertTrue(sk.screenMatch(screenShotDir + arg1));
    }
    new ImageTarget(new File(targetPath));
}

We’re referring to the image file via regex. The step definition makes an assertion using TestNG that the value returned from our instance of SikuliHelper’s screen match method is true (Success!!!). If not, TestNG throws an exception and our test will be marked as having failed.

Finally, since we already have cucumber steps that let us invoke and direct Webdriver to a live site, we can write a test that looks like the following:

Feature: Screen Shot Test
As a QA tester
I want to do screen compares
So I can be a boss ass QA tester

Scenario: Find the nav element on BV's home page
Given I visit &quot;http://www.bazaarvoice.com&quot;
Then the image &quot;screentest1.png&quot; can be found on the screen
new ImageTarget(new File(targetPath));

In this case, the image we are attempting to find is a portion of the nav element on BV’s home page:

Considerations:

This is not a full-stop solution to cross browser UI testing. Instead, we want to use Sikuli and tools like it to reduce overall manual testing as much as possible (as reasonably as possible) by giving the option to pre-warn product development teams of UI discrepancies. This can help us make better decisions on how to organize and allocate testing resources – manual and otherwise.

There are caveats to using Sikuli. The most explicit caveat is that tests designed with it cannot run heedlessly – the test tool requires a real, actual screen to capture and manipulate.

Obviously, the other possible drawback is the required maintenance of local image files you will need to check into your automation project as test artifacts. How deep you will be able to go with this type of testing may be tempered by how large of a file collection you will be able to reasonably maintain or deploy.

Despite that, Sikuli seems to have a large number of powerful features, not limited to being able to provide some level of mobile device testing. Check out the project repository and documentation to see how you might be able to incorporate similar automation code into your project today.