Matthew Bogner | bazaarvoice: engineering

This is is the second of a two-part article that outlines how we used a various set of tools to improve our page load performance. If you haven’t read part 1, go ahead and give it a read before continuing.

Tactics

We opted to not use our normal staging environment for this project, since our staging environment doesn’t run experimental code.

In order to iterate rapidly on our changes and to provide a location that is publicly accessible over the web (so that WebPageTest can see it), we set up an Amazon EC2 instance running a complete copy of all of our software so that it effectively behaved exactly like a local developer instance with the exception that it could be hit from any external resource on the web. (Heh… I make this sound really easy)

So now that we have a server on the web running a customized version of our software, the problem is now making requests that normally go to our production datacenter get redirected to our EC2 instance without redirecting real end-users. In my opinion, this is where the capabilities of WebPageTest really shined and began to flex it’s muscle.

Let’s say that under normal conditions, your production application runs at foo.com/123.456.789.1 and that the EC2 instance you created and that is running a customized version of your app is running at ec2-123-456-789-2.aws.com/123.456.789.2. WebPageTest will allow you to override DNS resolution for foo.com to 123.456.789.2. This works in a similar manner to a host override except that WPT will still have the browser perform a DNS lookup of your production host so that you still get accurate timings for the DNS resolution.

To take advantage of this, you need to provide the following “script” to your test execution:

navigate http://www.external-site-that-includes-your-code.com
setDns foo.com 123.456.789.2

The other cool thing about WebPageTest is that the test execution and results parsing can be scripted via their REST-like APIs. In fact, check out this java class gist I wrote (embedded below) that makes use of this API to run some aggregated stats of the usage of Twitter on AOL.com. This class allows you to more easily view aggregated statistics for a narrow set of the resources that are actually used by a page — assuming that you only care about the resources being delivered by your servers into another page.

package com.bazaarvoice.mbogner.utils;
import org.apache.commons.httpclient.HttpClient;
import org.apache.commons.httpclient.HttpMethod;
import org.apache.commons.httpclient.methods.GetMethod;
import org.apache.commons.httpclient.methods.PostMethod;
import org.apache.commons.io.IOUtils;
import org.apache.commons.lang.StringUtils;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathFactory;
import java.io.ByteArrayInputStream;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Iterator;
import java.util.Set;
/**
 * @author: Matthew Bogner
 * @date: 10/25/11
 */
public class WebPageTestRunner {
public static HttpClient client = new HttpClient();
public static XPath xpath = XPathFactory.newInstance().newXPath();
public static Set<String> hostnameFilter = new HashSet<String>();
public static void main(String[] args) throws Exception {
hostnameFilter.add(“cdn.api.twitter.com”);
hostnameFilter.add(“platform.twitter.com”);
PostMethod post = new PostMethod(“http://www.webpagetest.org/runtest.php”);
post.addParameter(“url”, “www.aol.com”);
post.addParameter(“runs”, “1”); // Only request the page once
post.addParameter(“fvonly”, “1”); // Only look at the FirstView
post.addParameter(“location”, “Dulles_IE7.DSL”); // Dulles, VA with IE7 and DSL connection
post.addParameter(“f”, “xml”); // Respond with XML
post.addParameter(“k”, args[0]); // API Key from WebPageTest.org
// Now we need to (optionally) send over a script so that we can
// override the DNS entries for certain hosts that the page will
// attempt to reverence.
//post.addParameter(“script”,
// “navigate http://www.aol.com\n” +
// “setDns platform.twitter.com 123.456.789.2\n” +
// “setDns cdn.api.twitter.com 123.456.789.2”);
String responseBody = executeHttpMethod(post);
System.out.println(responseBody);
Node statusCodeNode = (Node) xpath.evaluate(“/response/statusCode”, getXmlSrc(responseBody), XPathConstants.NODE);
String statusCode = statusCodeNode.getTextContent();
System.out.println(“StatusCode = “ + statusCode + “\n”);
if (“200”.equals(statusCode)) {
// Request was successful. Wait for the test to complete.
Node testIdNode = (Node) xpath.evaluate(“/response/data/testId”, getXmlSrc(responseBody), XPathConstants.NODE);
waitForTestCompletion(testIdNode.getTextContent());
}
}
private static InputSource getXmlSrc(String content) throws Exception {
return new InputSource(new ByteArrayInputStream(content.getBytes(“UTF-8”)));
}
public static String executeHttpMethod(HttpMethod method) throws Exception {
int responseCode;
String responseBody;
try {
responseCode = client.executeMethod(method);
responseBody = IOUtils.toString(method.getResponseBodyAsStream());
} finally {
method.releaseConnection();
}
if (responseCode != 200) {
throw new Exception(“Invalid server response. \nResponse code: “ + responseCode + “\nResponse body: “ + responseBody);
}
return responseBody;
}
private static void waitForTestCompletion(String testId) throws Exception {
PostMethod post = new PostMethod(“http://www.webpagetest.org/testStatus.php”);
post.addParameter(“f”, “xml”); // Respond with XML
post.addParameter(“test”, testId);
String responseBody = executeHttpMethod(post);
Node statusCodeNode = (Node) xpath.evaluate(“/response/statusCode”, getXmlSrc(responseBody), XPathConstants.NODE);
String statusCode = statusCodeNode.getTextContent();
// 200 indicates test is completed. 1XX means the test is still in progress. And 4XX indicates some error.
if (statusCode.startsWith(“4”)) {
System.err.println(responseBody);
throw new Exception(“Error getting test results.”);
} else if (statusCode.startsWith(“1”)) {
System.out.println(“Test not completed. Waiting for 30 seconds and retrying…”);
Thread.sleep(30000); // Wait for 30sec
waitForTestCompletion(testId);
} else if (“200”.equals(statusCode)) {
obtainTestResults(testId);
} else {
System.err.println(responseBody);
throw new Exception(“Unknown statusCode in response”);
}
}
private static void obtainTestResults(String testId) throws Exception {
GetMethod get = new GetMethod(“http://www.webpagetest.org/xmlResult/” + testId + “/”);
String responseBody = executeHttpMethod(get);
Node statusCodeNode = (Node) xpath.evaluate(“/response/statusCode”, getXmlSrc(responseBody), XPathConstants.NODE);
String statusCode = statusCodeNode.getTextContent();
if (!“200”.equals(statusCode)) {
System.err.println(responseBody);
throw new Exception(“Unable to obtain raw test results”);
}
NodeList requestsDataUrlNodes = (NodeList) xpath.evaluate(“/response/data/run/firstView/rawData/requestsData”,
getXmlSrc(responseBody),
XPathConstants.NODESET);
for(int nodeCtr = 0; nodeCtr < requestsDataUrlNodes.getLength(); ++nodeCtr) {
Node requestsDataNode = requestsDataUrlNodes.item(nodeCtr);
String requestsDataUrl = requestsDataNode.getTextContent().trim();
analyzeTestResult(requestsDataUrl);
}
}
private static void analyzeTestResult(String requestsDataUrl) throws Exception {
System.out.println(“\n\nAnalyzing results for “ + requestsDataUrl);
/*
 Things we want to track for each hostname.
 Total # requests
 Total # of requests for each content type
 Total # of bytes for each content type
 Total Time to First Byte
 Total DNS Time
 Total bytes
 Total connection time
 */
HashMap<String, Integer> numRequestsPerHost = new HashMap<String, Integer>();
HashMap<String, HashMap<String, Integer>> numRequestsPerHostPerContentType = new HashMap<String, HashMap<String, Integer>>();
HashMap<String, Integer> totalTTFBPerHost = new HashMap<String, Integer>();
HashMap<String, Integer> totalDNSLookupPerHost = new HashMap<String, Integer>();
HashMap<String, Integer> totalInitialCnxnTimePerHost = new HashMap<String, Integer>();
HashMap<String, HashMap<String, Integer>> totalBytesPerHostPerContentType = new HashMap<String, HashMap<String, Integer>>();
HashMap<String, Integer> totalBytesPerHost = new HashMap<String, Integer>();
String responseBody = executeHttpMethod(new GetMethod(requestsDataUrl)); // Unlike the rest, this response will be tab-delimited
String[] lines = StringUtils.split(responseBody, “\n”);
for (int lineCtr = 1; lineCtr < lines.length; ++lineCtr) {
String line = lines[lineCtr];
String[] columns = StringUtils.splitPreserveAllTokens(line, “\t”);
String hostname = columns[5];
String contentType = columns[18];
String ttfb = StringUtils.isBlank(columns[9]) ? “0” : columns[9];
String dns = StringUtils.isBlank(columns[47]) ? “0” : columns[47];
String cnxn = StringUtils.isBlank(columns[48]) ? “0” : columns[48];
String bytes = StringUtils.isBlank(columns[13]) ? “0” : columns[13];
if (“0”.equals(bytes) || (!hostnameFilter.isEmpty() && !hostnameFilter.contains(hostname))) {
continue;
}
// Track total # requests per host
if (!numRequestsPerHost.containsKey(hostname)) {
numRequestsPerHost.put(hostname, new Integer(1));
} else {
numRequestsPerHost.put(hostname, numRequestsPerHost.get(hostname) + 1);
}
// Track total # requests per host per content-type
if (!numRequestsPerHostPerContentType.containsKey(hostname)) {
HashMap<String, Integer> tmp = new HashMap<String, Integer>();
tmp.put(contentType, new Integer(1));
numRequestsPerHostPerContentType.put(hostname, tmp);
} else if (!numRequestsPerHostPerContentType.get(hostname).containsKey(contentType)) {
numRequestsPerHostPerContentType.get(hostname).put(contentType, new Integer(1));
} else {
numRequestsPerHostPerContentType.get(hostname).put(contentType, numRequestsPerHostPerContentType.get(hostname).get(contentType) + 1);
}
// Track total # bytes per host per content-type
if (!totalBytesPerHostPerContentType.containsKey(hostname)) {
HashMap<String, Integer> tmp = new HashMap<String, Integer>();
tmp.put(contentType, Integer.valueOf(bytes));
totalBytesPerHostPerContentType.put(hostname, tmp);
} else if (!totalBytesPerHostPerContentType.get(hostname).containsKey(contentType)) {
totalBytesPerHostPerContentType.get(hostname).put(contentType, Integer.valueOf(bytes));
} else {
totalBytesPerHostPerContentType.get(hostname).put(contentType, totalBytesPerHostPerContentType.get(hostname).get(contentType) + Integer.valueOf(bytes));
}
// Track total TTFB for host
if (!totalTTFBPerHost.containsKey(hostname)) {
totalTTFBPerHost.put(hostname, Integer.valueOf(ttfb));
} else {
totalTTFBPerHost.put(hostname, totalTTFBPerHost.get(hostname) + Integer.valueOf(ttfb));
}
// Track total DNS lookup time for host
if (!totalDNSLookupPerHost.containsKey(hostname)) {
totalDNSLookupPerHost.put(hostname, Integer.valueOf(dns));
} else {
totalDNSLookupPerHost.put(hostname, totalDNSLookupPerHost.get(hostname) + Integer.valueOf(dns));
}
// Track total initial connection time for host
if (!totalInitialCnxnTimePerHost.containsKey(hostname)) {
totalInitialCnxnTimePerHost.put(hostname, Integer.valueOf(cnxn));
} else {
totalInitialCnxnTimePerHost.put(hostname, totalInitialCnxnTimePerHost.get(hostname) + Integer.valueOf(cnxn));
}
// Track total bytes for host
if (!totalBytesPerHost.containsKey(hostname)) {
totalBytesPerHost.put(hostname, Integer.valueOf(bytes));
} else {
totalBytesPerHost.put(hostname, totalBytesPerHost.get(hostname) + Integer.valueOf(bytes));
}
}
printMap(“Total # requests per host”, numRequestsPerHost);
printMap(“Total # requests per host per content-type”, numRequestsPerHostPerContentType);
printMap(“Total # bytes per host per content-type”, totalBytesPerHostPerContentType);
printMap(“Total TTFB per host”, totalTTFBPerHost);
printMap(“Total DNS lookup per host”, totalDNSLookupPerHost);
printMap(“Total Initial Connection Time per host”, totalInitialCnxnTimePerHost);
printMap(“Total Bytes per host”, totalBytesPerHost);
}
private static void printMap(String title, HashMap stats) {
System.out.println(“\t” + title);
Iterator keyItr = stats.keySet().iterator();
while (keyItr.hasNext()) {
Object key = keyItr.next();
Object value = stats.get(key);
System.out.println(“\t\t” + key.toString() + “: “ + value.toString());
}
}
}
view rawWebPageTestRunner.javaThis Gist brought to you by GitHub.

Summary

Let’s recap what we have available to us…

EC2 instance running a set of our code that can be recompiled and relaunched on a whim
The CharlesProxy tool for a cursory look at the whole transaction of the page from your desktop
WebPageTest tool for a third-party view of your app’s performance
and lastly, a custom java class that can invoke your WebPageTest runs and consolidate/aggregate the results programmatically

With these tools and tactics we had an externally facing environment that we could use to iterate on new ideas quickly.

Debugging performance issues is hard. Debugging end-user performance issues from a distributed production software stack is even harder, especially if you are the 3rd-party service provider for one of your clients that actually is in control of how your code is integrated into their site. There are lots of articles on the web regarding what the performance best practices are, but few, if any, that discuss the tactics of developing and improving them.

Primarily, the challenges stem from the fact that troubleshooting performance issues is always iterative. If your production operations can handle deploying test code to production on a rapidly iterative schedule, then you can stop reading this post — you are already perfect, and worthy of a cold brewski for your impressive skills.

As our team and software stack continued to grow in both size and complexity, we underwent a restructuring of our client (js, css and html) code a while back that allows us to only deliver the code that is actually needed by our client’s site and configuration. We did this by reorganizing our code into modules that can be loaded asynchronously by the browser using require.js. Effectively this took us from a monolithic js & css include that contained a bunch of unused code and styles, to something that averaged out to a much smaller deliverable that was consumed in smaller chunks by the browser.

This technique is a double-edged sword, and like all things, is best when done in moderation. Loading multiple modules asynchronously in the browser results in multiple http requests. Every HTTP request made by the browser results in some overhead spent doing the following:

DNS Lookup – Only performed once per unique hostname that the browser encounters.
Initial Connection – This is simply the time that the browser takes to establish a TCP socket connection with the web server.
SSL Negotiation – This step is omitted if the connection is not intended to be secure. Otherwise, it is just the SSL handshake of certificates.
Time to First Byte (TTFB) – This is the time starting from after the browser has sent the request to when the first byte of the response is received by the browser.
Content Download – This is the time spent downloading all the bytes of the response from the web server.

There are many great resources from Yahoo! and Google which discuss the details of best-practices for improving page performance. I won’t re-hash that great information, nor dispute any of the recommendations that the respective authors make. I will, on the other hand, discuss some tactics and tools that I have found beneficial in analyzing and iterating on performance-related enhancements to a distributed software stack.

Mission

A few months ago, we challenged ourselves to improve our page load performance with IE7 in a bandwidth constrained (let’s call it “DSL-ish”) environment. DSL connections vary in speed and latency with each ISP.

I won’t bore you with the details of the changes we ended up making, but I want to give you a flavor of the situation we were trying to solve before I talk about the tactics and tools we used to iterate on the best practices that are available all over the web.

The unique challenges here are that IE7 only allows 2 simultaneous connections to the same host at a time. Since our software distributes multiple modules of js, css and images that are very small, we were running into this 2-connection-per-hostname issue with a vengeance. Commonly accepted solutions to this involve image spriting, file concatenation and distributing or “sharding” requests for static resources across multiple domains. The sharding tactic made us realize the other constraining factor we were dealing with — the longer latency of HTTP requests on a DSL connection that gets exaggerated when making multiple DNS lookups to a larger set of distinct host names.

Tools

The tools that we used to measure and evaluate our changes affected the tactics we used – so I’ll discuss them first.

Charles Proxy

Charles Proxy is a tool that runs on all platforms and provides some key features that really aided us in our analysis. Primarily, it had a built-in bandwidth throttling capability which allowed us to simulate specific latency and upload/download conditions from our local machine. We used CharlesProxy for a rougher on-the-spot analysis of changes. CharlesProxy also allowed us to easily and quickly see some aggregate numbers of specific metrics we were interested in. In particular, we were looking for the total # of requests, total durations of all requests and the total response size of all requests. Since these numbers are affected by the rest of the code (not ours) on our client’s site – Charles allowed us to filter out the resources that were not ours, but still allowed us to see how our software behaved in the presence of our client’s code.

However, since we had multiple developers working on the project — each making isolated changes — we wanted a way to run a sort of “integration” test of all the changes at once in a manner that more closely aligned with how our software is delivered from our production servers. This led us to our next tool of choice – one that we’d never used until now.

WebPageTest.org

In it’s own words:

WebPagetest is a tool that was originally developed by AOL for use internally and was open-sourced in 2008 under a BSD license. The platform is under active development by several companies and community contributors on Google code. The software is also packaged up periodically and available for download if you would like to run your own instance.

In our case, WebPageTest provided two key things:

It’s Free
It is a useful 3rd party mediator between ourselves and others for spot-checking page performance

At a high level, WebPageTest.org controls a bunch of compute agents that live in various geographic locations of the US that are able to simulate bandwidth conditions according to your specifications (under the hood it uses DummyNet). It allows you to request one of it’s agents to load your page and interact with your site by simulating link clicks (if necessary) and monitors and captures the results for detailed analysis by you later. This tool is a great way for you to use an external entity to verify your changes and have a consistent pre & post benchmark of your page’s performance.

Of course, having some random machine on the web poke your site means that your changes must be publicly accessible over the web. Password protection is fine since you can use WPT to script the login, but IMHO is non-ideal as that is not part of the normal end-user experience.

Tactics

Now that we have a good handle on the tools we used – we should discuss how we put them to work. Stay tuned for part 2, where we will explore the tactics for using these tools together effectively.

bazaarvoice: engineering

The official blog of Bazaarvoice R&D

Author Archives: Matthew Bogner

About Matthew Bogner

Using the Cloud to Troubleshoot End-User Performance (Part 2)

Tactics

Summary

Using the Cloud to Troubleshoot End-User Performance (Part 1)