Using the Cloud to Troubleshoot End-User Performance (Part 2)

This is is the second of a two-part article that outlines how we used a various set of tools to improve our page load performance. If you haven’t read part 1, go ahead and give it a read before continuing.

Tactics

We opted to not use our normal staging environment for this project, since our staging environment doesn’t run experimental code.

In order to iterate rapidly on our changes and to provide a location that is publicly accessible over the web (so that WebPageTest can see it), we set up an Amazon EC2 instance running a complete copy of all of our software so that it effectively behaved exactly like a local developer instance with the exception that it could be hit from any external resource on the web. (Heh… I make this sound really easy)

So now that we have a server on the web running a customized version of our software, the problem is now making requests that normally go to our production datacenter get redirected to our EC2 instance without redirecting real end-users. In my opinion, this is where the capabilities of WebPageTest really shined and began to flex it’s muscle.

Let’s say that under normal conditions, your production application runs at foo.com/123.456.789.1 and that the EC2 instance you created and that is running a customized version of your app is running at ec2-123-456-789-2.aws.com/123.456.789.2. WebPageTest will allow you to override DNS resolution for foo.com to 123.456.789.2. This works in a similar manner to a host override except that WPT will still have the browser perform a DNS lookup of your production host so that you still get accurate timings for the DNS resolution.

To take advantage of this, you need to provide the following “script” to your test execution:

navigate http://www.external-site-that-includes-your-code.com
setDns foo.com 123.456.789.2

The other cool thing about WebPageTest is that the test execution and results parsing can be scripted via their REST-like APIs. In fact, check out this java class gist I wrote (embedded below) that makes use of this API to run some aggregated stats of the usage of Twitter on AOL.com. This class allows you to more easily view aggregated statistics for a narrow set of the resources that are actually used by a page — assuming that you only care about the resources being delivered by your servers into another page.

package com.bazaarvoice.mbogner.utils;
import org.apache.commons.httpclient.HttpClient;
import org.apache.commons.httpclient.HttpMethod;
import org.apache.commons.httpclient.methods.GetMethod;
import org.apache.commons.httpclient.methods.PostMethod;
import org.apache.commons.io.IOUtils;
import org.apache.commons.lang.StringUtils;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathFactory;
import java.io.ByteArrayInputStream;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Iterator;
import java.util.Set;
/**
* @author: Matthew Bogner
* @date: 10/25/11
*/
public class WebPageTestRunner {
public static HttpClient client = new HttpClient();
public static XPath xpath = XPathFactory.newInstance().newXPath();
public static Set<String> hostnameFilter = new HashSet<String>();
public static void main(String[] args) throws Exception {
hostnameFilter.add(“cdn.api.twitter.com”);
hostnameFilter.add(“platform.twitter.com”);
PostMethod post = new PostMethod(“http://www.webpagetest.org/runtest.php”);
post.addParameter(“url”, “www.aol.com”);
post.addParameter(“runs”, “1”); // Only request the page once
post.addParameter(“fvonly”, “1”); // Only look at the FirstView
post.addParameter(“location”, “Dulles_IE7.DSL”); // Dulles, VA with IE7 and DSL connection
post.addParameter(“f”, “xml”); // Respond with XML
post.addParameter(“k”, args[0]); // API Key from WebPageTest.org
// Now we need to (optionally) send over a script so that we can
// override the DNS entries for certain hosts that the page will
// attempt to reverence.
//post.addParameter(“script”,
// “navigate http://www.aol.com\n” +
// “setDns platform.twitter.com 123.456.789.2\n” +
// “setDns cdn.api.twitter.com 123.456.789.2”);
String responseBody = executeHttpMethod(post);
System.out.println(responseBody);
Node statusCodeNode = (Node) xpath.evaluate(“/response/statusCode”, getXmlSrc(responseBody), XPathConstants.NODE);
String statusCode = statusCodeNode.getTextContent();
System.out.println(“StatusCode = “ + statusCode + “\n”);
if (“200”.equals(statusCode)) {
// Request was successful. Wait for the test to complete.
Node testIdNode = (Node) xpath.evaluate(“/response/data/testId”, getXmlSrc(responseBody), XPathConstants.NODE);
waitForTestCompletion(testIdNode.getTextContent());
}
}
private static InputSource getXmlSrc(String content) throws Exception {
return new InputSource(new ByteArrayInputStream(content.getBytes(“UTF-8”)));
}
public static String executeHttpMethod(HttpMethod method) throws Exception {
int responseCode;
String responseBody;
try {
responseCode = client.executeMethod(method);
responseBody = IOUtils.toString(method.getResponseBodyAsStream());
} finally {
method.releaseConnection();
}
if (responseCode != 200) {
throw new Exception(“Invalid server response. \nResponse code: “ + responseCode + “\nResponse body: “ + responseBody);
}
return responseBody;
}
private static void waitForTestCompletion(String testId) throws Exception {
PostMethod post = new PostMethod(“http://www.webpagetest.org/testStatus.php”);
post.addParameter(“f”, “xml”); // Respond with XML
post.addParameter(“test”, testId);
String responseBody = executeHttpMethod(post);
Node statusCodeNode = (Node) xpath.evaluate(“/response/statusCode”, getXmlSrc(responseBody), XPathConstants.NODE);
String statusCode = statusCodeNode.getTextContent();
// 200 indicates test is completed. 1XX means the test is still in progress. And 4XX indicates some error.
if (statusCode.startsWith(“4”)) {
System.err.println(responseBody);
throw new Exception(“Error getting test results.”);
} else if (statusCode.startsWith(“1”)) {
System.out.println(“Test not completed. Waiting for 30 seconds and retrying…”);
Thread.sleep(30000); // Wait for 30sec
waitForTestCompletion(testId);
} else if (“200”.equals(statusCode)) {
obtainTestResults(testId);
} else {
System.err.println(responseBody);
throw new Exception(“Unknown statusCode in response”);
}
}
private static void obtainTestResults(String testId) throws Exception {
GetMethod get = new GetMethod(“http://www.webpagetest.org/xmlResult/” + testId + “/”);
String responseBody = executeHttpMethod(get);
Node statusCodeNode = (Node) xpath.evaluate(“/response/statusCode”, getXmlSrc(responseBody), XPathConstants.NODE);
String statusCode = statusCodeNode.getTextContent();
if (!“200”.equals(statusCode)) {
System.err.println(responseBody);
throw new Exception(“Unable to obtain raw test results”);
}
NodeList requestsDataUrlNodes = (NodeList) xpath.evaluate(“/response/data/run/firstView/rawData/requestsData”,
getXmlSrc(responseBody),
XPathConstants.NODESET);
for(int nodeCtr = 0; nodeCtr < requestsDataUrlNodes.getLength(); ++nodeCtr) {
Node requestsDataNode = requestsDataUrlNodes.item(nodeCtr);
String requestsDataUrl = requestsDataNode.getTextContent().trim();
analyzeTestResult(requestsDataUrl);
}
}
private static void analyzeTestResult(String requestsDataUrl) throws Exception {
System.out.println(“\n\nAnalyzing results for “ + requestsDataUrl);
/*
Things we want to track for each hostname.
Total # requests
Total # of requests for each content type
Total # of bytes for each content type
Total Time to First Byte
Total DNS Time
Total bytes
Total connection time
*/
HashMap<String, Integer> numRequestsPerHost = new HashMap<String, Integer>();
HashMap<String, HashMap<String, Integer>> numRequestsPerHostPerContentType = new HashMap<String, HashMap<String, Integer>>();
HashMap<String, Integer> totalTTFBPerHost = new HashMap<String, Integer>();
HashMap<String, Integer> totalDNSLookupPerHost = new HashMap<String, Integer>();
HashMap<String, Integer> totalInitialCnxnTimePerHost = new HashMap<String, Integer>();
HashMap<String, HashMap<String, Integer>> totalBytesPerHostPerContentType = new HashMap<String, HashMap<String, Integer>>();
HashMap<String, Integer> totalBytesPerHost = new HashMap<String, Integer>();
String responseBody = executeHttpMethod(new GetMethod(requestsDataUrl)); // Unlike the rest, this response will be tab-delimited
String[] lines = StringUtils.split(responseBody, “\n”);
for (int lineCtr = 1; lineCtr < lines.length; ++lineCtr) {
String line = lines[lineCtr];
String[] columns = StringUtils.splitPreserveAllTokens(line, “\t”);
String hostname = columns[5];
String contentType = columns[18];
String ttfb = StringUtils.isBlank(columns[9]) ? “0” : columns[9];
String dns = StringUtils.isBlank(columns[47]) ? “0” : columns[47];
String cnxn = StringUtils.isBlank(columns[48]) ? “0” : columns[48];
String bytes = StringUtils.isBlank(columns[13]) ? “0” : columns[13];
if (“0”.equals(bytes) || (!hostnameFilter.isEmpty() && !hostnameFilter.contains(hostname))) {
continue;
}
// Track total # requests per host
if (!numRequestsPerHost.containsKey(hostname)) {
numRequestsPerHost.put(hostname, new Integer(1));
} else {
numRequestsPerHost.put(hostname, numRequestsPerHost.get(hostname) + 1);
}
// Track total # requests per host per content-type
if (!numRequestsPerHostPerContentType.containsKey(hostname)) {
HashMap<String, Integer> tmp = new HashMap<String, Integer>();
tmp.put(contentType, new Integer(1));
numRequestsPerHostPerContentType.put(hostname, tmp);
} else if (!numRequestsPerHostPerContentType.get(hostname).containsKey(contentType)) {
numRequestsPerHostPerContentType.get(hostname).put(contentType, new Integer(1));
} else {
numRequestsPerHostPerContentType.get(hostname).put(contentType, numRequestsPerHostPerContentType.get(hostname).get(contentType) + 1);
}
// Track total # bytes per host per content-type
if (!totalBytesPerHostPerContentType.containsKey(hostname)) {
HashMap<String, Integer> tmp = new HashMap<String, Integer>();
tmp.put(contentType, Integer.valueOf(bytes));
totalBytesPerHostPerContentType.put(hostname, tmp);
} else if (!totalBytesPerHostPerContentType.get(hostname).containsKey(contentType)) {
totalBytesPerHostPerContentType.get(hostname).put(contentType, Integer.valueOf(bytes));
} else {
totalBytesPerHostPerContentType.get(hostname).put(contentType, totalBytesPerHostPerContentType.get(hostname).get(contentType) + Integer.valueOf(bytes));
}
// Track total TTFB for host
if (!totalTTFBPerHost.containsKey(hostname)) {
totalTTFBPerHost.put(hostname, Integer.valueOf(ttfb));
} else {
totalTTFBPerHost.put(hostname, totalTTFBPerHost.get(hostname) + Integer.valueOf(ttfb));
}
// Track total DNS lookup time for host
if (!totalDNSLookupPerHost.containsKey(hostname)) {
totalDNSLookupPerHost.put(hostname, Integer.valueOf(dns));
} else {
totalDNSLookupPerHost.put(hostname, totalDNSLookupPerHost.get(hostname) + Integer.valueOf(dns));
}
// Track total initial connection time for host
if (!totalInitialCnxnTimePerHost.containsKey(hostname)) {
totalInitialCnxnTimePerHost.put(hostname, Integer.valueOf(cnxn));
} else {
totalInitialCnxnTimePerHost.put(hostname, totalInitialCnxnTimePerHost.get(hostname) + Integer.valueOf(cnxn));
}
// Track total bytes for host
if (!totalBytesPerHost.containsKey(hostname)) {
totalBytesPerHost.put(hostname, Integer.valueOf(bytes));
} else {
totalBytesPerHost.put(hostname, totalBytesPerHost.get(hostname) + Integer.valueOf(bytes));
}
}
printMap(“Total # requests per host”, numRequestsPerHost);
printMap(“Total # requests per host per content-type”, numRequestsPerHostPerContentType);
printMap(“Total # bytes per host per content-type”, totalBytesPerHostPerContentType);
printMap(“Total TTFB per host”, totalTTFBPerHost);
printMap(“Total DNS lookup per host”, totalDNSLookupPerHost);
printMap(“Total Initial Connection Time per host”, totalInitialCnxnTimePerHost);
printMap(“Total Bytes per host”, totalBytesPerHost);
}
private static void printMap(String title, HashMap stats) {
System.out.println(“\t” + title);
Iterator keyItr = stats.keySet().iterator();
while (keyItr.hasNext()) {
Object key = keyItr.next();
Object value = stats.get(key);
System.out.println(“\t\t” + key.toString() + “: “ + value.toString());
}
}
}

Summary

Let’s recap what we have available to us…

  • EC2 instance running a set of our code that can be recompiled and relaunched on a whim
  • The CharlesProxy tool for a cursory look at the whole transaction of the page from your desktop
  • WebPageTest tool for a third-party view of your app’s performance
  • and lastly, a custom java class that can invoke your WebPageTest runs and consolidate/aggregate the results programmatically

With these tools and tactics we had an externally facing environment that we could use to iterate on new ideas quickly.