Category Archives: Uncategorized

SELECT developers FROM Bazaarvoice UNION SELECT flags FROM Stripe_CTF;

Stripe (https://stripe.com/) held their second capture the flag event, this time the CTF was dedicated to web-based vulnerabilities and exploits. As a new Security Engineer here at BV the timing of this was perfect. It allowed me to use it as a vehicle for awareness and to ramp up curiosity, interest and even excitement for web application security (webappsec).

Let’s start with some definitions to get us all on the same page.

Capture the flag is a traditional outdoor game where two teams each have a flag and the objective is to capture the other team’s flag, located at the team’s base. In computer security the flag is a piece of data stored somewhere on a vulnerable computer and the goal is to use tools and know how to “hack” the machine to find the flag.

Webappsec is the acronym for Web Application Security a branch of Information Security that deals specifically with security of websites and web applications. In my opinion there is more to it than just that. As we see from the OWASP Top 10 (https://www.owasp.org/index.php/Category:OWASP_Top_Ten_Project) and the emerging DevOps movement, web developers are doing more than just coding. They are now configuring full stacks to support their applications. So, webappsec needs to encompass more than just the application. It has to deal with developer and management education, configuration management, networking security, OS hardening, just to name a few areas, and of course not to forget the code.

Now that we know what CTF and webappsec are lets move on to the details of the event.

The CTF started on Wednesday, August 22nd, 2PM CDT today at 2PM CDT and ran for a week until Wednesday, August 29th, 2012 2PM CDT. One week == Seven Days == 168 hours and, for me at least, this was not enough. This capture the flag included 9 levels (0-8) to conquer before arriving at the actual flag. Each level was progressively more difficult and when completed provided you with the password for the next level.

The CTF was very challenging, yet no security specific training/knowledge was needed in order to participate. In my opinion the minimum requirements to participate and find it fun and enjoyable was some knowledge of web application programming and a willingness to learn and research possible vulnerabilities and potential exploits.

In total about 15 or so Bazaarvoice cohorts tried their hand at the CTF. Four actually captured the flag! When did they have the time? Well, we used our 20% time while at work and personal time when away from work. Late nights, lunch hours, weekend, etc.…believe me once you get started you are hooked. You end up thinking about it all day. You think about the level you are on, how to solve it and whether the previous levels offer up any hints or possible staging areas for potential exploits. If you are anything like me, first a developer at heart then a security professional, this type of activity is a great way to test out your chops and have some fun while learning new things and potentially bringing some of the lessons back to your teams. This is the perfect avenue for awareness – code wins after all!

Here are quotes from some of the BV participants:

I got to 8 last night. I don’t think I’ll get it in the next 4 hours, but it was a fun challenge. – Cory

Made it to level 8 last night, but not until after midnight. Don’t think I’ll be finishing, but it was fun all the same. Level 8 is really interesting, and I look forward to hearing how folks finished it. – RC

I’ve been having a lot of fun with it. – Jeremy

Overall everyone that tried it had a good time with it no matter how far they actually got.

All in all a total of four BVers completed the CTF and captured the final flag!

 

Congratulations to the following BVers for successfully capturing the flag:

 

 

For their last CTF Stripe made available the solutions and a working VM with each level. I hope they do the same for this one; this will give everyone the opportunity to learn and see what vulnerabilities were found and how they were exploited in order to complete each level and eventually capture the flag.

For now I leave you with a quote which I think embodies web application security, its education and use in the development community:

Secure programming is a mind-set. It may start with a week or two of training, but it will require constant reinforcement. – Karl Keller, President of IS Power

As with many things here at Bazaarvoice, the education and growth of our developers and their skills often take on a fun and enjoyable approach.

MongoDB Arrays and Atomicity

Over time, even technologies that are tried and true begin to show their age. This is especially true for data stores as the shear amount of data explodes and site traffic increases. Because of this, we are continually working with new technologies to determine whether they have a place in our primary stack.

To that end, we began using MongoDB for one of our new internal systems several months ago. Its quick setup and ease of use make it a great data store for starting a new project that’s constantly in flux. Its robustness and scalability make it a worthy contender for our primary data store, if it should prove capable of handling our use cases.

Being traditionally a MySQL shop, our team is used to working with databases in a certain way. When we write code that interacts with the DB, we use transactions and locks rather liberally. Therefore using MongoDB is a rather significant paradigm shift for our developers.

Though MongoDB doesn’t have true ACID compliance, it does support atomic updates to an individual object. And because a “single” object in Mongo can actually be quite complex, this is sufficient for many operations. For example, for the following complex object, setting values, incrementing numbers, adding and removing items from an array, can all be done at the same time atomically:

db.employees.update(
  {_id: ObjectId("4e5e4c0e945a66e30892e4e3")},
  {$set: {firstName: "Bobby", lastName: "Schmidt"}}
);

Even setting values of child objects can be done atomically:

db.employees.update(
  {_id: ObjectId("4e5e4c0e945a66e30892e4e3")},
  {$set: {"position.title": "Software Developer", "position.id": 1015}}
)

In fact, individual items in an array can be changed:

db.employees.update(
  {_id: ObjectId("4e5e4c0e945a66e30892e4e3")},
  {$set: {"employmentHistory.0.company": "Google, Inc."}}
);

However, one thing you may notice is that the code above to modify a specific item in the array requires us to know the specific index of the item in the array we wish to modify. That seems simple, because we can pull down the object, find the index of the item, and then change it. The difficulty with this is that other operations on an array do not necessarily guarantee that the order of the elements will be the same. For example, if an item in an array is removed and then added later, it will always be appended to the array. The only way to keep an array in a particular order is to $set it (replacing all the objects in it), which for a large array may not have the best performance.

This problem is best demonstrated with the following race conditions:

One solution to this problem is to $pull a particular item from the array, change it, and then $push it back onto the array. In order to avoid the race condition above, we decided that this was a safer solution. After all, these two operations should be doable atomically.

db.employees.update(
  {_id: ObjectId("4e5e4c0e945a66e30892e4e3")},
  {$pull: {badges: {type: "TOP_EMPLOYEE"}}, $push: {badges: {type: "TOP_EMPLOYEE", date: null}}}
);

So, what’s the problem? The issue is that MongoDB doesn’t allow multiple operations on the same property in the same update call. This means that the two operations must happen in two individually atomic operations. Therefore it’s possible for the following to occur:

That is bad enough, but it can be dealt with by updating an object only if it is in the correct state. Unfortunately, the following can happen as well:

That looks very odd to the reader, because at one moment it would see the item, then the next read would lose it, and then it would come back. To an end user that’s bad enough, but to another system that is taking actions based on the data, the results can be inconsistent at best, and deleterious at worst.

What we really want is the ability to modify an item (or items) in an array by query, effectively $set with $pull semantics:

db.employees.update(
  {_id: ObjectId("4e5e4c0e945a66e30892e4e3")},
  {$update: {badges: {$query: {type: "TOP_EMPLOYEE"}, $set: {date: null}}}}
);

Since that’s not yet supported by MongoDB, we decided to use a map. This makes a lot of sense for our use case since the structure happened to be a map in Java. Because MongoDB uses JSON-like structures, the map is an object just like any other, and the keys are simply the property names of the object.

This means that individual elements can be accessed and updated from the map:

db.employees.update(
  {_id: ObjectId("4e5e4c0e945a66e30892e4e3")},
  {$set: {"badges.TOP_EMPLOYEE.date": null}}
);

This seems like a rather elegant solution, so why didn’t we start out this way? One pretty major problem: if you don’t know all of the possible keys to the map, there is no good way to index the fields for faster querying. Indexes to child objects in MongoDB are created using dot notation, just like the query syntax. This works really well with arrays because MongoDB supports indexing fields of objects in arrays just like any other child object field. But for a map, the key is part of the path to the child field, so each key is another index:

// Simple fields
db.employees.ensureIndex({lastName: 1, firstName: 1})

// Child object fields
db.employees.ensureIndex({"position.id": 1})

// Array object fields
db.employees.ensureIndex({"employmentHistory.company": 1})

// Map fields, treating map values like child objects
db.employees.ensureIndex({"badges.TOP_EMPLOYEE.date": 1})

For our purposes, we are actually able to index every key individually, and treat the map as an object with known properties.

The sinister part of all of this is that you won’t normally run into any problems while doing typical testing of the system. And in the case of the pull/read/push diagram above, even typical load testing wouldn’t necessarily exhibit the problem unless the reader is checking every response and looking for inconsistencies between subsequent reads. In a complex system, where data is expected to change constantly, this can be a difficult bug to notice, and even more difficult to track down the root cause.

There are ways to make MongoDB work really well as the primary store, especially since there are so many ways to update individual records atomically. However, working with MongoDB and other NoSQL solutions requires a shift in how developers think about their data storage and the way they write their query/update logic. So far it’s been a joy to use, and getting better all the time as we learn to avoid these types of problems in the future.

Third-Party Front-end Performance: Part 1

Bazaarvoice is a third-party application provider. We have a growing number of applications running on our own domain, but our core business is providing user-generated content applications and widgets that are hosted by us, but run on our clients’ webpages. Scaling an application platform of our size certainly has its challenges at the data layer. We use all the coolest noSQL tools and post up big request per seconds numbers every month, but an oft forgotten part of the story lives long after all of that is cached and delivered:

On the front end.

Steve Souders, the guy who (literally) wrote the book on web performance, estimates that 80% of average page load times actually occur after the markup is completely downloaded. That means a 50% speed up in your front-end code is going to mean a lot more than a 50% speed up in your backend code. This is important stuff!

I’d like to go through some of our front-end performance considerations and explain the optimizations that we take, or the ones we’d like to start taking soon.

I recently read that “the history of web development is just increasingly difficult ways of concatenating strings together.” So we’ll start at that point: We’ve got everything ready to go on the server and now we have to put it on client’s site and make it work.

From this point, we’ll identify three distinct areas of the front-end performance:

  • The network
  • Parse and evaluate
  • Application responsiveness

This post will cover The Network and the other two will be posted as continued parts.

The Network

The network is by far the most commonly discussed area of optimization on the client. That’s for good reason, though, as it’s often the bottleneck. There is really no great ‘secret’ to making this fast, though. The smaller your payload is and the fewer amount of files that you request, the faster your page will generally be. There are lots of articles on reducing image sizes, and spriting css, and you should read those, but I wanted to focus on some less commonly talked about techniques that are especially important in third party applications.

The only non-obvious part of this equation to me (in regards to 3rd party code) is the caching as well as the cacheability of your resources. There are a few unique problems when doing this as a third party, but more or less, the higher percentage of cacheable static resources that make up your app, the more opportunity you have to make the network problem a lot less scary/slow. There are whole books and companies who specialize in this stuff, but we’ll go through at a high level.

— Obligatory Wu-tang Cache Rules Everything Around Me reference. —

Edge caching

One of the easiest wins is using a CDN that implements what’s known as ‘edge-caching.’ This is the practice of a putting a distributed system of cache servers around the world (or country, etc) and serving content from the geographically closest server to each request. This not only decreases load on individual servers, but it also vastly improves latency of requests.

Note that edge-caching is still not quite cached on a user’s computer, but still on a server in the CDN. Even if you have amazing cacheability in your app, you can still benefit from edge-caching for the initial load of the application (when the cache isn’t primed).

Bazaarvoice does this with almost all of its content, even relatively dynamic content. We are one of the largest users of the Akamai CDN, and well over 90% of our total traffic actually just hits Akamai servers and never makes it back to our origin servers.

You may be familiar with something like Amazon S3 to serve your static files. S3 is not distributed by default, but adding on the CloudFront option to your Amazon S3 buckets will achieve many of the same results (assuming you serve from the CloudFront URLs).

Maximum Cacheability

The content in most web apps changes frequently. However, the application really only changes when new code is written and pushed (read: hopefully far less frequently). In an ideal world, everything except for the data would be cached for a length of time (in the end-user’s browser).

The larger the percentage of your application that you can statically generate, the more you can cache.

This seems pretty easy at a glance, but can actually get a bit hairy in a third party environment. Due to the fact that the application is included via JavaScript, the markup for the page is not generally served in the initial request and is injected afterwards instead. This differs from normal webapps that can send their rendered markup along with the initial page request.

Dynamic Content

Currently, Bazaarvoice’s template system integrates with our Java backend, so after the static resources are loaded, we make a request for the rendered content of the page we’re on. This comes back to us as a string, which is saved to a javascript variable.

// simplified example - uncacheable data and templates
var html = "<ul id=\"reviews\"><li>review 1</li><li>review 2</li></ul> ";

Exactly how this works is unimportant, but the key is that this entire additional request is not cacheable (at least for any significant amount of time). Since the data in the rendered templates changes, we not only have to send the fresh data on each request, we also have to send up all the templates each time, multiplied times the number of times they are invoked. Given a significant amount of data, this can add quite a bit of weight to these requests. GZip can help quite a bit here with repeated data, but if you also factor in all the escaping characters to save these as strings in JavaScript, you can see where this could reduce performance by increasing the file size of noncacheable requests.

For the most part, in practice, we don’t serve so much data in an individual request to be effected by this problem on an alarming level, but it’s always nice to optimize further, and we’d love to save the money on bandwidth costs.

In our simple example above, we had the added duplication of a single set of <li> tags. The solution to only sending that once and caching it after the initial load is client-side templating.

If we are able to include our templates in our cached, static resources, you only need to request them once. So even though the library to render them has some cost, it usually pays off quickly (often still on the initial load).

// cacheable template
var tmpl = "<ul id=\"reviews\">{{#each reviews}}<li>{{name}}</li>{{/each}}</ul>";

// uncacheable data 
{ reviews: [{ name: "review 1" }, { name: "review 2" }] };

We’ll get into the actual performance and responsiveness of the app in a later post, but note that in production, these templates can actually be ‘pre-compiled’ into javascript functions that run very quickly (more or less as quickly as string concatenation can take you). Also in production, the templates and data are much larger, so this reduction pays off at scale.

In a third-party application that can cache its templates alongside the application code, the data (usually JSON) is the only unique and the only non-cached bytes going over the wire after the initial load. This is great. I’ll leave optimizing data payloads to a different blog post, but just make sure you only grab the minimum amount of data that you need.

For extreme performance (and SEO to boot!), a third party tool could also integrate the markup injection on the server-side. The markup would go in the initial payload and be instantly visible upon page load (great perceived performance). You would have to write your JavaScript in a way that could bind to pre-rendered templates, but it’s all possible. This is usually a more difficult thing to get clients to agree to. You end up duplicating the markup in loops, like in our first example, but you also save an http request.

Since the client can cache the request for every end user on their app server this ends up being by far the fastest integration in practice. This is a good complement for a high-performance third-party system, even if it can be tough to get clients to agree to.

Data Caching too!

If you want to go even further (and I very much encourage it), you may want to look into caching your data too. This wouldn’t do much for new page loads, but it could really help for common widgets across pages, or reloads/revisits of the same page. Anything that doesn’t take a significant amount of code that can reduce origin requests is probably going to be worth it eventually at scale.

Consider a “Most Recently Reviewed Products” widget that goes across every page on a site. This widget would list 5 products or so. The recency of the products in the list is important, but the exact accuracy is likely not important within a few minute period (unless you’re an unusually high-review-volume webpage). Unfortunately, though, people usually visit quite a few pages in that same time period.

As a third-party, you’re more than likely relying on JSONP to request this information for the widget. Each subsequent page load is highly cached (static resources) because you’re doing all the things we’ve mentioned previously, but you’re forced to re-request this redundant data each time.

I know what you’re thinking…

“Since JSONP injects a script tag under the hood, it is just a GET request for a “javascript file” that can be cached by the browser.” — you

While this is technically correct, it’s usually difficult to actually realize in the real-world for a couple of reasons.

At Bazaarvoice, and in many (read: most of the ones I’ve worked with) APIs, all requests are sent with a ‘no-cache’ header. This stops all caching from occuring. If you do have an API that allows browser caching you’re one step closer, but still usually have another issue.

Most people are using jQuery or other AJAX libraries in order to make JSONP requests to the server. Unfortunately for fans of API caching, these libraries almost always bust your cache, either intentionally, or unintentionally.

This code:

$.ajax({
  url : "http://bestreviewapiever.com/api", 
  data : {"filter": "recent"}, 
  dataType : "jsonp" 
});

actually results in a request to:

http://bestreviewapiever.com/api?filter=recents&callback=jQuery171984098_234234&_=98723498723

And every time you request it, those last two numbers will change. Even if you set the cache option to false – you still end up with unique callback values in each request. This will still bust your cache.

There’s a touch too much code to include here, but the solution is to set cache to false as well as override the jsonpCallback with a value that stays the same. However, you must then manage callback collisions yourself. That has some significant code behind it.

If you successfully implement all that, this means that, as long as edge cases don’t arise, the url will be the same each time, and the jsonp requests would be cache-hits for the cache TTL of the api server.

Another approach that we are in the process of testing and implementing at Bazaarvoice is utilizing localStorage and sessionStorage.

There is not a lot of detail needed, but in browsers that support these DOM Storage utilities, you can write logic to check them for data before making JSONP requests, and invalidate the data after a given time. Make sure you clean up old data when you invalidate, because space is limited, but this is a nice solution, because the fallback is just the natural case.

Currently I’d suggest using sessionStorage with an appropriate TTL on your data. That way it would get cleaned up automatically at the end of the ‘session.’ This solution seems a lot more elegant to me than api cache-headers and JSONP overrides.

Now, after the first request for our data, each subsequent page load has a cache-hit (for our allotted interval) and saves a request from occurring. FASTER!

BONUS TIP!!

If you have very real-time data, don’t count out this method.

One very slick ‘realtime-y’ touch is to have visibly updating data. In our example, even if we wanted to make an uncached request each time, we could instantly display the old data on page load from sessionStorage, wait until all the critical page elements were fully loaded, as to not block their speed, and then load in the new data. If the data was updated, you can put in a smooth update transition to let the user know that your app is awesome and real time. Best of both worlds!

Cache-busting, or “how long do I cache?”

“But Alex! How long do I cache my resources? If I cache them too long, then when I update my app, everyone will have different files!” — you, again.

Very insightful of you. In Bazaarvoice’s current system, each file has a relatively low cache max-age (it varies, but usually between no-cache and ~2 hours). This means that we can be sure that all the files that people have after 2 hours of pushing a change will be the most current one.

This actually ends up being fairly practical. For the most part, people are done browsing a site well before their cache runs out. Then the next time they just have to get it again one time. But we can do better! 😀

We’re working on a new system for caching that affords us more update control, and much higher cache-hit rates. Win-win!

The general idea is to have a single “bootstrap” JavaScript file that we refer to as a “scout” file. This file is usually unique for each our clients, and kept as small as we can keep it. It’s usually included via a script tag (partially for ease of implementation), but could be injected asynchronously, as well.

** Including it as a script tag (synchronously) has the added benefit of caching the dns lookup for the domain in a single request, which should actually speed up subsequent requests in many browsers.

This scout file should have a very low cache max-age — in our case, somewhere in the ballpark of 0 to 5 minutes — depending on the need for control.

<script src="http://bazaarvoice.com/static/scout.js"></script>

EVERYTHING ELSE IS CACHED FOREVER.

In this “scout” file, we kick off our real application by asynchronously injecting our built core javascript file. Concatenation of this main application file and its dependencies is well understood and necessary, so I’m not covering it in this post, but it is essential to good performance.

The trick is that at build time, we embed a version number into our scout file. Then the asynchronous inject function uses this version number as part of the request for the file. This can be done by putting each build in a folder with its version as the name of the folder, or it can just be done via url-rewrites.

I would encourage you to not use the query string (get params) in order to do cache busting, as VPNs and other old browser situations can sometimes ruin the cache, and get you into weird situations.

The affect of this setup is that we have a single, very small file that is constantly looking for updates (hence: “scout file”). It generally is still cached for the average ‘time on site’ of a consumer on one of our clients’ pages, but not much longer. However, if the version of the build doesn’t change, then all the requested scripts after that will be cached forever, or until the version changes again.

If you don’t push a new build, a visitor can still have cached content from years prior and only have to download the “scout” file, but at any point in time, if you run a build, their cache is busted and they get all the new files within the amount of time the scout file was cached for.

// An example of a simple scout file 
(function(){ 
  var VERSION = "12345";

  function injectScript(url){ /*… async script injection …*/} 

  injectScript("http://bazaarvoice.com/static/"+VERSION+"/widget.build.js"); 
})();

This obviously becomes more complex when you have multiple scripts loading. Bazaarvoice uses the require.js library, and its baseUrl configuration option is more or less the only place we end up having to update to make this work for us. It may be more complicated in more hand-rolled solutions, but still very worth it.

The numbers in the end are generally very much in your favor when taking average browsing habits into account. You end up with a few small non-cached hits, but a much better long tail of cacheability of the large application files — with the added benefit of quickly being able to force an update in the event of an emergency.

Conclusion

We glazed over the reducing file-size and concatenation section in half of a paragraph, but I’d like to reiterate that this is just not optional if you care about performance. Your mechanism for doing this may vary, and we don’t mind that, as long as it works for you.

I would however like to point out that you should be constantly testing the performance of your app and identifying your bottlenecks specifically. So often I see people byte-shaving licenses at the tops of their libraries while simultaneously sending improper cache-headers.

The key to tackling performance is always tackling the bottleneck first. Then you’ll have a new bottle neck to tackle.

You can measure network performance of all these techniques in your browser. Google Chrome (and other webkit based browsers) as well as the Firebug extension on Firefox can give you great insight in to the latency, time-to-load, and file size of your files (in the Network tab). You can export this data as a .har file and then import it into a variety of other evaluation tools if you need more info.

At Bazaarvoice we’re tackling our performance issues so we can help give the best experience for our clients’ users.

Remember: after you have a responsible amount of http requests and total file size, cache as much as you can, and then cache more. It’s as simple as that.

Next

Stay tuned for information on Parsing and evaluation, and application responsiveness.

Ember.js — the framework formerly known as SproutCore 2.0 (Amber.js)

Update: It looks like there was a conflict with the name Amber with another JavaScript project, so the SproutCore folks have graciously decided to change their name to Ember.js so as not to be in conflict. Check out the details at Yehuda’s blog.

The following is the original blog post:

Yehuda Katz has just officially announced the rebranding of SproutCore 2.0 to Amber.js.

This is refreshing for all the reasons he mentions in his post, especially when attempting to learn new things about the framework. It gets confused with SproutCore 1.0 quite a bit, and I can imagine trying to find answers about Amber.js will be a much easier experience!

Bazaarvoice’s newest product, Customer Intelligence, has been using SproutCore 2.0 since well before its release, and one of the primary developers on that team, Daniel Marcotte, spent a fair bit of time over several months working with the nascent framework and helping to get Beta 2 out the door. Look for an upcoming post where Daniel goes into much more detail on that process.

Will it be usable over dualip? (Vista always started downloading stuff)Will it be possible to turn off indexing? (Can’t concentrate on my work with the chattering the whole time)

Daniel Marcotte on SproutCore 2.0 (now Ember.js)

Daniel Marcotte is one of the top developers on our latest product offering, Customer Intelligence (CI). He spent quite a bit of time evaluating SproutCore 2.0 for the complex user interface requirements of CI, helping to work out some of the kinks in the product as it moved toward beta.

Daniel has recently moved to San Francisco to work on a new project — still for Bazaarvoice, of course! Before he left, I had a chance to sit down with him to chat about his experience working with the framework.

MDN: Welcome, Daniel! Thanks for letting me take some time out of your hectic schedule to get you on record, so to speak.

DM: Ah, my pleasure, Michael. Thank you for doing all the hard work.

MDN: No, no, not at all. You did all the hard work already — I’m just gonna put my name on it!

DM: I’m a team player; you go right ahead!

MDN: Awesome! So let’s get right down to it. You switch seamlessly between front-end and back-end development, and you’re very passionate about the user experience. A lot of developers in the industry want to work on back-end systems and don’t seem to get as excited about building great UIs. So, what do you think contributes to your passion? I’m really interested to hear what drives you because it also tends to drive the Web UI frameworks people choose.

DM: Yeah, that’s fair. I’ve actually given this a little bit of thought because my new manager asked me something about this recently. Something like, ‘Wow, you know, you’re really a UI guy. That’s great; how did you get into that?’ And I had kind of a funny moment, thinking it might be fruitful to start thinking of myself as a UI guy because then it’ll focus me on taking time to even go the extra level. Maybe read some stuff about user experience and even more wacky technologies. Because, I hadn’t traditionally done that. So how did I even end up in this position where someone would think, ‘of course you’re a UI guy’?

I think it’s because I don’t wanna do a shitty job. So if you’ve been tasked with building a UI feature, it needs to work. And it needs to be attractive. It needs to be usable. You need to put yourself in the user’s shoes.

MDN: It’s amazing, but most people don’t think that way. It seems like you actually know a little bit more. Have you actually done any reading and learning about any of this?

DM: I have, yeah…some. But nothing extensive. It’s a bit intuitive which I think gets people relatively far. And I think it has a lot to do with just being honest with yourself, of trying to put yourself in someone’s shoes maybe.

I have a bit of a teaching background. It’s one of my great pleasures. (Let’s ramble a little bit.) And one of my favorite things was being able to find a way to explain something to someone who it was completely new and novel to. To never forget what that was like to learn it for the first time. ‘Cause you can get lost and be an expert and think, ‘oh, it’s obvious, it’s fine.’ And I think that bleeds into UI sometimes. ‘Oh, I know how to use this.’ And that’s not the bar we’re trying to meet here.

MDN: You were instrumental in choosing SproutCore 2.0 for CI. What other Web UI frameworks did you and the team look at before deciding on Sproutcore?

DM: I looked at what Dojo was offering because it has the same full app/infrastructure stuff. It was a touch inscrutable, but really appealing. But it kind of seemed to go down the Ext-JS packaged-widget route. And I knew that Alex Zelenak, our UX designer, would come in and do a lot of custom stuff. And the other big driver, though I’m perceived as choosing this, I was really just the person who made it possible. It was really Alex Sexton that said, ‘this is the one; this is it for these reasons.’ So that kind of focused it. And it sort of became the one, and it needed to lose for some reason as opposed to beat something else.

MDN: Did Alex talk about Handlebars at all, because I know that he’s using it in other places.

DM: Oh, that’s interconnected. You use Handlebars templating inside SproutCore.

MDN: I’m just showing my ignorance!

DM: It’s like a specialized version of Handlebars that’s bindings-aware. So as your values change, your templates re-render.

MDN: You talked about this a little bit, but does SproutCore have widgets too?

DM: No. They’d like to get there, but it’d be like a UI layer. This is more like the structure/spine of the app.

MDN: So other than Alex’s stamp of approval, what are the other reasons you might use it in the future?

DM: I like the way it makes me think. I like the flavor of thinking it lends itself to. This disconnect with the bindings…. (Let me see if I can package this a little bit.) Trying to wrangle web UI using HTML, CSS, JavaScript without making the classic disaster, the classic mess, has been a mission of mine for a while. What’s the answer? What’s the secret? How do we make this maintainable and obvious to the next developer? Extensible, that sort of stuff.

And one of the big realizations I had a while ago was, with Don’t Repeat Yourself, the DOM, it’s the boss in so many things. I have a list of things I’m showing the user, and I don’t wanna duplicate in JavaScript having another list of things and try to synchronize those two. It’s much more a matter of saying something like… DOM: you’re the boss. You’re displaying it. You have these nodes that represent my things. So if I need to reevaluate, I’m just gonna ask you, ‘Hey dude, what’s my list look like?’ OK, I’m gonna modify it inside you ’cause I wanna display it. ‘Cause you’re the boss. And so in practice this works out really well.

MDN: ‘Cause that’s kind of how JQuery works because you have this separate set of code over here and it’s overlayed on top of the DOM.

DM: Yeah exactly. But for some richer apps and much more complex apps, asking the DOM those questions gets more complicated ’cause stuff is kinda all over the place. And so you end up with these really bonkers selectors. And it’s slow to be constantly asking the DOM these questions to see what’s going on with it. So then you start to pull back and say, ‘Oh, crap, I wish I had an object representation of this in my hand!’ Is it worth writing this connective tissue? And so now, enter SproutCore, and the app is sort of pure JS and then you bind it to the UI and you don’t maintain that connection; SproutCore does. Which is cool.

So the classic collections example that ties into the list example earlier, where before the DOM was the boss, now javaScript’s the boss. I have this list and I’m gonna monkey with it. Using observers and bindings, SproutCore’s gonna hold onto the fact that the list corresponds to something in the DOM, that’s rendered by a template. When the data changes, the template is called to re-render that piece.

MDN: So SproutCore owns the DOM then.

DM: Effectively, and it can do smart stuff like buffer your updates. So it’ll just be like manually updating the DOM, but will be a lot faster. Then you kinda get into their flavor of MVC. SproutCore MVC gets quite appealing where you can really build an app that does its thing and then you can write unit tests against that. And when you bind the UI on it, that’s just another view. But independently, you have this app that you can validate does its thing and is independent of how it’s displayed. You do have to make sure you structure it carefully enough that you’re not getting into circular references in your bindings, but it’s super-appealing that your templates are HTML and you just bind them in with the app and there’s a lot of really great ideas. It’s technically very satisfying.

MDN: And do you just bind it on top of a div element, because you’re going to have to put it in somewhere, right? So there’s some ID where you say, hey put this over here in this spot?

DM: You have a script block that has a type “text/html” or “text/x-handlebars”. And one of the first things SproutCore does is it looks at the page and any script block of that type, if it has an ID on it, then it assumes you’re gonna reference this later and so it will just pull it out of the DOM and store it compiled. Otherwise, it’ll replace it inline. You get to write very naturally, right inline, if you’re not reusing that component. It’s got a nice flow to it where it will just replace that piece with the markup it corresponds to.

MDN: Did you run into any problems with SproutCore’s architecture as you were building out the system at all?

DM: So, to even decide if we wanted to do this, I threw together a prototype that recreated what the pilot’s UI — the interested parts — did. And I was really able to see the value because we’re managing different views on the same data. And the user gets to toggle displayed data and interact with what they’re seeing. And so this idea of an app that will hold that state and where you can just swap out these views, it’s great because you can just say “app, here’s some data, do your thing” and now you’re not fetching from the server again and again if you just wanna re-slice something or have a different look at it, you have all that power easily in the client side.

So that was all going really well, but then I’d run into mundane stuff. Like: you’ve sent in a collection and you’re running over it and the designer wants to number each row. Perfectly reasonable request, but the context that you have in the collection rendering is just the collection item that you’re on. So do you artificially start adding indexes to your objects externally? Or do you say we should really have access to the index? And so that was kinda fun; I submitted a patch which exposed the current index when rendering a collection. Another example is that Tapestry [the server side web framework in this project] hosed us because you have almost no access to the head of the documents that it generates, which in general isn’t a problem, but in one of the early versions of SproutCore, it was only looking for stored templates in the head, which was an artificial restriction.

MDN: But not really any big pains or roadblocks. Nothing you couldn’t work around or fix.

DM: There was naturally a bit of instability in a few of the alpha versions I was using. And then there’s a tougher learning curve that goes with that: you can imagine being unable to distinguish between a flaw in understanding and a flaw in the library. But that’s come a long way.

There are also things that it really super-facilitates incredibly, but as with a lot of these things, it can mean limiting you in other ways, like there being hoops to jump through to integrate with say a select control.

And then we got into trouble where other developers started building out what I had there, and it wasn’t obvious that this code worked differently than they were used to. I hadn’t quite gotten it polished to the point of “this is how it’s done and you grow it from here and follow these patterns.” I hadn’t quite gotten there. The schedule comes in and you end up with this problem. So there was a bit of: “I wanna do this thing, and I can picture that vision in the ‘JQuery way.’ Well now, what’s the ‘SproutCore way’ of doing exactly that?” And you end up kind of cheating and creating artificial properties that you can bind to and kind of tricking it into doing things the ‘JQuery way’ instead of doing what is really the ‘SproutCore way’. And that’s really challenging.

MDN: Well, it’s the speed of sort of doing it versus doing it in a way that it’s gonna be good later.

DM: Yeah. Do you want the speed now, or more of it later? And that’s a really hard choice to make. And I’ve been ruminating on whether that’s a flaw on our part, or on the library’s part. How do we keep pushing that stuff forward and make it even better? I’m not sure.

MDN: You made a pretty concerted effort to get SproutCore out of beta and into mainstream for release 2.0. So, you actually fixed a couple of issues. How did that process go? How did you work with the team and get them to take your patches?

DM: Contributing to open source had been mysterious to me for a long time. And I’ve met a lot of people who are also like, ‘oh I want to contribute to open source, but I kinda like don’t know how. And IRC makes me cross-eyed.’ What do you do? In a lot of ways, this was just sort of really kinda lucky — a nice sort of confluence of events. And in my mind some of the things that made it possible was actually meeting Tom Dale and Yehuda Katz (who are great, by the way) and shaking their hands [at TXJS]. And they don’t have to remember it, but that’s kinda your way in later to politely say, ‘Hey, remember when I expressed interest in your framework? I’m using it and I don’t need anything from you. I’m just sharing good news. I’m not after you for anything.’

And then, GitHub makes a lot of this brilliantly easier. You make your fork, you monkey around, you learn something. Figuring out how Github works is pretty straightforward. You still see some trolling and joke pulls and this sort of stuff, but it’s kind of neat and friendly once you get your head around it. So then to contribute to a project: Keep it small. Keep it focused. Explain why. Write a unit test for whatever you’re putting up. And then you just post it, and then you wait.

I also started a bit of not-constant (it was a real priority for me not to bother these guys) but still regular e-mail correspondence with Tom Dale, so I think for my initial patches, the SproutCore guys were able to connect them with me. So I think it sort of gets you in the door that you’re not just some random unknown pull request. And then it was pretty easy. It was super satisfying to get even simple messages like, ‘Hey, thanks!’ And then they merge in your code. and that’s pretty cool.

Being able to take some time to work on that at BV is brilliant. People knew that I was investigating some of this stuff and there was a lot of support. It was cool. And John [Daniel’s manager] got a lot of satisfaction seeing my patches going in and that sort of stuff. So yeah, it was really cool.

I still think navigating all this is kind of hard. It’s social. You get self-conscious someone’s probably going to be a dick to you. Someone’s probably going to ignore you ’cause you’re nobody. But this experience was really good so it’s worth trying more. And I would encourage other people too: find something you’re passionate about, some little tool you use you wish worked a bit different. You’re probably bright enough at this point; you can actually figure out how to make a useful change, even just for yourself initially.

MDN: What kind of support did you get from the rest of the R&D organization here? You mentioned you could do some of it during work hours, which is great.

DM: I just tried to not be stupid about it. As long as I was being productive and watching my diminishing returns level, I was free to put in the time I needed. The VP of Engineering said we should at least try and do something interesting. He wanted to know “Is that feasible?” And so that’s the kind of support. I’m like yeah, sure, I believe that it’s feasible and I have a vision for walling it off so we’re not betting the whole product on it. I can try, but you’ve just introduced some risk that we may have to pay for, we may have to mitigate. And he was fully supportive, and so when that directive to try something fresh comes from that high up, that’s kind of the definition of support. You know what I mean? And I felt like I had the freedom to at any time make the call that we were going to bail on that course of action (the diminishing returns thing). I don’t think anyone would have turned on me and been like, ‘what the hell, Daniel, you didn’t even use that technology. You spent all that time on it.’ I think they would have been like, ‘Thank you for actually evaluating this and making a technical decision.’

MDN: So, would you recommend SproutCore 2.0 for everybody?

DM: I have no idea. It was still in its infancy during this project, so “everybody” seems a bit strong. But I really enjoyed using it, and I’m looking forward to using it more and watching it grow.

MDN: Thanks Daniel, for taking the time to chat with me! It was very educational.

 

Daniel brought in his laptop and showed me https://github.com/dmarcotte/ember-chess. Specifically, we walked through his very informative presentation in just a few minutes: https://github.com/dmarcotte/ember-chess/blob/master/presentation.js.

 

Thank you for sharing this interview. I’m evaluating ember now, and it really excites me that we are finally seeing an appeal to structure and proper separation of concerns on the client-side in JavaScript.

I too am a “hybrid” developer of sorts, with the user experience being at the forefront of my passion and enjoyment. It’s a good place to be nowadays; I remember not too long ago when the art of UI was not getting the respect it deserves. I think Steve Jobs did a good job of helping correct that.

Anyway, having a fellow UX afficianado come off a successful project with ember is pretty encouraging. I’m now pretty confident I’m heading down the right path.

P.S. ctrl + h, sproutcore, tab, ember, enter 😉

Using the Cloud to Troubleshoot End-User Performance (Part 2)

This is is the second of a two-part article that outlines how we used a various set of tools to improve our page load performance. If you haven’t read part 1, go ahead and give it a read before continuing.

Tactics

We opted to not use our normal staging environment for this project, since our staging environment doesn’t run experimental code.

In order to iterate rapidly on our changes and to provide a location that is publicly accessible over the web (so that WebPageTest can see it), we set up an Amazon EC2 instance running a complete copy of all of our software so that it effectively behaved exactly like a local developer instance with the exception that it could be hit from any external resource on the web. (Heh… I make this sound really easy)

So now that we have a server on the web running a customized version of our software, the problem is now making requests that normally go to our production datacenter get redirected to our EC2 instance without redirecting real end-users. In my opinion, this is where the capabilities of WebPageTest really shined and began to flex it’s muscle.

Let’s say that under normal conditions, your production application runs at foo.com/123.456.789.1 and that the EC2 instance you created and that is running a customized version of your app is running at ec2-123-456-789-2.aws.com/123.456.789.2. WebPageTest will allow you to override DNS resolution for foo.com to 123.456.789.2. This works in a similar manner to a host override except that WPT will still have the browser perform a DNS lookup of your production host so that you still get accurate timings for the DNS resolution.

To take advantage of this, you need to provide the following “script” to your test execution:

navigate http://www.external-site-that-includes-your-code.com
setDns foo.com 123.456.789.2

The other cool thing about WebPageTest is that the test execution and results parsing can be scripted via their REST-like APIs. In fact, check out this java class gist I wrote (embedded below) that makes use of this API to run some aggregated stats of the usage of Twitter on AOL.com. This class allows you to more easily view aggregated statistics for a narrow set of the resources that are actually used by a page — assuming that you only care about the resources being delivered by your servers into another page.

package com.bazaarvoice.mbogner.utils;
import org.apache.commons.httpclient.HttpClient;
import org.apache.commons.httpclient.HttpMethod;
import org.apache.commons.httpclient.methods.GetMethod;
import org.apache.commons.httpclient.methods.PostMethod;
import org.apache.commons.io.IOUtils;
import org.apache.commons.lang.StringUtils;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathFactory;
import java.io.ByteArrayInputStream;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Iterator;
import java.util.Set;
/**
* @author: Matthew Bogner
* @date: 10/25/11
*/
public class WebPageTestRunner {
public static HttpClient client = new HttpClient();
public static XPath xpath = XPathFactory.newInstance().newXPath();
public static Set<String> hostnameFilter = new HashSet<String>();
public static void main(String[] args) throws Exception {
hostnameFilter.add(“cdn.api.twitter.com”);
hostnameFilter.add(“platform.twitter.com”);
PostMethod post = new PostMethod(“http://www.webpagetest.org/runtest.php”);
post.addParameter(“url”, “www.aol.com”);
post.addParameter(“runs”, “1”); // Only request the page once
post.addParameter(“fvonly”, “1”); // Only look at the FirstView
post.addParameter(“location”, “Dulles_IE7.DSL”); // Dulles, VA with IE7 and DSL connection
post.addParameter(“f”, “xml”); // Respond with XML
post.addParameter(“k”, args[0]); // API Key from WebPageTest.org
// Now we need to (optionally) send over a script so that we can
// override the DNS entries for certain hosts that the page will
// attempt to reverence.
//post.addParameter(“script”,
// “navigate http://www.aol.com\n” +
// “setDns platform.twitter.com 123.456.789.2\n” +
// “setDns cdn.api.twitter.com 123.456.789.2”);
String responseBody = executeHttpMethod(post);
System.out.println(responseBody);
Node statusCodeNode = (Node) xpath.evaluate(“/response/statusCode”, getXmlSrc(responseBody), XPathConstants.NODE);
String statusCode = statusCodeNode.getTextContent();
System.out.println(“StatusCode = “ + statusCode + “\n”);
if (“200”.equals(statusCode)) {
// Request was successful. Wait for the test to complete.
Node testIdNode = (Node) xpath.evaluate(“/response/data/testId”, getXmlSrc(responseBody), XPathConstants.NODE);
waitForTestCompletion(testIdNode.getTextContent());
}
}
private static InputSource getXmlSrc(String content) throws Exception {
return new InputSource(new ByteArrayInputStream(content.getBytes(“UTF-8”)));
}
public static String executeHttpMethod(HttpMethod method) throws Exception {
int responseCode;
String responseBody;
try {
responseCode = client.executeMethod(method);
responseBody = IOUtils.toString(method.getResponseBodyAsStream());
} finally {
method.releaseConnection();
}
if (responseCode != 200) {
throw new Exception(“Invalid server response. \nResponse code: “ + responseCode + “\nResponse body: “ + responseBody);
}
return responseBody;
}
private static void waitForTestCompletion(String testId) throws Exception {
PostMethod post = new PostMethod(“http://www.webpagetest.org/testStatus.php”);
post.addParameter(“f”, “xml”); // Respond with XML
post.addParameter(“test”, testId);
String responseBody = executeHttpMethod(post);
Node statusCodeNode = (Node) xpath.evaluate(“/response/statusCode”, getXmlSrc(responseBody), XPathConstants.NODE);
String statusCode = statusCodeNode.getTextContent();
// 200 indicates test is completed. 1XX means the test is still in progress. And 4XX indicates some error.
if (statusCode.startsWith(“4”)) {
System.err.println(responseBody);
throw new Exception(“Error getting test results.”);
} else if (statusCode.startsWith(“1”)) {
System.out.println(“Test not completed. Waiting for 30 seconds and retrying…”);
Thread.sleep(30000); // Wait for 30sec
waitForTestCompletion(testId);
} else if (“200”.equals(statusCode)) {
obtainTestResults(testId);
} else {
System.err.println(responseBody);
throw new Exception(“Unknown statusCode in response”);
}
}
private static void obtainTestResults(String testId) throws Exception {
GetMethod get = new GetMethod(“http://www.webpagetest.org/xmlResult/” + testId + “/”);
String responseBody = executeHttpMethod(get);
Node statusCodeNode = (Node) xpath.evaluate(“/response/statusCode”, getXmlSrc(responseBody), XPathConstants.NODE);
String statusCode = statusCodeNode.getTextContent();
if (!“200”.equals(statusCode)) {
System.err.println(responseBody);
throw new Exception(“Unable to obtain raw test results”);
}
NodeList requestsDataUrlNodes = (NodeList) xpath.evaluate(“/response/data/run/firstView/rawData/requestsData”,
getXmlSrc(responseBody),
XPathConstants.NODESET);
for(int nodeCtr = 0; nodeCtr < requestsDataUrlNodes.getLength(); ++nodeCtr) {
Node requestsDataNode = requestsDataUrlNodes.item(nodeCtr);
String requestsDataUrl = requestsDataNode.getTextContent().trim();
analyzeTestResult(requestsDataUrl);
}
}
private static void analyzeTestResult(String requestsDataUrl) throws Exception {
System.out.println(“\n\nAnalyzing results for “ + requestsDataUrl);
/*
Things we want to track for each hostname.
Total # requests
Total # of requests for each content type
Total # of bytes for each content type
Total Time to First Byte
Total DNS Time
Total bytes
Total connection time
*/
HashMap<String, Integer> numRequestsPerHost = new HashMap<String, Integer>();
HashMap<String, HashMap<String, Integer>> numRequestsPerHostPerContentType = new HashMap<String, HashMap<String, Integer>>();
HashMap<String, Integer> totalTTFBPerHost = new HashMap<String, Integer>();
HashMap<String, Integer> totalDNSLookupPerHost = new HashMap<String, Integer>();
HashMap<String, Integer> totalInitialCnxnTimePerHost = new HashMap<String, Integer>();
HashMap<String, HashMap<String, Integer>> totalBytesPerHostPerContentType = new HashMap<String, HashMap<String, Integer>>();
HashMap<String, Integer> totalBytesPerHost = new HashMap<String, Integer>();
String responseBody = executeHttpMethod(new GetMethod(requestsDataUrl)); // Unlike the rest, this response will be tab-delimited
String[] lines = StringUtils.split(responseBody, “\n”);
for (int lineCtr = 1; lineCtr < lines.length; ++lineCtr) {
String line = lines[lineCtr];
String[] columns = StringUtils.splitPreserveAllTokens(line, “\t”);
String hostname = columns[5];
String contentType = columns[18];
String ttfb = StringUtils.isBlank(columns[9]) ? “0” : columns[9];
String dns = StringUtils.isBlank(columns[47]) ? “0” : columns[47];
String cnxn = StringUtils.isBlank(columns[48]) ? “0” : columns[48];
String bytes = StringUtils.isBlank(columns[13]) ? “0” : columns[13];
if (“0”.equals(bytes) || (!hostnameFilter.isEmpty() && !hostnameFilter.contains(hostname))) {
continue;
}
// Track total # requests per host
if (!numRequestsPerHost.containsKey(hostname)) {
numRequestsPerHost.put(hostname, new Integer(1));
} else {
numRequestsPerHost.put(hostname, numRequestsPerHost.get(hostname) + 1);
}
// Track total # requests per host per content-type
if (!numRequestsPerHostPerContentType.containsKey(hostname)) {
HashMap<String, Integer> tmp = new HashMap<String, Integer>();
tmp.put(contentType, new Integer(1));
numRequestsPerHostPerContentType.put(hostname, tmp);
} else if (!numRequestsPerHostPerContentType.get(hostname).containsKey(contentType)) {
numRequestsPerHostPerContentType.get(hostname).put(contentType, new Integer(1));
} else {
numRequestsPerHostPerContentType.get(hostname).put(contentType, numRequestsPerHostPerContentType.get(hostname).get(contentType) + 1);
}
// Track total # bytes per host per content-type
if (!totalBytesPerHostPerContentType.containsKey(hostname)) {
HashMap<String, Integer> tmp = new HashMap<String, Integer>();
tmp.put(contentType, Integer.valueOf(bytes));
totalBytesPerHostPerContentType.put(hostname, tmp);
} else if (!totalBytesPerHostPerContentType.get(hostname).containsKey(contentType)) {
totalBytesPerHostPerContentType.get(hostname).put(contentType, Integer.valueOf(bytes));
} else {
totalBytesPerHostPerContentType.get(hostname).put(contentType, totalBytesPerHostPerContentType.get(hostname).get(contentType) + Integer.valueOf(bytes));
}
// Track total TTFB for host
if (!totalTTFBPerHost.containsKey(hostname)) {
totalTTFBPerHost.put(hostname, Integer.valueOf(ttfb));
} else {
totalTTFBPerHost.put(hostname, totalTTFBPerHost.get(hostname) + Integer.valueOf(ttfb));
}
// Track total DNS lookup time for host
if (!totalDNSLookupPerHost.containsKey(hostname)) {
totalDNSLookupPerHost.put(hostname, Integer.valueOf(dns));
} else {
totalDNSLookupPerHost.put(hostname, totalDNSLookupPerHost.get(hostname) + Integer.valueOf(dns));
}
// Track total initial connection time for host
if (!totalInitialCnxnTimePerHost.containsKey(hostname)) {
totalInitialCnxnTimePerHost.put(hostname, Integer.valueOf(cnxn));
} else {
totalInitialCnxnTimePerHost.put(hostname, totalInitialCnxnTimePerHost.get(hostname) + Integer.valueOf(cnxn));
}
// Track total bytes for host
if (!totalBytesPerHost.containsKey(hostname)) {
totalBytesPerHost.put(hostname, Integer.valueOf(bytes));
} else {
totalBytesPerHost.put(hostname, totalBytesPerHost.get(hostname) + Integer.valueOf(bytes));
}
}
printMap(“Total # requests per host”, numRequestsPerHost);
printMap(“Total # requests per host per content-type”, numRequestsPerHostPerContentType);
printMap(“Total # bytes per host per content-type”, totalBytesPerHostPerContentType);
printMap(“Total TTFB per host”, totalTTFBPerHost);
printMap(“Total DNS lookup per host”, totalDNSLookupPerHost);
printMap(“Total Initial Connection Time per host”, totalInitialCnxnTimePerHost);
printMap(“Total Bytes per host”, totalBytesPerHost);
}
private static void printMap(String title, HashMap stats) {
System.out.println(“\t” + title);
Iterator keyItr = stats.keySet().iterator();
while (keyItr.hasNext()) {
Object key = keyItr.next();
Object value = stats.get(key);
System.out.println(“\t\t” + key.toString() + “: “ + value.toString());
}
}
}

Summary

Let’s recap what we have available to us…

  • EC2 instance running a set of our code that can be recompiled and relaunched on a whim
  • The CharlesProxy tool for a cursory look at the whole transaction of the page from your desktop
  • WebPageTest tool for a third-party view of your app’s performance
  • and lastly, a custom java class that can invoke your WebPageTest runs and consolidate/aggregate the results programmatically

With these tools and tactics we had an externally facing environment that we could use to iterate on new ideas quickly.

You’ve lost that startup feeling…

Huge opportunity for impact. A sense of ownership. Collaboration with people across every function in the organization. An understanding of the big picture and a real opportunity to shape the future.

These are some of the best and most exciting qualities of working at a software start-up. In my personal experience, working at both a number of start-ups as well as in my nearly 4 years at Google, environments with these qualities attract some of the highest-quality software engineers, and subsequently allows them to be maximally effective once they’re on board.

Now that Bazaarvoice is six years old and technically a “big company” (in fact the best big company to work for in Austin), one of the biggest challenges we face from an engineering team culture standpoint is maintaining these qualities through growth of both the team and the scope of business. All too often in technology companies, especially those that experience fast success and growth, the culture is sacrificed at the altar of the business. This can be deadly to companies that thrive in large part on the strength of their culture, and highlights an important and often overlooked question about organizational culture: Is your culture scalable?

As VP of Engineering, a big part of my job is shepherding our culture and making sure that we create the kind of environment that allows us to attract and retain the best engineering talent. This brings us to the main topic of this blog post: Bazaarvoice engineering’s approach to building a scalable engineering culture and organization.

The central question I try to tackle is this: how can I build an engineering organization and culture that scales to hundreds of engineers where every engineer experiences, on a daily basis, those key qualities of a start-up? To answer this question, we focus on four major areas.

1. Team Size

This one seems obvious, but it never ceases to amaze me how many organizations miss the boat on this one. The bigger the team is, the less significant each individual is going to feel. The math is simple: it’s hard to feel like there’s a big opportunity for impact when you’re one of fifteen people on a team as compared to when you’re one of three people on a team. Also often overlooked is how much easier it is to hide an underperformer on a larger team than on a smaller team, and having to work with (and make up for) underperforming teammates can be toxic to an engineering culture.

In my experience, some of the most effective and high-powered engineering teams were 3- and 4-person teams working on Really Big Problems. I like to create environments populated by small teams, led by high-potential people, tackling problems that on their face are just slightly bigger than the team should, on paper, be able to handle. When done properly, this creates a sense of urgency and creates an opportunity for growth and self-discovery for the team members. (When done improperly, it makes people feel like you’re asking the impossible, which is obviously very demotivating, and it’s a fine line to walk between the two. This highlights how important it is that, as a leader, you know your team well and are very in tune with their state of mind.)

2. Cross-Functional Structure

Every organization has to choose how it’s going to fundamentally structure itself: functionally or cross-functionally (sort of like choosing your clustered index in SQL Server). I believe in a functional organization structure in the sense that every engineer reports to an engineering manager and on up through a VP of Engineering or CTO, for individual performance management and career development reasons.

However, on a daily basis, I think the organization should operate logically and physically as if it were cross-functionally structured. What I mean by this is that I believe we should have teams that work together daily consist of members from different functions (e.g. engineering, product management, design, operations, etc), that those teams should by physically co-located (i.e. sit together), and that the success of the team as a whole should be weighed more heavily than individual success.

The reason for this is simple: I feel that every engineer should have both visibility into why they’re building what they’re building, as well as having input into what gets built and how it gets built. Engineers, by and large, are creative people and they have a lot to bring to the table when it comes to creative problem solving. That stuff shouldn’t be solely the domain of the product managers, it should be a highly collaborative process. It’s similarly important that the engineers get visibility into how their products are being used when out in the wild. All of these inputs are hugely valuable to making sure the engineering team builds the right product for the market. As an important side benefit, exposure to these other areas gives the engineers themselves an opportunity o learn, makes them smarter and stronger and more likely to make a good decision when faced with uncertainty in the future.

3. Power! Unlimited Power!

(Not really, I just wanted an excuse to link to this video.) Seriously though, this bullet would be better titled: “Uncapped Potential”, or “Roles, Not Titles”. Every engineer ultimately wants to feel like they’re in control of their own destiny, and I want to create an environment where that’s largely true.

This boils down to one major question: can an engineer with talent, potential and ambition take the ball and run with it as far as her talent will let her? If not, what’s getting in her (or his) way? Maybe it’s an overbearing management structure, or lack of visibility into how the product is used in the field, or how it’s sold. Whatever it is, find it and get it out of the way as quickly as you can. We want to create an environment in which engineers with talent, potential and ambition can be as effective as possible, and nothing is more frustrating than an artificial barrier.

One simple organization structure change that helps this along is organizing teams around roles, not titles. First, some definitions. In my opinion, a job “title” should do one thing and one thing only: define the expectations by which someone’s work should be judged, vis-a-vis the organization’s overall values. For instance, the job title of “senior software engineer” may entail the expectation that an individual be capable of owning moderate-to-large components of a big system and operating in a self-directed manner. Notably, it says nothing about the function of that individual, or his relationship to his peers.

A job “role”, on the other hand, is all about the function of the individual in some specific context. For instance, a technical lead (TL) for a team may be responsible for the technical delivery of a project, including the day-to-day work assignments for the engineers on that team. Notably, it says nothing about the title or level of the individual. Every team can and should have a TL, regardless of whether the TL is the only individual on the team or if the team has dozens of members (in which case there may be multiple TLs).

What this means, as an extreme example, is that you can have situations where you’ve got a more junior engineer that is the TL of a team that includes much more senior engineers. Under the right circumstances, this can be exactly what you want (I’ve seen it work out spectacularly well on multiple occasions in my career).

More importantly though, is that you want your organization and culture to support this kind of flexibility because it provides an opportunity for high-potential engineers, regardless of seniority, to take ownership over something and stretch themselves.

4. A Voice Heard

The last bullet point is in many ways the most important, and really runs as an undercurrent to each of the others. Ultimately, the engineers on the team need to feel like they have a voice, and that that voice will be heard when they’ve got something to say. Whether that means having a voice in the design of the product, the technology direction of the team, the company’s core values, or anything else, engineers are ultimately people, and they want to feel like their opinion matters. As team leaders, it’s our job to make sure both that they feel this way and that it is actually true!

Conclusion

As always with these types of philosophical discussions, your mileage may vary, and there are always exceptions to the rules. However, I’ve seen these approaches and techniques be successful repeatedly throughout my career, perhaps most notably during my time at Google, and I believe that they are generally applicable.

What are your thoughts? I’d love to hear them and discuss them in the comments section below.

Like the main 4 constraints you are focusing for building up a healthy orgainzation culture giving efficient productivity. Also would like to add to give balanced amount of feedback including criticism and appreciation must be given to each and every employee so that they should feel that there work is being observed.

Engineers Giving Back at Random Hacks of Kindness

On December 3rd and 4th Bazaarvoice was the lead sponsor on an event in Austin called Random Hacks of Kindness (RHoK), a coordinated worldwide hackathon for social good. The event started Friday night with a reception for all of the hackers at the Volstead Lounge where over over 60 people celebrated, heard a few quick thoughts on how technology could solve some big problems, and of course had a drink. Representatives of the Chicago Community Emergency Response Team, Williamson County Office of Emergency Management, NASA (a global sponsor for RHoK) and others gave a quick preview of the projects they hoped would be completed during the hackathon, and our own Scott Bonneau (VP of Engineering) spoke about the power that engineers have to change the world. Scott reminded us all how only a few short years ago, creating a new technology meant years of R&D efforts by large teams of highly educated scientists, and that now, anyone with a laptop and a credit card can launch an application over the course of days.

Saturday morning kicked off with coffee and presentations by each subject matter expert on the problems they had been researching to the over 50 hackers who were eager to get started. Additionally, NASA had a special surprise for the Austin attendees, and had arranged for Astronaut Ron Garan to speak about how his time on the international space station had provided him with a unique view into the power RHoK hackers have and the need for greater collaboration on the biggest problems of the world. The hackers organized themselves into teams based on their skillsets and desires, and before lunch the product design had begun. Teams were well fed throughout the weekend with great local food from P. Terry’s, The Peached Tortilla, and Freebirds, ensuring that no one ever went hungry or was lacking caffeine. Several teams coded late into the evening at the Capital City Event’s space, lounging on couches, and some even stayed the night.

By the end of the hackathon Sunday afternoon, the teams had built a number of amazing applications, and you can read more about the applications on the RHoK web site. All of the teams presented their work, and the top teams as selected by our judges received some amazing prizes (iPads, Kindles, and Buckyballs). Overall we’re very proud to have helped support this amazing opportunity, and we couldn’t have done it without the generous help of our sponsors and partners Homeaway.com, Freebirds, Capital City Events Center, and Github, as well as the numerous volunteers from tech companies throughout the community.

Not that our event went perfectly, but we certainly learned a few lessons along the way:

  1. Lead the leaders. You cannot run something of this size on your own. You need a team of leaders who’ll run alongside you and carry the ball for you in specific areas.
  2. It takes an army. You can never have enough volunteers. Find them early and have roles clearly lined out.
  3. It’s all about the network. It’s not about who you know, it’s about who they know. Find the connectors in your target demographic and pursue them. They’ll connect you with the masses.

To find out more about the next RHoK Austin, just follow the Pixadillo @rhokaustin. Want to help run RHoK Austin in the future? Send us a message @rhokaustin and we’ll connect you with the steering team for the next RHok Austin.

Using the Cloud to Troubleshoot End-User Performance (Part 1)

Debugging performance issues is hard. Debugging end-user performance issues from a distributed production software stack is even harder, especially if you are the 3rd-party service provider for one of your clients that actually is in control of how your code is integrated into their site. There are lots of articles on the web regarding what the performance best practices are, but few, if any, that discuss the tactics of developing and improving them.

Primarily, the challenges stem from the fact that troubleshooting performance issues is always iterative. If your production operations can handle deploying test code to production on a rapidly iterative schedule, then you can stop reading this post — you are already perfect, and worthy of a cold brewski for your impressive skills.

As our team and software stack continued to grow in both size and complexity, we underwent a restructuring of our client (js, css and html) code a while back that allows us to only deliver the code that is actually needed by our client’s site and configuration. We did this by reorganizing our code into modules that can be loaded asynchronously by the browser using require.js. Effectively this took us from a monolithic js & css include that contained a bunch of unused code and styles, to something that averaged out to a much smaller deliverable that was consumed in smaller chunks by the browser.

This technique is a double-edged sword, and like all things, is best when done in moderation. Loading multiple modules asynchronously in the browser results in multiple http requests. Every HTTP request made by the browser results in some overhead spent doing the following:

  1. DNS Lookup – Only performed once per unique hostname that the browser encounters.
  2. Initial Connection – This is simply the time that the browser takes to establish a TCP socket connection with the web server.
  3. SSL Negotiation – This step is omitted if the connection is not intended to be secure. Otherwise, it is just the SSL handshake of certificates.
  4. Time to First Byte (TTFB) – This is the time starting from after the browser has sent the request to when the first byte of the response is received by the browser.
  5. Content Download – This is the time spent downloading all the bytes of the response from the web server.

There are many great resources from Yahoo! and Google which discuss the details of best-practices for improving page performance. I won’t re-hash that great information, nor dispute any of the recommendations that the respective authors make. I will, on the other hand, discuss some tactics and tools that I have found beneficial in analyzing and iterating on performance-related enhancements to a distributed software stack.

Mission

A few months ago, we challenged ourselves to improve our page load performance with IE7 in a bandwidth constrained (let’s call it “DSL-ish”) environment. DSL connections vary in speed and latency with each ISP.

I won’t bore you with the details of the changes we ended up making, but I want to give you a flavor of the situation we were trying to solve before I talk about the tactics and tools we used to iterate on the best practices that are available all over the web.

The unique challenges here are that IE7 only allows 2 simultaneous connections to the same host at a time. Since our software distributes multiple modules of js, css and images that are very small, we were running into this 2-connection-per-hostname issue with a vengeance. Commonly accepted solutions to this involve image spriting, file concatenation and distributing or “sharding” requests for static resources across multiple domains. The sharding tactic made us realize the other constraining factor we were dealing with — the longer latency of HTTP requests on a DSL connection that gets exaggerated when making multiple DNS lookups to a larger set of distinct host names.

Tools

The tools that we used to measure and evaluate our changes affected the tactics we used – so I’ll discuss them first.

Charles Proxy

Charles Proxy is a tool that runs on all platforms and provides some key features that really aided us in our analysis. Primarily, it had a built-in bandwidth throttling capability which allowed us to simulate specific latency and upload/download conditions from our local machine. We used CharlesProxy for a rougher on-the-spot analysis of changes. CharlesProxy also allowed us to easily and quickly see some aggregate numbers of specific metrics we were interested in. In particular, we were looking for the total # of requests, total durations of all requests and the total response size of all requests. Since these numbers are affected by the rest of the code (not ours) on our client’s site – Charles allowed us to filter out the resources that were not ours, but still allowed us to see how our software behaved in the presence of our client’s code.

However, since we had multiple developers working on the project — each making isolated changes — we wanted a way to run a sort of “integration” test of all the changes at once in a manner that more closely aligned with how our software is delivered from our production servers. This led us to our next tool of choice – one that we’d never used until now.

WebPageTest.org

In it’s own words:

WebPagetest is a tool that was originally developed by AOL for use internally and was open-sourced in 2008 under a BSD license. The platform is under active development by several companies and community contributors on Google code. The software is also packaged up periodically and available for download if you would like to run your own instance.

In our case, WebPageTest provided two key things:

  • It’s Free
  • It is a useful 3rd party mediator between ourselves and others for spot-checking page performance

At a high level, WebPageTest.org controls a bunch of compute agents that live in various geographic locations of the US that are able to simulate bandwidth conditions according to your specifications (under the hood it uses DummyNet). It allows you to request one of it’s agents to load your page and interact with your site by simulating link clicks (if necessary) and monitors and captures the results for detailed analysis by you later. This tool is a great way for you to use an external entity to verify your changes and have a consistent pre & post benchmark of your page’s performance.

Of course, having some random machine on the web poke your site means that your changes must be publicly accessible over the web. Password protection is fine since you can use WPT to script the login, but IMHO is non-ideal as that is not part of the normal end-user experience.

Tactics

Now that we have a good handle on the tools we used – we should discuss how we put them to work. Stay tuned for part 2, where we will explore the tactics for using these tools together effectively.

Grilling up an API

BBQ is a religion in Austin. Everyone has their opinion on who serves up the best BBQ. Debates between people defending their choices have been known to last into the wee hours of the night. Friendships have been ruined, and neighbors turned into enemies (okay, I might have made that last bit up…but you get my point).

APIs are also like a religion to many in the developer community. Developers spend their precious time using the tools and APIs that companies create. The easiest tools to use will be the ones they turn to consistently – and tell their friends about. At Bazaarvoice, we are hyper-focused on how to make our API and Platform the best set of tools around.

But how do you “serve up” good API? To answer that, let’s borrow an analogy from the world of BBQ.

Imagine you wanted to make a BBQ dinner, and you came to Bazaarvoice for help. We could help you in a few different ways:

Method #1: We can provide you with the raw ingredients & materials you need – e.g. spices to make your sauce, sticks to build your fire, and of course – a cow.

Method #2: We can provide you with some pre-packaged ingredients & items – e.g. a bottle of BBQ sauce, a grill, and some prime cuts of meat.

Method #3: We can provide you with a menu from the Salt Lick, as well as the number to their delivery service.

So what does this translate to (aside from a yummy BBQ dinner)?

Method #1: High innovation, high support costs, and low adoption.

Method #2: Medium innovation, medium support costs, and medium adoption.

Method #3: Low innovation, low support costs, and high adoption.

At Bazaarvoice, we aim to provide the developer community with tools that support all three of the methods above:

Method #1: Our API gives developers fine-grained control over the information they can request, the filters they can specify, etc. However, this flexibility comes at a cost. Developers will have to understand our object model and syntax to take full advantage of the API, and we at Bazaarvoice need to provide training and documentation to help with this process.

Method #2: Our API documentation always starts off with example API calls and popular use cases. These “pre-packaged” examples can help you skip straight to the API calls that will get the job done.

Method #3: We have reference apps available to download, and we will continue to add more over time. These reference apps serve two purposes. First, you can download the apps, enter your API credentials, and be off and running (just like BBQ takeout!). Second, you can use these apps as a learning tool to help you get familiar with the Bazaarvoice API faster.

Like grilling up BBQ, it is hard to satisfy everyone. But when you get your product just right, you can turn customers into dedicated fans. So in conclusion – when you see an employee of Bazaarvoice feasting away at one of the many popular local BBQ joints, feel confident knowing that we are hard at work.

Extremenly superb way of comparing APIs with BBQ though firstly was quite confussed while reading this article.. So finally can say enjoyed both Bazaarvoice API and Austin BBQ 😉