5 Ways to Improve Your Mobile Submission Form

Here at Bazaarvoice, we’re constantly focused on improving the user experience for our products. From the initial email invitation, to the submission form, to the way in which reviews are presented, we want to make sure that our interfaces are as flexible and intuitive as possible.

Part of my job on the mobile team at Bazaarvoice is to make sure that our products reflect best practices when displayed on mobile devices. In reality, that means running hands-on user tests, A/B testing different designs, and gathering detailed information about the way in which users interact with our products.

Recently, we ran a test with Buckle, one of our partner clients, to experiment with various mobile-friendly submission forms. What follows are some of the takeaways from those experiments.

1. Handle Landscape Mode Gracefully

It is important that users are able to navigate forms easily while in landscape mode. It becomes particularly important to support landscape for form fields that solicit text input. We found that mobile users will, on average, input about 20% fewer words in their reviews than desktop users, so the last thing we want to do is to make it even more difficult to enter text. Many users prefer to type in landscape mode as it provides for a larger keyboard.

2. Make Interactions Easy

Generally, a desktop user with a mouse can interact much more precisely than a mobile user with a clumsy finger. Therefore, it is important to make sure that elements are large enough to be either clicked or tapped. Apple recommends that tappable elements be at least 44×44 pixels. In our experimental form, we intentionally oversized our radio buttons, selection drop-downs and sliders to make the form easier to interact with and to prevent form errors.

Additionally, mobile devices provide a number of different keyboard layouts for different types of inputs. For instance, an input type of “email” might surface the @ symbol to make it more readily accessible. In order to take advantage of the various keyboard layouts, be sure to properly specify the input type on your form elements.

3. Snappy Over Flashy

The first version of our experimental form involved a heavy amount of JavaScript to do things like alpha animations and transforms. While our animations generally ran smoothly on a desktop, they became sluggish on the iPhone and lower end Android devices.

When designing for mobile, be sure to prioritize function over flashiness. Slick animations can greatly improve the usability and “wow” factor of a site, but they should be used sparingly. If necessary, use hardware-accelerated transforms to minimize sluggishness.

4. Choose The Most Efficient Form Path

Overall, our goal is to allow the user to complete our form in the quickest, simplest manner possible. In our testing, we found that a surprising number of users preferred to navigate and exit form elements via the “done” button rather than using the next/previous buttons. This has several interesting consequences.

First, short forms are better than tall forms. While some users “tab” through fields, most users scroll. By minimizing the vertical spacing between elements, users do not need to scroll as far to get to the next field.

Second, for most users, the interaction with a select element will involve 3 clicks: open, select, and done. Therefore, if a user is selecting between just a few options, it is better to use oversized radio buttons than select elements.

5. Provide Instant Feedback

If a user submits an invalid form value such as a malformed email address, provide a clear error message that instructs the user how to fix the error. If possible, provide an error near the offending field. Additionally, once the form field becomes valid, notify the user immediately rather than requiring the user to submit the form again.

For our experimental form, we used the JQuery validation library, which makes basic form validation dead simple. Since it is all client side, it makes validation snappy as well.

Our tests are ongoing, so be on the lookout for more updates soon. Until then, hopefully these insights will be valuable to others as the Internet becomes more mobile-friendly.

SELECT developers FROM Bazaarvoice UNION SELECT flags FROM Stripe_CTF;

Stripe (https://stripe.com/) held their second capture the flag event, this time the CTF was dedicated to web-based vulnerabilities and exploits. As a new Security Engineer here at BV the timing of this was perfect. It allowed me to use it as a vehicle for awareness and to ramp up curiosity, interest and even excitement for web application security (webappsec).

Let’s start with some definitions to get us all on the same page.

Capture the flag is a traditional outdoor game where two teams each have a flag and the objective is to capture the other team’s flag, located at the team’s base. In computer security the flag is a piece of data stored somewhere on a vulnerable computer and the goal is to use tools and know how to “hack” the machine to find the flag.

Webappsec is the acronym for Web Application Security a branch of Information Security that deals specifically with security of websites and web applications. In my opinion there is more to it than just that. As we see from the OWASP Top 10 (https://www.owasp.org/index.php/Category:OWASP_Top_Ten_Project) and the emerging DevOps movement, web developers are doing more than just coding. They are now configuring full stacks to support their applications. So, webappsec needs to encompass more than just the application. It has to deal with developer and management education, configuration management, networking security, OS hardening, just to name a few areas, and of course not to forget the code.

Now that we know what CTF and webappsec are lets move on to the details of the event.

The CTF started on Wednesday, August 22nd, 2PM CDT today at 2PM CDT and ran for a week until Wednesday, August 29th, 2012 2PM CDT. One week == Seven Days == 168 hours and, for me at least, this was not enough. This capture the flag included 9 levels (0-8) to conquer before arriving at the actual flag. Each level was progressively more difficult and when completed provided you with the password for the next level.

The CTF was very challenging, yet no security specific training/knowledge was needed in order to participate. In my opinion the minimum requirements to participate and find it fun and enjoyable was some knowledge of web application programming and a willingness to learn and research possible vulnerabilities and potential exploits.

In total about 15 or so Bazaarvoice cohorts tried their hand at the CTF. Four actually captured the flag! When did they have the time? Well, we used our 20% time while at work and personal time when away from work. Late nights, lunch hours, weekend, etc.…believe me once you get started you are hooked. You end up thinking about it all day. You think about the level you are on, how to solve it and whether the previous levels offer up any hints or possible staging areas for potential exploits. If you are anything like me, first a developer at heart then a security professional, this type of activity is a great way to test out your chops and have some fun while learning new things and potentially bringing some of the lessons back to your teams. This is the perfect avenue for awareness – code wins after all!

Here are quotes from some of the BV participants:

I got to 8 last night. I don’t think I’ll get it in the next 4 hours, but it was a fun challenge. – Cory

Made it to level 8 last night, but not until after midnight. Don’t think I’ll be finishing, but it was fun all the same. Level 8 is really interesting, and I look forward to hearing how folks finished it. – RC

I’ve been having a lot of fun with it. – Jeremy

Overall everyone that tried it had a good time with it no matter how far they actually got.

All in all a total of four BVers completed the CTF and captured the final flag!

 

Congratulations to the following BVers for successfully capturing the flag:

 

 

For their last CTF Stripe made available the solutions and a working VM with each level. I hope they do the same for this one; this will give everyone the opportunity to learn and see what vulnerabilities were found and how they were exploited in order to complete each level and eventually capture the flag.

For now I leave you with a quote which I think embodies web application security, its education and use in the development community:

Secure programming is a mind-set. It may start with a week or two of training, but it will require constant reinforcement. – Karl Keller, President of IS Power

As with many things here at Bazaarvoice, the education and growth of our developers and their skills often take on a fun and enjoyable approach.

Platform API release notes, version 5.3

We are pleased to announce that the following functionality has been developed for version 5.3:

  • Hosted authentication – email
  • Feedback submission for comments
  • RatingDistribution (Histogram data) and SecondaryRatingsAverages added to review statistics
  • Time zone changed to UTC
  • Error codes added to form errors
  • Syndication attribution on reviews

More detailed information on each of these items is listed below. For complete documentation, refer to the Platform API documentation, version 5.3.

Hosted authentication – email

Hosted email authentication can be used during submission to confirm the identity of a content submitter. When submitting content for the first time, a user receives an email containing a link. When the link is clicked, the user is directed to a landing page that calls back to the API to confirm their identity. This call results in the generation of an encrypted user token that can be used in subsequent submission calls. Depending on your configuration, the submitter’s content might not be accepted until the confirmation call is submitted. In order to use this feature, you must have hosted authentication enabled for your submission process. If you need more information, read the “Bazaarvoice hosted authentication reference guide” and the submission method documentation for details on the required parameters.

Feedback submission for comments

Feedback submission for review comments and story comments is now supported in addition to the existing support for feedback submission on reviews, questions, answers, and stories. For complete documentation, see the Feedback Submission method page.

RatingDistribution and SecondaryRatingsAverages added to review statistics

New RatingDistribution and SecondaryRatingsAverages blocks have been added to the ReviewStatistics block. You can now see the distribution of ratings for each product, which allows you to construct a rating histogram. You can also see the average rating of your secondary rating dimensions for reviews in relation to products and authors.

Time zone changed to UTC

The API now returns all time data using UTC (+00:00) to avoid the confusion of multiple time zones. The date format has not changed.

Error codes added to form errors

The API response has been updated to return error codes in addition to the existing error message for all form errors. A complete listing of the error codes can be found in each submission method.

Syndication attribution on reviews

All reviews have an “isSyndicated” field set to true or false. If the review is syndicated, a SyndicationSource block is displayed with details of where the review is being syndicated from. Syndicated content can only be returned if the API key is configured to show syndicated content.

Platform API Release Notes, Version 5.2

We are pleased to announce that the following functionality has been developed for version 5.2:

  • Helpfulness and inappropriate content feedback submission enabled
  • ContentLocale no longer filtered implicitly by default
  • Product and category attributes populated as a map
  • Hosted video submission and display updated
  • Inline ratings data exposed for product-based review statistics

More detailed information on each of these items is listed below. For complete documentation, refer to the Platform API Documentation, version 5.2.

Helpfulness and inappropriate feedback

Submission of helpfulness votes and inappropriate feedback can now be done through the API. The response for the user-generated content has also been updated to display the actual inappropriate feedback and total vote and feedback counts. In order for inappropriate feedback to be populated, the API key must be updated.

Implicit default ContentLocale filter removed

There is no longer any implicit ContentLocale filter if none is specified as an argument. If no filter is provided, all content will be returned, regardless of what locale the content is in. There is a default locale defined for every API key. Prior to version 5.2, if the locale parameter was used, it caused an implied ContentLocale filter to be used.

Note that version 5.2 does not change the behavior of explicitly supplied ContentLocale filters. In addition, you can now ask for labels in any locale and specify a different content locale. Therefore, if you request a locale of en_US and a ContentLocale of fr_FR, you get English labels and French content.

Product and category attributes

Products and categories now have a new attributes field populated. This field contains a map of attributes provided to Bazaarvoice from a product feed import.

Hosted video submission and display

Video elements of all content now contain URLs that can be used to embed the video into an HTML page. Bazaarvoice provides boilerplate HTML tags for use with embedding these videos. For more information, see the API Basics page.

Inline ratings data

A new method has been created to provide a quick way to access inline ratings data for products. For complete documentation, see the Statistics Display method page.

Scaling on Mysql: Sometimes you’ve gotta break a few eggs

We recently delivered this presentation titled “How to Scale Big on MySQL? Break a Few Rules!” as part of Database Week here in New York City. The presentation is a lighthearted, and informative take on how Bazaarvoice Engineering has been able to take MySQL to billions of requests per month. The slides and video are available over at LeadIt.us. In the presentation we cover denormalization, query planning, partitioning, MySQL replication, InfoBright’s take on a data storage, and thinking beyond the RDBMS. Overall the presentation is a little over an hour, and is littered with great questions. I had a great time delivering the presentation, and I got a lot of very good feedback so I hope it proves useful for you as well.

MongoDB Arrays and Atomicity

Over time, even technologies that are tried and true begin to show their age. This is especially true for data stores as the shear amount of data explodes and site traffic increases. Because of this, we are continually working with new technologies to determine whether they have a place in our primary stack.

To that end, we began using MongoDB for one of our new internal systems several months ago. Its quick setup and ease of use make it a great data store for starting a new project that’s constantly in flux. Its robustness and scalability make it a worthy contender for our primary data store, if it should prove capable of handling our use cases.

Being traditionally a MySQL shop, our team is used to working with databases in a certain way. When we write code that interacts with the DB, we use transactions and locks rather liberally. Therefore using MongoDB is a rather significant paradigm shift for our developers.

Though MongoDB doesn’t have true ACID compliance, it does support atomic updates to an individual object. And because a “single” object in Mongo can actually be quite complex, this is sufficient for many operations. For example, for the following complex object, setting values, incrementing numbers, adding and removing items from an array, can all be done at the same time atomically:

db.employees.update(
  {_id: ObjectId("4e5e4c0e945a66e30892e4e3")},
  {$set: {firstName: "Bobby", lastName: "Schmidt"}}
);

Even setting values of child objects can be done atomically:

db.employees.update(
  {_id: ObjectId("4e5e4c0e945a66e30892e4e3")},
  {$set: {"position.title": "Software Developer", "position.id": 1015}}
)

In fact, individual items in an array can be changed:

db.employees.update(
  {_id: ObjectId("4e5e4c0e945a66e30892e4e3")},
  {$set: {"employmentHistory.0.company": "Google, Inc."}}
);

However, one thing you may notice is that the code above to modify a specific item in the array requires us to know the specific index of the item in the array we wish to modify. That seems simple, because we can pull down the object, find the index of the item, and then change it. The difficulty with this is that other operations on an array do not necessarily guarantee that the order of the elements will be the same. For example, if an item in an array is removed and then added later, it will always be appended to the array. The only way to keep an array in a particular order is to $set it (replacing all the objects in it), which for a large array may not have the best performance.

This problem is best demonstrated with the following race conditions:

One solution to this problem is to $pull a particular item from the array, change it, and then $push it back onto the array. In order to avoid the race condition above, we decided that this was a safer solution. After all, these two operations should be doable atomically.

db.employees.update(
  {_id: ObjectId("4e5e4c0e945a66e30892e4e3")},
  {$pull: {badges: {type: "TOP_EMPLOYEE"}}, $push: {badges: {type: "TOP_EMPLOYEE", date: null}}}
);

So, what’s the problem? The issue is that MongoDB doesn’t allow multiple operations on the same property in the same update call. This means that the two operations must happen in two individually atomic operations. Therefore it’s possible for the following to occur:

That is bad enough, but it can be dealt with by updating an object only if it is in the correct state. Unfortunately, the following can happen as well:

That looks very odd to the reader, because at one moment it would see the item, then the next read would lose it, and then it would come back. To an end user that’s bad enough, but to another system that is taking actions based on the data, the results can be inconsistent at best, and deleterious at worst.

What we really want is the ability to modify an item (or items) in an array by query, effectively $set with $pull semantics:

db.employees.update(
  {_id: ObjectId("4e5e4c0e945a66e30892e4e3")},
  {$update: {badges: {$query: {type: "TOP_EMPLOYEE"}, $set: {date: null}}}}
);

Since that’s not yet supported by MongoDB, we decided to use a map. This makes a lot of sense for our use case since the structure happened to be a map in Java. Because MongoDB uses JSON-like structures, the map is an object just like any other, and the keys are simply the property names of the object.

This means that individual elements can be accessed and updated from the map:

db.employees.update(
  {_id: ObjectId("4e5e4c0e945a66e30892e4e3")},
  {$set: {"badges.TOP_EMPLOYEE.date": null}}
);

This seems like a rather elegant solution, so why didn’t we start out this way? One pretty major problem: if you don’t know all of the possible keys to the map, there is no good way to index the fields for faster querying. Indexes to child objects in MongoDB are created using dot notation, just like the query syntax. This works really well with arrays because MongoDB supports indexing fields of objects in arrays just like any other child object field. But for a map, the key is part of the path to the child field, so each key is another index:

// Simple fields
db.employees.ensureIndex({lastName: 1, firstName: 1})

// Child object fields
db.employees.ensureIndex({"position.id": 1})

// Array object fields
db.employees.ensureIndex({"employmentHistory.company": 1})

// Map fields, treating map values like child objects
db.employees.ensureIndex({"badges.TOP_EMPLOYEE.date": 1})

For our purposes, we are actually able to index every key individually, and treat the map as an object with known properties.

The sinister part of all of this is that you won’t normally run into any problems while doing typical testing of the system. And in the case of the pull/read/push diagram above, even typical load testing wouldn’t necessarily exhibit the problem unless the reader is checking every response and looking for inconsistencies between subsequent reads. In a complex system, where data is expected to change constantly, this can be a difficult bug to notice, and even more difficult to track down the root cause.

There are ways to make MongoDB work really well as the primary store, especially since there are so many ways to update individual records atomically. However, working with MongoDB and other NoSQL solutions requires a shift in how developers think about their data storage and the way they write their query/update logic. So far it’s been a joy to use, and getting better all the time as we learn to avoid these types of problems in the future.

Third-Party Front-end Performance: Part 1

Bazaarvoice is a third-party application provider. We have a growing number of applications running on our own domain, but our core business is providing user-generated content applications and widgets that are hosted by us, but run on our clients’ webpages. Scaling an application platform of our size certainly has its challenges at the data layer. We use all the coolest noSQL tools and post up big request per seconds numbers every month, but an oft forgotten part of the story lives long after all of that is cached and delivered:

On the front end.

Steve Souders, the guy who (literally) wrote the book on web performance, estimates that 80% of average page load times actually occur after the markup is completely downloaded. That means a 50% speed up in your front-end code is going to mean a lot more than a 50% speed up in your backend code. This is important stuff!

I’d like to go through some of our front-end performance considerations and explain the optimizations that we take, or the ones we’d like to start taking soon.

I recently read that “the history of web development is just increasingly difficult ways of concatenating strings together.” So we’ll start at that point: We’ve got everything ready to go on the server and now we have to put it on client’s site and make it work.

From this point, we’ll identify three distinct areas of the front-end performance:

  • The network
  • Parse and evaluate
  • Application responsiveness

This post will cover The Network and the other two will be posted as continued parts.

The Network

The network is by far the most commonly discussed area of optimization on the client. That’s for good reason, though, as it’s often the bottleneck. There is really no great ‘secret’ to making this fast, though. The smaller your payload is and the fewer amount of files that you request, the faster your page will generally be. There are lots of articles on reducing image sizes, and spriting css, and you should read those, but I wanted to focus on some less commonly talked about techniques that are especially important in third party applications.

The only non-obvious part of this equation to me (in regards to 3rd party code) is the caching as well as the cacheability of your resources. There are a few unique problems when doing this as a third party, but more or less, the higher percentage of cacheable static resources that make up your app, the more opportunity you have to make the network problem a lot less scary/slow. There are whole books and companies who specialize in this stuff, but we’ll go through at a high level.

— Obligatory Wu-tang Cache Rules Everything Around Me reference. —

Edge caching

One of the easiest wins is using a CDN that implements what’s known as ‘edge-caching.’ This is the practice of a putting a distributed system of cache servers around the world (or country, etc) and serving content from the geographically closest server to each request. This not only decreases load on individual servers, but it also vastly improves latency of requests.

Note that edge-caching is still not quite cached on a user’s computer, but still on a server in the CDN. Even if you have amazing cacheability in your app, you can still benefit from edge-caching for the initial load of the application (when the cache isn’t primed).

Bazaarvoice does this with almost all of its content, even relatively dynamic content. We are one of the largest users of the Akamai CDN, and well over 90% of our total traffic actually just hits Akamai servers and never makes it back to our origin servers.

You may be familiar with something like Amazon S3 to serve your static files. S3 is not distributed by default, but adding on the CloudFront option to your Amazon S3 buckets will achieve many of the same results (assuming you serve from the CloudFront URLs).

Maximum Cacheability

The content in most web apps changes frequently. However, the application really only changes when new code is written and pushed (read: hopefully far less frequently). In an ideal world, everything except for the data would be cached for a length of time (in the end-user’s browser).

The larger the percentage of your application that you can statically generate, the more you can cache.

This seems pretty easy at a glance, but can actually get a bit hairy in a third party environment. Due to the fact that the application is included via JavaScript, the markup for the page is not generally served in the initial request and is injected afterwards instead. This differs from normal webapps that can send their rendered markup along with the initial page request.

Dynamic Content

Currently, Bazaarvoice’s template system integrates with our Java backend, so after the static resources are loaded, we make a request for the rendered content of the page we’re on. This comes back to us as a string, which is saved to a javascript variable.

// simplified example - uncacheable data and templates
var html = "<ul id=\"reviews\"><li>review 1</li><li>review 2</li></ul> ";

Exactly how this works is unimportant, but the key is that this entire additional request is not cacheable (at least for any significant amount of time). Since the data in the rendered templates changes, we not only have to send the fresh data on each request, we also have to send up all the templates each time, multiplied times the number of times they are invoked. Given a significant amount of data, this can add quite a bit of weight to these requests. GZip can help quite a bit here with repeated data, but if you also factor in all the escaping characters to save these as strings in JavaScript, you can see where this could reduce performance by increasing the file size of noncacheable requests.

For the most part, in practice, we don’t serve so much data in an individual request to be effected by this problem on an alarming level, but it’s always nice to optimize further, and we’d love to save the money on bandwidth costs.

In our simple example above, we had the added duplication of a single set of <li> tags. The solution to only sending that once and caching it after the initial load is client-side templating.

If we are able to include our templates in our cached, static resources, you only need to request them once. So even though the library to render them has some cost, it usually pays off quickly (often still on the initial load).

// cacheable template
var tmpl = "<ul id=\"reviews\">{{#each reviews}}<li>{{name}}</li>{{/each}}</ul>";

// uncacheable data 
{ reviews: [{ name: "review 1" }, { name: "review 2" }] };

We’ll get into the actual performance and responsiveness of the app in a later post, but note that in production, these templates can actually be ‘pre-compiled’ into javascript functions that run very quickly (more or less as quickly as string concatenation can take you). Also in production, the templates and data are much larger, so this reduction pays off at scale.

In a third-party application that can cache its templates alongside the application code, the data (usually JSON) is the only unique and the only non-cached bytes going over the wire after the initial load. This is great. I’ll leave optimizing data payloads to a different blog post, but just make sure you only grab the minimum amount of data that you need.

For extreme performance (and SEO to boot!), a third party tool could also integrate the markup injection on the server-side. The markup would go in the initial payload and be instantly visible upon page load (great perceived performance). You would have to write your JavaScript in a way that could bind to pre-rendered templates, but it’s all possible. This is usually a more difficult thing to get clients to agree to. You end up duplicating the markup in loops, like in our first example, but you also save an http request.

Since the client can cache the request for every end user on their app server this ends up being by far the fastest integration in practice. This is a good complement for a high-performance third-party system, even if it can be tough to get clients to agree to.

Data Caching too!

If you want to go even further (and I very much encourage it), you may want to look into caching your data too. This wouldn’t do much for new page loads, but it could really help for common widgets across pages, or reloads/revisits of the same page. Anything that doesn’t take a significant amount of code that can reduce origin requests is probably going to be worth it eventually at scale.

Consider a “Most Recently Reviewed Products” widget that goes across every page on a site. This widget would list 5 products or so. The recency of the products in the list is important, but the exact accuracy is likely not important within a few minute period (unless you’re an unusually high-review-volume webpage). Unfortunately, though, people usually visit quite a few pages in that same time period.

As a third-party, you’re more than likely relying on JSONP to request this information for the widget. Each subsequent page load is highly cached (static resources) because you’re doing all the things we’ve mentioned previously, but you’re forced to re-request this redundant data each time.

I know what you’re thinking…

“Since JSONP injects a script tag under the hood, it is just a GET request for a “javascript file” that can be cached by the browser.” — you

While this is technically correct, it’s usually difficult to actually realize in the real-world for a couple of reasons.

At Bazaarvoice, and in many (read: most of the ones I’ve worked with) APIs, all requests are sent with a ‘no-cache’ header. This stops all caching from occuring. If you do have an API that allows browser caching you’re one step closer, but still usually have another issue.

Most people are using jQuery or other AJAX libraries in order to make JSONP requests to the server. Unfortunately for fans of API caching, these libraries almost always bust your cache, either intentionally, or unintentionally.

This code:

$.ajax({
  url : "http://bestreviewapiever.com/api", 
  data : {"filter": "recent"}, 
  dataType : "jsonp" 
});

actually results in a request to:

http://bestreviewapiever.com/api?filter=recents&callback=jQuery171984098_234234&_=98723498723

And every time you request it, those last two numbers will change. Even if you set the cache option to false – you still end up with unique callback values in each request. This will still bust your cache.

There’s a touch too much code to include here, but the solution is to set cache to false as well as override the jsonpCallback with a value that stays the same. However, you must then manage callback collisions yourself. That has some significant code behind it.

If you successfully implement all that, this means that, as long as edge cases don’t arise, the url will be the same each time, and the jsonp requests would be cache-hits for the cache TTL of the api server.

Another approach that we are in the process of testing and implementing at Bazaarvoice is utilizing localStorage and sessionStorage.

There is not a lot of detail needed, but in browsers that support these DOM Storage utilities, you can write logic to check them for data before making JSONP requests, and invalidate the data after a given time. Make sure you clean up old data when you invalidate, because space is limited, but this is a nice solution, because the fallback is just the natural case.

Currently I’d suggest using sessionStorage with an appropriate TTL on your data. That way it would get cleaned up automatically at the end of the ‘session.’ This solution seems a lot more elegant to me than api cache-headers and JSONP overrides.

Now, after the first request for our data, each subsequent page load has a cache-hit (for our allotted interval) and saves a request from occurring. FASTER!

BONUS TIP!!

If you have very real-time data, don’t count out this method.

One very slick ‘realtime-y’ touch is to have visibly updating data. In our example, even if we wanted to make an uncached request each time, we could instantly display the old data on page load from sessionStorage, wait until all the critical page elements were fully loaded, as to not block their speed, and then load in the new data. If the data was updated, you can put in a smooth update transition to let the user know that your app is awesome and real time. Best of both worlds!

Cache-busting, or “how long do I cache?”

“But Alex! How long do I cache my resources? If I cache them too long, then when I update my app, everyone will have different files!” — you, again.

Very insightful of you. In Bazaarvoice’s current system, each file has a relatively low cache max-age (it varies, but usually between no-cache and ~2 hours). This means that we can be sure that all the files that people have after 2 hours of pushing a change will be the most current one.

This actually ends up being fairly practical. For the most part, people are done browsing a site well before their cache runs out. Then the next time they just have to get it again one time. But we can do better! 😀

We’re working on a new system for caching that affords us more update control, and much higher cache-hit rates. Win-win!

The general idea is to have a single “bootstrap” JavaScript file that we refer to as a “scout” file. This file is usually unique for each our clients, and kept as small as we can keep it. It’s usually included via a script tag (partially for ease of implementation), but could be injected asynchronously, as well.

** Including it as a script tag (synchronously) has the added benefit of caching the dns lookup for the domain in a single request, which should actually speed up subsequent requests in many browsers.

This scout file should have a very low cache max-age — in our case, somewhere in the ballpark of 0 to 5 minutes — depending on the need for control.

<script src="http://bazaarvoice.com/static/scout.js"></script>

EVERYTHING ELSE IS CACHED FOREVER.

In this “scout” file, we kick off our real application by asynchronously injecting our built core javascript file. Concatenation of this main application file and its dependencies is well understood and necessary, so I’m not covering it in this post, but it is essential to good performance.

The trick is that at build time, we embed a version number into our scout file. Then the asynchronous inject function uses this version number as part of the request for the file. This can be done by putting each build in a folder with its version as the name of the folder, or it can just be done via url-rewrites.

I would encourage you to not use the query string (get params) in order to do cache busting, as VPNs and other old browser situations can sometimes ruin the cache, and get you into weird situations.

The affect of this setup is that we have a single, very small file that is constantly looking for updates (hence: “scout file”). It generally is still cached for the average ‘time on site’ of a consumer on one of our clients’ pages, but not much longer. However, if the version of the build doesn’t change, then all the requested scripts after that will be cached forever, or until the version changes again.

If you don’t push a new build, a visitor can still have cached content from years prior and only have to download the “scout” file, but at any point in time, if you run a build, their cache is busted and they get all the new files within the amount of time the scout file was cached for.

// An example of a simple scout file 
(function(){ 
  var VERSION = "12345";

  function injectScript(url){ /*… async script injection …*/} 

  injectScript("http://bazaarvoice.com/static/"+VERSION+"/widget.build.js"); 
})();

This obviously becomes more complex when you have multiple scripts loading. Bazaarvoice uses the require.js library, and its baseUrl configuration option is more or less the only place we end up having to update to make this work for us. It may be more complicated in more hand-rolled solutions, but still very worth it.

The numbers in the end are generally very much in your favor when taking average browsing habits into account. You end up with a few small non-cached hits, but a much better long tail of cacheability of the large application files — with the added benefit of quickly being able to force an update in the event of an emergency.

Conclusion

We glazed over the reducing file-size and concatenation section in half of a paragraph, but I’d like to reiterate that this is just not optional if you care about performance. Your mechanism for doing this may vary, and we don’t mind that, as long as it works for you.

I would however like to point out that you should be constantly testing the performance of your app and identifying your bottlenecks specifically. So often I see people byte-shaving licenses at the tops of their libraries while simultaneously sending improper cache-headers.

The key to tackling performance is always tackling the bottleneck first. Then you’ll have a new bottle neck to tackle.

You can measure network performance of all these techniques in your browser. Google Chrome (and other webkit based browsers) as well as the Firebug extension on Firefox can give you great insight in to the latency, time-to-load, and file size of your files (in the Network tab). You can export this data as a .har file and then import it into a variety of other evaluation tools if you need more info.

At Bazaarvoice we’re tackling our performance issues so we can help give the best experience for our clients’ users.

Remember: after you have a responsible amount of http requests and total file size, cache as much as you can, and then cache more. It’s as simple as that.

Next

Stay tuned for information on Parsing and evaluation, and application responsiveness.

Bazaarvoice Platform API Tutorial

We recently released our new Bazaarvoice Platform API. This is a new RESTful API that allows access to much more data and provides responses in XML and JSON. We are really excited to see the types of applications our clients will be building on the API.

For a quick introduction to the API, we created the API Tutorial (you must be logged in to view) that walks through creating a Javascript-based widget for displaying Bazaarvoice content on any webpage. The tutorial explores the JSONP response format of the API and how it can be used with Javascript templates to inject content onto the page.

You can read the API tutorial or go grab the code from Github and start checking out the new API.

Home Depot using the Platform API to show off their UGC

Clients like the Home Depot are using ideas from our Inspiration Gallery to find innovative ways to show off their user-generated content (UGC) and demonstrate the importance of listening to their customers.

The following image is taken from a Home Depot Store Managers meeting which had all store managers as well as suppliers in attendance. It shows real-time reviews on a globe using a Google Earth placemark based on the reviewer’s IP address.

homedepot_earth_app

Want to try it for yourself?

Click here to apply for an API key, then click here to download the reference app. Don’t forget to send us a picture or video of your app in motion.

Ember.js — the framework formerly known as SproutCore 2.0 (Amber.js)

Update: It looks like there was a conflict with the name Amber with another JavaScript project, so the SproutCore folks have graciously decided to change their name to Ember.js so as not to be in conflict. Check out the details at Yehuda’s blog.

The following is the original blog post:

Yehuda Katz has just officially announced the rebranding of SproutCore 2.0 to Amber.js.

This is refreshing for all the reasons he mentions in his post, especially when attempting to learn new things about the framework. It gets confused with SproutCore 1.0 quite a bit, and I can imagine trying to find answers about Amber.js will be a much easier experience!

Bazaarvoice’s newest product, Customer Intelligence, has been using SproutCore 2.0 since well before its release, and one of the primary developers on that team, Daniel Marcotte, spent a fair bit of time over several months working with the nascent framework and helping to get Beta 2 out the door. Look for an upcoming post where Daniel goes into much more detail on that process.

Will it be usable over dualip? (Vista always started downloading stuff)Will it be possible to turn off indexing? (Can’t concentrate on my work with the chattering the whole time)