SELECT developers FROM Bazaarvoice UNION SELECT flags FROM Stripe_CTF;

Stripe (https://stripe.com/) held their second capture the flag event, this time the CTF was dedicated to web-based vulnerabilities and exploits. As a new Security Engineer here at BV the timing of this was perfect. It allowed me to use it as a vehicle for awareness and to ramp up curiosity, interest and even excitement for web application security (webappsec).

Let’s start with some definitions to get us all on the same page.

Capture the flag is a traditional outdoor game where two teams each have a flag and the objective is to capture the other team’s flag, located at the team’s base. In computer security the flag is a piece of data stored somewhere on a vulnerable computer and the goal is to use tools and know how to “hack” the machine to find the flag.

Webappsec is the acronym for Web Application Security a branch of Information Security that deals specifically with security of websites and web applications. In my opinion there is more to it than just that. As we see from the OWASP Top 10 (https://www.owasp.org/index.php/Category:OWASP_Top_Ten_Project) and the emerging DevOps movement, web developers are doing more than just coding. They are now configuring full stacks to support their applications. So, webappsec needs to encompass more than just the application. It has to deal with developer and management education, configuration management, networking security, OS hardening, just to name a few areas, and of course not to forget the code.

Now that we know what CTF and webappsec are lets move on to the details of the event.

The CTF started on Wednesday, August 22nd, 2PM CDT today at 2PM CDT and ran for a week until Wednesday, August 29th, 2012 2PM CDT. One week == Seven Days == 168 hours and, for me at least, this was not enough. This capture the flag included 9 levels (0-8) to conquer before arriving at the actual flag. Each level was progressively more difficult and when completed provided you with the password for the next level.

The CTF was very challenging, yet no security specific training/knowledge was needed in order to participate. In my opinion the minimum requirements to participate and find it fun and enjoyable was some knowledge of web application programming and a willingness to learn and research possible vulnerabilities and potential exploits.

In total about 15 or so Bazaarvoice cohorts tried their hand at the CTF. Four actually captured the flag! When did they have the time? Well, we used our 20% time while at work and personal time when away from work. Late nights, lunch hours, weekend, etc.…believe me once you get started you are hooked. You end up thinking about it all day. You think about the level you are on, how to solve it and whether the previous levels offer up any hints or possible staging areas for potential exploits. If you are anything like me, first a developer at heart then a security professional, this type of activity is a great way to test out your chops and have some fun while learning new things and potentially bringing some of the lessons back to your teams. This is the perfect avenue for awareness – code wins after all!

Here are quotes from some of the BV participants:

I got to 8 last night. I don’t think I’ll get it in the next 4 hours, but it was a fun challenge. – Cory

Made it to level 8 last night, but not until after midnight. Don’t think I’ll be finishing, but it was fun all the same. Level 8 is really interesting, and I look forward to hearing how folks finished it. – RC

I’ve been having a lot of fun with it. – Jeremy

Overall everyone that tried it had a good time with it no matter how far they actually got.

All in all a total of four BVers completed the CTF and captured the final flag!

 

Congratulations to the following BVers for successfully capturing the flag:

 

 

For their last CTF Stripe made available the solutions and a working VM with each level. I hope they do the same for this one; this will give everyone the opportunity to learn and see what vulnerabilities were found and how they were exploited in order to complete each level and eventually capture the flag.

For now I leave you with a quote which I think embodies web application security, its education and use in the development community:

Secure programming is a mind-set. It may start with a week or two of training, but it will require constant reinforcement. – Karl Keller, President of IS Power

As with many things here at Bazaarvoice, the education and growth of our developers and their skills often take on a fun and enjoyable approach.

Platform API release notes, version 5.3

We are pleased to announce that the following functionality has been developed for version 5.3:

  • Hosted authentication – email
  • Feedback submission for comments
  • RatingDistribution (Histogram data) and SecondaryRatingsAverages added to review statistics
  • Time zone changed to UTC
  • Error codes added to form errors
  • Syndication attribution on reviews

More detailed information on each of these items is listed below. For complete documentation, refer to the Platform API documentation, version 5.3.

Hosted authentication – email

Hosted email authentication can be used during submission to confirm the identity of a content submitter. When submitting content for the first time, a user receives an email containing a link. When the link is clicked, the user is directed to a landing page that calls back to the API to confirm their identity. This call results in the generation of an encrypted user token that can be used in subsequent submission calls. Depending on your configuration, the submitter’s content might not be accepted until the confirmation call is submitted. In order to use this feature, you must have hosted authentication enabled for your submission process. If you need more information, read the “Bazaarvoice hosted authentication reference guide” and the submission method documentation for details on the required parameters.

Feedback submission for comments

Feedback submission for review comments and story comments is now supported in addition to the existing support for feedback submission on reviews, questions, answers, and stories. For complete documentation, see the Feedback Submission method page.

RatingDistribution and SecondaryRatingsAverages added to review statistics

New RatingDistribution and SecondaryRatingsAverages blocks have been added to the ReviewStatistics block. You can now see the distribution of ratings for each product, which allows you to construct a rating histogram. You can also see the average rating of your secondary rating dimensions for reviews in relation to products and authors.

Time zone changed to UTC

The API now returns all time data using UTC (+00:00) to avoid the confusion of multiple time zones. The date format has not changed.

Error codes added to form errors

The API response has been updated to return error codes in addition to the existing error message for all form errors. A complete listing of the error codes can be found in each submission method.

Syndication attribution on reviews

All reviews have an “isSyndicated” field set to true or false. If the review is syndicated, a SyndicationSource block is displayed with details of where the review is being syndicated from. Syndicated content can only be returned if the API key is configured to show syndicated content.

Platform API Release Notes, Version 5.2

We are pleased to announce that the following functionality has been developed for version 5.2:

  • Helpfulness and inappropriate content feedback submission enabled
  • ContentLocale no longer filtered implicitly by default
  • Product and category attributes populated as a map
  • Hosted video submission and display updated
  • Inline ratings data exposed for product-based review statistics

More detailed information on each of these items is listed below. For complete documentation, refer to the Platform API Documentation, version 5.2.

Helpfulness and inappropriate feedback

Submission of helpfulness votes and inappropriate feedback can now be done through the API. The response for the user-generated content has also been updated to display the actual inappropriate feedback and total vote and feedback counts. In order for inappropriate feedback to be populated, the API key must be updated.

Implicit default ContentLocale filter removed

There is no longer any implicit ContentLocale filter if none is specified as an argument. If no filter is provided, all content will be returned, regardless of what locale the content is in. There is a default locale defined for every API key. Prior to version 5.2, if the locale parameter was used, it caused an implied ContentLocale filter to be used.

Note that version 5.2 does not change the behavior of explicitly supplied ContentLocale filters. In addition, you can now ask for labels in any locale and specify a different content locale. Therefore, if you request a locale of en_US and a ContentLocale of fr_FR, you get English labels and French content.

Product and category attributes

Products and categories now have a new attributes field populated. This field contains a map of attributes provided to Bazaarvoice from a product feed import.

Hosted video submission and display

Video elements of all content now contain URLs that can be used to embed the video into an HTML page. Bazaarvoice provides boilerplate HTML tags for use with embedding these videos. For more information, see the API Basics page.

Inline ratings data

A new method has been created to provide a quick way to access inline ratings data for products. For complete documentation, see the Statistics Display method page.

Scaling on Mysql: Sometimes you’ve gotta break a few eggs

We recently delivered this presentation titled “How to Scale Big on MySQL? Break a Few Rules!” as part of Database Week here in New York City. The presentation is a lighthearted, and informative take on how Bazaarvoice Engineering has been able to take MySQL to billions of requests per month. The slides and video are available over at LeadIt.us. In the presentation we cover denormalization, query planning, partitioning, MySQL replication, InfoBright’s take on a data storage, and thinking beyond the RDBMS. Overall the presentation is a little over an hour, and is littered with great questions. I had a great time delivering the presentation, and I got a lot of very good feedback so I hope it proves useful for you as well.

MongoDB Arrays and Atomicity

Over time, even technologies that are tried and true begin to show their age. This is especially true for data stores as the shear amount of data explodes and site traffic increases. Because of this, we are continually working with new technologies to determine whether they have a place in our primary stack.

To that end, we began using MongoDB for one of our new internal systems several months ago. Its quick setup and ease of use make it a great data store for starting a new project that’s constantly in flux. Its robustness and scalability make it a worthy contender for our primary data store, if it should prove capable of handling our use cases.

Being traditionally a MySQL shop, our team is used to working with databases in a certain way. When we write code that interacts with the DB, we use transactions and locks rather liberally. Therefore using MongoDB is a rather significant paradigm shift for our developers.

Though MongoDB doesn’t have true ACID compliance, it does support atomic updates to an individual object. And because a “single” object in Mongo can actually be quite complex, this is sufficient for many operations. For example, for the following complex object, setting values, incrementing numbers, adding and removing items from an array, can all be done at the same time atomically:

db.employees.update(
  {_id: ObjectId("4e5e4c0e945a66e30892e4e3")},
  {$set: {firstName: "Bobby", lastName: "Schmidt"}}
);

Even setting values of child objects can be done atomically:

db.employees.update(
  {_id: ObjectId("4e5e4c0e945a66e30892e4e3")},
  {$set: {"position.title": "Software Developer", "position.id": 1015}}
)

In fact, individual items in an array can be changed:

db.employees.update(
  {_id: ObjectId("4e5e4c0e945a66e30892e4e3")},
  {$set: {"employmentHistory.0.company": "Google, Inc."}}
);

However, one thing you may notice is that the code above to modify a specific item in the array requires us to know the specific index of the item in the array we wish to modify. That seems simple, because we can pull down the object, find the index of the item, and then change it. The difficulty with this is that other operations on an array do not necessarily guarantee that the order of the elements will be the same. For example, if an item in an array is removed and then added later, it will always be appended to the array. The only way to keep an array in a particular order is to $set it (replacing all the objects in it), which for a large array may not have the best performance.

This problem is best demonstrated with the following race conditions:

One solution to this problem is to $pull a particular item from the array, change it, and then $push it back onto the array. In order to avoid the race condition above, we decided that this was a safer solution. After all, these two operations should be doable atomically.

db.employees.update(
  {_id: ObjectId("4e5e4c0e945a66e30892e4e3")},
  {$pull: {badges: {type: "TOP_EMPLOYEE"}}, $push: {badges: {type: "TOP_EMPLOYEE", date: null}}}
);

So, what’s the problem? The issue is that MongoDB doesn’t allow multiple operations on the same property in the same update call. This means that the two operations must happen in two individually atomic operations. Therefore it’s possible for the following to occur:

That is bad enough, but it can be dealt with by updating an object only if it is in the correct state. Unfortunately, the following can happen as well:

That looks very odd to the reader, because at one moment it would see the item, then the next read would lose it, and then it would come back. To an end user that’s bad enough, but to another system that is taking actions based on the data, the results can be inconsistent at best, and deleterious at worst.

What we really want is the ability to modify an item (or items) in an array by query, effectively $set with $pull semantics:

db.employees.update(
  {_id: ObjectId("4e5e4c0e945a66e30892e4e3")},
  {$update: {badges: {$query: {type: "TOP_EMPLOYEE"}, $set: {date: null}}}}
);

Since that’s not yet supported by MongoDB, we decided to use a map. This makes a lot of sense for our use case since the structure happened to be a map in Java. Because MongoDB uses JSON-like structures, the map is an object just like any other, and the keys are simply the property names of the object.

This means that individual elements can be accessed and updated from the map:

db.employees.update(
  {_id: ObjectId("4e5e4c0e945a66e30892e4e3")},
  {$set: {"badges.TOP_EMPLOYEE.date": null}}
);

This seems like a rather elegant solution, so why didn’t we start out this way? One pretty major problem: if you don’t know all of the possible keys to the map, there is no good way to index the fields for faster querying. Indexes to child objects in MongoDB are created using dot notation, just like the query syntax. This works really well with arrays because MongoDB supports indexing fields of objects in arrays just like any other child object field. But for a map, the key is part of the path to the child field, so each key is another index:

// Simple fields
db.employees.ensureIndex({lastName: 1, firstName: 1})

// Child object fields
db.employees.ensureIndex({"position.id": 1})

// Array object fields
db.employees.ensureIndex({"employmentHistory.company": 1})

// Map fields, treating map values like child objects
db.employees.ensureIndex({"badges.TOP_EMPLOYEE.date": 1})

For our purposes, we are actually able to index every key individually, and treat the map as an object with known properties.

The sinister part of all of this is that you won’t normally run into any problems while doing typical testing of the system. And in the case of the pull/read/push diagram above, even typical load testing wouldn’t necessarily exhibit the problem unless the reader is checking every response and looking for inconsistencies between subsequent reads. In a complex system, where data is expected to change constantly, this can be a difficult bug to notice, and even more difficult to track down the root cause.

There are ways to make MongoDB work really well as the primary store, especially since there are so many ways to update individual records atomically. However, working with MongoDB and other NoSQL solutions requires a shift in how developers think about their data storage and the way they write their query/update logic. So far it’s been a joy to use, and getting better all the time as we learn to avoid these types of problems in the future.

Third-Party Front-end Performance: Part 1

Bazaarvoice is a third-party application provider. We have a growing number of applications running on our own domain, but our core business is providing user-generated content applications and widgets that are hosted by us, but run on our clients’ webpages. Scaling an application platform of our size certainly has its challenges at the data layer. We use all the coolest noSQL tools and post up big request per seconds numbers every month, but an oft forgotten part of the story lives long after all of that is cached and delivered:

On the front end.

Steve Souders, the guy who (literally) wrote the book on web performance, estimates that 80% of average page load times actually occur after the markup is completely downloaded. That means a 50% speed up in your front-end code is going to mean a lot more than a 50% speed up in your backend code. This is important stuff!

I’d like to go through some of our front-end performance considerations and explain the optimizations that we take, or the ones we’d like to start taking soon.

I recently read that “the history of web development is just increasingly difficult ways of concatenating strings together.” So we’ll start at that point: We’ve got everything ready to go on the server and now we have to put it on client’s site and make it work.

From this point, we’ll identify three distinct areas of the front-end performance:

  • The network
  • Parse and evaluate
  • Application responsiveness

This post will cover The Network and the other two will be posted as continued parts.

The Network

The network is by far the most commonly discussed area of optimization on the client. That’s for good reason, though, as it’s often the bottleneck. There is really no great ‘secret’ to making this fast, though. The smaller your payload is and the fewer amount of files that you request, the faster your page will generally be. There are lots of articles on reducing image sizes, and spriting css, and you should read those, but I wanted to focus on some less commonly talked about techniques that are especially important in third party applications.

The only non-obvious part of this equation to me (in regards to 3rd party code) is the caching as well as the cacheability of your resources. There are a few unique problems when doing this as a third party, but more or less, the higher percentage of cacheable static resources that make up your app, the more opportunity you have to make the network problem a lot less scary/slow. There are whole books and companies who specialize in this stuff, but we’ll go through at a high level.

— Obligatory Wu-tang Cache Rules Everything Around Me reference. —

Edge caching

One of the easiest wins is using a CDN that implements what’s known as ‘edge-caching.’ This is the practice of a putting a distributed system of cache servers around the world (or country, etc) and serving content from the geographically closest server to each request. This not only decreases load on individual servers, but it also vastly improves latency of requests.

Note that edge-caching is still not quite cached on a user’s computer, but still on a server in the CDN. Even if you have amazing cacheability in your app, you can still benefit from edge-caching for the initial load of the application (when the cache isn’t primed).

Bazaarvoice does this with almost all of its content, even relatively dynamic content. We are one of the largest users of the Akamai CDN, and well over 90% of our total traffic actually just hits Akamai servers and never makes it back to our origin servers.

You may be familiar with something like Amazon S3 to serve your static files. S3 is not distributed by default, but adding on the CloudFront option to your Amazon S3 buckets will achieve many of the same results (assuming you serve from the CloudFront URLs).

Maximum Cacheability

The content in most web apps changes frequently. However, the application really only changes when new code is written and pushed (read: hopefully far less frequently). In an ideal world, everything except for the data would be cached for a length of time (in the end-user’s browser).

The larger the percentage of your application that you can statically generate, the more you can cache.

This seems pretty easy at a glance, but can actually get a bit hairy in a third party environment. Due to the fact that the application is included via JavaScript, the markup for the page is not generally served in the initial request and is injected afterwards instead. This differs from normal webapps that can send their rendered markup along with the initial page request.

Dynamic Content

Currently, Bazaarvoice’s template system integrates with our Java backend, so after the static resources are loaded, we make a request for the rendered content of the page we’re on. This comes back to us as a string, which is saved to a javascript variable.

// simplified example - uncacheable data and templates
var html = "<ul id=\"reviews\"><li>review 1</li><li>review 2</li></ul> ";

Exactly how this works is unimportant, but the key is that this entire additional request is not cacheable (at least for any significant amount of time). Since the data in the rendered templates changes, we not only have to send the fresh data on each request, we also have to send up all the templates each time, multiplied times the number of times they are invoked. Given a significant amount of data, this can add quite a bit of weight to these requests. GZip can help quite a bit here with repeated data, but if you also factor in all the escaping characters to save these as strings in JavaScript, you can see where this could reduce performance by increasing the file size of noncacheable requests.

For the most part, in practice, we don’t serve so much data in an individual request to be effected by this problem on an alarming level, but it’s always nice to optimize further, and we’d love to save the money on bandwidth costs.

In our simple example above, we had the added duplication of a single set of <li> tags. The solution to only sending that once and caching it after the initial load is client-side templating.

If we are able to include our templates in our cached, static resources, you only need to request them once. So even though the library to render them has some cost, it usually pays off quickly (often still on the initial load).

// cacheable template
var tmpl = "<ul id=\"reviews\">{{#each reviews}}<li>{{name}}</li>{{/each}}</ul>";

// uncacheable data 
{ reviews: [{ name: "review 1" }, { name: "review 2" }] };

We’ll get into the actual performance and responsiveness of the app in a later post, but note that in production, these templates can actually be ‘pre-compiled’ into javascript functions that run very quickly (more or less as quickly as string concatenation can take you). Also in production, the templates and data are much larger, so this reduction pays off at scale.

In a third-party application that can cache its templates alongside the application code, the data (usually JSON) is the only unique and the only non-cached bytes going over the wire after the initial load. This is great. I’ll leave optimizing data payloads to a different blog post, but just make sure you only grab the minimum amount of data that you need.

For extreme performance (and SEO to boot!), a third party tool could also integrate the markup injection on the server-side. The markup would go in the initial payload and be instantly visible upon page load (great perceived performance). You would have to write your JavaScript in a way that could bind to pre-rendered templates, but it’s all possible. This is usually a more difficult thing to get clients to agree to. You end up duplicating the markup in loops, like in our first example, but you also save an http request.

Since the client can cache the request for every end user on their app server this ends up being by far the fastest integration in practice. This is a good complement for a high-performance third-party system, even if it can be tough to get clients to agree to.

Data Caching too!

If you want to go even further (and I very much encourage it), you may want to look into caching your data too. This wouldn’t do much for new page loads, but it could really help for common widgets across pages, or reloads/revisits of the same page. Anything that doesn’t take a significant amount of code that can reduce origin requests is probably going to be worth it eventually at scale.

Consider a “Most Recently Reviewed Products” widget that goes across every page on a site. This widget would list 5 products or so. The recency of the products in the list is important, but the exact accuracy is likely not important within a few minute period (unless you’re an unusually high-review-volume webpage). Unfortunately, though, people usually visit quite a few pages in that same time period.

As a third-party, you’re more than likely relying on JSONP to request this information for the widget. Each subsequent page load is highly cached (static resources) because you’re doing all the things we’ve mentioned previously, but you’re forced to re-request this redundant data each time.

I know what you’re thinking…

“Since JSONP injects a script tag under the hood, it is just a GET request for a “javascript file” that can be cached by the browser.” — you

While this is technically correct, it’s usually difficult to actually realize in the real-world for a couple of reasons.

At Bazaarvoice, and in many (read: most of the ones I’ve worked with) APIs, all requests are sent with a ‘no-cache’ header. This stops all caching from occuring. If you do have an API that allows browser caching you’re one step closer, but still usually have another issue.

Most people are using jQuery or other AJAX libraries in order to make JSONP requests to the server. Unfortunately for fans of API caching, these libraries almost always bust your cache, either intentionally, or unintentionally.

This code:

$.ajax({
  url : "http://bestreviewapiever.com/api", 
  data : {"filter": "recent"}, 
  dataType : "jsonp" 
});

actually results in a request to:

http://bestreviewapiever.com/api?filter=recents&callback=jQuery171984098_234234&_=98723498723

And every time you request it, those last two numbers will change. Even if you set the cache option to false – you still end up with unique callback values in each request. This will still bust your cache.

There’s a touch too much code to include here, but the solution is to set cache to false as well as override the jsonpCallback with a value that stays the same. However, you must then manage callback collisions yourself. That has some significant code behind it.

If you successfully implement all that, this means that, as long as edge cases don’t arise, the url will be the same each time, and the jsonp requests would be cache-hits for the cache TTL of the api server.

Another approach that we are in the process of testing and implementing at Bazaarvoice is utilizing localStorage and sessionStorage.

There is not a lot of detail needed, but in browsers that support these DOM Storage utilities, you can write logic to check them for data before making JSONP requests, and invalidate the data after a given time. Make sure you clean up old data when you invalidate, because space is limited, but this is a nice solution, because the fallback is just the natural case.

Currently I’d suggest using sessionStorage with an appropriate TTL on your data. That way it would get cleaned up automatically at the end of the ‘session.’ This solution seems a lot more elegant to me than api cache-headers and JSONP overrides.

Now, after the first request for our data, each subsequent page load has a cache-hit (for our allotted interval) and saves a request from occurring. FASTER!

BONUS TIP!!

If you have very real-time data, don’t count out this method.

One very slick ‘realtime-y’ touch is to have visibly updating data. In our example, even if we wanted to make an uncached request each time, we could instantly display the old data on page load from sessionStorage, wait until all the critical page elements were fully loaded, as to not block their speed, and then load in the new data. If the data was updated, you can put in a smooth update transition to let the user know that your app is awesome and real time. Best of both worlds!

Cache-busting, or “how long do I cache?”

“But Alex! How long do I cache my resources? If I cache them too long, then when I update my app, everyone will have different files!” — you, again.

Very insightful of you. In Bazaarvoice’s current system, each file has a relatively low cache max-age (it varies, but usually between no-cache and ~2 hours). This means that we can be sure that all the files that people have after 2 hours of pushing a change will be the most current one.

This actually ends up being fairly practical. For the most part, people are done browsing a site well before their cache runs out. Then the next time they just have to get it again one time. But we can do better! 😀

We’re working on a new system for caching that affords us more update control, and much higher cache-hit rates. Win-win!

The general idea is to have a single “bootstrap” JavaScript file that we refer to as a “scout” file. This file is usually unique for each our clients, and kept as small as we can keep it. It’s usually included via a script tag (partially for ease of implementation), but could be injected asynchronously, as well.

** Including it as a script tag (synchronously) has the added benefit of caching the dns lookup for the domain in a single request, which should actually speed up subsequent requests in many browsers.

This scout file should have a very low cache max-age — in our case, somewhere in the ballpark of 0 to 5 minutes — depending on the need for control.

<script src="http://bazaarvoice.com/static/scout.js"></script>

EVERYTHING ELSE IS CACHED FOREVER.

In this “scout” file, we kick off our real application by asynchronously injecting our built core javascript file. Concatenation of this main application file and its dependencies is well understood and necessary, so I’m not covering it in this post, but it is essential to good performance.

The trick is that at build time, we embed a version number into our scout file. Then the asynchronous inject function uses this version number as part of the request for the file. This can be done by putting each build in a folder with its version as the name of the folder, or it can just be done via url-rewrites.

I would encourage you to not use the query string (get params) in order to do cache busting, as VPNs and other old browser situations can sometimes ruin the cache, and get you into weird situations.

The affect of this setup is that we have a single, very small file that is constantly looking for updates (hence: “scout file”). It generally is still cached for the average ‘time on site’ of a consumer on one of our clients’ pages, but not much longer. However, if the version of the build doesn’t change, then all the requested scripts after that will be cached forever, or until the version changes again.

If you don’t push a new build, a visitor can still have cached content from years prior and only have to download the “scout” file, but at any point in time, if you run a build, their cache is busted and they get all the new files within the amount of time the scout file was cached for.

// An example of a simple scout file 
(function(){ 
  var VERSION = "12345";

  function injectScript(url){ /*… async script injection …*/} 

  injectScript("http://bazaarvoice.com/static/"+VERSION+"/widget.build.js"); 
})();

This obviously becomes more complex when you have multiple scripts loading. Bazaarvoice uses the require.js library, and its baseUrl configuration option is more or less the only place we end up having to update to make this work for us. It may be more complicated in more hand-rolled solutions, but still very worth it.

The numbers in the end are generally very much in your favor when taking average browsing habits into account. You end up with a few small non-cached hits, but a much better long tail of cacheability of the large application files — with the added benefit of quickly being able to force an update in the event of an emergency.

Conclusion

We glazed over the reducing file-size and concatenation section in half of a paragraph, but I’d like to reiterate that this is just not optional if you care about performance. Your mechanism for doing this may vary, and we don’t mind that, as long as it works for you.

I would however like to point out that you should be constantly testing the performance of your app and identifying your bottlenecks specifically. So often I see people byte-shaving licenses at the tops of their libraries while simultaneously sending improper cache-headers.

The key to tackling performance is always tackling the bottleneck first. Then you’ll have a new bottle neck to tackle.

You can measure network performance of all these techniques in your browser. Google Chrome (and other webkit based browsers) as well as the Firebug extension on Firefox can give you great insight in to the latency, time-to-load, and file size of your files (in the Network tab). You can export this data as a .har file and then import it into a variety of other evaluation tools if you need more info.

At Bazaarvoice we’re tackling our performance issues so we can help give the best experience for our clients’ users.

Remember: after you have a responsible amount of http requests and total file size, cache as much as you can, and then cache more. It’s as simple as that.

Next

Stay tuned for information on Parsing and evaluation, and application responsiveness.

Bazaarvoice Platform API Tutorial

We recently released our new Bazaarvoice Platform API. This is a new RESTful API that allows access to much more data and provides responses in XML and JSON. We are really excited to see the types of applications our clients will be building on the API.

For a quick introduction to the API, we created the API Tutorial (you must be logged in to view) that walks through creating a Javascript-based widget for displaying Bazaarvoice content on any webpage. The tutorial explores the JSONP response format of the API and how it can be used with Javascript templates to inject content onto the page.

You can read the API tutorial or go grab the code from Github and start checking out the new API.

Home Depot using the Platform API to show off their UGC

Clients like the Home Depot are using ideas from our Inspiration Gallery to find innovative ways to show off their user-generated content (UGC) and demonstrate the importance of listening to their customers.

The following image is taken from a Home Depot Store Managers meeting which had all store managers as well as suppliers in attendance. It shows real-time reviews on a globe using a Google Earth placemark based on the reviewer’s IP address.

homedepot_earth_app

Want to try it for yourself?

Click here to apply for an API key, then click here to download the reference app. Don’t forget to send us a picture or video of your app in motion.

Ember.js — the framework formerly known as SproutCore 2.0 (Amber.js)

Update: It looks like there was a conflict with the name Amber with another JavaScript project, so the SproutCore folks have graciously decided to change their name to Ember.js so as not to be in conflict. Check out the details at Yehuda’s blog.

The following is the original blog post:

Yehuda Katz has just officially announced the rebranding of SproutCore 2.0 to Amber.js.

This is refreshing for all the reasons he mentions in his post, especially when attempting to learn new things about the framework. It gets confused with SproutCore 1.0 quite a bit, and I can imagine trying to find answers about Amber.js will be a much easier experience!

Bazaarvoice’s newest product, Customer Intelligence, has been using SproutCore 2.0 since well before its release, and one of the primary developers on that team, Daniel Marcotte, spent a fair bit of time over several months working with the nascent framework and helping to get Beta 2 out the door. Look for an upcoming post where Daniel goes into much more detail on that process.

Will it be usable over dualip? (Vista always started downloading stuff)Will it be possible to turn off indexing? (Can’t concentrate on my work with the chattering the whole time)

Daniel Marcotte on SproutCore 2.0 (now Ember.js)

Daniel Marcotte is one of the top developers on our latest product offering, Customer Intelligence (CI). He spent quite a bit of time evaluating SproutCore 2.0 for the complex user interface requirements of CI, helping to work out some of the kinks in the product as it moved toward beta.

Daniel has recently moved to San Francisco to work on a new project — still for Bazaarvoice, of course! Before he left, I had a chance to sit down with him to chat about his experience working with the framework.

MDN: Welcome, Daniel! Thanks for letting me take some time out of your hectic schedule to get you on record, so to speak.

DM: Ah, my pleasure, Michael. Thank you for doing all the hard work.

MDN: No, no, not at all. You did all the hard work already — I’m just gonna put my name on it!

DM: I’m a team player; you go right ahead!

MDN: Awesome! So let’s get right down to it. You switch seamlessly between front-end and back-end development, and you’re very passionate about the user experience. A lot of developers in the industry want to work on back-end systems and don’t seem to get as excited about building great UIs. So, what do you think contributes to your passion? I’m really interested to hear what drives you because it also tends to drive the Web UI frameworks people choose.

DM: Yeah, that’s fair. I’ve actually given this a little bit of thought because my new manager asked me something about this recently. Something like, ‘Wow, you know, you’re really a UI guy. That’s great; how did you get into that?’ And I had kind of a funny moment, thinking it might be fruitful to start thinking of myself as a UI guy because then it’ll focus me on taking time to even go the extra level. Maybe read some stuff about user experience and even more wacky technologies. Because, I hadn’t traditionally done that. So how did I even end up in this position where someone would think, ‘of course you’re a UI guy’?

I think it’s because I don’t wanna do a shitty job. So if you’ve been tasked with building a UI feature, it needs to work. And it needs to be attractive. It needs to be usable. You need to put yourself in the user’s shoes.

MDN: It’s amazing, but most people don’t think that way. It seems like you actually know a little bit more. Have you actually done any reading and learning about any of this?

DM: I have, yeah…some. But nothing extensive. It’s a bit intuitive which I think gets people relatively far. And I think it has a lot to do with just being honest with yourself, of trying to put yourself in someone’s shoes maybe.

I have a bit of a teaching background. It’s one of my great pleasures. (Let’s ramble a little bit.) And one of my favorite things was being able to find a way to explain something to someone who it was completely new and novel to. To never forget what that was like to learn it for the first time. ‘Cause you can get lost and be an expert and think, ‘oh, it’s obvious, it’s fine.’ And I think that bleeds into UI sometimes. ‘Oh, I know how to use this.’ And that’s not the bar we’re trying to meet here.

MDN: You were instrumental in choosing SproutCore 2.0 for CI. What other Web UI frameworks did you and the team look at before deciding on Sproutcore?

DM: I looked at what Dojo was offering because it has the same full app/infrastructure stuff. It was a touch inscrutable, but really appealing. But it kind of seemed to go down the Ext-JS packaged-widget route. And I knew that Alex Zelenak, our UX designer, would come in and do a lot of custom stuff. And the other big driver, though I’m perceived as choosing this, I was really just the person who made it possible. It was really Alex Sexton that said, ‘this is the one; this is it for these reasons.’ So that kind of focused it. And it sort of became the one, and it needed to lose for some reason as opposed to beat something else.

MDN: Did Alex talk about Handlebars at all, because I know that he’s using it in other places.

DM: Oh, that’s interconnected. You use Handlebars templating inside SproutCore.

MDN: I’m just showing my ignorance!

DM: It’s like a specialized version of Handlebars that’s bindings-aware. So as your values change, your templates re-render.

MDN: You talked about this a little bit, but does SproutCore have widgets too?

DM: No. They’d like to get there, but it’d be like a UI layer. This is more like the structure/spine of the app.

MDN: So other than Alex’s stamp of approval, what are the other reasons you might use it in the future?

DM: I like the way it makes me think. I like the flavor of thinking it lends itself to. This disconnect with the bindings…. (Let me see if I can package this a little bit.) Trying to wrangle web UI using HTML, CSS, JavaScript without making the classic disaster, the classic mess, has been a mission of mine for a while. What’s the answer? What’s the secret? How do we make this maintainable and obvious to the next developer? Extensible, that sort of stuff.

And one of the big realizations I had a while ago was, with Don’t Repeat Yourself, the DOM, it’s the boss in so many things. I have a list of things I’m showing the user, and I don’t wanna duplicate in JavaScript having another list of things and try to synchronize those two. It’s much more a matter of saying something like… DOM: you’re the boss. You’re displaying it. You have these nodes that represent my things. So if I need to reevaluate, I’m just gonna ask you, ‘Hey dude, what’s my list look like?’ OK, I’m gonna modify it inside you ’cause I wanna display it. ‘Cause you’re the boss. And so in practice this works out really well.

MDN: ‘Cause that’s kind of how JQuery works because you have this separate set of code over here and it’s overlayed on top of the DOM.

DM: Yeah exactly. But for some richer apps and much more complex apps, asking the DOM those questions gets more complicated ’cause stuff is kinda all over the place. And so you end up with these really bonkers selectors. And it’s slow to be constantly asking the DOM these questions to see what’s going on with it. So then you start to pull back and say, ‘Oh, crap, I wish I had an object representation of this in my hand!’ Is it worth writing this connective tissue? And so now, enter SproutCore, and the app is sort of pure JS and then you bind it to the UI and you don’t maintain that connection; SproutCore does. Which is cool.

So the classic collections example that ties into the list example earlier, where before the DOM was the boss, now javaScript’s the boss. I have this list and I’m gonna monkey with it. Using observers and bindings, SproutCore’s gonna hold onto the fact that the list corresponds to something in the DOM, that’s rendered by a template. When the data changes, the template is called to re-render that piece.

MDN: So SproutCore owns the DOM then.

DM: Effectively, and it can do smart stuff like buffer your updates. So it’ll just be like manually updating the DOM, but will be a lot faster. Then you kinda get into their flavor of MVC. SproutCore MVC gets quite appealing where you can really build an app that does its thing and then you can write unit tests against that. And when you bind the UI on it, that’s just another view. But independently, you have this app that you can validate does its thing and is independent of how it’s displayed. You do have to make sure you structure it carefully enough that you’re not getting into circular references in your bindings, but it’s super-appealing that your templates are HTML and you just bind them in with the app and there’s a lot of really great ideas. It’s technically very satisfying.

MDN: And do you just bind it on top of a div element, because you’re going to have to put it in somewhere, right? So there’s some ID where you say, hey put this over here in this spot?

DM: You have a script block that has a type “text/html” or “text/x-handlebars”. And one of the first things SproutCore does is it looks at the page and any script block of that type, if it has an ID on it, then it assumes you’re gonna reference this later and so it will just pull it out of the DOM and store it compiled. Otherwise, it’ll replace it inline. You get to write very naturally, right inline, if you’re not reusing that component. It’s got a nice flow to it where it will just replace that piece with the markup it corresponds to.

MDN: Did you run into any problems with SproutCore’s architecture as you were building out the system at all?

DM: So, to even decide if we wanted to do this, I threw together a prototype that recreated what the pilot’s UI — the interested parts — did. And I was really able to see the value because we’re managing different views on the same data. And the user gets to toggle displayed data and interact with what they’re seeing. And so this idea of an app that will hold that state and where you can just swap out these views, it’s great because you can just say “app, here’s some data, do your thing” and now you’re not fetching from the server again and again if you just wanna re-slice something or have a different look at it, you have all that power easily in the client side.

So that was all going really well, but then I’d run into mundane stuff. Like: you’ve sent in a collection and you’re running over it and the designer wants to number each row. Perfectly reasonable request, but the context that you have in the collection rendering is just the collection item that you’re on. So do you artificially start adding indexes to your objects externally? Or do you say we should really have access to the index? And so that was kinda fun; I submitted a patch which exposed the current index when rendering a collection. Another example is that Tapestry [the server side web framework in this project] hosed us because you have almost no access to the head of the documents that it generates, which in general isn’t a problem, but in one of the early versions of SproutCore, it was only looking for stored templates in the head, which was an artificial restriction.

MDN: But not really any big pains or roadblocks. Nothing you couldn’t work around or fix.

DM: There was naturally a bit of instability in a few of the alpha versions I was using. And then there’s a tougher learning curve that goes with that: you can imagine being unable to distinguish between a flaw in understanding and a flaw in the library. But that’s come a long way.

There are also things that it really super-facilitates incredibly, but as with a lot of these things, it can mean limiting you in other ways, like there being hoops to jump through to integrate with say a select control.

And then we got into trouble where other developers started building out what I had there, and it wasn’t obvious that this code worked differently than they were used to. I hadn’t quite gotten it polished to the point of “this is how it’s done and you grow it from here and follow these patterns.” I hadn’t quite gotten there. The schedule comes in and you end up with this problem. So there was a bit of: “I wanna do this thing, and I can picture that vision in the ‘JQuery way.’ Well now, what’s the ‘SproutCore way’ of doing exactly that?” And you end up kind of cheating and creating artificial properties that you can bind to and kind of tricking it into doing things the ‘JQuery way’ instead of doing what is really the ‘SproutCore way’. And that’s really challenging.

MDN: Well, it’s the speed of sort of doing it versus doing it in a way that it’s gonna be good later.

DM: Yeah. Do you want the speed now, or more of it later? And that’s a really hard choice to make. And I’ve been ruminating on whether that’s a flaw on our part, or on the library’s part. How do we keep pushing that stuff forward and make it even better? I’m not sure.

MDN: You made a pretty concerted effort to get SproutCore out of beta and into mainstream for release 2.0. So, you actually fixed a couple of issues. How did that process go? How did you work with the team and get them to take your patches?

DM: Contributing to open source had been mysterious to me for a long time. And I’ve met a lot of people who are also like, ‘oh I want to contribute to open source, but I kinda like don’t know how. And IRC makes me cross-eyed.’ What do you do? In a lot of ways, this was just sort of really kinda lucky — a nice sort of confluence of events. And in my mind some of the things that made it possible was actually meeting Tom Dale and Yehuda Katz (who are great, by the way) and shaking their hands [at TXJS]. And they don’t have to remember it, but that’s kinda your way in later to politely say, ‘Hey, remember when I expressed interest in your framework? I’m using it and I don’t need anything from you. I’m just sharing good news. I’m not after you for anything.’

And then, GitHub makes a lot of this brilliantly easier. You make your fork, you monkey around, you learn something. Figuring out how Github works is pretty straightforward. You still see some trolling and joke pulls and this sort of stuff, but it’s kind of neat and friendly once you get your head around it. So then to contribute to a project: Keep it small. Keep it focused. Explain why. Write a unit test for whatever you’re putting up. And then you just post it, and then you wait.

I also started a bit of not-constant (it was a real priority for me not to bother these guys) but still regular e-mail correspondence with Tom Dale, so I think for my initial patches, the SproutCore guys were able to connect them with me. So I think it sort of gets you in the door that you’re not just some random unknown pull request. And then it was pretty easy. It was super satisfying to get even simple messages like, ‘Hey, thanks!’ And then they merge in your code. and that’s pretty cool.

Being able to take some time to work on that at BV is brilliant. People knew that I was investigating some of this stuff and there was a lot of support. It was cool. And John [Daniel’s manager] got a lot of satisfaction seeing my patches going in and that sort of stuff. So yeah, it was really cool.

I still think navigating all this is kind of hard. It’s social. You get self-conscious someone’s probably going to be a dick to you. Someone’s probably going to ignore you ’cause you’re nobody. But this experience was really good so it’s worth trying more. And I would encourage other people too: find something you’re passionate about, some little tool you use you wish worked a bit different. You’re probably bright enough at this point; you can actually figure out how to make a useful change, even just for yourself initially.

MDN: What kind of support did you get from the rest of the R&D organization here? You mentioned you could do some of it during work hours, which is great.

DM: I just tried to not be stupid about it. As long as I was being productive and watching my diminishing returns level, I was free to put in the time I needed. The VP of Engineering said we should at least try and do something interesting. He wanted to know “Is that feasible?” And so that’s the kind of support. I’m like yeah, sure, I believe that it’s feasible and I have a vision for walling it off so we’re not betting the whole product on it. I can try, but you’ve just introduced some risk that we may have to pay for, we may have to mitigate. And he was fully supportive, and so when that directive to try something fresh comes from that high up, that’s kind of the definition of support. You know what I mean? And I felt like I had the freedom to at any time make the call that we were going to bail on that course of action (the diminishing returns thing). I don’t think anyone would have turned on me and been like, ‘what the hell, Daniel, you didn’t even use that technology. You spent all that time on it.’ I think they would have been like, ‘Thank you for actually evaluating this and making a technical decision.’

MDN: So, would you recommend SproutCore 2.0 for everybody?

DM: I have no idea. It was still in its infancy during this project, so “everybody” seems a bit strong. But I really enjoyed using it, and I’m looking forward to using it more and watching it grow.

MDN: Thanks Daniel, for taking the time to chat with me! It was very educational.

 

Daniel brought in his laptop and showed me https://github.com/dmarcotte/ember-chess. Specifically, we walked through his very informative presentation in just a few minutes: https://github.com/dmarcotte/ember-chess/blob/master/presentation.js.

 

Thank you for sharing this interview. I’m evaluating ember now, and it really excites me that we are finally seeing an appeal to structure and proper separation of concerns on the client-side in JavaScript.

I too am a “hybrid” developer of sorts, with the user experience being at the forefront of my passion and enjoyment. It’s a good place to be nowadays; I remember not too long ago when the art of UI was not getting the respect it deserves. I think Steve Jobs did a good job of helping correct that.

Anyway, having a fellow UX afficianado come off a successful project with ember is pretty encouraging. I’m now pretty confident I’m heading down the right path.

P.S. ctrl + h, sproutcore, tab, ember, enter 😉