In the previous post, I provided a rundown on what Bazaarvoice Labs is, our process and why it is important to have flexibility in our toolset choices. I now want to give you some tool examples in the following categories:
- Operational Tools
- Server-side Application Development Environments
- Data Storage and Management
- Client-side Tools
- Measurement Tools
Operational Tools
- Amazon EC2: Well, duh. I mentioned that we need to seamlessly transition from internal prototypes to live running pilots and by using EC2, Elastic Load Balancer and creating a set of mostly standardized AMIs, we’re able to get a machine up and running to demo a prototype or scale out to supporting hundreds of thousands of requests almost instantly. Key to our use of the EC2 is the fact that it has a very robust API and tools like boto so we can automate just about everything that we do. This is important since it’s well documented that EC2 instances can go up and down without rhyme or reason. Which brings me to my next operational tool…
- Cloudkick: We use Cloudkick for basic monitoring. Its UI is simple and it just plain works. Given how frequently we take services and applications up and down in EC2, it’s really nice to have an easily configurable, straightforward monitoring solution to rely on.
Server-side Application Development Environments
- Ruby on Rails and Django: While we’ve experimented with microframeworks like Flask, sometimes when you’re moving fast and prototyping, you don’t know exactly what you need or when you’re going to need it. You may not want to think about what ORM or templating language to use or want to re-invent how user sessions are handled and it’s times like these that a nice full-stack web application framework comes in handy. Why both though? Well, quite simply, some engineers on our team prefer Ruby and some (most) prefer Python. This is where our one engineer, one project comes in handy. We work with the tools that will make us fastest. Ultimately, if someone needs to step up and lend a hand on a project when someone is on vacation, we’re all polyglots and can get our hands dirty in any language or framework necessary. The Facebook apps referenced above were written in Rails and the very, very high traffic pilot that we ran with TurboTax was written with Django (as was our Customer Intelligence product).
- Node.js: The evented asynchronous server built on Google’s V8 Javascript engine. Node is a great tool to use when you’re building an application that needs to pull data in from multiple HTTP-based APIs and mash it together. Its performance is remarkable and it allows a developer to work in the same language in both the client and on the server. While some people think server-side JS is a fad, I think Node is leading a revolution in how people build and think about web applications. Please note that Node is so much more useful than for just building webapps. It can be used, for example as a very effective proxy as well (see Joe Stump’s answer about what technologies SimpleGeo uses on Quora). Data for Travelocity’s Social Connect Discovery pilot is served from Node.js backed with the Bazaarvoice Developer API and custom indices stored in Redis.
Data Storage and Management
- ElasticSearch: We’re no strangers to Lucene-based search and data stores at Bazaarvoice. Most of our core platform’s displays are backed by queries made to SOLR. However, unlike SOLR, ElasticSearch is schema free and therefore really nice to use for prototyping and pilots where you’re not sure of the kinds of data that you’ll be wanting to index. There are some gotchas with this approach but for Labs projects, we’ll take the flexibility it offers. As a side note, it’s amazing how often Lucene-based tools are left out of the NoSQL discussion (In fact, my colleague RC Johnson did a SXSWi presentation on this). The search functionality in our Ask and Answer for Facebook pilot with Nikon is driven out of ElasticSearch.
- MongoDB: We’ve used MongoDB in any number of Labs pilots at this point. Most notably, it drives the leaderboard and newsfeed functionality in our Ratings and Reviews for Facebook pilot with Benefit Cosmetics and also the majority of our new product discovery pilot application that we’re running with Sam’s Club in Facebook.
Client-side Tools
- Dust: Dust is a Javascript templating library well suited to asynchronous applications. We like Dust because it’s a flexible and easy to use templating language, it integrates well with server-side JS tools like Node and allows you to pre-compile your templates for great performance.
- Protovis: Protovis is an excellent visualization library. It’s declarative and very easy to build complex, interactive visualizations while still having a high degree of flexibilty over how those visualizations are rendered. We use Protovis to create what I believe are visualizations that are way beyond typical for an analytics tool in our Customer Intelligence product.
Measurement Tools
- Google Analytics: It’d be tough to tell where we’d be without Google Analytics. It’s got its obvious uses, but also has comprehensive APIs that allow you to call custom events, set variables and then suck the data back out as necessary. This allows us to track specific actions that a user takes and to set up funnels based on those actions (even when the actions are clicks within a page vs. full page views).
- Mixpanel: Mixpanel is a great alternative to Google Analytics. Many of our projects in Bazaarvoice Labs take the form of Javascript plugins or widgets that don’t conform to the traditional page-view-first mentality of most web analytics. Mixpanel focuses much more on tracking individual events that a user takes either in-page or across pages. Their API for doing this is very easy to use and it has the added benefit of being realtime which means you don’t have to wait a few hours to start seeing results from that code change that you just launched.
Of course, no project, prototype or pilot would get off the ground in Bazaarvoice Labs if we couldn’t get at our customer’s data. In order to maintain agility, all Bazaarvoice Labs projects are written as free-standing applications that are not part of our core application stack (a somewhat traditional J2EE application built on Spring MVC). Early on in Labs, even though we had direct access to our databases, we knew we needed to maintain separation between our core stack and Labs applications. Since we maintain a very complex set of business rules that are configurable on a per client basis around content submission and display, if we were to write directly to the databases, there’d be a high risk that we’d compromise data integrity. Generally, we’d use our existing XML API for submission (because it was obvious that trying to write data into the DBs from a separate application was a recipe for disaster) but we’d still use replicas of our core MySQL database clusters for display. That was okay but there were still some business logic mistakes made in the display of content (unacceptable when your pilot clients are some of the biggest online retailers around). In order to get around this, we created a new API that supported significantly higher degree of queryability, JSON and JSON-P data formats and had much lighter weight responses. This allows Bazaarvoice Labs to talk to our core data sets in a much more efficient manner and be assured that business rules are followed. This new API has now be productized as The Bazaarvoice Developer API. We will often create new, experimental method calls or create application local data indexes, but every single Bazaarvoice Labs project leverages this API heavily.
I hope I’ve given you a good overview of how Bazaarvoice Labs operates and the tools that keep us humming. It’s great to be able to work in an environment where exploration of new ideas and technologies are supported and encouraged. By operating the Bazaarvoice Labs team off-stack, it gives the Labs Engineers a chance not only to give input into what new products get built but what technologies get used to build them in a very low risk way.