In the previous post, I provided a rundown on what Bazaarvoice Labs is, our process and why it is important to have flexibility in our toolset choices. I now want to give you some tool examples in the following categories:
- Operational Tools
- Server-side Application Development Environments
- Data Storage and Management
- Client-side Tools
- Measurement Tools
- Amazon EC2: Well, duh. I mentioned that we need to seamlessly transition from internal prototypes to live running pilots and by using EC2, Elastic Load Balancer and creating a set of mostly standardized AMIs, we’re able to get a machine up and running to demo a prototype or scale out to supporting hundreds of thousands of requests almost instantly. Key to our use of the EC2 is the fact that it has a very robust API and tools like boto so we can automate just about everything that we do. This is important since it’s well documented that EC2 instances can go up and down without rhyme or reason. Which brings me to my next operational tool…
- Cloudkick: We use Cloudkick for basic monitoring. Its UI is simple and it just plain works. Given how frequently we take services and applications up and down in EC2, it’s really nice to have an easily configurable, straightforward monitoring solution to rely on.
Server-side Application Development Environments
- Ruby on Rails and Django: While we’ve experimented with microframeworks like Flask, sometimes when you’re moving fast and prototyping, you don’t know exactly what you need or when you’re going to need it. You may not want to think about what ORM or templating language to use or want to re-invent how user sessions are handled and it’s times like these that a nice full-stack web application framework comes in handy. Why both though? Well, quite simply, some engineers on our team prefer Ruby and some (most) prefer Python. This is where our one engineer, one project comes in handy. We work with the tools that will make us fastest. Ultimately, if someone needs to step up and lend a hand on a project when someone is on vacation, we’re all polyglots and can get our hands dirty in any language or framework necessary. The Facebook apps referenced above were written in Rails and the very, very high traffic pilot that we ran with TurboTax was written with Django (as was our Customer Intelligence product).
Data Storage and Management
- ElasticSearch: We’re no strangers to Lucene-based search and data stores at Bazaarvoice. Most of our core platform’s displays are backed by queries made to SOLR. However, unlike SOLR, ElasticSearch is schema free and therefore really nice to use for prototyping and pilots where you’re not sure of the kinds of data that you’ll be wanting to index. There are some gotchas with this approach but for Labs projects, we’ll take the flexibility it offers. As a side note, it’s amazing how often Lucene-based tools are left out of the NoSQL discussion (In fact, my colleague RC Johnson did a SXSWi presentation on this). The search functionality in our Ask and Answer for Facebook pilot with Nikon is driven out of ElasticSearch.
- MongoDB: We’ve used MongoDB in any number of Labs pilots at this point. Most notably, it drives the leaderboard and newsfeed functionality in our Ratings and Reviews for Facebook pilot with Benefit Cosmetics and also the majority of our new product discovery pilot application that we’re running with Sam’s Club in Facebook.
- Protovis: Protovis is an excellent visualization library. It’s declarative and very easy to build complex, interactive visualizations while still having a high degree of flexibilty over how those visualizations are rendered. We use Protovis to create what I believe are visualizations that are way beyond typical for an analytics tool in our Customer Intelligence product.
- Google Analytics: It’d be tough to tell where we’d be without Google Analytics. It’s got its obvious uses, but also has comprehensive APIs that allow you to call custom events, set variables and then suck the data back out as necessary. This allows us to track specific actions that a user takes and to set up funnels based on those actions (even when the actions are clicks within a page vs. full page views).
Of course, no project, prototype or pilot would get off the ground in Bazaarvoice Labs if we couldn’t get at our customer’s data. In order to maintain agility, all Bazaarvoice Labs projects are written as free-standing applications that are not part of our core application stack (a somewhat traditional J2EE application built on Spring MVC). Early on in Labs, even though we had direct access to our databases, we knew we needed to maintain separation between our core stack and Labs applications. Since we maintain a very complex set of business rules that are configurable on a per client basis around content submission and display, if we were to write directly to the databases, there’d be a high risk that we’d compromise data integrity. Generally, we’d use our existing XML API for submission (because it was obvious that trying to write data into the DBs from a separate application was a recipe for disaster) but we’d still use replicas of our core MySQL database clusters for display. That was okay but there were still some business logic mistakes made in the display of content (unacceptable when your pilot clients are some of the biggest online retailers around). In order to get around this, we created a new API that supported significantly higher degree of queryability, JSON and JSON-P data formats and had much lighter weight responses. This allows Bazaarvoice Labs to talk to our core data sets in a much more efficient manner and be assured that business rules are followed. This new API has now be productized as The Bazaarvoice Developer API. We will often create new, experimental method calls or create application local data indexes, but every single Bazaarvoice Labs project leverages this API heavily.
I hope I’ve given you a good overview of how Bazaarvoice Labs operates and the tools that keep us humming. It’s great to be able to work in an environment where exploration of new ideas and technologies are supported and encouraged. By operating the Bazaarvoice Labs team off-stack, it gives the Labs Engineers a chance not only to give input into what new products get built but what technologies get used to build them in a very low risk way.