Scoutfile: A module for generating a client-side JS app loader

A couple of years ago, my former colleague Alex Sexton wrote about the techniques that we use at Bazaarvoice to deploy client-side JavaScript applications and then load those applications in a browser. Alex went into great detail, and it’s a good, if long, read. The core idea, though, is pretty simple: an application is bootstrapped by a “scout” file that lives at a URL that never changes, and that has a very short TTL. Its job is to load other static resources with long TTLs that live at versioned URLs — that is, URLs that change with each new deployment of the application. This strategy balances two concerns: the bulk of application resources become highly cacheable, while still being easy to update.

In order for a scout file to perform its duty, it needs to load JavaScript, load CSS, and host the config that says which JS and CSS to load. Depending on the application, other functionality might be useful: the ability to detect old IE; the ability to detect DOM ready; the ability to queue calls to the application’s methods, so they can be invoked for real when the core application resources arrive.

At Bazaarvoice, we’ve been building a lot of new client-side applications lately — internal and external — and we’ve realized two things: one, it’s very silly for each application to reinvent this particular wheel; two, there’s nothing especially top secret about this wheel that would prevent us from sharing it with others.

To that end, I’m happy to release scoutfile as an NPM module that you can use in your projects to generate a scout file. It’s a project that Lon Ingram and I worked on, and it provides both a Grunt task and a Node interface for creating a scout file for your application. With scoutfile, your JavaScript application can specify the common functionality required in your scout file — for example, the ability to load JS, load CSS, and detect old IE. Then, you provide any code that is unique to your application that should be included in your scout file. The scoutfile module uses Webpack under the hood, which means you can use loaders like json! and css! for common tasks.

The most basic usage is to npm install scoutfile, then create a scout file in your application. In your scout file, you specify the functionality you need from scoutfile:

var App = require('scoutfile/lib/browser/application');
var loader = require('scoutfile/lib/browser/loader');

var config = require('json!./config.json');
var MyApp = App('MyApp');

MyApp.config = config;

loader.loadScript(config.appJS);
loader.loadStyleSheet(config.appCSS);

Next, you can generate your scout file using a simple Node script:

var scout = require('scoutfile');
scout.generate({
  appModules: [
    {
      name: 'MyApp',
      path: './app/scout.js'
    }
  ],

  // Specify `pretty` to get un-uglified output.
  pretty: true
}).then(function (scout) {
  console.log(scout);
});

The README contains a lot more details, including how to use flags to differentiate production vs. development builds; how to configure the Grunt task; how to configure the “namespace” that is occupied on window (a necessary evil if you want to queue calls before your main application renders); and more.

There are also several open issues to improve or add functionality. You can check out the developer README if you’re interested in contributing.

Analyzing our global shopper network (part one)

Every holiday season, the virtual doors of your favorite retailer are blown open by a torrent of shoppers who are eager to find the best deal, whether they’re looking for a Turbo Man action figure or a ludicrously discounted 4K flat screen. This series focuses on our Big Data analytics platform, which is used to learn more about how people interact with our network.

The challenge

Within the Reporting & Analytics group, we use Big Data analytics to help some of the world’s largest brands and retailers understand how to most effectively serve their customers, as well as provide those customers with the information they need to make informed buying decisions. The amount of clickstream traffic we see during the holidays – over 45,000 events per second, produced by 500 million monthly unique visitors from around the world – is tremendous.

In fact, if we reserved a seat at the Louisiana Superdome for each collected analytics event, we would fill it up in about 1.67 seconds. And, if we wanted to give each of our monthly visitors their own seat in a classic Beetle, we’d need about 4.64 times the total number produced between 1938 and 2003. That’s somewhere in the neighborhood of a hundred million cars!

Fortunately for us, we live in the era of Big Data and high scalability. Our platform, which is based on the principles outlined in Nathan Marz’s Lambda architecture design, addresses the requirements of ad-hoc, near real-time, and batch applications. Before we could analyze any data, however, we needed a way to reliably collect it. That’s where our in-house event collection service, which we named “Cookie Monster,” came into the picture.

Collecting the data

When investigating how clients would send events to us, our engineers knew that the payload had to fit within the query string of an HTTP GET request. They settled upon a lightweight serialization format called Rison, which expresses JSON data structures, but is designed to support URI encoding semantics. (Our Rison plugin for Jackson, which we leverage to handle the processing of Rison-encoded events, is available on GitHub.)

In addition, we decided to implement support for client-side batching logic, which would allow a web browser to send multiple events within the payload of a single request. By sending fewer requests, we reduced the amount of HTTP transaction overhead, which minimized the amount of infrastructure required to support a massive network audience. Meanwhile, as their browsers would only need to send one request, end-users also saw a performance uptick.

Because the service itself needed a strong foundation, we chose the ubiquitous Dropwizard framework, which accelerated development by providing the basic ingredients needed to create a maintainable, scalable, and performant web service. Dropwizard glues together Jetty (a high-performance web server), Jersey (a framework for REST-ful web services), and Jackson (a JSON processor).

BDAP - Cookie Monster Event Collection - Diagram

Perhaps most importantly, we used the Disruptor library‘s ring buffer implementation to facilitate very fast inter-thread messaging. When a new event arrives, it is submitted to the EventQueue by the EventCollector. Two event handler classes, which listen for ring events, ensure that the event is delivered properly. The first event handler acts as a producer for Kafka, publishing the event to the appropriate topic. (Part two of this series will discuss Kafka in further detail.)

The second is a “fan out” logging sink, which mods specific event metadata and delivers the corresponding event to the appropriate logger. At the top of every hour, the previous hour’s batch logs are delivered to S3, and then consumed by downstream processes.

In the real world

When building Cookie Monster, we knew that our service would need to maintain as little state as possible, and accommodate the volatility of cloud infrastructure.

Because EC2 is built on low-cost, commodity hardware, we knew that we couldn’t “cheat” with sophisticated hardware RAID – everything would run on machines that were naturally prone to failure. In our case, we deemed those trade-offs acceptable, as our design goals for a distributed system aligned perfectly with the intent of EC2 auto-scaling groups.

Even though the service was designed for EC2, there were a few hiccups along the way, and we’ve learned many valuable lessons. For example, the Elastic Load Balancer, which distributes HTTP requests to instances within the Auto Scaling group, must be “pre-warmed” before accepting a large volume of traffic. Although that’s by design, it means that good communication with AWS prior to deployment must be a crucial part of our process.

Also, Cookie Monster was designed prior to the availability of EBS optimized instances and provisioned IOPS, which allow for more consistent performance of an I/O-bound process when using EBS volumes. Even in today’s world, where both of those features could be enabled, ephemeral (i.e. host-local) volumes remain a fiscally compelling – if brittle – alternative for transient storage. (AWS generally discourages the use of ephemeral storage where data loss is a concern, as they are prone to failure.)

Ultimately, our choice to deploy into EC2 paid off, and it allowed us to scale the service to stratospheric heights without a dedicated operations team. Today, Cookie Monster remains an integral service within our Big Data analytics platform, successfully collecting and delivering many billions of events from all around the world.

Open sourcing cloudformation-ruby-dsl

Cloudformation is a powerful tool for building large, coordinated clusters of AWS resources. It has a sophisticated API, capable of supporting many different enterprise use-cases and scaling to thousands of stacks and resources. However, there is a downside: the JSON interface for specifying a stack can be cumbersome to manipulate, especially as your organization grows and code reuse becomes more necessary.

To address this and other concerns, Bazaarvoice engineers have built cloudformation-ruby-dsl, which turns your static Cloudformation JSON into dynamic, refactorable Ruby code.

https://github.com/bazaarvoice/cloudformation-ruby-dsl

The DSL closely mimics the structure of the underlying API, but with enough syntactic sugar to make building Cloudformation stacks less painful.

We use cloudformation-ruby-dsl in many projects across Bazaarvoice. Now that it’s proven its value, and gained some degree of maturity, we are releasing it to the larger world as open source, under the Apache 2.0 license. It is still an earlier stage project, and may undergo some further refactoring prior to it’s v1.0 release, but we don’t anticipate major API changes. Please download it, try it out, and let us know what you think (in comments below, or as issues or pull request on Github).

A big thanks to Shawn Smith, Dave Barcelo, Morgan Fletcher, Csongor Gyuricza, Igor Polishchuk, Nathaniel Eliot, Jona Fenocchi, and Tony Cui, for all their contributions to the code base.

Output from bv.io

Looks like everyone had a blast at bv.io this year! Thank yous go out to the conference speakers and hackathon participants for making this year outstanding. Here are some tweets and images from the conference:


https://twitter.com/bentonporter/status/451362916181090304

Continue reading

HTTP/RESTful API troubleshooting tools

As a developer I’ve used a variety of APIs and as a Developer Advocate at Bazaarvoice I help developers use our APIs. As a result I am keenly aware of the importance of good tools and of using the right tool for the right job. The right tool can save you time and frustration. With the recent release of the Converstations API Inspector, an inhouse web app built to help developers use our Conversations API, it seemed like the perfect time to survey tools that make using APIs easier.

The tools

This post is a survey covering several tools for interacting with HTTP based APIs. In it I introduce the tools and briefly explain how to use them. Each one has its advantages and all do some combination of the following:

  • Construct and execute HTTP requests
  • Make requests other than GET, like POST, PUT, and DELETE
  • Define HTTP headers, cookies and body data in the request
  • See the response, possibly formatted for easier reading

Firefox and Chrome

Yes a web browser can be a tool for experimenting with APIs, so long as the API request only requires basic GET operations with query string parameters. At our developer portal we embed sample URLs in our documentation were possible to make seeing examples super easy for developers.

Basic GET

http://api.example.com/resource/1?passkey=12345&apiversion=2

Some browsers don’t necessarily present the response in a format easily readable by humans. Firefox users already get nicely formatted XML. To see similarly formatted JSON there is an extension called JSONView. To see the response headers LiveHTTP Headers will do the trick. Chrome also has a version of JSONview and for XML there’s XML Tree. They both offer built in consoles that provide network information like headers and cookies.

CURL

The venerable cURL is possibly the most flexable while at the same time being the least usable. As a command line tool some developers will balk at using it, but cURL’s simplicity and portability (nix, pc, mac) make it an appealing tool. cURL can make just about any request, assuming you can figure out how. These tutorials provide some easy to follow examples and the man page has all the gory details.

I’ll cover a few common usages here.

Basic GET

Note the use of quotes.

$ curl "http://api.example.com/resource/1?passkey=12345&apiversion=2"

Basic POST

Much more useful is making POST requests. The following submits data the same as if a web form were used (default Content-Type: application/x-www-form-urlencoded). Note -d "" is the data sent in the request body.

$ curl -d "key1=some value&key2=some other value" http://api.example.com/resource/1

POST with JSON body

Many APIs expect data formatted in JSON or XML instead of encoded key=value pairs. This cURL command sends JSON in the body by using -H 'Content-Type: application/json' to set the appropriate HTTP header.

$ curl -H 'Content-Type: application/json' -d '{"key": "some value"}' http://api.example.com/resource/1

POST with a file as the body

The previous example can get unwieldy quickly as the size of your request body grows. Instead of adding the data directly to the command line you can instruct cURL to upload a file as the body. This is not the same as a “file upload.” It just tells cURL to use the contents of a file as the request body.

$ curl -H 'Content-Type: application/json' -d @myfile.json http://api.example.com/resource/1

One major drawback of cURL is that the response is displayed unformatted. The next command line tool solves that problem.

HTTPie

HTTPie is a python based command line tool similar to cURL in usage. According to the Github page “Its goal is to make CLI interaction with web services as human-friendly as possible.” This is accomplished with “simple and natural syntax” and “colorized responses.” It supports Linux, Mac OS X and Windows, JSON, uploads and custom headers among other things.

The documentation seems pretty thorough so I’ll just cover the same examples as with cURL above.

Basic GET

$ http "http://api.example.com/resource/1?passkey=12345&apiversion=2"

Basic POST

HTTPie assumes JSON as the default content type. Use --form to indicate Content-Type: application/x-www-form-urlencoded

$ http --form POST api.example.org/resource/1 key1='some value' key2='some other value'

POST with JSON body

The = is for strings and := indicates raw JSON.

$ http POST api.example.com/resource/1 key='some value' parameter2:=2 parameter3:=false parameter4:='["http", "pies"]'

POST with a file as the body

HTTPie looks for a local file to include in the body after the < symbol.

$ http POST api.example.com/resource/1 < resource.json

PostMan Chrome extension

My personal favorite is the PostMan extension for Chrome. In my opinion it hits the sweet spot between functionality and usability by providing most of the HTTP functionality needed for testing APIs via an intuitive GUI. It also offers built in support for several authentication protocols including Oath 1.0. There a few things it can’t do because of restrictions imposed by Chrome, although there is a python based proxy to get around that if necessary.

Basic GET

The column on the left stores recent requests so you can redo them with ease. The results of any request will be displayed in the bottom half of the right column.

postman_get

Basic POST

It’s possible to POST files, application/x-www-form-urlencoded, and your own raw data

postman_post

POST with JSON body

Postman doesn’t support loading a BODY from a local file, but doing so isn’t necessary thanks to its easy to use interface.

postman_post_json

RunScope.com

Runscope is a little different than the others, but no less useful. It’s a webservice instead of a tool and not open source, although they do offer a free option. It can be used much like the other tools to manually create and execute various HTTP requests, but that is not what makes it so useful.

Runscope acts a proxy for API requests. Requests are made to Runscope, which passes them on to the API provider and then passes the responses back. In the process Runscope logs the requests and responses. At that point, to use their words, “you can view the request/response details, share requests with others, edit and retry requests from the web.”

Below is a quick example of what a Runscopeified request looks like. Read their official documentation to learn more.

before: $ curl "http://api.example.com/resource/1?passkey=12345&apiversion=2"
after: $ curl "http://api-example-com-bucket_key.runscope.net/resource/1?passkey=12345&apiversion=2"

Conclusion

If you’re an API consumer you should use some or all of these tools. When I’m helping developers troubleshoot their Bazaarvoice API requests I use the browser when I can get away with it and switch to PostMan when things start to get hairy. There are other tools, I know because I omitted some of them. Feel free to mention your favorite in the comments.

(A version of this post was previously published at the author’s personal blog)

BV I/O: Nick Bailey – Cassandra

Every year Bazaarvoice holds an internal technical conference for our engineers. Each conference has a theme and as a part of these conferences we invite noted experts in fields related to the theme to give presentations. The latest conference was themed “unlocking the power of our data.” You can read more about it here.

Nick Bailey is a software developer for datastax, the company that develops commercially supported, enterprise-ready solutions based on the open source Apache Cassandra database. In his BV I/O talk he introduces Cassandra, discusses several useful approaches to data modeling and presents a couple real world use-cases.