BV I/O: Unlocking the power of our data

At Bazaarvoice, we strongly believe that our people are our most important asset. We hire extremely smart and passionate people, let them loose on complex problems, and watch all the amazing things they create. We had another opportunity to see that innovation engine in full effect last week at our internal technical conference and 2 day hackathon.

Every year we hold an internal technical conference for our engineers and technical community. If you are lucky enough to have been at Bazaarvoice, you remember our conference last year called BV.JS which was all about front end UI and javascript, and in years’ past we did Science Fairs. Last year at BV.JS we were focused on redesigning our consumer facing ratings and reviews product (Conversations) so we gathered some amazing javascript gurus such as Paul Irish (@paul_irish), Rebecca Murphey (@rmurphey), Andrew Dupont (@andrewdupont), Alex Sexton (@SlexAxton) and Garann Means (@garannm) to school us on all the latest in javascript.

This year our event was called BV.IO and we are focused on “unlocking the power of our data”, so we asked some great minds in big data analytics and data visualization to come inspire our engineering team.

IMG_7798ab_group
The event kicked off with a day at the Alamo Drafthouse. Bazaarvoice is powered by tacos, so of course there were tons of breakfast tacos to get us ready for a fun filled day of learning and mind opening presentations, and a colorful pants competition, but I digress and will get to that in a minute.

IMG_7752ab_adrian
First up was Adrian Cockcroft (‪@adrianco‬), cloud architect from Netflix. We are big fans of Netflix’s architecture and we use and have added to several of their open source packages. Some of the projects we use are Curator, Priam and Astyanax. Adrian gave us an update on some of the new advancements in Netflix’s architecture and scale as well as details on their new open source projects. Netflix is also running an open source competition called NetflixOSS and they have some cool prizes for the best contributions to their projects. The competition is open until September 15, 2013, so get coding.

Jason Baldridge (‪@jasonbaldridge‬), Ph.D. and associate professor in Computational Linguistics at the University of Texas, presented on scaling models for text analysis. He shared some really interesting insights into things that can be done with geotagged, temporal, and toponym data. Nick Bailey (‪@nickmbailey‬), an engineer at DataStax, presented on Cassandra best practices, new features, and some interesting real world use cases. And Peter Wong (‪@pwang‬), Co-founder and President of Continuum Analytics, gave a really entertaining talk about planning and architecting for big data as well as some interesting python packages for data analysis and visualization.

Ok, and now back to the most important aspect of the day, the Colorful Pants Competition. Qingqing, one of our amazing directors of engineering, organized this hilarious competition. Can you guess who was the winner?
IMG_7766ab_cpc

We really enjoyed all the speakers, and we know that you will too, so we will be sharing their presentations on this blog in the coming days and weeks.

Check back regularly for the videos.

Jolt released to the world

We are pleased to announce a new open source contribution, a Java based JSON to JSON transformation tool named Jolt.

Jolt grew out of a BV Platform API project to migrate the backend from Solr/MySql to Cassandra/ElasticSearch.  As such, we were going to be doing a lot of data transformations from the new ElasticSearch JSON format to the BV Platform API JSON format.

Prior to Jolt, there were 3 general strategies for doing JSON to JSON transforms :

  1. Convert to XML, use XSLT, convert back to JSON
  2. Use your input JSON and a template language to build your output JSON
  3. Write custom code

Those options were rather unpalatable, so we went with option “4”, write reusable custom code.

The key insight was that there are actually separable concerns when doing a transform, and that part of the reason the XSLT or template approaches are unpalatable, is that they force you to deal with them all together.

Jolt tackles each separate concern individually :

  1. Identify the pieces of the input data that you care about and place them in the output JSON
    • Jolt provides a transform, “shift”, that has its own JSON based declarative DSL (domain specific language)
  2. Make sure the output JSON looks correct ( apply defaults to the output JSON )
    • Jolt provides a transform, “default”, with its own JSON based declarative DSL
  3. Handle all the JSON text formatting (comma, closing curly brackets etc)
    • Jolt operates on “hydrated” JSON data (Map<String,Object> and List<Object>) and leverages the Jackson library to handle serialization / JSON text formatting
  4. Verify the transform for data and format correctness
    • Jolt provides a test tool called Diffy so that you can unit test your transforms for data and format correctness
    • For format correctness, this is not as good of an answer as an xml dtd is, but you could pull in the JSON schema if you wanted
  5. Perform arbitrary custom data manipulations like adding fields together or performing date conversions
    • Jolt provides an interface where you can implement your own custom logic to be run in series with the other transforms

The code is now available at Github, and jar artifacts are now being published to Maven central.

How Bazaarvoice Weathered The AWS Storm

Greetings all! In the world of SaaS, wiser men than I have referred to Operations as the “Secret Sauce” that distinguishes you from your competition. As manager of one of our DevOps teams, I wanted to talk to you about how Bazaarvoice uses the cloud and how we engineer our systems for maximum reliability.

You may have heard about the AWS Storm and the Leapocalypse, two events that made the weekend of June 29th last year a sad one for many Internet companies. Electrical storms in the Northeast knocked out one of Amazon Web Service’s availability zones in their US East region Friday night, knocking many services off the air (Netflix, Mozilla, Pinterest, Heroku, LinkedIn, Foursquare, Yelp, Instagram, Reddit, and many more). Then on Saturday a “leap second” caused Java virtual machines across the planet to freak out and peg CPUs. Guess what two technologies we use heavily here at Bazaarvoice? That’s right, Amazon Web Services and Java.

Here’s a great graph from alerting service PagerDuty showing the impact these two events had across the Internet:

AWS-outage-graph-small

But here’s the Keynote graph we use to continually monitor our customer facing services for the same time period:

bvkeynote

It’s really five different graphs for a set of our major customers overlaid, but it’s hard to tell because they are all flatlined on top of each other. That’s right – we had 100% availability on all our properties for the entire crisis period.

Were we “untouched?” By no means. We lost 77 servers, 37 of which were production servers, during this time. But by architecting for resilience, we have constructed a system to avoid customer impact even when an outage hits our systems like a shotgun blast.

As you know from previous blog posts, we’re a big Solr and mySQL shop. We shard our customers into distinct “clusters” for scalability (we’re up to seven). But then each cluster is mirrored into the AWS East region, AWS West region, and Rackspace. Inside each region, we make use of multiple availability zones and levels of load balancing (haproxy in the cloud, F5 in Rackspace). Here’s the view inside one region (1/3 of a cluster):

cluster

Then we use Neustar GTM for DNS-based traffic balancing across all three parts of the cluster in AWS East, AWS West, and Rackspace. This means we can lose zones within a region, or even a full region, without downtime – though in a case like this, we definitely had to expand our capacity in AWS West when AWS East went down so that we wouldn’t have performance issues, and we did have engineers working over the weekend to clean up the “debris” left over from the outage. We are working on engineering our clusters to dynamically scale up and clean up behind themselves to avoid that manual work.

But what about the data, you ask? Well, the other key to this setup is that we use a message queue for all writes instead of writing synchronously to the database. This gives us a huge amount of flexibility in how and where we process them – we use a master/slave relationship for each cluster where the mySQL and Solr are mastered out of Rackspace, but with that architecture if Rackspace were completely down all that does is delay content submission – nothing is lost and the user experience is still good.

This architecture also allows us to scale quickly and gives us a lot of control over shifting traffic around to meet specific challenges (like Black Friday). More on that in a later post, but I hope this gives you some insight into how we make our service as reliable as we can!

Technical Talk: How Bazaarvoice is using Cassandra and Elastic Search

In late May the Bazaarvoice team was delighted to speak again as a part of Database Month in New York City, and we were excited to speak about our recent work with Cassandra & Elastic Search. We discussed our goals for replacing the Bazaarvoice data infrastructure, as well as our hopes for the new system, and then we dove into the internal details of how we’re using Cassandra and Elastic Search to handle the scale needed by the myriad Bazaarvoice applications. We had a lot of fun at the talk as well as answering questions for quite some time aftewards, and we’re always excited to talk about this even more.

Hello, World: Thoughts from an Undergrad Intern

Last summer I was given an internship opportunity in the R&D department of Bazaarvoice as a software engineer. Having only finished my freshman year in college, I had no idea what to expect from a tech company of this size. I would have never guessed that on my first day I would be handed a macbook pro with better specs than any computer I had used before. I certainly didn’t anticipate anything like the All Hands conference downtown, or the Seven Year anniversary party. And although I suspected I would pick up a few new skills during my employment, I could have never imagined the breadth of my learning over the past few months.

My first week was spent setting up my development environment, and performing the dev onboarding — writing a game of tic-tac-toe in GWT. For those of you who do not know, GWT is the Google Web Toolkit, which Wikipedia identifies as “an open source set of tools that allows web developers to create and maintain complex JavaScript front-end applications in Java.” Virtually all of my programming experience is in Java, so I was grateful that I would be able to apply that experience. However, all of the programs I had written before had been completely text based. None of my previous projects involved a graphical interface, even one as simple as tic-tac-toe. I recall each day of my first week following the same pattern — I arrived in the morning feeling overwhelmed at the alien concept I was supposed to learn that day; by the late afternoon I had struggled through enough tutorials and examples to feel that I had a decent understanding of the new skill, and before leaving the office I would begin to look at the next subject to learn, and would feel overwhelmed all over again.

Overall I feel that my first week was analogous to any of the computer science classes I have taken in school, albeit at an accelerated pace. I would write some code, and once I felt comfortable with my progress, I would show it to my mentor, who would point out the things I had done correctly, and offer advice for improvement in the areas where it was needed. Like any assignment for school, the tic-tac-toe project was exclusively for my own benefit; no one was going to use my code for anything, but there is no better teacher than practice. The project served its purpose, because by the end of my first week I was developing code for the product.

Development was a totally new experience for me. All of my previous programming involved starting from scratch, and at completion, every line of code was my own. At Bazaarvoice, I was a developer jumping into a code base thousands of lines deep, and there were a dozen other developers constantly shaping it. It was an awesome experience working as a member of a team, rather than working on a project on my own. As a team member I not only gained experience in programming, but also in the team dynamics of software development. I feel that I was able to be a boon to my team, having contributed code in the development of several features. Because I was writing code for features that the team was familiar with, it was very easy to get help from the other members of the team. I was only one degree of separation away from the developer who had the answer to one of my many questions; if the first person I asked did not know the answer, they could immediately direct me to someone who did. This high availability of assistance was a key factor in my ability to put useful code into the product. The code review system — in which every line of code was looked at by at least one other developer before entering the product’s codebase — ensured that my code conformed to the proper coding practices. Because of this, I not only improved the functionality of my code, but also the style.

My education went beyond the software itself. I attended all of the team meetings, which taught me about the “real life” side of software development. I attended meetings which showed me how the developers made sure that the product they were producing was in line with the needs of the customers. Every week involved multiple meetings to estimate the time it would take to implement various features, so that the team’s progress could be projected into the future,  and we could know how long it would take for the product to have certain degrees of functionality. The meetings were one of the most informative aspects of my internship, because they addressed issues that I had never even considered as a CS student.

This was my first internship, so while I can’t offer a personal comparison between working on a personal project and working as a team member, I can say that last summer was an enriching opportunity, which not only expanded my computer science knowledge, but taught me first hand what it means to work as a software engineer, and in my mind, that’s the most valuable thing I can get out of an internship.

This summer I have returned to Bazaarvoice, and this time I will be working on an independent project from the ground up. I am excited to see how this internship compares and contrasts to my first.

Mashery-powered Bazaarvoice Developer portal is LIVE!

Welcome to the Mashery-powered Bazaarvoice Developer portal. We strive to give you the tools you need to develop cutting-edge applications on the Bazaarvoice platform.

Some changes you’ll notice:

  • You no longer have to login to see documentation. Just click the Expand icon (expand) to drill down to the information you need.
  • If you want to request an API key or need to contact us with a support question, you will need to create and use a Mashery ID (or use your existing one if you access other Mashery-powered APIs). Your current Bazaarvoice developer portal ID will no longer work.

Note that none of your existing API keys are affected by this transition. They will continue to work without interruption.

Thanks for your support of the Bazaarvoice platform.

Bazaarvoice developer portal moving to Mashery effective 3/1/13

We are pleased to announce that on March 1, 2013, we are moving our developer portal hosting to Mashery.

What does this mean to you, our developer community:

  • None of your existing API keys will be affected by this transition. They will continue to work without interruption.
  • You no longer need to log in to see API documentation or any public content on the site.
  • To request a new API key, you will need to register with a Mashery ID (or use your existing one if you access other Mashery-powered APIs). Your current Bazaarvoice developer portal ID will no longer work after March 1st.
  • There is no change to the developer portal URL (http://developer.bazaarvoice.com).

We know that the increased security and faster key generation will enhance your development experience. As always, thank you for developing on the Bazaarvoice platform. We look forward to seeing your applications.

Platform API release notes, version 5.4

We are pleased to announce that the following functionality has been developed for version 5.4:

  • Submission forms pre-filled for non-anonymous users
  • Full text search on all UGC and on includes
  • Product family queries
  • Photo upload accepts URLs
  • Brightcove Smart Player Javascript integration
  • Story rating field exposed in the response
  • Special product attributes exposed in the response
  • New filtering capabilities

More detailed information on each of these items is listed below. For complete documentation, refer to the Platform API documentation, version 5.4.

Submission forms pre-filled for non-anonymous users

When submitting content, the values of all known submission fields are now returned in the submission response fields. This only affects submissions where the user is not anonymous and the user/userid parameter is provided with the GET request.

Full text search on all UGC and on includes

The following content types were added to the existing search capabilities:

  • reviews
  • answers
  • comments (story and review)
  • stories

All content is now searchable. For a list of all the fields that are searched for any given content type, see the API Basics page.

Product family queries

When filtering by product id, all content from that product’s product family is also returned by default. There is a new excludeFamily parameter that you can set to not return product family content. For examples and full documentation, see the Product Display method page.

Photo upload accepts URLs

The uploadphoto endpoint now accepts HTTP URLs of images in addition to locally stored photos from the client side. For examples and full documentation, see the Photo Submission method page.

Brightcove Smart Player Javascript integration

Brightcove videos can be loaded in a variety of ways. The information necessary to load these videos in the browser is now returned in the Videos block of the response elements. See the API Basics page for details on the new response items that were added to support Brightcove videos.

Story rating field exposed in the response

The story display response has a new block called “StoryRating” that contains two fields:

  • Average score – average of the rating feedback score displayed for each story ID
  • Range – range of the average score

Special product attributes exposed in the response

The product display response has new fields for each of the following five product attributes:

  • EANs
  • UPCs
  • ISBNs
  • ModelNumbers
  • ManufacturerPartNumbers

New filtering capabilities

The following new filters are available:

  • Affiliation filter on reviews
  • Brand answer filters on questions and answers
  • Brand external ID filter for reviews, stories, questions, and products
  • Content locale filter inline ratings (statistics.json)

For more information, see the appropriate method’s documentation.

Interns and graduates – Keys to job search success

Bazaarvoice R&D had a great year of intensive university recruiting with 12 interns joining our teams last summer and working side-by-side with the developers on our products. We have further expanded the program this year to accommodate two co-op positions for students from the University of Waterloo. The influx of fresh ideas and additional energy from these students has been great for the whole organization!

For many students, looking for an internship or graduate employment may be their first time experiencing the interview process and creating resumes, and I’d like to offer some advice for those of you in this position. These guidelines are intended to help you think about how to present your capabilities in the best possible light, and in my experience, apply to tech interviews at most companies.

What we’re looking for

For new graduate and internship positions, it often surprises students that tech companies are, in general, less focused on them knowing specific technologies or languages. They are more focused on determining whether you have:

  • solid CS fundamentals (data structures, algorithms, etc.)
  • passion for problem-solving and software development
  • an ability to learn quickly

It is generally expected that you have confidence in at least one programming language, with a solid grip on its syntax and common usage. It is also helpful for you to demonstrate an understanding of object-oriented concepts and design but, again, this can be independent of any specific language.

Resumes

Your resume is a critical chance to sell the reader on your abilities. While it can be uncomfortable to ‘toot your own horn’ it is important that you use your resume to try to differentiate yourself from the sea of other candidates. A dry list of courses or projects is unlikely to do this, so it really is worth investing a lot of thought in expressing on the resume what was particularly interesting, important, or impressive about what you did.

  • Definitely include details of any side projects that you’ve worked on, and don’t be afraid to demo them if you get the chance (mobile apps, websites, etc.). Some students are embarrassed because they are just hobby-projects and not commercial-grade applications – this doesn’t matter!
  • Include details of anything that you are proud of, or that you did differently or better than others.
  • If you mention group/team projects be sure to make it clear what YOU did, rather than just talking about what the team did. Which bits were you responsible for?
  • Don’t emphasize anything on your resume that you are not prepared for a detailed technical discussion on. For example, if you make a point of calling out a multi-threaded, C-programming project, you should be confident talking about threading, and being able to do impromptu coding on a whiteboard using threads. We’re not expecting perfection, but are looking for a solid grasp on fundamentals and syntax.
  • Leave out cryptic course numbers – it’s unlikely that the person reading your resume knows what ‘CS252’ means, but they will understand ‘Data Structures’.
  • Make sure you have good contact info – we do occasionally see resumes with no contact info, or where the contact info had typos.

Interview technique

While an interview can be nerve-wracking, interviewers love to see people do well and are there to be supportive.

  • Coding on a whiteboard is difficult (but the interviewer knows that) – a large chunk of most technical interviews is problem-solving or coding on a whiteboard. Interviewers are very understanding that coding on a whiteboard is not easy, so don’t worry about building neat code from the outset.
  • Don’t rush to put code on the board – think about the problem, ask clarifying questions, and maybe jot a few examples down to help you get oriented.
  • Talk through what you are thinking – a large part of a technical interview is understanding how the person is thinking, even if you’re running through approaches to eliminate. Getting to an answer is only part of what the interviewer is looking for, and they want to see what your thought process is.
  • Ask for help (but not too quickly!) – it’s OK that you don’t know everything, and sometimes get stuck. If you get stuck, explain what you are stuck on and the interviewer will be prepared to guide you.
  • Use what you are familiar with – you will likely be asked to code in the language you are most comfortable with. Do it! Some students think the interviewer is ‘expecting’ them to use a certain language because it’s one we used at the hiring company, but that’s not the case.
  • Perfection is not required – while no interviewer is ever going to complain if you do everything perfectly, forgetting a little piece of syntax, or a particular library function name is not fatal. It’s more important that you write code that is easy to follow and logically reasoned through. Also remember it’s OK to ask for help if you are truly stuck. At the same time, if your syntax is way off, and you’re asking for help on every line of code then you’re probably not demonstrating the level of mastery that is expected.
  • Consider bringing a laptop if you have code to show – while the interviewer may choose to focus on whiteboard problem-solving, it is a nice option to be able to offer showing them code you’ve written. Be sure to bring your best examples; ones that show off your strengths or originality. Make sure it is code you know inside-out as you will likely be questioned in detail about why you did things a certain way.
  • Come prepared with questions for the interviewer – the interview is an opportunity for you to get to know the company in more detail, and see if it’s somewhere you’d like to work. Think about the things that are important to you, and that you’d use to decide between different employment/internship offers.

Over my career I’ve found that these rules of thumb apply well in all technical interview/application processes, and hopefully they are useful guidance for students out there. Any other advice from readers?

5 Ways to Improve Your Mobile Submission Form

Here at Bazaarvoice, we’re constantly focused on improving the user experience for our products. From the initial email invitation, to the submission form, to the way in which reviews are presented, we want to make sure that our interfaces are as flexible and intuitive as possible.

Part of my job on the mobile team at Bazaarvoice is to make sure that our products reflect best practices when displayed on mobile devices. In reality, that means running hands-on user tests, A/B testing different designs, and gathering detailed information about the way in which users interact with our products.

Recently, we ran a test with Buckle, one of our partner clients, to experiment with various mobile-friendly submission forms. What follows are some of the takeaways from those experiments.

1. Handle Landscape Mode Gracefully

It is important that users are able to navigate forms easily while in landscape mode. It becomes particularly important to support landscape for form fields that solicit text input. We found that mobile users will, on average, input about 20% fewer words in their reviews than desktop users, so the last thing we want to do is to make it even more difficult to enter text. Many users prefer to type in landscape mode as it provides for a larger keyboard.

2. Make Interactions Easy

Generally, a desktop user with a mouse can interact much more precisely than a mobile user with a clumsy finger. Therefore, it is important to make sure that elements are large enough to be either clicked or tapped. Apple recommends that tappable elements be at least 44×44 pixels. In our experimental form, we intentionally oversized our radio buttons, selection drop-downs and sliders to make the form easier to interact with and to prevent form errors.

Additionally, mobile devices provide a number of different keyboard layouts for different types of inputs. For instance, an input type of “email” might surface the @ symbol to make it more readily accessible. In order to take advantage of the various keyboard layouts, be sure to properly specify the input type on your form elements.

3. Snappy Over Flashy

The first version of our experimental form involved a heavy amount of JavaScript to do things like alpha animations and transforms. While our animations generally ran smoothly on a desktop, they became sluggish on the iPhone and lower end Android devices.

When designing for mobile, be sure to prioritize function over flashiness. Slick animations can greatly improve the usability and “wow” factor of a site, but they should be used sparingly. If necessary, use hardware-accelerated transforms to minimize sluggishness.

4. Choose The Most Efficient Form Path

Overall, our goal is to allow the user to complete our form in the quickest, simplest manner possible. In our testing, we found that a surprising number of users preferred to navigate and exit form elements via the “done” button rather than using the next/previous buttons. This has several interesting consequences.

First, short forms are better than tall forms. While some users “tab” through fields, most users scroll. By minimizing the vertical spacing between elements, users do not need to scroll as far to get to the next field.

Second, for most users, the interaction with a select element will involve 3 clicks: open, select, and done. Therefore, if a user is selecting between just a few options, it is better to use oversized radio buttons than select elements.

5. Provide Instant Feedback

If a user submits an invalid form value such as a malformed email address, provide a clear error message that instructs the user how to fix the error. If possible, provide an error near the offending field. Additionally, once the form field becomes valid, notify the user immediately rather than requiring the user to submit the form again.

For our experimental form, we used the JQuery validation library, which makes basic form validation dead simple. Since it is all client side, it makes validation snappy as well.

Our tests are ongoing, so be on the lookout for more updates soon. Until then, hopefully these insights will be valuable to others as the Internet becomes more mobile-friendly.