Summer 2013 Intern Science Fair

large_groupEvery year at Bazaarvoice we bring on a new class of Summer interns and put them to work creating innovative (and educational) projects. At the beginning of the Summer interns choose from a list of projects and teams that interest them. From there they are embedded in a team where they spend the rest of the Summer working on their chosen project with help from seasoned professional mentors. At the end of the Summer they set up demos and we invite the whole company to visit with them and see what they have accomplished. Read on to learn about our interns and their fantastic and innovative projects.

Below are some of the 2013 intern science fair projects in the creators own words:

Devin Carr
School: Texas A&MField of study: Electrical Engineering

ccarr_intern_scifairI built a product review template using the Bazaarvoice Platform API as well as some of the latest front-end libraries and architectural patterns. It provides a best practice and simplistic model to assist potential/existing clients and their developers with a base to develop an efficient product page integrated with Bazaarvoice Ratings & Reviews.


Lewis Ren
School: UC BerkeleyField of study: Electrical Engineering & Computer Science

Genie is a product recommendation tool based on user clustering. The idea is that you want to give every user the possibility of a product within the network as a recommendation; however, be able to rank them in such a way that is most relevant to the specific user. By clustering users based on modularity we can determine how big of an impact a specific user will have on the rest of the network. For example, a high modularity node that makes a decision will have a stronger impact on its neighbors, but will propagate out slower and have a small to negligible effect on outer nodes. Similarly, a low modularity node that makes a decision will have a lesser impact, but will radiate outwards much more quickly.


Matt Leibowitz
School: University of DallasField of study: Physics / Computer Science

My project is called Flynn, which is a web page for accessing documents out our data store. Flynn allows developers to bypass a fairly long and involved process (up to ten minutes of looking up documentation and performing curl requests) with a quick and simple web interface.


Perry Shuman

My project is a bookmarklet built to provide debug information about a page with injected BV content: information that could help a client fix implementation issues that they might have, without having to know the inner workings of our code. It also allows previously hidden errors to be displayed to a client, giving them a first line of information to debug broken pages.


Ralph Pina
School: UT AustinField of study: Computer Science

rpina_intern_scifair“Over this past summer I interned with Bazaarvoice’s mobile team. In its effort to make it easier to integrate BV’s services into every platform and outlet their clients use it has build and distributed mobile SDKs (Software Development Kits) to display and submit data to the BV API. I worked to improve the Android SDK, and along with another intern, Devin Carr, wrote a new Windows Phone 8 SDK. I also wrote 2 Windows Phone 8 sample applications to demonstrate how to implement the SDK in an app. Afterwards, we opened sourced all our mobile SDKs under the permissive Apache 2.0 license.”


Rishi Bajekal
School: University of Illinois at Urbana-ChampaignField of study: Computer Science

Built a RESTful web service for serving photos from the data stack and worked on the Conversations 2013 API as part of the that team.


Ryan Morris
School: McCombs School of Business at the University of Texas at AustinField of study: Management Information Systems

rmorris_intern_scifairMy project’s theme was predictive modeling and focused around using a client’s web metrics, review count, and average rating to predict orders for products. To create the predictive model, I pulled the client’s visits, orders, and revenue for a certain time period from their web analytics account. I then pulled the client’s number of approved reviews and average rating for each product from the Bazaarvoice database using mySQL. Finally, I ran the data in the statistical program R to come up with a predictive model. With this model, I was able to analyze the effects that varying review count and average rating had on predicted orders for a product.


Schukey Shan
School: University of WaterlooField of study: Software Engineering

sshan_intern_scifairMy project was the brand search summary, which provides both an API and a UI to summarize data we collect on brands. For example, if we search for a brand “Acme”, the API looks for all the brand names that contain the word “Acme”, creates a composite of the matched brand values, and returns the count for pageviews, ugc mpressions, and unique visitors for each client of each brand value in the composite. This would be useful as an overview of how brand values look like over the network.


Steven Cropp
School: Rensselaer Polytechnic InstituteField of study: Computer Systems Engineering and Computer Science

This Summer I worked on an xml file validation service. Users can set up “rules” that dictate how xml files must be formed, and then apply those rules to our clients xml feeds to generate user friendly reports that outline what is wrong with the xml, and some details about how to fix any issues. This should help smooth out the process our clients go through when importing feeds, saving the support team and our clients time.


Thomas Poepping
School: Carnegie Mellon UniversityField of study: Computer Science

tpoepping_intern_scifair1. A tool to inject spreadsheets of UGC into the system.
2. An addition to the web app that generates a payroll report for all moderators, streamlining the payroll process.
3. A pilot WordPress plugin that moderates WordPress comments, then allows administrators to act on rejected comments.

BV I/O: Imagination unlocked

What do you get when you lock 100+ engineers, product managers, designers and other techies in a building for 2 days and ask them to come up with new and creative ways to “unlock the power of our data”? Well, I could tell you, but then I would have to… yeah that’s top secret awesome product roadmap stuff now. (and even redacted)

IMG_7745a_redacted-sm

As an extension to our BV.IO internal tech conference that I recently blogged about, we held an engineering wide Hackathon for everyone in our technical community to go nuts with our data and try to come up with some of the next big ideas for Bazaarvoice. We had over 100 folks participate, form teams of 3, and after 2 days, we had 31 really cool prototypes that they demo’d to the entire company. It was such a great experience to see so many smart and passionate people singularly focused on innovation and building some cool new ideas and value for Bazaarvoice. Here is a quick summary of how things went down:

Tuesday = BV.IO tech conference & speakers, present Hackathon ideas
Wednesday = Form teams & brainstorm
Thursday = Hackathon & XBOX Call of Duty: Black Ops 2 tourney
Friday = Pancake breakfast, Hackathon, Demos, Prizes
Saturday = Sleep

We started the hackathon as a continuation of the BV.IO event at the Alamo Drafthouse, where we had anyone with a proposal come on stage and try and sell the idea. Think about it like trying to sell your idea to a group of engineers to come work with you on your startup idea. After we heard all the interesting ideas, everyone went off and self formed teams, and started brainstorming.

IMG_7747a_neville-sm

On Thursday, we kicked off the Hackathon, and it was eerily quiet in engineering. Everyone split off into small teams and was heads down coding or held up in a breakout room whiteboarding designs. We had tons of food and snacks brought in for breakfast, lunch and dinner to keep everyone energized, and by the end of the day everyone had made some amazing progress and were ready to blow off some steam…and by “blow off”, I mean blow up, and by “steam”, I mean COD:BO2.

IMG_7869a_evan-sm

We set up 8 portable flat screen monitors and 8 Xbox 360s, and we got our game on. It was so simple and so much fun, I don’t know why we hadn’t done this earlier. It was a huge hit, and we are thinking about how we can keep that set up all the time. Everyone self rated their skill level, and we balanced teams for a round robin Call of Duty: Black Ops 2 tournament.

BlackOpsBreak

Friday morning, all the managers got together to show our appreciation for the team with pancakes and bacon. I really think there is no better way to show your appreciation than with bacon.

IMG_7860a_flippin-sm

“That’s way too much bacon”‘, said no one ever…

Coding continued throughout the day, and at 3pm it was pencils down, and time for demos. The company filled the All Hands, and it was rapid fire through 31 demos.

IMG_7886a_allhands-sm

IMG_7908a_demo-sm

We even had a really slick Google Glass hackathon project.We even had a really slick Google Glass hackathon project.

The energy was awesome, the ideas were awesome, and the conversations it inspired across the entire company was awesome. After all the teams had demo’d, the company voted, and winners across several categories were selected. We had a few cool prizes for the winners like iPad Minis, Parrot Drones, Rokus, cash money and of course totally custom Lego trophies.

IMG_7899a_prizes-sm

If you haven’t done a full on Hackathon at your company in a while, I highly recommend it. Every time we do it here, I am amazed by the creativity, the innovative ideas and solutions that are created in such a small time. And the ripple effect that happens from that continues for months as the business internalizes the ideas and roadmaps and direction start to change based on those ideas. The key is to not let the ideas die on the vine. Champion them, advocate them, and push them forward, and you too can change the world one authentic conversation at a time.

IMG_7962a_cto_approved-smThis hackathon is CTO approved.

SQL, NoSQL,… What’s now? New SQL

It has not been long since the holy war between SQL and NoSQL database technologies faded, and now we see a new contender, NewSQL, rising. What is it? Will it cause another round of the war?

Recently at Bazaarvoice we hosted an informational session on VoltDB, one of the better known NewSQL solutions, with several engineers and technical managers from Austin and San Francisco offices participating in that session.  The question we needed answered: what is VoltDB and why it might be an interesting datastore technology for us?

In short, it may be very good for real-time and near real time analytics, where SQL and ACID compliance are desirable. Personalized ad-serving, marketing campaign real-time effectiveness measuring, and electronic trading systems are some of the reference applications that VoltDB provides.

VoltDB is an in-memory database, which makes it extremely fast. However, this is just a small portion of the story. Besides residing in memory, VoltDB has a few performance improving architectural solutions based on research by well known database technologists, including the famous Michael Stonebraker, who was involved in the creation of Ingres, Postgres, and Vertica.

The creators of VoltDB wanted to preserve all the good features of a traditional RDBMS like SQL, ACID compliance, and data integrity, but they also wanted to drastically improve performance and scalability. All the modern commercial and open source RDBMS are built on the same principles, which were created more than 40 years ago for the era of small memory and slow disks. The researchers analyzed the bottlenecks of a traditional RDBMS and found that at high load about 88% of the server capacity is wasted on the traditional RDBMS overheads and only about 12% of the capacity is used for doing  actual useful work.

Screen Shot 2013-07-01 at 2.41.08 PM

VoltDB’s architectural solutions eliminate the traditional RDBMS overheads:

  • The computing power is brought close to data. Data partitions have affinity to specific memory regions and CPU cores ( virtual nodes) in a shared-nothing cluster;
  • Data is located in main memory which eliminates buffer management overhead, let alone access to painfully slow disks;
  • Single-threaded virtual nodes operate on partitions autonomously to eliminate locking and latching overhead;
  • Combination of continuous snapshots and command logging instead of writing db blocks to disks and transaction log for durability drastically reduces logging overhead.

These architectural solutions allow combining all the advantages of a traditional RDBMS with scalability features usually associated only with NoSQL databases: automatic sharding across a shared-nothing cluster, eliminating many overheads, automatic replication and log-less durability for high availability. With these features, VoltDB claims to be one of the fastest databases on the market today.

VoltDB’s impressive performance is illustrated by the results of the TPC-C-like benchmark below, in which VoltDB and a well known OLTP DBMS were compared running the same test on identical hardware (Dell R610, 2x 2.66Ghz Quad-Core Xeon 5550 with 12x 4GB (48GB) DDR3-1333 Registered ECC DIMMs, 3x 72GB 15K RPM 2.5in Enterprise SAS 6GBPS Drives):

Screen Shot 2013-07-01 at 3.09.06 PM

So, will the rising NewSQL technology cause another religious database war? Probably not. VoltDB positions their database as a niche database doing a few things really well. It doesn’t try to be a one size fits all database, and VoltDB’s philosophy is “an organization should use a few datastore technologies, using each for the case where it plays the best.” For example, you cannot use VoltDB if your data does not fit into the combined memory of your cluster.  Long leaving transactions are also not supported on VoltDB. 

Hopefully, if some team has a need for a very fast but consistent, ACID and SQL compliant database for a new project, they would consider VoltDB as as an option.

BV I/O: Unlocking the power of our data

At Bazaarvoice, we strongly believe that our people are our most important asset. We hire extremely smart and passionate people, let them loose on complex problems, and watch all the amazing things they create. We had another opportunity to see that innovation engine in full effect last week at our internal technical conference and 2 day hackathon.

Every year we hold an internal technical conference for our engineers and technical community. If you are lucky enough to have been at Bazaarvoice, you remember our conference last year called BV.JS which was all about front end UI and javascript, and in years’ past we did Science Fairs. Last year at BV.JS we were focused on redesigning our consumer facing ratings and reviews product (Conversations) so we gathered some amazing javascript gurus such as Paul Irish (@paul_irish), Rebecca Murphey (@rmurphey), Andrew Dupont (@andrewdupont), Alex Sexton (@SlexAxton) and Garann Means (@garannm) to school us on all the latest in javascript.

This year our event was called BV.IO and we are focused on “unlocking the power of our data”, so we asked some great minds in big data analytics and data visualization to come inspire our engineering team.

IMG_7798ab_group
The event kicked off with a day at the Alamo Drafthouse. Bazaarvoice is powered by tacos, so of course there were tons of breakfast tacos to get us ready for a fun filled day of learning and mind opening presentations, and a colorful pants competition, but I digress and will get to that in a minute.

IMG_7752ab_adrian
First up was Adrian Cockcroft (‪@adrianco‬), cloud architect from Netflix. We are big fans of Netflix’s architecture and we use and have added to several of their open source packages. Some of the projects we use are Curator, Priam and Astyanax. Adrian gave us an update on some of the new advancements in Netflix’s architecture and scale as well as details on their new open source projects. Netflix is also running an open source competition called NetflixOSS and they have some cool prizes for the best contributions to their projects. The competition is open until September 15, 2013, so get coding.

Jason Baldridge (‪@jasonbaldridge‬), Ph.D. and associate professor in Computational Linguistics at the University of Texas, presented on scaling models for text analysis. He shared some really interesting insights into things that can be done with geotagged, temporal, and toponym data. Nick Bailey (‪@nickmbailey‬), an engineer at DataStax, presented on Cassandra best practices, new features, and some interesting real world use cases. And Peter Wong (‪@pwang‬), Co-founder and President of Continuum Analytics, gave a really entertaining talk about planning and architecting for big data as well as some interesting python packages for data analysis and visualization.

Ok, and now back to the most important aspect of the day, the Colorful Pants Competition. Qingqing, one of our amazing directors of engineering, organized this hilarious competition. Can you guess who was the winner?
IMG_7766ab_cpc

We really enjoyed all the speakers, and we know that you will too, so we will be sharing their presentations on this blog in the coming days and weeks.

Check back regularly for the videos.

Jolt released to the world

We are pleased to announce a new open source contribution, a Java based JSON to JSON transformation tool named Jolt.

Jolt grew out of a BV Platform API project to migrate the backend from Solr/MySql to Cassandra/ElasticSearch.  As such, we were going to be doing a lot of data transformations from the new ElasticSearch JSON format to the BV Platform API JSON format.

Prior to Jolt, there were 3 general strategies for doing JSON to JSON transforms :

  1. Convert to XML, use XSLT, convert back to JSON
  2. Use your input JSON and a template language to build your output JSON
  3. Write custom code

Those options were rather unpalatable, so we went with option “4”, write reusable custom code.

The key insight was that there are actually separable concerns when doing a transform, and that part of the reason the XSLT or template approaches are unpalatable, is that they force you to deal with them all together.

Jolt tackles each separate concern individually :

  1. Identify the pieces of the input data that you care about and place them in the output JSON
    • Jolt provides a transform, “shift”, that has its own JSON based declarative DSL (domain specific language)
  2. Make sure the output JSON looks correct ( apply defaults to the output JSON )
    • Jolt provides a transform, “default”, with its own JSON based declarative DSL
  3. Handle all the JSON text formatting (comma, closing curly brackets etc)
    • Jolt operates on “hydrated” JSON data (Map<String,Object> and List<Object>) and leverages the Jackson library to handle serialization / JSON text formatting
  4. Verify the transform for data and format correctness
    • Jolt provides a test tool called Diffy so that you can unit test your transforms for data and format correctness
    • For format correctness, this is not as good of an answer as an xml dtd is, but you could pull in the JSON schema if you wanted
  5. Perform arbitrary custom data manipulations like adding fields together or performing date conversions
    • Jolt provides an interface where you can implement your own custom logic to be run in series with the other transforms

The code is now available at Github, and jar artifacts are now being published to Maven central.

How Bazaarvoice Weathered The AWS Storm

Greetings all! In the world of SaaS, wiser men than I have referred to Operations as the “Secret Sauce” that distinguishes you from your competition. As manager of one of our DevOps teams, I wanted to talk to you about how Bazaarvoice uses the cloud and how we engineer our systems for maximum reliability.

You may have heard about the AWS Storm and the Leapocalypse, two events that made the weekend of June 29th last year a sad one for many Internet companies. Electrical storms in the Northeast knocked out one of Amazon Web Service’s availability zones in their US East region Friday night, knocking many services off the air (Netflix, Mozilla, Pinterest, Heroku, LinkedIn, Foursquare, Yelp, Instagram, Reddit, and many more). Then on Saturday a “leap second” caused Java virtual machines across the planet to freak out and peg CPUs. Guess what two technologies we use heavily here at Bazaarvoice? That’s right, Amazon Web Services and Java.

Here’s a great graph from alerting service PagerDuty showing the impact these two events had across the Internet:

AWS-outage-graph-small

But here’s the Keynote graph we use to continually monitor our customer facing services for the same time period:

bvkeynote

It’s really five different graphs for a set of our major customers overlaid, but it’s hard to tell because they are all flatlined on top of each other. That’s right – we had 100% availability on all our properties for the entire crisis period.

Were we “untouched?” By no means. We lost 77 servers, 37 of which were production servers, during this time. But by architecting for resilience, we have constructed a system to avoid customer impact even when an outage hits our systems like a shotgun blast.

As you know from previous blog posts, we’re a big Solr and mySQL shop. We shard our customers into distinct “clusters” for scalability (we’re up to seven). But then each cluster is mirrored into the AWS East region, AWS West region, and Rackspace. Inside each region, we make use of multiple availability zones and levels of load balancing (haproxy in the cloud, F5 in Rackspace). Here’s the view inside one region (1/3 of a cluster):

cluster

Then we use Neustar GTM for DNS-based traffic balancing across all three parts of the cluster in AWS East, AWS West, and Rackspace. This means we can lose zones within a region, or even a full region, without downtime – though in a case like this, we definitely had to expand our capacity in AWS West when AWS East went down so that we wouldn’t have performance issues, and we did have engineers working over the weekend to clean up the “debris” left over from the outage. We are working on engineering our clusters to dynamically scale up and clean up behind themselves to avoid that manual work.

But what about the data, you ask? Well, the other key to this setup is that we use a message queue for all writes instead of writing synchronously to the database. This gives us a huge amount of flexibility in how and where we process them – we use a master/slave relationship for each cluster where the mySQL and Solr are mastered out of Rackspace, but with that architecture if Rackspace were completely down all that does is delay content submission – nothing is lost and the user experience is still good.

This architecture also allows us to scale quickly and gives us a lot of control over shifting traffic around to meet specific challenges (like Black Friday). More on that in a later post, but I hope this gives you some insight into how we make our service as reliable as we can!

Technical Talk: How Bazaarvoice is using Cassandra and Elastic Search

In late May the Bazaarvoice team was delighted to speak again as a part of Database Month in New York City, and we were excited to speak about our recent work with Cassandra & Elastic Search. We discussed our goals for replacing the Bazaarvoice data infrastructure, as well as our hopes for the new system, and then we dove into the internal details of how we’re using Cassandra and Elastic Search to handle the scale needed by the myriad Bazaarvoice applications. We had a lot of fun at the talk as well as answering questions for quite some time aftewards, and we’re always excited to talk about this even more.

Hello, World: Thoughts from an Undergrad Intern

Last summer I was given an internship opportunity in the R&D department of Bazaarvoice as a software engineer. Having only finished my freshman year in college, I had no idea what to expect from a tech company of this size. I would have never guessed that on my first day I would be handed a macbook pro with better specs than any computer I had used before. I certainly didn’t anticipate anything like the All Hands conference downtown, or the Seven Year anniversary party. And although I suspected I would pick up a few new skills during my employment, I could have never imagined the breadth of my learning over the past few months.

My first week was spent setting up my development environment, and performing the dev onboarding — writing a game of tic-tac-toe in GWT. For those of you who do not know, GWT is the Google Web Toolkit, which Wikipedia identifies as “an open source set of tools that allows web developers to create and maintain complex JavaScript front-end applications in Java.” Virtually all of my programming experience is in Java, so I was grateful that I would be able to apply that experience. However, all of the programs I had written before had been completely text based. None of my previous projects involved a graphical interface, even one as simple as tic-tac-toe. I recall each day of my first week following the same pattern — I arrived in the morning feeling overwhelmed at the alien concept I was supposed to learn that day; by the late afternoon I had struggled through enough tutorials and examples to feel that I had a decent understanding of the new skill, and before leaving the office I would begin to look at the next subject to learn, and would feel overwhelmed all over again.

Overall I feel that my first week was analogous to any of the computer science classes I have taken in school, albeit at an accelerated pace. I would write some code, and once I felt comfortable with my progress, I would show it to my mentor, who would point out the things I had done correctly, and offer advice for improvement in the areas where it was needed. Like any assignment for school, the tic-tac-toe project was exclusively for my own benefit; no one was going to use my code for anything, but there is no better teacher than practice. The project served its purpose, because by the end of my first week I was developing code for the product.

Development was a totally new experience for me. All of my previous programming involved starting from scratch, and at completion, every line of code was my own. At Bazaarvoice, I was a developer jumping into a code base thousands of lines deep, and there were a dozen other developers constantly shaping it. It was an awesome experience working as a member of a team, rather than working on a project on my own. As a team member I not only gained experience in programming, but also in the team dynamics of software development. I feel that I was able to be a boon to my team, having contributed code in the development of several features. Because I was writing code for features that the team was familiar with, it was very easy to get help from the other members of the team. I was only one degree of separation away from the developer who had the answer to one of my many questions; if the first person I asked did not know the answer, they could immediately direct me to someone who did. This high availability of assistance was a key factor in my ability to put useful code into the product. The code review system — in which every line of code was looked at by at least one other developer before entering the product’s codebase — ensured that my code conformed to the proper coding practices. Because of this, I not only improved the functionality of my code, but also the style.

My education went beyond the software itself. I attended all of the team meetings, which taught me about the “real life” side of software development. I attended meetings which showed me how the developers made sure that the product they were producing was in line with the needs of the customers. Every week involved multiple meetings to estimate the time it would take to implement various features, so that the team’s progress could be projected into the future,  and we could know how long it would take for the product to have certain degrees of functionality. The meetings were one of the most informative aspects of my internship, because they addressed issues that I had never even considered as a CS student.

This was my first internship, so while I can’t offer a personal comparison between working on a personal project and working as a team member, I can say that last summer was an enriching opportunity, which not only expanded my computer science knowledge, but taught me first hand what it means to work as a software engineer, and in my mind, that’s the most valuable thing I can get out of an internship.

This summer I have returned to Bazaarvoice, and this time I will be working on an independent project from the ground up. I am excited to see how this internship compares and contrasts to my first.

Mashery-powered Bazaarvoice Developer portal is LIVE!

Welcome to the Mashery-powered Bazaarvoice Developer portal. We strive to give you the tools you need to develop cutting-edge applications on the Bazaarvoice platform.

Some changes you’ll notice:

  • You no longer have to login to see documentation. Just click the Expand icon (expand) to drill down to the information you need.
  • If you want to request an API key or need to contact us with a support question, you will need to create and use a Mashery ID (or use your existing one if you access other Mashery-powered APIs). Your current Bazaarvoice developer portal ID will no longer work.

Note that none of your existing API keys are affected by this transition. They will continue to work without interruption.

Thanks for your support of the Bazaarvoice platform.

Bazaarvoice developer portal moving to Mashery effective 3/1/13

We are pleased to announce that on March 1, 2013, we are moving our developer portal hosting to Mashery.

What does this mean to you, our developer community:

  • None of your existing API keys will be affected by this transition. They will continue to work without interruption.
  • You no longer need to log in to see API documentation or any public content on the site.
  • To request a new API key, you will need to register with a Mashery ID (or use your existing one if you access other Mashery-powered APIs). Your current Bazaarvoice developer portal ID will no longer work after March 1st.
  • There is no change to the developer portal URL (http://developer.bazaarvoice.com).

We know that the increased security and faster key generation will enhance your development experience. As always, thank you for developing on the Bazaarvoice platform. We look forward to seeing your applications.