Category Archives: Talks

BVIO 2015 Summary and Presentations

Every year Bazaarvoice R&D throws BVIO, an internal technical conference followed by a two-day hackathon. These conferences are an opportunity for us to focus on unlocking the power of our network, data, APIs, and platforms as well as have some fun in the process. We invite keynote speakers from within BV, from companies who use our data in inspiring ways, and from companies who are successfully using big data to solve cool problems. After a full day of learning we engage in an intense, two-day hackathon to create new applications, visualizations, and insights into our extensive our data.

Continue reading for pictures of the event and videos of the presentations.

bvio-logo

This year we held the conference at the palatial Omni Barton Creek Resort in one of their well-appointed ballrooms.

omni

Participants arrived around 9am (some of us a little later). After breakfast, provided by Bazaarvoice, we got started with the speakers followed by lunch, also provided by Bazaarvoice, followed by more speakers.

bvio2015_presentation2 bvio2015_presentation

After the speakers came a “pitchfest” during which our Product team presented hackathon ideas and participants started forming teams and brainstorming.

bvio2015_bigidea bvio2015_bigidea2

Finally it was time for 48 hours of hacking, eating, and gaming (not necessarily in that order) culminating in project presentations and prizes.

bvio2015_hacking bvio2015_hacking2 bvio2015_gaming bvio2015_eating bvio2015_demo bvio2015_demo2

Presentations

Sephora: Consumer Targeted Content

Venkat Gopalan
Director of Architecture & Devops @ Sephora.com

Venkat presented on the work Sephora is doing around serving relevant, targeted content to their consumers in both the mobile and in-store space. It was a fascinating speech and we love to see our how our clients are innovating with us. Unfortunately due to technical difficulties we don’t have a recording 🙁

Philosophy & Design of The BV System of Record

John Roesler & Fahd Siddiqui
Bazaarvoice Engineers

This talk was about the overarching design of Bazaarvoice’s innovative data architecture. According to them there are aspects to it that may seem unexpected at first glance (especially not coming from a big data background), but are actually surprisingly powerful. The first innovation is the separation of storage and query, and the second is choosing a knowledge-base-inspired data model. By making these two choices, we guarantee that our data infrastructure will be robust and durable.

Realtime Bidding: Predicting the future, 10,000 times per second

Ian Clarke
Co-Founder and CTO at OneSpot

Ian has built and manages a team of world-class software engineers as well as data scientists at OneSpot™s. In his presentation he discusses how he applied machine learning and game theory to architect a sophisticated realtime bidding engine for OneSpot™ capable of predicting the behavior of tens of thousands of people per second.

New Amazon Machine Learning and Lambda architectures

Jeff Nun
Amazon Solutions Architect

In his presentation Jeff discusses the history of Amazon Machine Learning and the Lambda architecture, how Amazon uses it and you can use it. This isn’t just a presentation; Ian walks us through the AWS UI for building and training a model.

Thanks to Sharon Hasting, Dan Heberden, and the presenters for contributing to this post.

Output from bv.io

Looks like everyone had a blast at bv.io this year! Thank yous go out to the conference speakers and hackathon participants for making this year outstanding. Here are some tweets and images from the conference:

Continue reading

BV I/O: Peter Wang – Architecting for Data

Every year Bazaarvoice holds an internal technical conference for our engineers. Each conference has a theme and as a part of these conferences we invite noted experts in fields related to the theme to give presentations. The latest conference was themed “unlocking the power of our data.” You can read more about it here.

In this presentation Peter Wang, co-founder and president of Continuum Analytics, discusses data analysis, the challenges presented by big data, and opportunities technology provides to overcome those challenges. He also discusses the importance of performance and visualization as well as advances the concept of “engineering on principle” which he demonstrates by discussing the design of the A-10 Thunderbolt and SAGE computerized command and control center for United States air defense. Peter ends his talk by discussing the Python programming language and its suitability for data analysis tasks. The full talk is below.

BV I/O: Dr. Jason Baldridge – Scaling Models for Text Analysis

Every year Bazaarvoice holds an internal technical conference for our engineers. Each conference has a theme and as a part of these conferences we invite noted experts in fields related to the theme to give presentations. The latest conferences was themed “unlocking the power of our data.” You can read more about it here.

The following video is of Dr. Jason Baldridge, currently an associate professor in the Linguistics Dept. at University of Texas and co-founder of People Pattern. Dr. Baldridge presented on the subject of text analysis. During his hour long talk he identified the desirable traits of a good text analysis function and focused on the problems of performing text categorization tasks given different amounts of labeled data. Big thanks to Dr. Baldridge for his informative presentation. The full talk is below:

BV I/O: Adrian Cockroft

As part of our internal BV I/O conference we’ve previously profiled on the blog, we had Adrian Cockroft, Cloud Architect at Netflix, come give us an overview of a lot of Netflix’s architecture as well as information on their multitude of open source projects and the ways Netflix is engaging the community to contribute. He also talked about some of his personal sources of inspiration when it’s come to things his teams have developed at Netflix. Big thanks to Adrian for taking the time out to come visit with us in Austin. The full talk is below:

SQL, NoSQL,… What’s now? New SQL

It has not been long since the holy war between SQL and NoSQL database technologies faded, and now we see a new contender, NewSQL, rising. What is it? Will it cause another round of the war?

Recently at Bazaarvoice we hosted an informational session on VoltDB, one of the better known NewSQL solutions, with several engineers and technical managers from Austin and San Francisco offices participating in that session.  The question we needed answered: what is VoltDB and why it might be an interesting datastore technology for us?

In short, it may be very good for real-time and near real time analytics, where SQL and ACID compliance are desirable. Personalized ad-serving, marketing campaign real-time effectiveness measuring, and electronic trading systems are some of the reference applications that VoltDB provides.

VoltDB is an in-memory database, which makes it extremely fast. However, this is just a small portion of the story. Besides residing in memory, VoltDB has a few performance improving architectural solutions based on research by well known database technologists, including the famous Michael Stonebraker, who was involved in the creation of Ingres, Postgres, and Vertica.

The creators of VoltDB wanted to preserve all the good features of a traditional RDBMS like SQL, ACID compliance, and data integrity, but they also wanted to drastically improve performance and scalability. All the modern commercial and open source RDBMS are built on the same principles, which were created more than 40 years ago for the era of small memory and slow disks. The researchers analyzed the bottlenecks of a traditional RDBMS and found that at high load about 88% of the server capacity is wasted on the traditional RDBMS overheads and only about 12% of the capacity is used for doing  actual useful work.

Screen Shot 2013-07-01 at 2.41.08 PM

VoltDB’s architectural solutions eliminate the traditional RDBMS overheads:

  • The computing power is brought close to data. Data partitions have affinity to specific memory regions and CPU cores ( virtual nodes) in a shared-nothing cluster;
  • Data is located in main memory which eliminates buffer management overhead, let alone access to painfully slow disks;
  • Single-threaded virtual nodes operate on partitions autonomously to eliminate locking and latching overhead;
  • Combination of continuous snapshots and command logging instead of writing db blocks to disks and transaction log for durability drastically reduces logging overhead.

These architectural solutions allow combining all the advantages of a traditional RDBMS with scalability features usually associated only with NoSQL databases: automatic sharding across a shared-nothing cluster, eliminating many overheads, automatic replication and log-less durability for high availability. With these features, VoltDB claims to be one of the fastest databases on the market today.

VoltDB’s impressive performance is illustrated by the results of the TPC-C-like benchmark below, in which VoltDB and a well known OLTP DBMS were compared running the same test on identical hardware (Dell R610, 2x 2.66Ghz Quad-Core Xeon 5550 with 12x 4GB (48GB) DDR3-1333 Registered ECC DIMMs, 3x 72GB 15K RPM 2.5in Enterprise SAS 6GBPS Drives):

Screen Shot 2013-07-01 at 3.09.06 PM

So, will the rising NewSQL technology cause another religious database war? Probably not. VoltDB positions their database as a niche database doing a few things really well. It doesn’t try to be a one size fits all database, and VoltDB’s philosophy is “an organization should use a few datastore technologies, using each for the case where it plays the best.” For example, you cannot use VoltDB if your data does not fit into the combined memory of your cluster.  Long leaving transactions are also not supported on VoltDB. 

Hopefully, if some team has a need for a very fast but consistent, ACID and SQL compliant database for a new project, they would consider VoltDB as as an option.