Tag Archives: nodejs

Getting Started with Dependency Security and NodeJS

Internet security is a topic that receives more attention every day.  If you’re reading this article in early 2018, issues like Meltdown, Specter and the Equifax breach are no doubt fresh in your mind.

Cybersecurity is a massive concern and can seem overwhelming.  Where do you start?  Where do you go?  What do you do if you’re a small application team with limited resources?  How can you better engineer your projects with security in mind?

Tackling the full gestalt of this topic, including OWASP and more is beyond the scope of this article. However, if you’re involved in a small front end team, and if you leverage services like AWS, have gone through the OWASP checklist and are wondering, ‘what now?’, this article is for you (especially if you are developing in NodeJS, which we’ll focus on in this article).

Let’s Talk About Co(de) Dependency:

A tiger and her baby pigs

We’re talking about claiming dependents right?

 

 

 

 

 

 

 

 

 

 

 

One way we can quickly impact the security of our applications is through better dependency management.  Whether you develop in Python, JavaScript, Ruby or even compiled languages like Java and C#, library packages and modules are part of your workflow.

Sure, it’s 3rd party code but that doesn’t mean it can’t have a major impact on your project (everyone should remember the Leftpad ordeal from just a few years ago).

As it turns out, your app’s dependencies can be your app’s greatest vulnerability.

Managing Code you Didn’t Write:

Below, we’ll outline some steps you can take to tackle at least one facet of secure development – dependency management.  In this article, we’ll cover how you can implement a handful of useful tools into a standard NodeJS project to protect, remediate and warn against potential security issues you may introduced into your application via its dependency stack.

Your code has issues

We’ve all had to deal with other people’s problematic code at one time or another

 

 

 

 

 

 

 

 

 

 

Using Dependency-Check:

Even if you’re not using NodeJS, if you are building any project that inherits one or more libraries, you should have some form of dependency checking in place for that project.

For NodeJS apps, Dependency-Check is the easiest, lowest-hanging fruit you can likely reach for to better secure your development process.

Dependency-Check is a command-line tool that can scan your project and warn you of any modules being included in your app’s manifest but not actually being utilized in any functioning code (e.g. modules listed in your package.json file but never ‘required’ in any class within your app).  After all, why import code that you do not require?

Installation is a snap.  From your project directory you can:

Npm install –g dependency-check

dependency-check package.json –unused

If you have any unused packages, Dependency-Check will warn you via a notice such as:

Fail! Modules in package.json not used in code: chai, chai-http, mocha

Armed with a list of any unused packages in hand, you can prune your application before pushing it to production.  This tool can be triggered within your CI process per commit to master or even incorporated into your project’s pre-commit hooks to scan your applications further upstream.

RetireJS:

‘What you require, you must retire’ as the saying goes – at least as it does when it comes to RetireJS.  Retire is a suite of tools that can be used for a variety of application types to help identify dependencies your app has incorporated that have known security vulnerabilities.

Retire is suited for JavaScript based projects in general, not just the NodeJS framework.

For the purpose of this article, and since we are dealing primarily with the command line, we’ll be working with the CLI portion of Retire.

npm install -g retire

then from the root of your project just do:

retire 

If you are scanning a NodeJS project’s dependencies only, you can narrow the scan by using the following arguments:

retire  —nopdepath 

By default, the tool will return something like the following, provided you have vulnerabilities RetireJS can identify:

ICanHandlebarz/test/jquery-1.4.4.min.js 
↳ jquery 1.4.4.min has known vulnerabilities: 
http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2011-4969 
http://research.insecurelabs.org/jquery/test/ 
<a href="http://bugs.jquery.com/ticket/11290">
http://bugs.jquery.com/ticket/11290</a>

Retire scans your app and compares packages used within it with an updated online database of packages/package versions that have been identified by the security world as having exploitable code.

Note the scan output above identified that the version of jquery we have imported via the ICanHandlebarz module has a bug that is exploitable by hackers (note the link to CVE-2011-4969 – the exploit in question).

Armed with this list, we can identify any unnecessary dependencies within our app, as well as any exploits that could be utilized given the code we are currently importing and using.

NSP:

There are a multitude of package scanning utilities for NodeJS apps out on the web. Among them, NSP (Node Security Platform) is arguably the most popular.

Where the previously covered Retire is a dependency scanner suited for JavaScript projects in general, NSP, as the name implies, is specifically designed for Node applications.

The premise is identical: this command line tool can scan your project’s package manifest and identify any commonly known web exploits that can be leveraged given the 3rd party packages you have included in your app.

Utilizing NSP and Retire may sound redundant but, much like a diagnosing a serious condition via medical professional, it’s often worth seeking a second opinion.  It’s also equally easy to get up and running with NSP:

npm install -g nsp

nsp check —reporter &lt;Report type (e.g. HTML)&gt;

Running the above within the root of your node application will generate a report in your chosen output format

Again, wiring this up into a CI job should be straightforward.  You can even perform back-to-back scans using both Retire and NSP.

Snyk:

Yes, this is yet another dependency scanner – and like NSP, another one that specifically NodeJS oriented but with an added twist: Snyk.

Snyk home page

Snyk home page

 

 

 

 

 

 

 

 

 

With Retire and NSP, we can quickly and freely generate a list of vulnerabilities our app is capable of having leveraged against it.  However, what of remediation?  What if you have a massive Node project that you may not be able to patch or collate dependency issues on quickly?  This is where Snyk comes in handy.

Snyk can generate detailed and presentable reports (perfect for team members who may not be elbow deep in your project’s code).  The service also provides other features such as automated email notifications and monitoring for your app’s dependency issues.

Typical Snyk report

Typical Snyk report

 

 

 

 

 

 

 

 

 

 

 

Cost: Now, these features sound great (they also sound expensive).  Snyk is a free service, depending on your app’s size.  For small projects or open source apps, Snyk is essentially free.  For teams with larger needs, you will want to consult their current pricing.

To install and get running with Snyk, first visit https://snyk.io and register for a user account (if you have a private or open source project in github or bitbucket, you’ll want to register your Snyk account with your code management tool’s account.

Next, from your project root in the command line console:

npm install -g snyk

snyk –auth

Read through the console prompt.  Once you receive a success message, your project is now ready to report to your Snyk account.  Next:

Snyk test

Once Snyk is finished you will be presented with a URL will direct you to a report containing all vulnerable dependency information Snyk was able to find regarding your project.

The report is handy in that it not only identifies vulnerabilities and the packages they’re associated with but also steps for remediation (e.g. – update a given package to a specific version, remove it, etc.).

Snyk has a few more tricks up its sleeve as well.  If you wish to dive headlong into securing your application. Simply run:

Snyk wizard

This will rescan your application in an interactive wizard mode that will walk you through each vulnerable package and prompt you for actions to take for each vulnerable package.

Note, it is not recommended to use Snyk’s wizard mode when dealing with large scale applications due to it being very time consuming.

It will look like this:

Snyk monitor output

Snyk monitor output

 

 

 

 

Additionally, you can utilize Snyk’s monitoring feature which, in addition to scanning your application, will send email notifications to you or your team when there are updates to vulnerable packages associated with your project.

Putting it all together in CI:

Of course we’re going to arrange these tools into some form a CI instance.  And why not?  Given we have a series of easy-to-implement command line tools, adding these to part of our project’s build process should be straight forward.

Below is an example of a shell script we added as a build step in a Jenkins job to install and run each of these scan tools as well as output some of their responses to artifacts our job can archive:

#Installs our tools
npm install -g snyk
npm install -g dependency-check
npm install -g retire

#Runs dependency-check and outputs results to a text file
dependency-check ./package.json >> depcheck_results.txt

#Runs retire and outputs results to a text file
retire --nodepath node_modules >> retire_results.txt

#Authenticates use with Snyk
snyk auth $SNYK_TOKEN 

#Runs Snyk's monitor task and outputs results to a test file 
snyk monitor >> snyk_raw_monitor.txt

#Uses printf, grep and cut CLI tools to retrieve the Snyk report URL
#and saves that URL to a text file
printf http: >> snykurl.txt
grep https snyk_raw_monitor.txt | cut -d ":" -f 2 >> snykurl.txt
cat snykurl.txt

#Don't forget the set your post-job task in your CI service to save the above .txt
#files as artifacts for our security check run.

Note – depending on how you stage your application’s build lifecycle in your CI service, you may want to break the above script up into separate scripts within the job or separate jobs entirely. For example, you may want to scan your app using Dependency-Check for every commit but save your scan of your application using NSP or Snyk for only when the application is built nightly or deployed to your staging environment.  In which case, you can divide this script accordingly.

Note on Snyk:

In order to use Snyk in your CI instance, you will need to reference your Snyk authentication key when executing this tool (see the API key reference in the script).

This key can be passed as a static string to the job.  Your auth key for Snyk can be obtained by logging in at Snyk.io and retrieving your API key from the account settings page after clicking on the My Account link from Snyk’s main menu.

Next Steps:

This is only the tip of the iceberg when it comes to security and web application development.  Hopefully, this article has given you some ideas and hits of where to start in writing more secure software.  For more info on how to secure your Node-based apps, here’s some additional reading to check out:

Telling Your Data To “Back Off!” (or How To Effectively Use Streams)

Foreword

Our Curations engineering team makes heavy use of serverless architecture. While this typically gives us the benefit of reduced costs, flexibility, and rapid development, it also requires us to ensure that our processes will run within the tight memory and lifecycle constraints of serverless instances.

In this article, I will describe an actual case where a scheduled job had started to fail, the discovery of the root cause and the refactor that resolved the issue. I will assume you have at least a rudimentary knowledge of Node JS and Amazon Web Services.

Content Export

Months previously, we built a scheduled job using AWS Lambdas that would kick off an export of our social content every 24 hours. In a nutshell, its purpose was to:

  1. Query for social content documents stored in a MongoDB server.
  2. Transform each document to a row in CSV format.
  3. Write out the CSV rows to a file in S3.

The resulting CSV files were meant for ingestion into an analytics tool to measure web site conversion impact.

It all worked fine for months. Then recently, this job began to fail.

Drowning in Content

For your viewing pleasure, here is an excerpted, condensed section of the main code:

function runExport(query) {
  MongoClient.connect(MONGODB_URL)
    .then((db) => {
      let exportCount = 0;
      db
        .collection('content')
        .find(query)
        .forEach(
          // iterator function to perform on each document
          (content) => {
            // transform content doc to a csv row, then push to S3 writer
            s3.write(csvTransform(content));
            exportCount++;
          },
          // done function, no more documents
          () => {
            db.close();
            s3.close();
            logger.info(`documents exported: ${exportCount}`);
          }
        );
    });
}

Each document from the database query is transformed into a CSV row, then pushed to our S3 writer which buffered all content until an s3.close(). At this point, the CSV data would be flushed out to the S3 destination bucket.

It was obvious from this implementation that heap usage would grow unbounded, as documents from the database would be loaded into resident memory as CSV data. As we aggregated content over the previous months, the data pushed into our export process began to generate frequent “out of heap memory” errors in our logs.

Memory Profile

To gain visibility into the memory usage, I wrote a quick and dirty MemProfiler class whose purpose was to sample memory usage using process.memoryUsage(), collecting the maximum heap used over the process lifetime.

class MemProfiler {
  constructor(collectInterval) {
    this.heapUsed = {
      min : null,
      max : null,
    };
    if (collectInterval) {
      this.start(collectInterval);
    }
  }

  start(collectInterval = 500) {
    this.interval = setInterval(() => this.sample(), collectInterval);
  }

  stop() {
    if (this.interval) {
      clearInterval(this.interval);
    }
  }

  reset() {
    this.stop();
    this.heapUsed = {
      min : null,
      max : null,
    };
  }

  sample() {
    const mem = process.memoryUsage();
    if (this.heapUsed.min === null || mem.heapUsed < this.heapUsed.min) {
      this.heapUsed.min = mem.heapUsed;
    }
    if (this.heapUsed.max === null || mem.heapUsed > this.heapUsed.max) {
      this.heapUsed.max = mem.heapUsed;
    }
  }
}

I then added the MemProfiler to the runExport code:

function runExport(query) {
  memProf.start();
  MongoClient.connect(MONGODB_URL)
    .then((db) => {
      let exportCount = 0;
      db
        .collection('content')
        .find(query)
        .forEach(
          (content) => {
            // transform content doc to a csv row, then push to S3 writer
            s3.write(csvTransform(content));
            exportCount++;
            memProf.sample();
          },
          () => {
            db.close();
            s3.close();
            logger.info(`documents exported: ${exportCount}`);
            console.log('heapUsed:');
            console.log(` max: ${memProf.heapUsed.max/1024/1024} MB`);
            console.log(` min: ${memProf.heapUsed.min/1024/1024} MB`);
          }
        );
    });
}

Running the export against a small set of 16,900 documents yielded this result:

documents exported: 16900
heapUsed:
max: 30.32764434814453 MB
min: 8.863059997558594 MB

Then running against a larger set of 34,000 documents:

documents exported: 34000
heapUsed:
max: 36.204505920410156 MB
min: 8.962921142578125 MB

Testing with additional documents increased the heap memory usage linearly as we expected.

Enter the Stream

The necessary code rewrite had one goal in mind: ensure the memory profile was constant, whether a total of one document was processed or a million. Since this was a batch process, we could trade off memory usage for processing time.

Let’s reiterate the primary operations:

  1. Fetch a document
  2. Transform the document to a CSV row
  3. Write the CSV row to S3

If I may present an analogy, we have widgets moving down the assembly line from one station to the next. The conveyor belt on which the widgets are transported is moving at a constant rate, which means the throughput at each station is constant. We could increase the speed of that conveyor belt, but only as fast as can be handled by the slowest station.

In Node JS, streams are the assembly line stations, and pipes are the conveyor belt. Let’s examine the three primary types of streams:

  1. Readable streams: This is typically an upstream data source that feeds data to a writable or transform stream. In our case, this is the MongoDB find query.
  2. Transform streams: This typically takes upstream data, performs a calculation or transformation operation on it and feeds it out downstream. In our case, this would take a MongoDB document and transform it to a CSV row.
  3. Writable streams: This is typically the terminating point of a data flow. In our case this is the writing of a CSV row to S3.

Readable Stream

Conveniently, the MongoDB Node JS driver’s find() function returns a Cursor object that extends the Node JS Readable stream. How convenient!

const streamContentReader = db.collection('content').find(query);

Transform Stream

All we need to do is extend the Transform stream class, and override the _transform() method to transform the content to a CSV row, push it out the other end, and signal upstream that it is ready for the next content.

class CsvTransformer extends Transform {
  constructor(options) {
    super(options);
  }

  _transform(content, encoding, callback) {
    // write row
    const { _id, authoredAt, client, channel, mediaType, permalink , text, textLanguage } = content;
    const csvRow = `${_id},${authoredAt},${client},${channel},${mediaType},${permalink}${text},${textLanguage}\n`;
    this.push(csvRow);
    // signal upstream that we are ready to take more data
    callback();
  }
}

Writable Stream

Let’s purposely mock the S3 Writer for now. It does nothing here, but realistically, we would buffer up incoming data, then flush out the buffer in chunks over the network for best throughput.

class S3Writer extends Writable {
  constructor(options) {
    super(options);
  }

  _write(data, encoding, callback) {
    // TODO: probably use a fixed size buffer
    // that we write to and then flush out to S3 using
    // multipart data upload
    callback();
  }
}

Pipe ‘Em Up

We now create the three streams:

const streamContentReader = db.collection('content').find(query);
const streamCsvTransformer = new CsvTransformer({ objectMode: true });
const streamS3Writer = new S3Writer();

Streams alone do nothing. What makes them useful is connecting the output from one stream to the input of another. In stream terminology, this is called piping, and is accomplished using the stream’s pipe method:

streamContentReader.pipe(streamCsvTransformer).pipe(streamS3Writer)

Although the native pipe method is perfectly functional, I highly recommend using the pump library. In addition to being more readable by passing in an array of streams to pipe together, we can also invoke then() when the pipe has finished, as well as handle close or error events emitted by the streams:

pump([ streamContentReader, streamCsvTransformer, streamS3Writer ])
  .then(() => {
    console.log(`documents exported: ${exportCount}`);
  })
  .catch((err) => console.error(err));

CSV and S3 Write Streams

I purposely did not implement the S3Writer above, because npm is a treasure trove of solutions! The s3-write-stream library will take care of buffering, using multipart upload and handling retries, so we don’t have to get our hands too dirty. And wouldn’t you know it, there is also the csv-write-stream library that will generate properly escaped CSV rows.

As an exercise, you the reader may want to try using the s3-write-stream and csv-write-stream. Trust me, once you get the hang of streaming, you will enjoy it!

Back Off!

Let’s touch back on the issue of memory usage.

What if a write stream becomes blocked for a period of time? Maybe we are writing data to a third-party service over HTTP and network congestion or service throttling is slowing the write rate. The readable stream will happily pump data in as fast as it can. Won’t that cause increased heap usage as all that data gets backed up and buffered?

The quick answer is “no”. In the implementation of the _write() method of your writable stream, when done with the data chunk (after writing to a file for example), call callback(). This signals to the incoming stream that it is ready to receive the next chunk. You can also provide a highWaterMark to the writable stream constructor to give a specific number of objects to buffer before pausing the incoming stream. This is all handled internally by the Node JS streams implementation. No more unbounded buffering!

For a deep dive into the concept of backpressure control, read this great article on Backpressuring in Streams.

Stream ‘Em If You Got ’em!

Our team is now in the process of utilizing Node JS streams whenever we read in volumes of data, process that data, then write out the results. We are now seeing great gains in both stability and maintainability!

References

Node JS Streams APIhttps://nodejs.org/api/stream.html

Article: Backpressuring in Streams: https://nodejs.org/en/docs/guides/backpressuring-in-streams/

MongoDB Cursor Streamhttp://mongodb.github.io/node-mongodb-native/2.2/api/Cursor.html

Pump Modulehttps://www.npmjs.com/package/pump

CSV Write Stream Module: https://www.npmjs.com/package/csv-write-stream

S3 Write Stream Module: https://www.npmjs.com/package/s3-write-stream