dry.ly

Full Streams Ahead

October 6th 2013

This is a companion article to my "Full Streams Ahead" talk at JS.LA:

Why Streams?

Streams are node's best and most misunderstood idea...
Dominic Tarr

There are two advantages to using streams: speed and efficiency.

Let's look at a typical scenario: a single page app that gets objects from a JSON API and presents it to the user in a table. What we have here are the web client (browser) that hits an api (web server) that fetches some data from a data source (database).

Without using streams, this is what would happen:

  1. The browser makes a request to the server.
  2. The server makes a request to the database.
  3. The database begins returning objects to the server.
  4. The server receives them and stores them in memory.
  5. When the server has all of the records/objects from the database it prepares a response and sends it back to the browser.
  6. The browser receives the response and parses it.
  7. The browser displays the objects to the user (a table).

This is roughly what the process looks like:

I want to call your attention to two points where the data is being buffered: (1) the server is buffering records from the database before passing them on to the client and (2) the client is buffering the response from the server before inserting records into the table.

If we use streams, the server wouldn't wait to have all of the records from the database to arrive before starting to pass them on, and the client wouldn't wait for the entire response from the server to arrive before placing them in the table.

This gives us speed: the time between the user's click and the user seeing data has been reduced (this matters btw), and this gives us efficiency: the web server doesn't have to keep all the data in memory, it only holds each chunk long enough to send to the client.

Here's the process again, but with the streaming approach on the bottom. Notice how (1) the first record gets to the client much faster, and (2) the web server holds much less data at a time.

So What is a "Stream" Anyway?

A stream is actually a very simple object. At its core, a stream is just an event emitter. There are a couple things an event emitter needs to be a stream, and those depend on if the stream is "readable", "writable", or both.

Readable Streams

If a stream is "readable", the most important condition it must satisfy is that it emits "data" events along with some data.

Here is a readable stream. It emits data every so often, and will blink when it does. Hover to see the data that it is emitting.

Like any other readable stream, we could use this stream's on method to listen to its data events. In this case, we would be notified of the time every second.

Writable Streams

There are also writable streams. When you call their "write" method with some data, they will do something. Here's a simple writable stream that will change the color of a div element.

When the text in the input changes, we call the write method on the stream with the contents of the element. For example: if you were to change the text in the input to be 33, 66, 99, this would be run: writeStream.write('33, 66, 99').

Pipes

Up to this point we've ommitted the coolest things about streams. You can pipe them together. If we make another stream that is both writable and readable, we can turn the date object into colors and "write" them to the div.

What's going on here is that the date object from our readable stream is being written to the transform stream. The transform stream changes the date to a color string, and then passes it along to our write stream. Instead of using .on and .write we simply use .pipe to do this. It looks like:

    readableStream
      .pipe(transformStream)
      .pipe(writeStream)
  

In the Wild

I hope the previous examples illustrate how simple streams are, and how easy they are to use. Let's see what what some "real world" code could look like.

Here's a simple example inspired by a tweet by John Resig:

In this example requestStream is just like our readable stream above, but instead of emitting date objects, it emits chunks of text as they come through.

Much like our transform stream, parser is piped to a readable stream and "transforms" what goes out the other end. In this case, it's a subset of the requested document's JSON.

Instead of listening to the data event, we could pipe the output to a file (write stream) or another transform (readable, writable stream).

We can see the two benefits of using streams here: (1) speed: we don't have to wait for the whole document to download before we start seeing headlines, and (2) efficiency: the whole requested document doesn't have to fit in memory -- we process bits of it at a time. There's also a third benefit here: look how clean that is!

Resources