Adblock Plus and (a little) more

Using asynchronous file I/O in Gecko · 2012-04-05 20:38 by Wladimir Palant

I’ve finally decided to start using asynchronous file I/O in Adblock Plus (probably about time). I didn’t expect this to be too complicated, mostly messy because of all the callbacks. Well, I was mistaken. I will write down what I figured out, this might help somebody.

Reading a file

NetUtil.jsm has a nice helper that lets you read a file. You simply call NetUtil.asyncFetch(), give it a file and a callback, and it will call your callback with an input stream that you can read out synchronously (you can use NetUtil.readInputStreamToString()). The catch here wasn’t immediately obvious to me: it calls the callback once. In other words, it reads the entire file into memory (pipe buffer to be precise) and gives it to you. What if you want to count the number of lines is a huge file and don’t want to load it into memory in full? Bad luck…

Fortunately, it’s not too complicated to implement yourself, without writing everything to the pipe. Here is how you would count the lines in a file:

Cu.import("resource://gre/modules/XPCOMUtils.jsm");
Cu.import("resource://gre/modules/NetUtil.jsm");
Cu.import("resource://gre/modules/FileUtils.jsm");

var file = FileUtils.getFile("ProfD", ["prefs.js"]);
var uri = NetUtil.ioService.newFileURI(file);
var channel = NetUtil.ioService.newChannelFromURI(uri);
var lineCount = 0;

channel.asyncOpen({
  QueryInterface: XPCOMUtils.generateQI([Ci.nsIRequestObserver, Ci.nsIStreamListener]),
  onStartRequest: function(request, context) {},
  onDataAvailable: function(request, context, stream, offset, count)
  {
    var data = NetUtil.readInputStreamToString(stream, count);
    var newLines = data.match(/\n/g);
    if (newLines)
      lineCount += newLines.length
  },
  onStopRequest: function(request, context, result)
  {
    if (Components.isSuccessCode(result))
      alert("Line count: " + lineCount);
    else
      alert("Failed to read file, error code " + result);
  }
}, null);

The drawback of this solution: it seems that the channel will stat the file synchronously, on the main thread — you will get an NS_ERROR_FILE_NOT_FOUND exception immediately for non-existing files. So while the bulk of the I/O operations is moved to a background thread, we are still hitting the file system on the main thread. If somebody knows a better approach, I would like to hear.

Writing to a file

NetUtil.jsm also has a helper that can be used to write to a file asynchronously: NetUtil.asyncCopy(). It copies data from your source stream to the target stream, very nice. The problem is getting an nsIInputStream instance for your data however. If you have all the data as a string then you can simply use nsIStringInputStream like in this example. But what if you want to generate data on the fly, without putting all of it into memory? You cannot implement nsIInputStream in JavaScript meaning that you have to use tricks.

I found only one way to supply data chunk-wise from JavaScript: nsIAsyncOutputStream. It lets you write a block of data and wait until the stream can handle more. Unfortunately, file streams don’t implement it, only the streams provided by nsIPipe. So you have to create a pipe, connect one of its ends with the file stream via NetUtil.asyncCopy(), and then you can use the other end of the pipe to write data. Here is the actual code:

Cu.import("resource://gre/modules/XPCOMUtils.jsm");
Cu.import("resource://gre/modules/NetUtil.jsm");
Cu.import("resource://gre/modules/FileUtils.jsm");
Cu.import("resource://gre/modules/Services.jsm");

var data = (function()
{
  for (var i = 0; i < 10000; i++)
    yield "line" + i + "\n";
})();

var file = FileUtils.getFile("ProfD", ["test.data"]);
var fileStream = FileUtils.openFileOutputStream(file,
  FileUtils.MODE_WRONLY | FileUtils.MODE_CREATE | FileUtils.MODE_TRUNCATE);

let pipe = Cc["@mozilla.org/pipe;1"].createInstance(Ci.nsIPipe);
pipe.init(true, true, 0, 0x8000, null);

// Connect the pipe to the file
NetUtil.asyncCopy(pipe.inputStream, fileStream, function(result)
{
  if (Components.isSuccessCode(result))
    alert("Success, file written");
  else
    alert("Error writing file: " + result);
});

// Now write to the pipe whenever it clears
function writeNextChunk()
{
  pipe.outputStream.asyncWait({
    QueryInterface: XPCOMUtils.generateQI([Ci.nsIOutputStreamCallback]),
    onOutputStreamReady: function()
    {
      try
      {
        var str = data.next();
        pipe.outputStream.write(str, str.length);
        writeNextChunk();
      }
      catch (e if e instanceof StopIteration)
      {
        pipe.outputStream.close();
      }
    }
  }, 0, 0, Services.tm.currentThread);
}
writeNextChunk();

Of course, in a real extension you would write larger chunks to the pipe — XPConnect overhead is prohibitive when making so many calls from JavaScript to XPCOM and back.

And the other operations?

While reading and writing files are the most common file system operations, there is a number of other operations as well: creating a directory, checking file existence or modification time, renaming or removing files. While these are supposed to be fast, they can still cause a significant delay if the file system is busy. So it would nice to run them asynchronously as well, off the main thread. Unfortunately, I didn’t find any way to do this. Well, if it gets really bad I can use ChromeWorker with js-ctypes to call operating system functions on a different thread — but that would be just crazy.

Tags:

Comment [3]

  1. David Rajchenbach-Teller · 2012-04-06 00:15 · #

    You will be happy to learn that I am working on it at this moment. I hope to land a js-ctypes-based version of file I/O that works in chrome worker threads this month, and an asynchronous version usable from the main thread a bit later. I have a few prototypes lying around if you are interested.

  2. David Rajchenbach-Teller · 2012-04-06 00:16 · #

    Forgot to add: no equivalent to asyncOpen/asyncCopy yet, but it is somewhere in the pipes, too. Pun intended.

  3. Taras Glek · 2012-04-06 00:16 · #

    See https://bugzilla.mozilla.org/show_bug.cgi?id=563742 we are almost done adding some really nice jsctypes wrappers for low level io.

    Great minds think alike…not too crazy at all.

Commenting is closed for this article.