How the F does Sprockets Load an Asset?

How does an asset get compiled? It’s less of a pipeline and more of a recursive ball of, well assets. To understand the process we will, start off with an asset with no directives (no require at the top). We’ll then walk through all the steps Sprockets goes through until a usable asset is loaded into memory. For this example we will use a js.erb file to see how a “complex” file (i.e. multiple extensions) type gets compiled. All examples are with Sprockets 4 (i.e. master branch). Here’s the file:

$ cat assets/users.js.erb
var Users = {
  find: function(id) {
    var t = '<%= Time.now %>';
  }
};

When we compile this asset we get:

var Users = {
  find: function(id) {
    var t = '2016-12-13 11:01:00 -0600';
  }
};

This is with the simplest of sprockets setup:

@env = Sprockets::Environment.new
@env.append_path(fixture_path('asset'))
@env.cache = {}

What happens first? We call

@env.find_asset("users.js")

This calls the find_asset method in Sprockets::Base. The contents are deceptively simple

uri, _ = resolve(*args)
if uri
  load(uri)
end

The resolve method comes from sprockets/resolve.rb and the load method comes from sprockets/load.rb. Resolve will find where the asset is on disk and give us a “uri”. We’ll skip over exactly how resolve works, its task is relatively straightforward, find an asset on disk that satisfies the requirement of resolving to a users.js file. We can go into it in detail some other time.

A “uri” in sprockets looks like this:

"file:///projects/sprockets/test/fixtures/asset/users.js.erb?type=application/javascript"

It has a schema with the type of thing it is (in this case a file). We can tell that it is an absolute path because after the schema file:// it starts with a slash. The full path to this file is /projects/sprockets/test/fixtures/asset/users.js.erb. Then in the query params we carry extra info, in this case we are storing the mime type, which is application/javascript. While the file itself is a .js.erb the expected result of loading (compiling) this file is to be a .js file.

Internally Sprockets mostly doesn’t care about file extensions, it really cares about mime types. It only uses file extensions to generate mime types. When you register any processors, you register via a mime type.

The body of the load method from sprockets/loader.rb is fairly complicated. It handles a few cases.

We’re going to assume a fresh cache for our example. That means that we hit the fetch_asset_from_dependency_cache method and fall back to the `“not found based on history” case so we have to load it.

Loading an unloaded asset (pipeline = nil/:default)

The bulk of work happens in the method load_from_unloaded. We’re going to start getting really technical and low level, so follow along in the code for better comprehension what I’m talking about. We first generate a “load” and a “logicial” path:

puts load_path.inspect
# => "/projects/sprockets/test/fixtures/asset"

puts logical_path.inspect
# => "users.js.erb"

There is an edge case that is handled next. In sprockets foo/index.js can be resolved to foo.js, it’s a convention in some NPM libraries. That doesn’t apply to this case. Next we generate an extname and a file_type

puts extname.inspect
# => ".js.erb"

puts file_type.inspect
# => "application/javascript+ruby"

The file_type is the mime type for our .js.erb extension. Note the +ruby which designates that this is an erb file. I think this is a Sprockets convention. This mime type will be very important.

In this case the only params we have are {:type=>"application/javascript"} so we skip over the pipeline case.

We do have a :type so we’ll run that part. The logical_path was trimmed down to remove the extension

puts logical_path.chomp(extname)
# => "users"

Now we pull an extension based off of our mime type and add it to the logical path

puts config[:mime_types][type][:extensions].first
# => ".js"

Putting these together our new logical path is:

"users.js"

We’ll use this later. This should match the original thing we looked for when we used @env.find_asset.

Next comes a sanity check. Either we’re working with a mime type which we’re requesting, or we’re working with a mime type that can be converted to the one we’re requesting. We check our transformers which is an internal concept to Sprockets, see guides/extending_sprockets.md for more info on building a transformer. They essentially allow you to convert one file into another. Sprockets mostly cares about mime types so we check the transformers to see if it’s possible to transfer the existing mime type into the desired mime type i.e. we want to convert application/javascript+ruby to application/javascript.

Next we grab the “processors” for our mime type. These can be transformers as mentioned earlier, or they can be processors such as DirectiveProcessor which is responsible for expanding directives such as //= require foo.js in the top of your file.

Into this processors_for method we also pass a “pipeline”. For now it is nil, which means that the :default pipeline is used.

A pipeline is registered like a transformer or a processor. They’re an internal concept. Here is what the default one looks like

register_pipeline :default do |env, type, file_type|
  # TODO: Hack for to inject source map transformer
  if (type == "application/js-sourcemap+json" && file_type != "application/js-sourcemap+json") ||
      (type == "application/css-sourcemap+json" && file_type != "application/css-sourcemap+json")
    [SourceMapProcessor]
  else
    env.default_processors_for(type, file_type)
  end
end

Here if we’re requesting a sourcemap we only want to run the [SourceMapProcessor] otherwise we find the “default” processors that are valid to our type (in this case application/javascript) from our file_type (in this case application/javascript+ruby). Default processors are defined here:

def default_processors_for(type, file_type)
  bundled_processors = config[:bundle_processors][type]
  if bundled_processors.any?
    bundled_processors
  else
    self_processors_for(type, file_type)
  end
end

Either we return any “bundled” processors for the type or we return “self” processors. In our case there is a bundle processor registered Sprockets::Bundle. It was registered. In sprockets.rb:

require 'sprockets/bundle'
register_bundle_processor 'application/javascript', Bundle

Now we’re back to the loader.rb file. We have our processors array which is simply [Sprockets::Bundle]. We call build_processors_uri. This generates a string like:

"processors:type=application/javascript&file_type=application/javascript+ruby"

This string gets added to the “dependencies”. This array is used for determining cache keys, so if a processor gets added or removed the cache key will change (I think).

Now we have to call each of our processors. First we resolve! the original filename, but with a different pipeline i.e. pipeline: :source. The resolve! method raises an error if the file cannot be found.

After we resolve the file we get a source_uri that looks like this:

"file:///projects/sprockets/test/fixtures/asset/users.js.erb?type=application/javascript+ruby&pipeline=source"

Now here’s where things get complicated (I know right). We load the exact same file that is already being loaded with this new pipeline=source.

Recursive asset loading is recursive (pipeline=source)

At this point we get recursive, we call repeat everything in load_from_unloaded but with pipeline=source. The results should be the same but with a different pipeline. The :source pipeline looks like this:

register_pipeline :source do |env|
  []
end

In this case the processors returned is an empty array [].

We skip over the processor section, and instead hit this:

dependencies << build_file_digest_uri(unloaded.filename)
metadata = {
  digest: file_digest(unloaded.filename),
  length: self.stat(unloaded.filename).size,
  dependencies: dependencies
}

The file is digested to create a “digest” and the length is added via stat. Also “dependencies” are recorded which look like this:

#<Set: {"environment-version", "environment-paths", "processors:type=application/javascript+ruby&file_type=application/javascript+ruby&pipeline=source", "file-digest:///projects/sprockets/test/fixtures/asset/users.js.erb"}>

After this we build an asset hash:

asset = {
  uri:          unloaded.uri,
  load_path:    load_path,
  filename:     unloaded.filename,
  name:         name,
  logical_path: logical_path,
  content_type: type,
  source:       source,
  metadata:     metadata,
  dependencies_digest:
                DigestUtils.digest(resolve_dependencies(metadata[:dependencies]))
}

Which is then used to generate a Sprockets::Asset and is returned by our load method.

Jumping back up the stack (pipeline=default)

Now that we have a “source” asset we can go back and finish running the processors for pipeline=default

We did all that work, just to get a digest path:

source_uri, _ = resolve!(unloaded.filename, pipeline: :source)
source_asset = load(source_uri)

source_path = source_asset.digest_path
# => "users.source.js-798a333a5596e1495e1cc4870f11c7729f168350ee5972637053f9691c8dc326.erb"

Which kinda seems insane, maybe we don’t have to need go all recursive to get this tiny piece of information, but whatevs. If there’s one thing I’ve learned from working on Sprockets, is that the code resists refactoring and most of the seemingly “clever” code is actually a very clean way of accomplishing tasks. That is to say, I’m not going to change this without a lot more research.

Now we execute the call_processors pass in our array of processors [Sprockets::Bundle] and our asset hash:

{
  environment:  self,
  cache:        self.cache,
  uri:          unloaded.uri,
  filename:     unloaded.filename,
  load_path:    load_path,
  source_path:  source_path,
  name:         name,
  content_type: type,
  metadata: {
    dependencies:
                dependencies
}

If we had more than one processor this would call each of them in reverse order and merge the results before calling the next. In this case there’s only one processor. Guess it’s time to figure out what that one does.

Bundle processor (still on pipeline=default)

The bundle processor is defined in sprockets/bundle.rb. Open it up to follow along. We pull out dependencies from the hash. For now it is very simple:

#<Set: {"environment-version", "environment-paths", "processors:type=application/javascript&file_type=application/javascript+ruby"}>

The next thing we do is we resolve the file (yes, again) this time using pipeline=self

processed_uri, deps = env.resolve(input[:filename], accept: type, pipeline: :self)

puts processed_uri.inspect
# => "file:///projects/sprockets/test/fixtures/asset/users.js.erb?type=application/javascript&pipeline=self"

puts deps.inspect
# => #<Set: {"file-digest:///projects/sprockets/test/fixtures/asset/users.js.erb"}>

We merge this deps with the dependencies from earlier. The file-digest:// that was returned from the resolve method indicates that there is a dependency on the contents of the file on disk, if the contents change, the digest should change.

You ready for some more recursion? You better hold onto your butts.

The next thing that happens is we build a proc

find_required = proc { |uri| env.load(uri).metadata[:required] }

This proc takes in a uri and loads it, then returns a set of “required” files. Sprockets uses this proc to do a depth first search of our processed_uri (i.e. the pipeline=self uri). We can look at the dfs now:

def dfs(initial)
  nodes, seen = Set.new, Set.new
  stack = Array(initial).reverse

  while node = stack.pop
    if seen.include?(node)
      nodes.add(node)
    else
      seen.add(node)
      stack.push(node)
      stack.concat(Array(yield node).reverse)
    end
  end

  nodes
end

The purpose of this search is that we want to make sure to only evaluate each file once and only once. Otherwise if we had an a.js that required a b.js that required a c.js that required a.js if we didn’t keep track, then we would be stuck in an infinite loop. There is more involved in making sure infinite loops don’t happen, but that’s maybe for another post.

For the first iteration this creates an array with only our URI in it:

puts stack.inspect
# => ["file:///projects/sprockets/test/fixtures/asset/users.js.erb?type=application/javascript&pipeline=self"]

It then adds this uri to the “seen” set and puts it back on the stack. The next line is a little tricky

stack.concat(Array(yield node).reverse)

Here the node is:

"file:///projects/sprockets/test/fixtures/asset/users.js.erb?type=application/javascript&pipeline=self"

So we call the block with that node, remembering our block is

find_required = proc { |uri| env.load(uri).metadata[:required] }

So our DFS method invokes this block and passes it our pipeline=self uri, which invokes our load method again.

Load recursion kicked off from within bundle (pipeline=self)

I feel like we can’t get out of this load method, here we are again. This is what our pipeline=self looks like:

register_pipeline :self do |env, type, file_type|
  env.self_processors_for(type, file_type)
end

This method `self_processors_for` is non-trivial:

```ruby
def self_processors_for(type, file_type)
  processors = []

  processors.concat config[:postprocessors][type]
  if type != file_type && processor = config[:transformers][file_type][type]
    processors << processor
  end
  processors.concat config[:preprocessors][file_type]

  if processors.any? || mime_type_charset_detecter(type)
    processors << FileReader
  end

  processors
end

First we grab any postprocessors that are registered for application/javascript mime type. There are no postprocessors registered by default, so I don’t know why they exist, but you can register one using register_postprocessor.

Next up, we pull out a transformer for our file type. This returns us a Sprockets::ProcessorUtils::CompositeProcessor. This is a meta processor that contains possibly several transformers. It is generated via a call to register_transformer. In this case the full processor looks like this:

#<struct Sprockets::ProcessorUtils::CompositeProcessor
  # ...
  processors=
   [#<Sprockets::Preprocessors::DefaultSourceMap:0x007fb24d3271a0>,
    #<Sprockets::DirectiveProcessor:0x007fb24d356400
     @header_pattern=/\A(?:(?m:\s*)(?:(?:\/\/.*\n?)+|(?:\/\*(?m:.*?)\*\/)))+/>,
    Sprockets::ERBProcessor]>

It’s doing some things with source maps and you can see now we have our ERBProcessor in there as well a DirectiveProcessor.

Next up, we gather any preprocessors, of which there are none. Finally, if there are any processors in our list we add a FileReader if we detect that it is not binary. Sprockets assumes a text file if the mime type has a charset defined. This is pretty standard.

So now we have our meta CompositeProcessor as well as our FileReader processor.

Now we call each of the processors in reverse order. First up is the FileReader.

class FileReader
  def self.call(input)
    env = input[:environment]
    data = env.read_file(input[:filename], input[:content_type])
    dependencies = Set.new(input[:metadata][:dependencies])
    dependencies += [env.build_file_digest_uri(input[:filename])]
    { data: data, dependencies: dependencies }
  end
end

It takes in filename, reads that file from disk and adds to the :data key of the hash. It also adds a dependency of the file, in case there isn’t already one:

"file-digest:///projects/sprockets/test/fixtures/asset/users.js.erb"

After the file is done being read from disk, next up is the CompositeProcessor. This delegates to its other processors in reverse order so these get called

Sprockets::ERBProcessor
#<Sprockets::DirectiveProcessor:0x007f85b1322448 @header_pattern=/\A(?:(?m:\s*)(?:(?:\/\/.*\n?)+|(?:\/\*(?m:.*?)\*\/)))+/>
#<Sprockets::Preprocessors::DefaultSourceMap:0x007f85b12f33a0>

First up is the ERBProcessor, it takes the input[:data] which is the contents of the file and runs it through an ERB processor. There’s a little magic in that file to detect if someone is using an ENV variable in their erb, in which case we auto add that as a dependency.

Next the DirectiveProcessor looks for any directives such as //= require foo.js of which there are none in this file. Finally we call DefaultSourceMap. This processor adds a 1-to-1 source map if one isn’t already generated. If you’re not familiar with source maps check out guides/source_maps.md which has some of my notes.

Now all of our processors for pipeline=self have been run, the load call completes and now we go back to where we were in our Bundle processor for pipeline=default.

Return to Bundle for (pipeline=default)

You may remember that we were in the middle of a depth first search.

def dfs(initial)
  nodes, seen = Set.new, Set.new
  stack = Array(initial).reverse

  while node = stack.pop
    if seen.include?(node)
      nodes.add(node)
    else
      seen.add(node)
      stack.push(node)
      stack.concat(Array(yield node).reverse)
    end
  end

  nodes
end

ALL that last section happened during the yield node section of this code. The return was an array of dependencies, which are reversed and added onto the stack. In our case there are no “required” files for file:///projects/sprockets/test/fixtures/asset/users.js.erb?type=application/javascript&pipeline=self so that yield call returns an empty set`.

The only node on the stack has already been “seen” so it is added to our nodes set. This was the last thing on the stack so we return that array. Our required list looks like this:

#<Set: {"file:///projects/sprockets/test/fixtures/asset/users.js.erb?type=application/javascript&pipeline=self"}>

If we were using a required directive such as //= require foo.js then we would have more things in this set. Another concept that Sprockets has is a “stubbed” list. Gonna be totally honest, I have no idea why you would need it but it is there. From the method docs: “Allows dependency to be excluded from the asset bundle”. So there you go. To get this list we call into load AGAIN

stubbed  = Utils.dfs(env.load(processed_uri).metadata[:stubbed], &find_required)

Though there is one thing I never mentioned, not all calls to load are created equal:

Cached Environment

Something I’ve failed to mention is that all calls to an env are not created equal. There is a Sprockets::Environment and a Sprockets::CachedEnvironment. The cached environment wraps the Sprockets::Environment and caches certain calls such as load so in the above example env.load(processed_uri) is returning a cached value and not actually calling into load, that’s a relief.

It turns out that this whole time I was somewhat misleading you, we weren’t using the version of fine_asset from Sprockets::Base but rather we were using Sprockets::Environment

def find_asset(*args)
  cached.find_asset(*args)
end

This call to cached creates a `CachedEnvironment object:

def cached
  CachedEnvironment.new(self)
end

Now any duplicate calls to load (with the EXACT same url) will return a cached copy. The rest of the implementation of find_asset is from the Sprockets::Base.

The first time we hit the cache in this example was with

file:///projects/sprockets/test/fixtures/asset/users.js.erb?type=application/javascript+ruby&pipeline=source

It is first put in the cache at:

/projects/sprockets/lib/sprockets/loader.rb:149:in `load_from_unloaded'

Note some of my line numbers might not match perfectly due to changes in source, also I’m adding in debug statements etc.

Or this line:

source_uri, _ = resolve!(unloaded.filename, pipeline: :source)
source_asset = load(source_uri) # <========== THIS LINE ===========

source_path = source_asset.digest_path

When we pull it from cache we do so in the bundle processor:

/projects/sprockets/lib/sprockets/bundle.rb:35:in `block in call'

Which corresponds to this code:

(required + stubbed).each do |uri|
  dependencies.merge(env.load(uri).metadata[:dependencies]) #< === Called from cache here
end

Which brings us back to the bundle processor we were looking at before:

Finish the bundle processor (pipeline=default)

We loop through our required set (which is #<Set: {"file:///projects/sprockets/test/fixtures/asset/users.js.erb?type=application/javascript&pipeline=self"}>) minus our stubbed set (which is empty).

For each of these we merge in dependencies. Our final dependencies set looks like this:

#<Set: {
  "environment-version",
  "environment-paths",
  "processors:type=application/javascript&file_type=application/javascript+ruby&pipeline=self",
  "file-digest:///projects/sprockets/test/fixtures/asset/users.js.erb"}>

We then look up “reducers” and get back a hash of keys and callable objects:

{:data=>
  [
    #<Proc:0x007ffef7b74460@/Users/richardschneeman/Documents/projects/sprockets/lib/sprockets.rb:129>,
    #<Proc:0x007ffef7b74398 (lambda)>
  ],
:links=>
  [
    nil,
    #<Proc:0x007ffef7b74118(&:+)>
  ],
:sources=>
  [
    #<Proc:0x007ffef8027c50@/Users/richardschneeman/Documents/projects/sprockets/lib/sprockets.rb:131>,
    #<Proc:0x007ffef7b74118(&:+)>
  ],
:map=>
  [
    #<Proc:0x007ffef8027278@/Users/richardschneeman/Documents/projects/sprockets/lib/sprockets.rb:132>,
    #<Proc:0x007ffef8027070 (lambda)>
  ]
}

A reducer can be registered like so:

register_bundle_metadata_reducer '*/*', :data, proc { String.new("") }, :concat
register_bundle_metadata_reducer 'application/javascript', :data, proc { String.new("") }, Utils.method(:concat_javascript_sources)
register_bundle_metadata_reducer '*/*', :links, :+
register_bundle_metadata_reducer '*/*', :sources, proc { [] }, :+
register_bundle_metadata_reducer '*/*', :map, proc { |input| { "version" => 3, "file" => PathUtils.split_subpath(input[:load_path], input[:filename]), "sections" => [] } }, SourceMapUtils.method(:concat_source_maps)

It acts on a key such as :data to transform or “reduce” individual keys.

If we had some “required” files do to the directive processor

assets = required.map { |uri| env.load(uri) }

Then this last line is where they would be concatenated via our reducers:

process_bundle_reducers(input, assets, reducers).merge(dependencies: dependencies, included: assets.map(&:uri))

In this case our only “required” asset is from file:///projects/sprockets/test/fixtures/asset/users.js.erb?type=application/javascript&pipeline=self which is important because you’ll remember that the pipeline=self is when the FileReader and ERBProcessor were run.

Finally we can return from our original pipeline=nil/:default case since all of our pipelines have been executed. In our original call to load.

The rest of the code is just doing things like taking digests and building hashes, we’ve already covered it in a previous section.

Finally a Sprockets::Asset is generated and returned from our original @env.find_asset invocation.

Yay!

2020 Hindsite

There’s a few confusing things going on here. It isn’t always clear that calls to an env are going to CachedEnvironment and its even less clear if we’re calling something that has already been cached or loading something new.

The pattern of loading files that Sprockets uses is a reactor. It stores state via pipeline=<whatever> and essentially loops with different pipeline variations until it gets its desired output. While this is very powerful, it’s also really hard to wrap your brain around. Most of the code, especially in the Bundle processor are indecipherable if you don’t know minute details about how things work inside of all of Sprockets. These two designs, the recursive-ish load reactor pattern and the CachedEnvironment are sometimes difficult to wrap your mind around. Especially this pattern of loading files creates a forking back trace, so if you’re trying to debug it’s not always immediately clear what’s going on. Debug statements are usually output several times per each method call.

The other thing that makes Sprockets hard to understand is the plugin ecosystem. Sprockets is less a library and more a framework that uses itself to build an asset processing framework. Things like transformers, preprocessors, compressors, bundle_processors, etc. make it confusing exactly where work gets done. Some of the processors are highly coupled, such as the Bundle processor and the DirectiveProcessor. Again it’s extremely powerful and makes the library very flexible but difficult to reason about.

Much of Sprockets resists refactoring. Many of the design decisions are very coupled to the implementation. I’ve spent hours trying to tease out CachedEnvironment into something else, but eventually gave up. One thing to consider if you’re prone to judging code like I am, this project is 70%+ written by one person. These design decisions are all very powerful and many times very beautiful in their simplicity. If you’re the only one that works on a project, sometimes it pays to pick a powerful abstraction over one easier to read and understand.

I’ve got some ideas on how we could tease some abstractions, but it’s a hard thing to do. We have to be backwards compatible, and bake in room for future features & growth. We also need to be performance conscious.

There are other features that I haven’t covered in this example such as how files get written to disk, and how manifest files are generated, but how an asset gets loaded is complicated enough for now. How is your life better now that you know how the “F” Sprockets loads an asset? I have no idea, but I’m sure there’s something good about it. If you enjoyed this technical deep dive check out my post where I live-blog a writing a non-trivial Rails feature. Thanks for reading!


If you liked this post (or even if you didn’t) you can subscribe to my mailing list to get updates when I post new content. I average a little less than a post a week, often fewer. The more subscribers I get, the more incentive I have to put out content consistently.