Robot Has No Heart

Xavier Shay blogs here

A robot that does not have a heart

Interface Mocking

UPDATE: This is a gem now: rspec-fire The code in the gem is better than that presented here.

Here is a screencast I put together in response to a recent Destroy All Software screencast on test isolation and refactoring, showing off an idea I’ve been tinkering around with for automatic validation of your implicit interfaces that you stub in tests.

Interface Mocking screencast.

Here is the code for InterfaceMocking:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
module InterfaceMocking

  # Returns a new interface double. This is equivalent to an RSpec double,
  # stub or, mock, except that if the class passed as the first parameter
  # is loaded it will raise if you try to set an expectation or stub on
  # a method that the class has not implemented.
  def interface_double(stubbed_class, methods = {})
    InterfaceDouble.new(stubbed_class, methods)
  end

  module InterfaceDoubleMethods

    include RSpec::Matchers

    def should_receive(method_name)
      ensure_implemented(method_name)
      super
    end

    def should_not_receive(method_name)
      ensure_implemented(method_name)
      super
    end

    def stub!(method_name)
      ensure_implemented(method_name)
      super
    end

    def ensure_implemented(*method_names)
      if recursive_const_defined?(Object, @__stubbed_class__)
        recursive_const_get(Object, @__stubbed_class__).
          should implement(method_names, @__checked_methods__)
      end
    end

    def recursive_const_get object, name
      name.split('::').inject(Object) {|klass,name| klass.const_get name }
    end

    def recursive_const_defined? object, name
      !!name.split('::').inject(Object) {|klass,name|
        if klass && klass.const_defined?(name)
          klass.const_get name
        end
      }
    end

  end

  class InterfaceDouble < RSpec::Mocks::Mock

    include InterfaceDoubleMethods

    def initialize(stubbed_class, *args)
      args << {} unless Hash === args.last

      @__stubbed_class__ = stubbed_class
      @__checked_methods__ = :public_instance_methods
      ensure_implemented *args.last.keys

      # __declared_as copied from rspec/mocks definition of `double`
      args.last[:__declared_as] = 'InterfaceDouble'
      super(stubbed_class, *args)
    end

  end
end

RSpec::Matchers.define :implement do |expected_methods, checked_methods|
  match do |stubbed_class|
    unimplemented_methods(
      stubbed_class,
      expected_methods,
      checked_methods
    ).empty?
  end

  def unimplemented_methods(stubbed_class, expected_methods, checked_methods)
    implemented_methods = stubbed_class.send(checked_methods)
    unimplemented_methods = expected_methods - implemented_methods
  end

  failure_message_for_should do |stubbed_class|
    "%s does not publicly implement:\n%s" % [
      stubbed_class,
      unimplemented_methods(
        stubbed_class,
        expected_methods,
        checked_methods
      ).sort.map {|x|
        "  #{x}"
      }.join("\n")
    ]
  end
end

RSpec.configure do |config|

  config.include InterfaceMocking

end

Static Asset Caching on Heroku Cedar Stack

I recently moved this blog over to Heroku, and in the process added in some proper HTTP caching headers. The dynamic pages use the build in fresh_when and stale? Rails helpers, combined with Rack::Cache and the free memcached plugin available on Heroku. That was all pretty straight forward, what was more difficult was configuring Heroku to serve all static assets (such as images and stylesheets) with a far-future max-age header so that they will be cached for eternity. What I’ve documented here is somewhat of a hack, and hopefully Heroku will provide a better way of doing this in the future.

By default Heroku serves everything in public directly via nginx. This is a problem for us since we don’t get a chance to configure the caching headers. Instead, use the Rack::StaticCache middleware (provided in the rack-contrib gem) to serve static files, which by default adds far future max age cache control headers. This needs to be out of different directory to public since there is no way to disable the nginx serving. I renamed by public folder to public_cached.

1
2
3
4
5
6
7
8
9
10
# config/application.rb
config.middleware.use Rack::StaticCache, 
  urls: %w(
    /stylesheets
    /images
    /javascripts
    /robots.txt
    /favicon.ico
  ),
  root: "public_cached"

I also disabled the built in Rails serving of static assets in development mode, so that it didn’t interfere:

1
2
# config/environments/development.rb
config.serve_static_assets = false

In the production config, I configured the x_sendfile_header option to be “X-Accel-Redirect”. It was “X-Sendfile” which is an apache directive, and was causing nginx to hang (Heroku would never actually serve the assets to the browser).

1
2
# config/environments/production.rb
config.action_dispatch.x_sendfile_header = 'X-Accel-Redirect'

A downside of this approach is that if you have a lot of static assets, they all have to hit the Rails stack in order to be served. If you only have one dyno (the free plan) then the initial load can be slower than it otherwise would be if nginx was serving them directly. As I mentioned in the introduction, hopefully Heroku will provide a nicer way to do this in the future.

Speeding up Rails startup time

In which I provide easy instructions to try a new patch that drastically improves the start up time of Ruby applications, in the hope that with wide support it will be merged into the upcoming 1.9.3 release. Skip to the bottom for instructions, or keep reading for the narrative.

UPDATE: If you have trouble installing, grab a recent copy of rvm: rvm get head.

Background

Recent releases of MRI Ruby have introduced some fairly major performance regressions when requiring files:

For reference, our medium-sized Rails application requires around 2200 files &emdash; off the right-hand side of this graph. This is problematic. On 1.9.2 it takes 20s to start up, on 1.9.3 it takes 46s. Both are far too long.

There are a few reasons for this, but the core of the problem is the basic algorithm which looks something like this:

1
2
3
4
5
6
7
def require(file)
  $loaded.each do |x|
    return false if x == file
  end
  load(file)
  $loaded << file
end

That loop is no good, and gets worse the more files you have required. I have written a patch for 1.9.3 which changes this algorithm to:

1
2
3
4
5
def require(file)
  return false if $loaded[file] 
  load(file)
  $loaded[file] = true
end

That gives you a performance curve that looks like this:

Much nicer.

That’s just a synthetic benchmark, but it works in the real world too. My main Rails application now loads in a mite over 10s, down from 20s it was taking on 1.9.2. A blank Rails app loads in 1.1s, which is even faster than 1.8.7.

Getting the fix

Here is how you can try out my patch right now in just ten minutes using RVM.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# First get a baseline measurement
cd /your/rails/app
time script/rails runner "puts 1"

# Install a patched ruby
curl https://gist.github.com/raw/996418/e2b346fbadeed458506fc69ca213ad96d1d08c3e/require-performance-fix-r31758.patch > /tmp/require-performance-fix.patch
rvm install ruby-head --patch /tmp/require-performance-fix.patch -n patched
# ... get a cup of tea, this took about 8 minutes on my MBP

# Get a new measurement
cd /your/rails/app
rvm use ruby-head-patched
gem install bundler --no-rdoc --no-ri
bundle
time script/rails runner "puts 1"

How you can help

I need a lot more eyeballs on this patch before it can be considered for merging into trunk. I would really appreciate any of the following:

Next steps

I imagine there will be a bit more work to get this into Ruby 1.9.3, but after that this is just the first step of many to try and speed up the time Rails takes to start up. Bundler and RubyGems still spend a lot of time doing … something, which I want to investigate. I also want to port these changes over to JRuby which has similar issues (Rubinius isn’t quite as fast out of the gate, but does not degrade exponentially so would not benefit from this patch).

Thank you for your time.

PostgreSQL 9 and ruby full text search tricks

I have just released an introduction to PostgreSQL screencast, published through PeepCode. It is over an hour long and covers a large number of juicy topics:

  • Setup full text search
  • Optimize search with triggers and indexes
  • Use Postgres with Ruby on Rails 3
  • Optimize indexes by including only the rows that you need
  • Use database standards for more reliable queries
  • Write powerful reports in only a few lines of code
  • Convert an existing MySQL application to use Postgres

It’s a steal at only $12. You can buy it over at PeepCode.

In it, I introduce full text search in postgres, and use a trigger to keep a search vector up to date. I’m not going to cover that here, but the point I get to is:

1
2
3
4
CREATE TRIGGER posts_search_vector_refresh 
  BEFORE INSERT OR UPDATE ON posts 
FOR EACH ROW EXECUTE PROCEDURE
  tsvector_update_trigger(search_vector, 'pg_catalog.english',  body, title);

That is good for simple models, but what if you want to index child models as well? For instance, we want to include comment authors in the search index. I rolled up my sleeves an came up with this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
CREATE OR REPLACE FUNCTION search_trigger() RETURNS trigger AS $$
DECLARE
  search TEXT;
  child_search TEXT;
begin
  SELECT string_agg(author_name, ' ') INTO child_search
  FROM comments
  WHERE post_id = new.id;

  search := '';
  search := search || ' ' || coalesce(new.title);
  search := search || ' ' || coalesce(new.body);
  search := search || ' ' child_search;

  new.search_index := to_tsvector(search); 
  return new;
end
$$ LANGUAGE plpgsql;

CREATE TRIGGER posts_search_vector_refresh 
  BEFORE INSERT OR UPDATE ON posts
FOR EACH ROW EXECUTE PROCEDURE
  search_trigger();

Getting a bit ugly eh. It might be nice to move that logic back into ruby land, but we have the problem that we need to call a database function to convert our search document into the correct data-type. In this case, a quick work around is to store a search_document in a text field on the model, then use a trigger to only index that field into our search_vector field. The search_document field can then easily be set from your ORM.

Of course, any self-respecting rubyist should hide all this complexity behind a neat interface. I have come up with one using DataMapper that automatically adds the required triggers and indexes via auto-migrations. You use it thusly:

1
2
3
4
5
6
7
8
9
10
class Post
  include DataMapper::Resource
  include Searchable

  property :id, Serial
  property :title, String
  property :body, Text

  searchable :title, :body # Provides Post.search('keyword')
end

You can find the Searchable module code over on github. In it you can also find a fugly proof-of-concept for a DSL that generates the above SQL for indexing child models using DataMapper’s rich property model. It worked, but I’m not using it in any production code so I can hardly recommend it. Maybe you want to have a play though.

Rails 3, Ruby 1.9.2, Windows 2008, and SQL Server 2008 Tutorial

This took me a while to figure out, especially since I’m not so great with either windows or SQL server, but in the end the process isn’t so difficult.

Rails 3, Ruby 1.9.2, Windows 2008, and SQL Server 2008 Screencast

The steps covered in this screencast are:

  1. Create user
  2. Create database
  3. Give user permissions
  4. Create DSN
  5. Install ruby
  6. Install devkit (Needed to complie native extensions for ODBC)
  7. Create a new rails app
  8. Add activerecord-sqlserver-adapter and ruby-odbc to Gemfile
  9. Customize config/database.yml
1
2
3
4
5
6
7
8
# config/database.yml
development:
  adapter: sqlserver
  dsn: testdsn_user
  mode: odbc
  database: test
  username: xavier
  password:

Some errors you may encounter:

The specified module could not be found – odbc.so You have likely copied odbc.so from i386-msvcrt-ruby-odbc.zip. This is for 1.8.7, and does not work for 1.9. Remove the .so file, and install ruby-odbc as above.

The specified DSN contains an architecture mismatch between the Driver and the Application. Perhaps you have created a system DSN. Try creating a user DSN instead. I also found some suggestions that you need to use a different version of the ODBC configuration panel, but this wasn’t relevant for me.

Why I Rewrote Chronic

It seems like a pretty epic yak shave. If you want to parse natural language dates in ruby, you use Chronic. That’s just how it is. (There’s also Tickle for recurring dates, which is similar, but based on Chronic anyways.) It’s the standard, everyone uses it, so why oh why did I write my own version from scratch?

Three reasons I can see.

Chronic is unmaintained. Check the network graph for Chronic. A more avid historian could turn this into an epic teledrama, but for now here’s the summary: The main repository hasn’t had a commit since late 2008. Evaryont made a valiant attempt to take the reins, but his stamina only lasted an extra year to August 2009. Since then numerous people have forked his efforts, mostly to add 1.9 support. These efforts are fragmented though. The inertia of such a large project with no clear leadership sees every man running for himself.

Further, the new maintainers aren’t providing a rock solid base. From Evaryont’s README:
I decided on my own volition that the 40-some (as reported by Github) network should be merged together. I got it to run, but quite haphazardly. There are a lot of new features (mostly undocumented except the git logs) so be a little flexible in your language passed to Chronic. [emphasis mine]

This does not fill me with confidence.

Chronic has a large barrier to entry. Natural date parsing is a big challenge. In the original README, there are ~50 examples of formats it supports, and that is excluding all of the features added in forks in the last two years. The result is a large code base which is intimidating for a new comer, especially with no high level guidance as to how everything fits together. On a project of this size, “the documentation is in the specs” is insufficient. I know what it does, I need to know how it does it.

Chronic solves the wrong problem. I want an alternative to date pickers. As such, I don’t need time support, and I only need very simple day parsing. Chronic seems geared towards a calendar type application (“tomorrow at 6:45pm”), but also parses many expressions which simply are not useful in a real application either because they are obtuse - “7 hours before tomorrow at noon” - or just not how users think about dates - “3 months ago saturday at 5:00 pm”. (Note the last assertion is a totally unsubstantiated claim with no user research to support it.)

Further, it is not hard to find simple examples that Chronic doesn’t support. Omitting a year is an easy one: 14 Sep, April 9.

So what to do?

Chronic needs a leader. Chronic neads a hero. One man to reunite the forks, document the code, and deliver it to the promised land.

I am not that man.

I sketched out the formats I actually needed to support for my application, looked at it and thought “really it can’t be that hard”. Natural date parsing is hard; parsing only the dates your application requires is easy. One hour later I had a gem that not only had 100% support for all of the Chronic features I had been using, but also covered some extra formats I wanted (“14 Sep”), and could also convert a date back into a human readable description. That’s less time than I had already sunk into trying to get Chronic working.

Introducing Kronic.

Less than 100 lines of code, totally specced, totally solved my problem. Ultimately, I don’t want to deal with this problem, so I wanted the easiest solution. While patching Chronic would intuitively appear to be pragmatic, a quick spike in the other direction turned out to be worthwhile. Sometimes 80% just isn’t that hard.

Build time graph with buildhawk

How long your build took to run, in a graph, on a webpage. That’s pretty fantastic. You need to be storing your build time in git notes, as I wrote about a few weeks back. Then simply:

1
2
3
gem install buildhawk
buildhawk > report.html
open report.html

This is a simple gem I hacked together today that parses git log and stuffs the output into an ERB template that uses TufteGraph and some subtle jQuery animation to make it look nice. For extra prettiness, I use the Monofur font, but maybe you are content with your default monospace. If you want to poke around the internals (there’s not much!) have a look on Github.

Six best talks from LSRC 2010

I wrote this last fortnight, but was waiting for videos. Still missing a few, but it’s a start. Enjoy!

I am just finishing up a week in Austin, Texas. I was here for Lone Star Ruby Conference, at which I ran both my Database Is Your Friend Training, and also a full day introduction to MongoDB course. I was then free to enjoy the talks for the remaining two days. Here are my top picks.

Debugging Ruby

Aman Gupta gave a fantastic overview of the low level tools available for debugging ruby applications, including perf-tools, strace, gdb, bleak-house, and some nice ruby wrappers he has written around them. I had heard of these tools before, but was never sure when to use them or where to start if I wanted to use them. Aman’s presentation was the hook I needed to get into these tools, giving plenty of real examples of where they had been useful and how he used them.

Slides

Seven Languages in Seven Weeks

Bruce Tate gave an entertaining talk in which he compared seven languages to movie characters. It was a great narrative, and is energy and excitement about the languages was infectious. He has written a book on the same topic, which I plan on purchasing when I make some time to work through it. There are some sample chapters available at the pragprog site.

Book

Greasing Your Suite

I had seen the content of Nick’s talk “Greasing Your Suite” before in slide format, and it was just as excellent live. Nick takes the run time of a rails test suite from 13 minutes down to eighteen seconds. An incredible effort. While watching his talk I installed and set up his hydra gem, and it was dead simple to get my tests running in parallel. I only added a rake task and a tiny yml file—-no other setup required—-and I got a significant speed up even on trivial test suites. I was impressed at how easy it was to get going, and I’ll be using it on all my apps from now on.

Video (From Goruco, but he gave the same talk)

Deciphering Yehuda

Gregg Pollack’s talk on how some of the techniques used in the internals of rails and bundler work was excellent. While the content wasn’t new to me, I was impressed at Gregg’s ability to explain code on slides, a task difficult to do well. If you ever plan to present you should watch this to pick up some of Gregg’s techniques. I am going to be checking out his Introduction to Rails 3 screencasts for the same reason.

Video

Real Software Engineering

Glenn Vanderburg opened the conference with a fantastic talk on the history of software engineering. This answered a lot of questions that have been floating around my mind, especially to do with the misleading comparisons often made to other engineering disciplines. Give a civil engineer the ability to quickly prototype bridges for little cost, they are going to do a lost less modelling. A mathematical model is simply a way to reduce costs. And cost is always an object. Watch the talk, it’s brilliant.

Video

Keynote

The best overall talk was Tom Preston-Werner’s keynote Friday evening. His mix of story, humour, and inspiration were perfect for a keynote, and his delivery was excellent. He pitched his content expertly and though there was no specific item I hadn’t heard before, it has had a significant impact on my thoughts the past few days. Hopefully a video is up soon.

Speeding Up Rails Rake

On a brand new rails project (this article is rails 3, but the same principle applies to rails 2), rake --tasks takes about a second to run. This is just the time it takes to load all the tasks, as a result any task you define will take at least this amount of time to run, even if it is has nothing to do with rails. Tab completion is slow. That makes me sad.

The issue is that since rails and gems can provide rake tasks for your project, the entire rails environment has to be loaded just to figure out which tasks are available. If you are familiar with the tasks available, you can hack around things to wring some extra speed out of your rake.

WARNING: Hacks abound beyond this point. Proceed at own risk.

Below is my edited Rakefile. Narrative continues in the comments below.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
# Rakefile
def load_rails_environment
  require File.expand_path('../config/application', __FILE__)
  require 'rake'
  Speedtest::Application.load_tasks
end

# By default, do not load the Rails environment. This allows for faster
# loading of all the rake files, so that getting the task list, or kicking
# off a spec run (which loads the environment by itself anyways) is much
# quicker.
if ENV['LOAD_RAILS'] == '1'
  # Bypass these hacks that prevent the Rails environment loading, so that the
  # original descriptions and tasks can be seen, or to see other rake tasks provided
  # by gems.
  load_rails_environment
else
  # Create a stub task for all Rails provided tasks that will load the Rails
  # environment, which in will append the real definition of the task to
  # the end of the stub task, so it will be run directly afterwards.
  #
  # Refresh this list with:
  # LOAD_RAILS=1 rake -T | ruby -ne 'puts $_.split(/\s+/)[1]' | tail -n+2 | xargs
  %w(
    about db:create db:drop db:fixtures:load db:migrate db:migrate:status 
    db:rollback db:schema:dump db:schema:load db:seed db:setup 
    db:structure:dump db:version doc:app log:clear middleware notes 
    notes:custom rails:template rails:update routes secret stats test 
    test:recent test:uncommitted time:zones:all tmp:clear tmp:create
  ).each do |task_name|
    task task_name do
      load_rails_environment
      # Explicitly invoke the rails environment task so that all configuration
      # gets loaded before the actual task (appended on to this one) runs.
      Rake::Task['environment'].invoke
    end
  end

  # Create an empty task that will show up in rake -T, instructing how to
  # get a list of all the actual tasks. This isn't necessary but is a courtesy
  # to your future self.
  desc "!!! Default rails tasks are hidden, run with LOAD_RAILS=1 to reveal."
  task :rails
end

# Load all tasks defined in lib/tasks/*.rake
Dir[File.expand_path("../lib/tasks/", __FILE__) + '/*.rake'].each do |file|
  load file
end

Now rake --tasks executes near instantaneously, and tasks will generally kick off faster (including rake spec). Much nicer!

This technique has the added benefit of hiding all the built in tasks. Depending on your experience this may not be a win, but since I already know the rails ones by heart, I’m usually only interested in the tasks specific to the project.

I don’t pretend this is a pretty or permanent solution, but I share it here because it has made my life better in recent times.

Duplicate Data

UPDATE: If you are on PostgreSQL, check this updated query, it’s more useful.

Forgotten to back validates_uniqueness_of with a unique constraint in your database? Oh no! Here is some SQL that will pull out all the duplicate records for you.

1
2
3
4
5
6
7
8
9
User.find_by_sql <<-EOS
  SELECT * 
  FROM users 
  WHERE name IN (
    SELECT name 
    FROM users 
    GROUP BY name 
    HAVING count(name) > 1);
EOS

You will need your own strategy for resolving the duplicates, since it is totally dependent on your data. Some ideas:

  • Arbitrarily deleting one of the records. Perhaps based on latest update time? Don’t forget about child records! If you have forgotten a uniqueness constraint it is likely you have also forgotten a foreign key, so you will have to delete child records manually.
  • Merge the records, including child records.
  • Manually resolving the conflicts on a case by case basis. Possible if there are not too many duplicates.

STI is the global variable of data modelling

A Single Table Inheritance table is really easy to both update and query. This makes it ideal for rapid prototyping: just throw some extra columns on it and you are good to go! This is why STI is so popular, and it fits perfectly into the Rails philosophy of getting things up and running fast.

Fast coding techniques do not always transfer into solid, maintainable code however. It is really easy to hack something together with global variables, but we eschew them when writing industry code. STI falls into the same category. I have written about the downsides of STI before: it clutters your data model, weakens your data integrity, and can be difficult to index. STI is a fast technique to get started with, but is not necessarily a great option for maintainable applications, especially when there are other modelling techniques such as class table inheritance available.

Updating Class Table Inheritance Tables

My last post covered querying class table inheritance tables; this one presents a method for updating them. Having set up our ActiveRecord models using composition, we can use a standard rails method accepts_nested_attributes_for to allow easy one-form updating of the relationship.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
class Item < ActiveRecord::Base
  validates_numericality_of :quantity

  SUBCLASSES = [:dvd, :car]
  SUBCLASSES.each do |class_name|
    has_one class_name
  end

  accepts_nested_attributes_for *SUBCLASSES
end

@item = Dvd.create!(
  :title => 'The Matix',
  :item  => Item.create!(:quantity => 1))

@item.update_attributes(
  :quantity => 2,
  :dvd_attributes => {
    :id    => @item.dvd.id,
    :title => 'The Matrix'})

This issues the following SQL to the database:

1
2
UPDATE "items" SET "quantity" = 10 WHERE ("items"."id" = 12)
UPDATE "dvds" SET "title" = 'The Matrix' WHERE ("dvds"."id" = 12)

Note that depending on your application, you may need some extra locking to ensure this method is concurrent, for example if you allow items to change type. Be sure to read the accepts_nested_attributes_for documentation for the full API.

I talk about this sort of thing in my “Your Database Is Your Friend” training sessions. They are happening throughout the US and UK in the coming months. One is likely coming to a city near you. Head on over to www.dbisyourfriend.com for more information and free screencasts

Class Table Inheritance and Eager Loading

Consider a typical class table inheritance table structure with items as the base class and dvds and cars as two subclasses. In addition to what is strictly required, items also has an item_type parameter. This denormalization is usually a good idea, I will save the justification for another post so please take it for granted for now.

The easiest way to map this relationship with Rails and ActiveRecord is to use composition, rather than trying to hook into the class loading code. Something akin to:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
class Item < ActiveRecord::Base
  SUBCLASSES = [:dvd, :car]
  SUBCLASSES.each do |class_name|
    has_one class_name
  end

  def description
    send(item_type).description
  end
end

class Dvd < ActiveRecord::Base
  belongs_to :item

  validates_presence_of :title, :running_time
  validates_numericality_of :running_time

  def description
    title
  end
end

class Car < ActiveRecord::Base
  belongs_to :item

  validates_presence_of :make, :registration

  def description
    make
  end
end

A naive way to fetch all the items might look like this:

1
Item.all(:include => Item::SUBCLASSES)

This will issue one initial query, then one for each subclass. (Since Rails 2.1, eager loading is done like this rather than joining.) This is inefficient, since at the point we preload the associations we already know which subclass tables we should be querying. There is no need to query all of them. A better way is to hook into the Rails eager loading ourselves to ensure that only the tables required are loaded:

1
2
3
Item.all(opts).tap do |items|
  preload_associations(items, items.map(&:item_type).uniq)
end

Wrapping that up in a class method on items is neat because we can then use it as a kicker at the end of named scopes or associations – person.items.preloaded, for instance.

Here are some tests demonstrating this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
require 'test/test_helper'

class PersonTest < ActiveRecord::TestCase
  setup do
    item = Item.create!(:item_type => 'dvd')
    dvd  = Dvd.create!(:item => item, :title => 'Food Inc.')
  end

  test 'naive eager load' do
    items = []
    assert_queries(3) { items = Item.all(:include => Item::SUBCLASSES) }
    assert_equal 1, items.size
    assert_queries(0) { items.map(&:description) }
  end

  test 'smart eager load' do
    items = []
    assert_queries(2) { items = Item.preloaded }
    assert_equal 1, items.size
    assert_queries(0) { items.map(&:description) }
  end
end

# Monkey patch stolen from activerecord/test/cases/helper.rb
ActiveRecord::Base.connection.class.class_eval do
  IGNORED_SQL = [/^PRAGMA/, /^SELECT currval/, /^SELECT CAST/, /^SELECT @@IDENTITY/, /^SELECT @@ROWCOUNT/, /^SAVEPOINT/, /^ROLLBACK TO SAVEPOINT/, /^RELEASE SAVEPOINT/, /SHOW FIELDS/]

  def execute_with_query_record(sql, name = nil, &block)
    $queries_executed ||= []
    $queries_executed << sql unless IGNORED_SQL.any? { |r| sql =~ r }
    execute_without_query_record(sql, name, &block)
  end

  alias_method_chain :execute, :query_record
end

I talk about this sort of thing in my “Your Database Is Your Friend” training sessions. They are happening throughout the US and UK in the coming months. One is likely coming to a city near you. Head on over to www.dbisyourfriend.com for more information and free screencasts

Last minute training in Seattle

If you or someone you know missed out on Saturday, I’ve scheduled a last minute database training for Seattle tomorrow. Register here. Last chance before I head to Chicago for a training on Friday.

Constraints assist understanding

The hardest thing for a new developer on a project to wrap his head around is not the code. For the most part, ruby code stays the same across projects. My controllers look like your controllers, my models look like your models. What defines an application is not the code, but the domain. The business concepts, and how they are translated into code, can take weeks or months to understand cleanly. Modelling your domain in a way that it is easily understood is an important principle to speed up this learning process.

In an application I am looking at there is an email field in the user model. It is defined as a string that allows null values. This is confusing. I need to figure in what circumstances a null value makes sense (can they choose to withhold that piece of information? Is there a case where a new column I am adding should be null?), which is extra information I need to locate and process before I can understand the code. There is a validates_presence_of declaration on the attribute, but production data has some null values. Two parts of the application are telling me two contradicting stories about the domain.

Further, when I am tracking down a bug in the application, eliminating the possibility that a column could be null is an extra step I need to take. The data model is harder to reason about because there are more possible states than strictly necessary.

Allowing a null value in a column creates another piece of information that a developer has to process. It creates an extra question that needs to be answered when reading the code: in what circumstances is a null value appropriate? Multiply this problem out to multiple columns (and factor in other sub-optimal modeling techniques not covered here), and the time to understanding quickly grows out of hand.

Adding not-null constraints on your database is a quick and cheap way to bring your data model inline with the code that sits on top of it. In addition to cutting lines of code, cut out extraneous information from your data model. For little cost, constraints simplify your application conceptually and allow your data to be reasoned about more efficiently.

I talk about this sort of thing in my “Your Database Is Your Friend” training sessions. They are happening throughout the US and UK in the coming months. One is likely coming to a city near you. Head on over to www.dbisyourfriend.com for more information and free screencasts

A pretty flower Another pretty flower