Robot Has No Heart

Xavier Shay blogs here

A robot that does not have a heart

DataMapper Retrospective

I introduced DataMapper on my last two major projects. As those projects matured after I had left, they both migrated to a different ORM. That deserves a retrospective, I think. As I’ve left both projects, I don’t have the insider level of detail on the decision to abandon DataMapper, but developers from both projects kindly provided background for this blog post.

Project A

Web application and a batch processing component built on top of a legacy Oracle database.

Good

  • Field mappings, nice ruby names and able to ignore fields we didn’t care about.

Bad

  • Had to roll our own locking and time zone integration.
  • Not great for batch processing (trying to write SQL through DM abstraction.)

It turned out this project required a lot more batch processing than we anticipated, which DataMapper does not shine at. It was migrated to Sequel which provides a far better abstraction for working closer to SQL.

Project B

A fairly typical Rails 3 application. A couple of tens of thousands of lines of code.

Good

  • No migrations (pre-release).
  • Foreign keys, composite primary keys.
  • Auto-validations.

Bad

  • Auto-validations with nested attributes was uncharted territory (needed bug fixes).
  • Performance on large object graphs was unusable for page rendering (close to two seconds for our home page, which admittedly had a stupid amount of stuff on it).
  • Performance was suboptimal (though passable) on smaller pages.
  • Tracing through what his happening across multiple gems (particularly around transactions) was tricky.
  • The maintenance/interactions of all the various gems was problematic (e.g. gems X,Y work with 1.9.3 but Z doesn’t yet).
  • Inability to easily “break the abstraction” when SQL was required.

The performance issues were clear in our code base, but eluded much effort to reduce them down to smaller reproducible problems. The best quick win I found was ~15% by disabling assertions, but I suspect that given the large scope of the problem DataMapper is trying to solve there may not be any approachable way of tackling the issue (would love to be proven wrong!)

We ran into obvious integration bugs (apologies for not having kept a concrete list), a symptom of a library not widely used. As a commiter on the project this wasn’t an issue, since they were easily fixed and moved past (the DataMapper code base is really nice to work on), but having a commiter on your team isn’t a tenable strategy.

DataMapper takes an all-ruby-all-the-time approach, which means things get tricky when the abstraction leaks. Much of the SQL generation is hidden in private methods. Compare some code to create a composable full text search query:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
def self.search(keywords, options = {})
  options = {
    conditions: ["true"]
  }.merge(options)

  current_query = query.merge(options)

  a           = repository.adapter
  columns_sql = a.send(:columns_statement,    current_query.fields,     false)
  conditions  = a.send(:conditions_statement, current_query.conditions, false)
  order_sql   = a.send(:order_statement,      current_query.order,      false)
  limit_sql   = current_query.limit || 50
  conditions_sql, conditions_values = *conditions

  bind_values = [keywords] + conditions_values

  find_by_sql([<<-SQL, *bind_values])
    SELECT #{columns_sql}, ts_rank_cd(search_vector, query) AS rank
    FROM things
    CROSS JOIN plainto_tsquery(?) query
    WHERE #{conditions_sql} AND (query @@ search_vector)
    ORDER BY rank DESC, #{order_sql}
    LIMIT #{limit_sql}
  SQL
end

To the ActiveRecord equivalent (Sequel is similar):

1
2
3
4
5
6
def self.search(keywords)
  select("things.*, ts_rank_cd(search_vector, query) AS rank")
    .joins(sanitize_sql_array(["CROSS JOIN plainto_tsquery(?) query", keywords]))
    .where("query @@ search_vector")
    .order("rank DESC")
end

Switching to ActiveRecord took a week of all hands (~4) on deck, plus another week alongside other feature work to get it stable. From beginning to in production was two weeks. The end result was a drop in response time (the deploy is pretty blatant in the graph below), start up time, plus 3K less lines of code (a lot of custom code for dropping down to SQL was able to be removed).

Do differently

Ultimately, DataMapper provides an abstraction that I just don’t need, and even if I did it hasn’t had its tires kicked sufficiently that a team can use it without having to delve down to the internals. The applications I find myself writing are about data, and the store in which that data lives is vitally important to the application. Abstracting away those details seems to be heading in the wrong direction for writing simple applications. As an intellectual achievement in its own right I really dig DataMapper, but it is too complicated a component to justify using inside other applications.

Rich Hickey’s talk Simple Made Easy has been rattling around my head a lot.

Nowadays I’m back to ActiveRecord for team conformance. It’s more work to keep on top of foreign keys and the like, but overall it does the job. It’s still too complicated, but has the non-trivial benefit of being used by lots of people. This is my responsible choice at the moment.

On my own projects I first reach for Sequel. It supports all the nice database features I want to use, while providing a thin layer over SQL. In other words, I don’t have to worry about the abstraction leaking because the abstraction is still SQL, just expressed in ruby (which is a huge win for composeability that you don’t get with raw SQL). While it does have “ORM” features, it feels more like the most convenient way of accessing my database rather than an abstraction layer. It’s actively maintained and the only bug I have found was something that Rails broke, and a patch was already available. There are no open issues in the bug tracker. My experiences have been overwhelmingly positive. I haven’t built anything big enough with it yet to have confidence using it on a team project though.

I still have a soft spot in my heart for DataMapper, I just don’t see anywhere for me to use it anymore.

Exercises in style

Let us make a stack machine! It can add numbers! This may be a winding journey. Have some time and an irb up your sleeve. Maybe it is more of a meditation than a blog post? Onwards!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
def push_op(value)
  lambda {|x| [value, x + [value]] }
end

def add_op
  lambda {|x| [x[-1] + x[-2], x[0..-3]] }
end

[
  push_op(1),
  push_op(2),
  add_op
].inject([nil, []]) {|(result, state), op|
  op[state]
}

Get it? Pushes 1, pushes 2, then the add_op pops them off the stack and makes 3. Not a lot of metadata in those lambdas though, and we can’t combine them in interesting way.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
class Operation < Struct.new(:block)
  def +(other)
    CompositeOperation.new(self, other)
  end

  def run(state)
    @block.call(state)
  end
end

class CompositeOperation < Operation
  def initialize(a, b)
    @a = a
    @b = b
    super(lambda {|x| @b.block[@a.block[x][1]] })
  end

  def desc
    @a.desc + "\n" + @b.desc
  end
end

class PushOperation < Operation
  def initialize(value)
    @value = value
    super(lambda {|x| [value, x + [value]] })
  end

  def desc
    "push #{@value}"
  end
end

class AddOperation < Operation
  def initialize
    super(lambda {|x| [x[-1] + x[-2], x[0..-3]] })
  end

  def desc
    "add top two digits on stack"
  end
end

A lot more setup, but now we also get a description of operations!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
def tagged_push_op(value)
  PushOperation.new(value)
end

def tagged_add_op
  AddOperation.new
end

ops =
  tagged_push_op(1) +
  tagged_push_op(2) +
  tagged_add_op

puts ops.desc
puts ops.run(start_state).inspect

Ok you get that. What else can we do?

“every monad [.] embodies a particular computational strategy. A ‘motto of computation,’ if you will.”Mental Guy

hmmm. What does it mean?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
class VerboseStackEvaluator < Struct.new(:stack)
  attr_accessor :result, :stack

  def pass(op)
    puts op.desc
    results = op.call(stack)
    self.class.new(results[1]).tap do |x|
      x.result = results[0]
    end
  end

  def self.identity
    new([])
  end
end

e = evaluator.identity.
  pass(tagged_push_op(1)).
  pass(tagged_push_op(2)).
  pass(tagged_add_op)

p [e.result, e.stack]

Oh so now we have one structure (the pass stuff) that we can run through different evaluators. Let us make a recursive one!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
class RecursiveLazyStackEvaluator < Struct.new(:stack)
  def pass(op)
    self.class.new(lambda {
      op.call(stack)
    })
  end

  def self.identity
    new(lambda { [nil, []] })
  end

  def result; evaled[0]; end
  def stack;  evaled[1]; end

  private

  def evaled
    @evaled ||= @stack.call
  end
end

Do you see it is now lazy. Rather than evaluate each operation when pass is called, it saves them up until a result is requested. Look out! Haskell in your Ruby! Recursion might blow out our stack though. Let us isomorphically (I just learned this word) translate it to use iteration!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
class LazyStackEvaluator
  attr_accessor :steps

  def initialize(stack, steps = [])
    @stack  = stack
    @steps  = steps
  end

  def pass(op)
    self.class.new(@stack, steps + [op])
  end

  def self.identity
    new([])
  end

  def result; evaled[0]; end
  def stack;  evaled[1]; end

  protected

  def evaled
    @evaled ||= steps.inject([nil, @stack]) {|(r, s), op|
      op.call(s)
    }
  end
end

Not too shabby. Let’s try something more useful. Given we only have one operation that pops the stack (add), and it only pops two numbers, if we have more than two numbers in a row they start becoming redundant. Let us optimize!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
class OptimizingEvaluator < LazyStackEvaluator
  def evaled
    @evaled ||= begin
      accumulator = []
      new_steps   = []
      steps.each do |step|
        accumulator << step
        if !step.is_a?(PushOperation)
          new_steps += accumulator
          accumulator = []
        elsif accumulator.length > 2
          accumulator = accumulator[1..-1]
        end
      end
      new_steps += accumulator
      new_steps.inject([nil, @stack]) {|(r, s), op|
        op.call(s)
      }
    end
  end
end

e = evaluator.identity.
  pass(tagged_push_op(1)). # This won't get run!
  pass(tagged_push_op(1)).
  pass(tagged_push_op(2)).
  pass(tagged_add_op)

p [e.result, e.stack]

Ok one more. This one is pretty useless for this problem, but perhaps it will inspire thought. Let us multithread!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
class ThreadingEvaluator < LazyStackEvaluator
  def evaled
    @evaled ||= begin
      accumulator = []
      workers     = []
      steps.each do |step|
        accumulator << step
        if step.is_a?(AddOperation)
          workers << spawn_thread(accumulator)
          accumulator = []
        end
      end
      workers << spawn_thread(accumulator) unless accumulator.empty?
      workers.each(&:join)

      workers.last[:result]
    end
  end

  def spawn_thread(accumulator)
    Thread.new do
      sleep rand / 3
      Thread.current[:result] = begin
        e = accumulator.inject(VerboseStackEvaluator.identity) {|e, s| e.pass(s) }
        [e.result, e.stack]
      end
    end
  end
end

e = evaluator.identity.
  pass(tagged_push_op(1)).
  pass(tagged_push_op(1)).
  pass(tagged_push_op(2)).
  pass(tagged_add_op).
  pass(tagged_push_op(3)).
  pass(tagged_push_op(4)).
  pass(tagged_add_op)

p [e.result, e.stack]

Ok that is all. Here is an exercise for you: how would you allow the threading and optimizing evaluators to be combined?

Interface Mocking

UPDATE: This is a gem now: rspec-fire The code in the gem is better than that presented here.

Here is a screencast I put together in response to a recent Destroy All Software screencast on test isolation and refactoring, showing off an idea I’ve been tinkering around with for automatic validation of your implicit interfaces that you stub in tests.

Interface Mocking screencast.

Here is the code for InterfaceMocking:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
module InterfaceMocking

  # Returns a new interface double. This is equivalent to an RSpec double,
  # stub or, mock, except that if the class passed as the first parameter
  # is loaded it will raise if you try to set an expectation or stub on
  # a method that the class has not implemented.
  def interface_double(stubbed_class, methods = {})
    InterfaceDouble.new(stubbed_class, methods)
  end

  module InterfaceDoubleMethods

    include RSpec::Matchers

    def should_receive(method_name)
      ensure_implemented(method_name)
      super
    end

    def should_not_receive(method_name)
      ensure_implemented(method_name)
      super
    end

    def stub!(method_name)
      ensure_implemented(method_name)
      super
    end

    def ensure_implemented(*method_names)
      if recursive_const_defined?(Object, @__stubbed_class__)
        recursive_const_get(Object, @__stubbed_class__).
          should implement(method_names, @__checked_methods__)
      end
    end

    def recursive_const_get object, name
      name.split('::').inject(Object) {|klass,name| klass.const_get name }
    end

    def recursive_const_defined? object, name
      !!name.split('::').inject(Object) {|klass,name|
        if klass && klass.const_defined?(name)
          klass.const_get name
        end
      }
    end

  end

  class InterfaceDouble < RSpec::Mocks::Mock

    include InterfaceDoubleMethods

    def initialize(stubbed_class, *args)
      args << {} unless Hash === args.last

      @__stubbed_class__ = stubbed_class
      @__checked_methods__ = :public_instance_methods
      ensure_implemented *args.last.keys

      # __declared_as copied from rspec/mocks definition of `double`
      args.last[:__declared_as] = 'InterfaceDouble'
      super(stubbed_class, *args)
    end

  end
end

RSpec::Matchers.define :implement do |expected_methods, checked_methods|
  match do |stubbed_class|
    unimplemented_methods(
      stubbed_class,
      expected_methods,
      checked_methods
    ).empty?
  end

  def unimplemented_methods(stubbed_class, expected_methods, checked_methods)
    implemented_methods = stubbed_class.send(checked_methods)
    unimplemented_methods = expected_methods - implemented_methods
  end

  failure_message_for_should do |stubbed_class|
    "%s does not publicly implement:\n%s" % [
      stubbed_class,
      unimplemented_methods(
        stubbed_class,
        expected_methods,
        checked_methods
      ).sort.map {|x|
        "  #{x}"
      }.join("\n")
    ]
  end
end

RSpec.configure do |config|

  config.include InterfaceMocking

end

Static Asset Caching on Heroku Cedar Stack

I recently moved this blog over to Heroku, and in the process added in some proper HTTP caching headers. The dynamic pages use the build in fresh_when and stale? Rails helpers, combined with Rack::Cache and the free memcached plugin available on Heroku. That was all pretty straight forward, what was more difficult was configuring Heroku to serve all static assets (such as images and stylesheets) with a far-future max-age header so that they will be cached for eternity. What I’ve documented here is somewhat of a hack, and hopefully Heroku will provide a better way of doing this in the future.

By default Heroku serves everything in public directly via nginx. This is a problem for us since we don’t get a chance to configure the caching headers. Instead, use the Rack::StaticCache middleware (provided in the rack-contrib gem) to serve static files, which by default adds far future max age cache control headers. This needs to be out of different directory to public since there is no way to disable the nginx serving. I renamed by public folder to public_cached.

1
2
3
4
5
6
7
8
9
10
# config/application.rb
config.middleware.use Rack::StaticCache, 
  urls: %w(
    /stylesheets
    /images
    /javascripts
    /robots.txt
    /favicon.ico
  ),
  root: "public_cached"

I also disabled the built in Rails serving of static assets in development mode, so that it didn’t interfere:

1
2
# config/environments/development.rb
config.serve_static_assets = false

In the production config, I configured the x_sendfile_header option to be “X-Accel-Redirect”. It was “X-Sendfile” which is an apache directive, and was causing nginx to hang (Heroku would never actually serve the assets to the browser).

1
2
# config/environments/production.rb
config.action_dispatch.x_sendfile_header = 'X-Accel-Redirect'

A downside of this approach is that if you have a lot of static assets, they all have to hit the Rails stack in order to be served. If you only have one dyno (the free plan) then the initial load can be slower than it otherwise would be if nginx was serving them directly. As I mentioned in the introduction, hopefully Heroku will provide a nicer way to do this in the future.

Speeding up Rails startup time

In which I provide easy instructions to try a new patch that drastically improves the start up time of Ruby applications, in the hope that with wide support it will be merged into the upcoming 1.9.3 release. Skip to the bottom for instructions, or keep reading for the narrative.

UPDATE: If you have trouble installing, grab a recent copy of rvm: rvm get head.

Background

Recent releases of MRI Ruby have introduced some fairly major performance regressions when requiring files:

For reference, our medium-sized Rails application requires around 2200 files &emdash; off the right-hand side of this graph. This is problematic. On 1.9.2 it takes 20s to start up, on 1.9.3 it takes 46s. Both are far too long.

There are a few reasons for this, but the core of the problem is the basic algorithm which looks something like this:

1
2
3
4
5
6
7
def require(file)
  $loaded.each do |x|
    return false if x == file
  end
  load(file)
  $loaded << file
end

That loop is no good, and gets worse the more files you have required. I have written a patch for 1.9.3 which changes this algorithm to:

1
2
3
4
5
def require(file)
  return false if $loaded[file] 
  load(file)
  $loaded[file] = true
end

That gives you a performance curve that looks like this:

Much nicer.

That’s just a synthetic benchmark, but it works in the real world too. My main Rails application now loads in a mite over 10s, down from 20s it was taking on 1.9.2. A blank Rails app loads in 1.1s, which is even faster than 1.8.7.

Getting the fix

Here is how you can try out my patch right now in just ten minutes using RVM.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# First get a baseline measurement
cd /your/rails/app
time script/rails runner "puts 1"

# Install a patched ruby
curl https://gist.github.com/raw/996418/e2b346fbadeed458506fc69ca213ad96d1d08c3e/require-performance-fix-r31758.patch > /tmp/require-performance-fix.patch
rvm install ruby-head --patch /tmp/require-performance-fix.patch -n patched
# ... get a cup of tea, this took about 8 minutes on my MBP

# Get a new measurement
cd /your/rails/app
rvm use ruby-head-patched
gem install bundler --no-rdoc --no-ri
bundle
time script/rails runner "puts 1"

How you can help

I need a lot more eyeballs on this patch before it can be considered for merging into trunk. I would really appreciate any of the following:

Next steps

I imagine there will be a bit more work to get this into Ruby 1.9.3, but after that this is just the first step of many to try and speed up the time Rails takes to start up. Bundler and RubyGems still spend a lot of time doing … something, which I want to investigate. I also want to port these changes over to JRuby which has similar issues (Rubinius isn’t quite as fast out of the gate, but does not degrade exponentially so would not benefit from this patch).

Thank you for your time.

New Column: Code Safari

I am writing a regular weekly column at the newly launched Sitepoint project RubySource. The column is named “Code Safari”, where I explore the jungle of ruby libraries and gems and figure out how they work. It’s an introductory series designed to not just explain how things operate, but show you the tools and techniques so that you can figure it out yourself.

Three posts have already been published:

The format is a bit different but I’m really happy with how it is working so far. Let me know what you think.

PostgreSQL 9 and ruby full text search tricks

I have just released an introduction to PostgreSQL screencast, published through PeepCode. It is over an hour long and covers a large number of juicy topics:

  • Setup full text search
  • Optimize search with triggers and indexes
  • Use Postgres with Ruby on Rails 3
  • Optimize indexes by including only the rows that you need
  • Use database standards for more reliable queries
  • Write powerful reports in only a few lines of code
  • Convert an existing MySQL application to use Postgres

It’s a steal at only $12. You can buy it over at PeepCode.

In it, I introduce full text search in postgres, and use a trigger to keep a search vector up to date. I’m not going to cover that here, but the point I get to is:

1
2
3
4
CREATE TRIGGER posts_search_vector_refresh 
  BEFORE INSERT OR UPDATE ON posts 
FOR EACH ROW EXECUTE PROCEDURE
  tsvector_update_trigger(search_vector, 'pg_catalog.english',  body, title);

That is good for simple models, but what if you want to index child models as well? For instance, we want to include comment authors in the search index. I rolled up my sleeves an came up with this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
CREATE OR REPLACE FUNCTION search_trigger() RETURNS trigger AS $$
DECLARE
  search TEXT;
  child_search TEXT;
begin
  SELECT string_agg(author_name, ' ') INTO child_search
  FROM comments
  WHERE post_id = new.id;

  search := '';
  search := search || ' ' || coalesce(new.title);
  search := search || ' ' || coalesce(new.body);
  search := search || ' ' child_search;

  new.search_index := to_tsvector(search); 
  return new;
end
$$ LANGUAGE plpgsql;

CREATE TRIGGER posts_search_vector_refresh 
  BEFORE INSERT OR UPDATE ON posts
FOR EACH ROW EXECUTE PROCEDURE
  search_trigger();

Getting a bit ugly eh. It might be nice to move that logic back into ruby land, but we have the problem that we need to call a database function to convert our search document into the correct data-type. In this case, a quick work around is to store a search_document in a text field on the model, then use a trigger to only index that field into our search_vector field. The search_document field can then easily be set from your ORM.

Of course, any self-respecting rubyist should hide all this complexity behind a neat interface. I have come up with one using DataMapper that automatically adds the required triggers and indexes via auto-migrations. You use it thusly:

1
2
3
4
5
6
7
8
9
10
class Post
  include DataMapper::Resource
  include Searchable

  property :id, Serial
  property :title, String
  property :body, Text

  searchable :title, :body # Provides Post.search('keyword')
end

You can find the Searchable module code over on github. In it you can also find a fugly proof-of-concept for a DSL that generates the above SQL for indexing child models using DataMapper’s rich property model. It worked, but I’m not using it in any production code so I can hardly recommend it. Maybe you want to have a play though.

YAML Tutorial

Many years ago I wrote a tutorial on using YAML in ruby. It still sees the most google traffic of any post, by far. So people want to know about YAML? I’ll help them out.

What is YAML?

YAML is a flexible, human readable file format that is ideal for storing object trees. YAML stands for “YAML Ain’t Markup Language”. It is easier to read (by humans) than JSON, and can contain richer meta data. It is far nicer than XML. There are libraries available for all mainstream languages including Ruby, Python, C++, Java, Perl, C#/.NET, Javascript, PHP and Haskell. It looks like this:

1
2
3
4
5
6
--- 
- name: Xavier
  country: Australia
  age: 24
- name: Don
  country: US

That is a simple array of hashes. You can nest any combination of these simple data structures however you like. Most parsers will also detect the 24 as an integer too. Quoting strings is optional, and was omitted in this example.

YAML allows you to add tags to your objects, which is extra meta-data that your application can use to deserialize portions into complex data structures. For instance, in ruby if you serialize a set object it looks like this:

1
2
3
4
5
# Set.new([1,2]).to_yaml
--- !ruby/object:Set 
hash: 
  1: true
  2: true

Notice that ruby has added the ruby/object:Set tag so that the correct object can be instantiated on deserialization, while maintaining a human readable rendition of a set. These tags can be anything you like, ruby just happens to use that particular format.

You can remove duplication from YAML files by using anchors (&) and aliases (*). You typically see this in configuration files, such as:

1
2
3
4
5
6
7
8
9
10
11
defaults: &defaults
  adapter:  postgres
  host:     localhost

development:
  database: myapp_development
  <<: *defaults

test:
  database: myapp_test
  <<: *defaults

& sets up the name of the anchor (“defaults”), << means “merge the given hash into the current one”, and * includes the named anchor (“defaults” again). The expanded version looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
defaults:
  adapter:  postgres
  host:     localhost

development:
  database: myapp_development
  adapter:  postgres
  host:     localhost

test:
  database: myapp_test
  adapter:  postgres
  host:     localhost

Note that the defaults hash hangs around, even though it isn’t really required anymore.

YAML generators use this technique to correctly serialize repeated references to the same object, and even cyclic references. That’s pretty clever.

Flow style

YAML has an alternate synax called “flow style”, that allows arrays and hashes to be written inline without having to rely on indentation, using square brackets and curly brackets respectively.

1
2
3
4
5
6
7
8
9
10
11
12
13
--- 
# Arrays
colors:
  - red
  - blue
# in flow style...
colors: [red, blue]

# Hashes
- name: Xavier
  age: 24
# in flow style...
- {name: Xavier, age: 24}

This has the curious effect of making YAML a superset of JSON. A valid JSON document is also a valid YAML document.

Performance

Given YAML’s richness and human readability, you would expect it to be slower than native serialization or JSON. This would be correct. My brief testing shows it is about an order of magnitude slower. For the typical configuration use-case, this is irrelevant, but worth keeping in mind if you are doing something crazy. Remember to run your own benchmarks that represent your specific need.

1
2
3
4
5
6
7
8
9
                     user       system     total    real
Marshal serialize    0.090000   0.000000   0.090000 (  0.091822)
Marshal deserialize  0.090000   0.000000   0.090000 (  0.092186)
JSON serialize       0.480000   0.010000   0.490000 (  0.480291)
JSON deserialize     0.130000   0.010000   0.140000 (  0.134860)
YAML serialize       2.040000   0.020000   2.060000 (  2.065693)
YAML deserialize     0.520000   0.010000   0.530000 (  0.526048)
Psych serialize      2.530000   0.030000   2.560000 (  2.565116)
Psych deserialize    1.510000   0.120000   1.630000 (  1.622601)

Curiously, the new YAML parser Psych included in ruby 1.9.2 appears significantly slower than the old one. Not sure what is going on there.

Reading YAML from a file with ruby

1
2
3
4
5
6
7
require 'yaml'

parsed = begin
  YAML.load(File.open("/tmp/test.yml"))
rescue ArgumentError => e
  puts "Could not parse YAML: #{e.message}"
end

Writing YAML to a file with ruby

1
2
3
4
require 'yaml'

data = {"name" => "Xavier"}
File.open("path/to/output.yml", "w") {|f| f.write(data.to_yaml) }

Anything else you’d like to know? Leave a comment.

Psych YAML in ruby 1.9.2 with RVM and Snow Leopard OSX

Note that you must have libyaml installed before you compile ruby, so this probably means you’ll need to recompile your current version.

1
2
3
sudo brew install libyaml
rvm install ruby-1.9.2 --with-libyaml-dir=/usr/local
ruby -rpsych -e 'puts Psych.load("win: true")'

Ordering by a field in a join model with DataMapper

The public interface for datamapper 1.0.3 does not support ordering by a column in a joined model on a query. The core of datamapper does support this though, so we can use some hacks to make it work, as the following code demonstrates.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
require 'rubygems'
require 'dm-core'
require 'dm-migrations'

DataMapper::Logger.new($stdout, :debug)
DataMapper.setup(:default, 'postgres://localhost/test') # createdb test

class User
  include DataMapper::Resource

  property :id, Serial

  has 1, :user_profile

  def self.ranked
    order = DataMapper::Query::Direction.new(user_profile.ranking, :desc) 
    query = all.query # Access a blank query object for us to manipulate
    query.instance_variable_set("@order", [order])

    # Force the user_profile model to be joined into the query
    query.instance_variable_set("@links", [relationships['user_profile'].inverse])

    all(query) # Create a new collection with the modified query
  end
end

class UserProfile
  include DataMapper::Resource

  property :user_id, Integer, :key => true
  property :ranking, Integer, :default => 0

  belongs_to :user
end

DataMapper.finalize
DataMapper.auto_migrate!

User.create(:user_profile => UserProfile.new(:ranking => 2))
User.create(:user_profile => UserProfile.new(:ranking => 5))
User.create(:user_profile => UserProfile.new(:ranking => 3))

puts User.ranked.map {|x| x.user_profile.ranking }.inspect

Padrino, MongoHQ and Heroku

Next time I google for this I’ll find the answer waiting:

1
2
3
4
5
6
7
8
9
# config/database.rb
if ENV['MONGOHQ_URL']
  uri = URI.parse(ENV['MONGOHQ_URL'])
  MongoMapper.connection = Mongo::Connection.from_uri(ENV['MONGOHQ_URL'], :logger => logger)
  MongoMapper.database = uri.path.gsub(/^\//, '')
else
  MongoMapper.connection = Mongo::Connection.new('localhost', nil, :logger => logger)
  MongoMapper.database = "myapp_#{Padrino.env}"
end

Also I’ll write MongoDB here for google. Nicked from Fikus.

Rails 3, Ruby 1.9.2, Windows 2008, and SQL Server 2008 Tutorial

This took me a while to figure out, especially since I’m not so great with either windows or SQL server, but in the end the process isn’t so difficult.

Rails 3, Ruby 1.9.2, Windows 2008, and SQL Server 2008 Screencast

The steps covered in this screencast are:

  1. Create user
  2. Create database
  3. Give user permissions
  4. Create DSN
  5. Install ruby
  6. Install devkit (Needed to complie native extensions for ODBC)
  7. Create a new rails app
  8. Add activerecord-sqlserver-adapter and ruby-odbc to Gemfile
  9. Customize config/database.yml
1
2
3
4
5
6
7
8
# config/database.yml
development:
  adapter: sqlserver
  dsn: testdsn_user
  mode: odbc
  database: test
  username: xavier
  password:

Some errors you may encounter:

The specified module could not be found – odbc.so You have likely copied odbc.so from i386-msvcrt-ruby-odbc.zip. This is for 1.8.7, and does not work for 1.9. Remove the .so file, and install ruby-odbc as above.

The specified DSN contains an architecture mismatch between the Driver and the Application. Perhaps you have created a system DSN. Try creating a user DSN instead. I also found some suggestions that you need to use a different version of the ODBC configuration panel, but this wasn’t relevant for me.

Transactional before all with RSpec and DataMapper

By default, before(:all) in rspec executes outside of any transaction, meaning that you can’t really use it for creating objects. Normally this should go in a before(:each), but for a spec with simple creation and a large number of assertions this is terribly inefficient.

Let’s fix it!

This code assumes you are using DataMapper, and that your database supports some form of nested transactions (at the very least faking them with savepoints – see nested transactions in postgres with datamapper). It wraps each before/after :all and :each in it’s own transaction.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
RSpec.configure do |config|
  [:all, :each].each do |x|
    config.before(x) do
      repository(:default) do |repository|
        transaction = DataMapper::Transaction.new(repository)
        transaction.begin
        repository.adapter.push_transaction(transaction)
      end
    end

    config.after(x) do
      repository(:default).adapter.pop_transaction.rollback
    end
  end

  config.include(RSpecExtensions::Set)
end

See that RSpecExtensions::Set include? That’s a version of the lovely let helpers that works with before(:all) setup. Props to pcreux for this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
module RSpecExtensions
  module Set

    module ClassMethods
      # Generates a method whose return value is memoized
      # in before(:all). Great for DB setup when combined with
      # transactional before alls.
      def set(name, &block)
        define_method(name) do
          __memoized[name] ||= instance_eval(&block)
        end
        before(:all) { __send__(name) }
        before(:each) do
          __send__(name).tap do |obj|
            obj.reload if obj.respond_to?(:reload)
          end
        end
      end
    end

    module InstanceMethods
      def __memoized # :nodoc:
        @__memoized ||= {}
      end
    end

    def self.included(mod) # :nodoc:
      mod.extend ClassMethods
      mod.__send__ :include, InstanceMethods
    end

  end
end

Fast specs make me a happy man.

Nested Transactions in Postgres with DataMapper

Hacks to get nested transactions support for Postgres in DataMapper. Not extensively tested, more a proof of concept. It re-opens the existing Transaction class to add a check for whether we need a nested transaction or not, and adds a new NestedTransaction transaction primitive that issues savepoint commands rather than begin/commit.

I put this code in a Rails initializer.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
# Hacks to get nested transactions in Postgres
# Not extensively tested, more a proof of concept
#
# It re-opens the existing Transaction class to add a check for whether
# we need a nested transaction or not, and adds a new NestedTransaction
# transaction primitive that issues savepoint commands rather than begin/commit.

module DataMapper
  module Resource
    def transaction(&block)
      self.class.transaction(&block)
    end
  end

  class Transaction
    # Overridden to allow nested transactions
    def connect_adapter(adapter)
      if @transaction_primitives.key?(adapter)
        raise "Already a primitive for adapter #{adapter}"
      end

      primitive = if adapter.current_transaction
        adapter.nested_transaction_primitive
      else
        adapter.transaction_primitive
      end

      @transaction_primitives[adapter] = validate_primitive(primitive)
    end
  end

  module NestedTransactions
    def nested_transaction_primitive
      DataObjects::NestedTransaction.create_for_uri(normalized_uri, current_connection)
    end
  end

  class NestedTransactionConfig < Rails::Railtie
    config.after_initialize do
      repository.adapter.extend(DataMapper::NestedTransactions)
    end
  end
end

module DataObjects
  class NestedTransaction < Transaction

    # The host name. Note, this relies on the host name being configured
    # and resolvable using DNS
    HOST = "#{Socket::gethostbyname(Socket::gethostname)[0]}" rescue "localhost"
    @@counter = 0

    # The connection object for this transaction - must have already had
    # a transaction begun on it
    attr_reader :connection
    # A unique ID for this transaction
    attr_reader :id

    def self.create_for_uri(uri, connection)
      uri = uri.is_a?(String) ? URI::parse(uri) : uri
      DataObjects::NestedTransaction.new(uri, connection)
    end

    #
    # Creates a NestedTransaction bound to an existing connection
    #
    def initialize(uri, connection)
      @connection = connection
      @id = Digest::SHA256.hexdigest(
        "#{HOST}:#{$$}:#{Time.now.to_f}:nested:#{@@counter += 1}")
    end

    def close
    end

    def begin
      run %{SAVEPOINT "#{@id}"}
    end

    def commit
      run %{RELEASE SAVEPOINT "#{@id}"}
    end

    def rollback
      run %{ROLLBACK TO SAVEPOINT "#{@id}"}
    end

    private
    def run(cmd)
      connection.create_command(cmd).execute_non_query
    end
  end
end

I wrote code similar to this with hassox while at NZX, big ups to those guys. I’m working on a proper patch, but haven’t quite figured out the internals enough. If you know how DataMapper works, please check out and comment on this sample patch for three dm gems.

Why I Rewrote Chronic

It seems like a pretty epic yak shave. If you want to parse natural language dates in ruby, you use Chronic. That’s just how it is. (There’s also Tickle for recurring dates, which is similar, but based on Chronic anyways.) It’s the standard, everyone uses it, so why oh why did I write my own version from scratch?

Three reasons I can see.

Chronic is unmaintained. Check the network graph for Chronic. A more avid historian could turn this into an epic teledrama, but for now here’s the summary: The main repository hasn’t had a commit since late 2008. Evaryont made a valiant attempt to take the reins, but his stamina only lasted an extra year to August 2009. Since then numerous people have forked his efforts, mostly to add 1.9 support. These efforts are fragmented though. The inertia of such a large project with no clear leadership sees every man running for himself.

Further, the new maintainers aren’t providing a rock solid base. From Evaryont’s README:
I decided on my own volition that the 40-some (as reported by Github) network should be merged together. I got it to run, but quite haphazardly. There are a lot of new features (mostly undocumented except the git logs) so be a little flexible in your language passed to Chronic. [emphasis mine]

This does not fill me with confidence.

Chronic has a large barrier to entry. Natural date parsing is a big challenge. In the original README, there are ~50 examples of formats it supports, and that is excluding all of the features added in forks in the last two years. The result is a large code base which is intimidating for a new comer, especially with no high level guidance as to how everything fits together. On a project of this size, “the documentation is in the specs” is insufficient. I know what it does, I need to know how it does it.

Chronic solves the wrong problem. I want an alternative to date pickers. As such, I don’t need time support, and I only need very simple day parsing. Chronic seems geared towards a calendar type application (“tomorrow at 6:45pm”), but also parses many expressions which simply are not useful in a real application either because they are obtuse - “7 hours before tomorrow at noon” - or just not how users think about dates - “3 months ago saturday at 5:00 pm”. (Note the last assertion is a totally unsubstantiated claim with no user research to support it.)

Further, it is not hard to find simple examples that Chronic doesn’t support. Omitting a year is an easy one: 14 Sep, April 9.

So what to do?

Chronic needs a leader. Chronic neads a hero. One man to reunite the forks, document the code, and deliver it to the promised land.

I am not that man.

I sketched out the formats I actually needed to support for my application, looked at it and thought “really it can’t be that hard”. Natural date parsing is hard; parsing only the dates your application requires is easy. One hour later I had a gem that not only had 100% support for all of the Chronic features I had been using, but also covered some extra formats I wanted (“14 Sep”), and could also convert a date back into a human readable description. That’s less time than I had already sunk into trying to get Chronic working.

Introducing Kronic.

Less than 100 lines of code, totally specced, totally solved my problem. Ultimately, I don’t want to deal with this problem, so I wanted the easiest solution. While patching Chronic would intuitively appear to be pragmatic, a quick spike in the other direction turned out to be worthwhile. Sometimes 80% just isn’t that hard.

A pretty flower Another pretty flower