DataMapper Retrospective
I introduced DataMapper on my last two major projects. As those projects matured after I had left, they both migrated to a different ORM. That deserves a retrospective, I think. As I’ve left both projects, I don’t have the insider level of detail on the decision to abandon DataMapper, but developers from both projects kindly provided background for this blog post.
Project A
Web application and a batch processing component built on top of a legacy Oracle database.
Good
- Field mappings, nice ruby names and able to ignore fields we didn’t care about.
Bad
- Had to roll our own locking and time zone integration.
- Not great for batch processing (trying to write SQL through DM abstraction.)
It turned out this project required a lot more batch processing than we anticipated, which DataMapper does not shine at. It was migrated to Sequel which provides a far better abstraction for working closer to SQL.
Project B
A fairly typical Rails 3 application. A couple of tens of thousands of lines of code.
Good
- No migrations (pre-release).
- Foreign keys, composite primary keys.
- Auto-validations.
Bad
- Auto-validations with nested attributes was uncharted territory (needed bug fixes).
- Performance on large object graphs was unusable for page rendering (close to two seconds for our home page, which admittedly had a stupid amount of stuff on it).
- Performance was suboptimal (though passable) on smaller pages.
- Tracing through what his happening across multiple gems (particularly around transactions) was tricky.
- The maintenance/interactions of all the various gems was problematic (e.g. gems X,Y work with 1.9.3 but Z doesn’t yet).
- Inability to easily “break the abstraction” when SQL was required.
The performance issues were clear in our code base, but eluded much effort to reduce them down to smaller reproducible problems. The best quick win I found was ~15% by disabling assertions, but I suspect that given the large scope of the problem DataMapper is trying to solve there may not be any approachable way of tackling the issue (would love to be proven wrong!)
We ran into obvious integration bugs (apologies for not having kept a concrete list), a symptom of a library not widely used. As a commiter on the project this wasn’t an issue, since they were easily fixed and moved past (the DataMapper code base is really nice to work on), but having a commiter on your team isn’t a tenable strategy.
DataMapper takes an all-ruby-all-the-time approach, which means things get tricky when the abstraction leaks. Much of the SQL generation is hidden in private methods. Compare some code to create a composable full text search query:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
def self.search(keywords, options = {}) options = { conditions: ["true"] }.merge(options) current_query = query.merge(options) a = repository.adapter columns_sql = a.send(:columns_statement, current_query.fields, false) conditions = a.send(:conditions_statement, current_query.conditions, false) order_sql = a.send(:order_statement, current_query.order, false) limit_sql = current_query.limit || 50 conditions_sql, conditions_values = *conditions bind_values = [keywords] + conditions_values find_by_sql([<<-SQL, *bind_values]) SELECT #{columns_sql}, ts_rank_cd(search_vector, query) AS rank FROM things CROSS JOIN plainto_tsquery(?) query WHERE #{conditions_sql} AND (query @@ search_vector) ORDER BY rank DESC, #{order_sql} LIMIT #{limit_sql} SQL end |
To the ActiveRecord equivalent (Sequel is similar):
1 2 3 4 5 6 |
def self.search(keywords) select("things.*, ts_rank_cd(search_vector, query) AS rank") .joins(sanitize_sql_array(["CROSS JOIN plainto_tsquery(?) query", keywords])) .where("query @@ search_vector") .order("rank DESC") end |
Switching to ActiveRecord took a week of all hands (~4) on deck, plus another week alongside other feature work to get it stable. From beginning to in production was two weeks. The end result was a drop in response time (the deploy is pretty blatant in the graph below), start up time, plus 3K less lines of code (a lot of custom code for dropping down to SQL was able to be removed).

Do differently
Ultimately, DataMapper provides an abstraction that I just don’t need, and even if I did it hasn’t had its tires kicked sufficiently that a team can use it without having to delve down to the internals. The applications I find myself writing are about data, and the store in which that data lives is vitally important to the application. Abstracting away those details seems to be heading in the wrong direction for writing simple applications. As an intellectual achievement in its own right I really dig DataMapper, but it is too complicated a component to justify using inside other applications.
Rich Hickey’s talk Simple Made Easy has been rattling around my head a lot.
Nowadays I’m back to ActiveRecord for team conformance. It’s more work to keep on top of foreign keys and the like, but overall it does the job. It’s still too complicated, but has the non-trivial benefit of being used by lots of people. This is my responsible choice at the moment.
On my own projects I first reach for Sequel. It supports all the nice database features I want to use, while providing a thin layer over SQL. In other words, I don’t have to worry about the abstraction leaking because the abstraction is still SQL, just expressed in ruby (which is a huge win for composeability that you don’t get with raw SQL). While it does have “ORM” features, it feels more like the most convenient way of accessing my database rather than an abstraction layer. It’s actively maintained and the only bug I have found was something that Rails broke, and a patch was already available. There are no open issues in the bug tracker. My experiences have been overwhelmingly positive. I haven’t built anything big enough with it yet to have confidence using it on a team project though.
I still have a soft spot in my heart for DataMapper, I just don’t see anywhere for me to use it anymore.
Exercises in style
Let us make a stack machine! It can add numbers! This may be a winding journey. Have some time and an irb up your sleeve. Maybe it is more of a meditation than a blog post? Onwards!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
def push_op(value) lambda {|x| [value, x + [value]] } end def add_op lambda {|x| [x[-1] + x[-2], x[0..-3]] } end [ push_op(1), push_op(2), add_op ].inject([nil, []]) {|(result, state), op| op[state] } |
Get it? Pushes 1, pushes 2, then the add_op pops them off the stack and makes 3. Not a lot of metadata in those lambdas though, and we can’t combine them in interesting way.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
class Operation < Struct.new(:block) def +(other) CompositeOperation.new(self, other) end def run(state) @block.call(state) end end class CompositeOperation < Operation def initialize(a, b) @a = a @b = b super(lambda {|x| @b.block[@a.block[x][1]] }) end def desc @a.desc + "\n" + @b.desc end end class PushOperation < Operation def initialize(value) @value = value super(lambda {|x| [value, x + [value]] }) end def desc "push #{@value}" end end class AddOperation < Operation def initialize super(lambda {|x| [x[-1] + x[-2], x[0..-3]] }) end def desc "add top two digits on stack" end end |
A lot more setup, but now we also get a description of operations!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
def tagged_push_op(value) PushOperation.new(value) end def tagged_add_op AddOperation.new end ops = tagged_push_op(1) + tagged_push_op(2) + tagged_add_op puts ops.desc puts ops.run(start_state).inspect |
Ok you get that. What else can we do?
“every monad [.] embodies a particular computational strategy. A ‘motto of computation,’ if you will.” — Mental Guy
hmmm. What does it mean?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
class VerboseStackEvaluator < Struct.new(:stack) attr_accessor :result, :stack def pass(op) puts op.desc results = op.call(stack) self.class.new(results[1]).tap do |x| x.result = results[0] end end def self.identity new([]) end end e = evaluator.identity. pass(tagged_push_op(1)). pass(tagged_push_op(2)). pass(tagged_add_op) p [e.result, e.stack] |
Oh so now we have one structure (the pass stuff) that we can run through different evaluators. Let us make a recursive one!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
class RecursiveLazyStackEvaluator < Struct.new(:stack) def pass(op) self.class.new(lambda { op.call(stack) }) end def self.identity new(lambda { [nil, []] }) end def result; evaled[0]; end def stack; evaled[1]; end private def evaled @evaled ||= @stack.call end end |
Do you see it is now lazy. Rather than evaluate each operation when pass is called, it saves them up until a result is requested. Look out! Haskell in your Ruby! Recursion might blow out our stack though. Let us isomorphically (I just learned this word) translate it to use iteration!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
class LazyStackEvaluator attr_accessor :steps def initialize(stack, steps = []) @stack = stack @steps = steps end def pass(op) self.class.new(@stack, steps + [op]) end def self.identity new([]) end def result; evaled[0]; end def stack; evaled[1]; end protected def evaled @evaled ||= steps.inject([nil, @stack]) {|(r, s), op| op.call(s) } end end |
Not too shabby. Let’s try something more useful. Given we only have one operation that pops the stack (add), and it only pops two numbers, if we have more than two numbers in a row they start becoming redundant. Let us optimize!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
class OptimizingEvaluator < LazyStackEvaluator def evaled @evaled ||= begin accumulator = [] new_steps = [] steps.each do |step| accumulator << step if !step.is_a?(PushOperation) new_steps += accumulator accumulator = [] elsif accumulator.length > 2 accumulator = accumulator[1..-1] end end new_steps += accumulator new_steps.inject([nil, @stack]) {|(r, s), op| op.call(s) } end end end e = evaluator.identity. pass(tagged_push_op(1)). # This won't get run! pass(tagged_push_op(1)). pass(tagged_push_op(2)). pass(tagged_add_op) p [e.result, e.stack] |
Ok one more. This one is pretty useless for this problem, but perhaps it will inspire thought. Let us multithread!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
class ThreadingEvaluator < LazyStackEvaluator def evaled @evaled ||= begin accumulator = [] workers = [] steps.each do |step| accumulator << step if step.is_a?(AddOperation) workers << spawn_thread(accumulator) accumulator = [] end end workers << spawn_thread(accumulator) unless accumulator.empty? workers.each(&:join) workers.last[:result] end end def spawn_thread(accumulator) Thread.new do sleep rand / 3 Thread.current[:result] = begin e = accumulator.inject(VerboseStackEvaluator.identity) {|e, s| e.pass(s) } [e.result, e.stack] end end end end e = evaluator.identity. pass(tagged_push_op(1)). pass(tagged_push_op(1)). pass(tagged_push_op(2)). pass(tagged_add_op). pass(tagged_push_op(3)). pass(tagged_push_op(4)). pass(tagged_add_op) p [e.result, e.stack] |
Ok that is all. Here is an exercise for you: how would you allow the threading and optimizing evaluators to be combined?
Interface Mocking
UPDATE: This is a gem now: rspec-fire The code in the gem is better than that presented here.
Here is a screencast I put together in response to a recent Destroy All Software screencast on test isolation and refactoring, showing off an idea I’ve been tinkering around with for automatic validation of your implicit interfaces that you stub in tests.
Here is the code for InterfaceMocking:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 |
module InterfaceMocking # Returns a new interface double. This is equivalent to an RSpec double, # stub or, mock, except that if the class passed as the first parameter # is loaded it will raise if you try to set an expectation or stub on # a method that the class has not implemented. def interface_double(stubbed_class, methods = {}) InterfaceDouble.new(stubbed_class, methods) end module InterfaceDoubleMethods include RSpec::Matchers def should_receive(method_name) ensure_implemented(method_name) super end def should_not_receive(method_name) ensure_implemented(method_name) super end def stub!(method_name) ensure_implemented(method_name) super end def ensure_implemented(*method_names) if recursive_const_defined?(Object, @__stubbed_class__) recursive_const_get(Object, @__stubbed_class__). should implement(method_names, @__checked_methods__) end end def recursive_const_get object, name name.split('::').inject(Object) {|klass,name| klass.const_get name } end def recursive_const_defined? object, name !!name.split('::').inject(Object) {|klass,name| if klass && klass.const_defined?(name) klass.const_get name end } end end class InterfaceDouble < RSpec::Mocks::Mock include InterfaceDoubleMethods def initialize(stubbed_class, *args) args << {} unless Hash === args.last @__stubbed_class__ = stubbed_class @__checked_methods__ = :public_instance_methods ensure_implemented *args.last.keys # __declared_as copied from rspec/mocks definition of `double` args.last[:__declared_as] = 'InterfaceDouble' super(stubbed_class, *args) end end end RSpec::Matchers.define :implement do |expected_methods, checked_methods| match do |stubbed_class| unimplemented_methods( stubbed_class, expected_methods, checked_methods ).empty? end def unimplemented_methods(stubbed_class, expected_methods, checked_methods) implemented_methods = stubbed_class.send(checked_methods) unimplemented_methods = expected_methods - implemented_methods end failure_message_for_should do |stubbed_class| "%s does not publicly implement:\n%s" % [ stubbed_class, unimplemented_methods( stubbed_class, expected_methods, checked_methods ).sort.map {|x| " #{x}" }.join("\n") ] end end RSpec.configure do |config| config.include InterfaceMocking end |
Static Asset Caching on Heroku Cedar Stack
I recently moved this blog over to Heroku, and in the process added in some proper HTTP caching headers. The dynamic pages use the build in fresh_when and stale? Rails helpers, combined with Rack::Cache and the free memcached plugin available on Heroku. That was all pretty straight forward, what was more difficult was configuring Heroku to serve all static assets (such as images and stylesheets) with a far-future max-age header so that they will be cached for eternity. What I’ve documented here is somewhat of a hack, and hopefully Heroku will provide a better way of doing this in the future.
By default Heroku serves everything in public directly via nginx. This is a problem for us since we don’t get a chance to configure the caching headers. Instead, use the Rack::StaticCache middleware (provided in the rack-contrib gem) to serve static files, which by default adds far future max age cache control headers. This needs to be out of different directory to public since there is no way to disable the nginx serving. I renamed by public folder to public_cached.
1 2 3 4 5 6 7 8 9 10 |
# config/application.rb config.middleware.use Rack::StaticCache, urls: %w( /stylesheets /images /javascripts /robots.txt /favicon.ico ), root: "public_cached" |
I also disabled the built in Rails serving of static assets in development mode, so that it didn’t interfere:
1 2 |
# config/environments/development.rb config.serve_static_assets = false |
In the production config, I configured the x_sendfile_header option to be “X-Accel-Redirect”. It was “X-Sendfile” which is an apache directive, and was causing nginx to hang (Heroku would never actually serve the assets to the browser).
1 2 |
# config/environments/production.rb config.action_dispatch.x_sendfile_header = 'X-Accel-Redirect' |
A downside of this approach is that if you have a lot of static assets, they all have to hit the Rails stack in order to be served. If you only have one dyno (the free plan) then the initial load can be slower than it otherwise would be if nginx was serving them directly. As I mentioned in the introduction, hopefully Heroku will provide a nicer way to do this in the future.
Speeding up Rails startup time
In which I provide easy instructions to try a new patch that drastically improves the start up time of Ruby applications, in the hope that with wide support it will be merged into the upcoming 1.9.3 release. Skip to the bottom for instructions, or keep reading for the narrative.
UPDATE: If you have trouble installing, grab a recent copy of rvm: rvm get head.
Background
Recent releases of MRI Ruby have introduced some fairly major performance regressions when requiring files:

For reference, our medium-sized Rails application requires around 2200 files &emdash; off the right-hand side of this graph. This is problematic. On 1.9.2 it takes 20s to start up, on 1.9.3 it takes 46s. Both are far too long.
There are a few reasons for this, but the core of the problem is the basic algorithm which looks something like this:
1 2 3 4 5 6 7 |
def require(file) $loaded.each do |x| return false if x == file end load(file) $loaded << file end |
That loop is no good, and gets worse the more files you have required. I have written a patch for 1.9.3 which changes this algorithm to:
1 2 3 4 5 |
def require(file) return false if $loaded[file] load(file) $loaded[file] = true end |
That gives you a performance curve that looks like this:

Much nicer.
That’s just a synthetic benchmark, but it works in the real world too. My main Rails application now loads in a mite over 10s, down from 20s it was taking on 1.9.2. A blank Rails app loads in 1.1s, which is even faster than 1.8.7.

Getting the fix
Here is how you can try out my patch right now in just ten minutes using RVM.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
# First get a baseline measurement cd /your/rails/app time script/rails runner "puts 1" # Install a patched ruby curl https://gist.github.com/raw/996418/e2b346fbadeed458506fc69ca213ad96d1d08c3e/require-performance-fix-r31758.patch > /tmp/require-performance-fix.patch rvm install ruby-head --patch /tmp/require-performance-fix.patch -n patched # ... get a cup of tea, this took about 8 minutes on my MBP # Get a new measurement cd /your/rails/app rvm use ruby-head-patched gem install bundler --no-rdoc --no-ri bundle time script/rails runner "puts 1" |
How you can help
I need a lot more eyeballs on this patch before it can be considered for merging into trunk. I would really appreciate any of the following:
- Try it out on your app and report timings in the comments.
- Code review the patch on this GitHub pull request (it’s C code, but don’t let that scare you off).
- Try it on Windows.
- Report any bugs you find.
Next steps
I imagine there will be a bit more work to get this into Ruby 1.9.3, but after that this is just the first step of many to try and speed up the time Rails takes to start up. Bundler and RubyGems still spend a lot of time doing … something, which I want to investigate. I also want to port these changes over to JRuby which has similar issues (Rubinius isn’t quite as fast out of the gate, but does not degrade exponentially so would not benefit from this patch).
Thank you for your time.
New Column: Code Safari
I am writing a regular weekly column at the newly launched Sitepoint project RubySource. The column is named “Code Safari”, where I explore the jungle of ruby libraries and gems and figure out how they work. It’s an introductory series designed to not just explain how things operate, but show you the tools and techniques so that you can figure it out yourself.
Three posts have already been published:
- Understanding Concurrent Programming With Ruby’s Goliath, in which I dig into the new Goliath web server to figure out how it uses the new 1.9 Fibers to work some magic.
- Configuring Capybara, in which I investigate how Capybara implemented its configuration DSL, and then make one for myself.
- TWSS and Bayesian Classification of Twitter Searches, in which I inspect the pipes of a beautiful piece of plumbing.
The format is a bit different but I’m really happy with how it is working so far. Let me know what you think.
PostgreSQL 9 and ruby full text search tricks
I have just released an introduction to PostgreSQL screencast, published through PeepCode. It is over an hour long and covers a large number of juicy topics:
- Setup full text search
- Optimize search with triggers and indexes
- Use Postgres with Ruby on Rails 3
- Optimize indexes by including only the rows that you need
- Use database standards for more reliable queries
- Write powerful reports in only a few lines of code
- Convert an existing MySQL application to use Postgres
It’s a steal at only $12. You can buy it over at PeepCode.
In it, I introduce full text search in postgres, and use a trigger to keep a search vector up to date. I’m not going to cover that here, but the point I get to is:
1 2 3 4 |
CREATE TRIGGER posts_search_vector_refresh BEFORE INSERT OR UPDATE ON posts FOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger(search_vector, 'pg_catalog.english', body, title); |
That is good for simple models, but what if you want to index child models as well? For instance, we want to include comment authors in the search index. I rolled up my sleeves an came up with this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
CREATE OR REPLACE FUNCTION search_trigger() RETURNS trigger AS $$ DECLARE search TEXT; child_search TEXT; begin SELECT string_agg(author_name, ' ') INTO child_search FROM comments WHERE post_id = new.id; search := ''; search := search || ' ' || coalesce(new.title); search := search || ' ' || coalesce(new.body); search := search || ' ' child_search; new.search_index := to_tsvector(search); return new; end $$ LANGUAGE plpgsql; CREATE TRIGGER posts_search_vector_refresh BEFORE INSERT OR UPDATE ON posts FOR EACH ROW EXECUTE PROCEDURE search_trigger(); |
Getting a bit ugly eh. It might be nice to move that logic back into ruby land, but we have the problem that we need to call a database function to convert our search document into the correct data-type. In this case, a quick work around is to store a search_document in a text field on the model, then use a trigger to only index that field into our search_vector field. The search_document field can then easily be set from your ORM.
Of course, any self-respecting rubyist should hide all this complexity behind a neat interface. I have come up with one using DataMapper that automatically adds the required triggers and indexes via auto-migrations. You use it thusly:
1 2 3 4 5 6 7 8 9 10 |
class Post include DataMapper::Resource include Searchable property :id, Serial property :title, String property :body, Text searchable :title, :body # Provides Post.search('keyword') end |
You can find the Searchable module code over on github. In it you can also find a fugly proof-of-concept for a DSL that generates the above SQL for indexing child models using DataMapper’s rich property model. It worked, but I’m not using it in any production code so I can hardly recommend it. Maybe you want to have a play though.
YAML Tutorial
Many years ago I wrote a tutorial on using YAML in ruby. It still sees the most google traffic of any post, by far. So people want to know about YAML? I’ll help them out.
What is YAML?
YAML is a flexible, human readable file format that is ideal for storing object trees. YAML stands for “YAML Ain’t Markup Language”. It is easier to read (by humans) than JSON, and can contain richer meta data. It is far nicer than XML. There are libraries available for all mainstream languages including Ruby, Python, C++, Java, Perl, C#/.NET, Javascript, PHP and Haskell. It looks like this:
1 2 3 4 5 6 |
--- - name: Xavier country: Australia age: 24 - name: Don country: US |
That is a simple array of hashes. You can nest any combination of these simple data structures however you like. Most parsers will also detect the 24 as an integer too. Quoting strings is optional, and was omitted in this example.
YAML allows you to add tags to your objects, which is extra meta-data that your application can use to deserialize portions into complex data structures. For instance, in ruby if you serialize a set object it looks like this:
1 2 3 4 5 |
# Set.new([1,2]).to_yaml --- !ruby/object:Set hash: 1: true 2: true |
Notice that ruby has added the ruby/object:Set tag so that the correct object can be instantiated on deserialization, while maintaining a human readable rendition of a set. These tags can be anything you like, ruby just happens to use that particular format.
You can remove duplication from YAML files by using anchors (&) and aliases (*). You typically see this in configuration files, such as:
1 2 3 4 5 6 7 8 9 10 11 |
defaults: &defaults adapter: postgres host: localhost development: database: myapp_development <<: *defaults test: database: myapp_test <<: *defaults |
& sets up the name of the anchor (“defaults”), << means “merge the given hash into the current one”, and * includes the named anchor (“defaults” again). The expanded version looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
defaults: adapter: postgres host: localhost development: database: myapp_development adapter: postgres host: localhost test: database: myapp_test adapter: postgres host: localhost |
Note that the defaults hash hangs around, even though it isn’t really required anymore.
YAML generators use this technique to correctly serialize repeated references to the same object, and even cyclic references. That’s pretty clever.
Flow style
YAML has an alternate synax called “flow style”, that allows arrays and hashes to be written inline without having to rely on indentation, using square brackets and curly brackets respectively.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
--- # Arrays colors: - red - blue # in flow style... colors: [red, blue] # Hashes - name: Xavier age: 24 # in flow style... - {name: Xavier, age: 24} |
This has the curious effect of making YAML a superset of JSON. A valid JSON document is also a valid YAML document.
Performance
Given YAML’s richness and human readability, you would expect it to be slower than native serialization or JSON. This would be correct. My brief testing shows it is about an order of magnitude slower. For the typical configuration use-case, this is irrelevant, but worth keeping in mind if you are doing something crazy. Remember to run your own benchmarks that represent your specific need.
1 2 3 4 5 6 7 8 9 |
user system total real Marshal serialize 0.090000 0.000000 0.090000 ( 0.091822) Marshal deserialize 0.090000 0.000000 0.090000 ( 0.092186) JSON serialize 0.480000 0.010000 0.490000 ( 0.480291) JSON deserialize 0.130000 0.010000 0.140000 ( 0.134860) YAML serialize 2.040000 0.020000 2.060000 ( 2.065693) YAML deserialize 0.520000 0.010000 0.530000 ( 0.526048) Psych serialize 2.530000 0.030000 2.560000 ( 2.565116) Psych deserialize 1.510000 0.120000 1.630000 ( 1.622601) |
Curiously, the new YAML parser Psych included in ruby 1.9.2 appears significantly slower than the old one. Not sure what is going on there.
Reading YAML from a file with ruby
1 2 3 4 5 6 7 |
require 'yaml' parsed = begin YAML.load(File.open("/tmp/test.yml")) rescue ArgumentError => e puts "Could not parse YAML: #{e.message}" end |
Writing YAML to a file with ruby
1 2 3 4 |
require 'yaml' data = {"name" => "Xavier"} File.open("path/to/output.yml", "w") {|f| f.write(data.to_yaml) } |
Anything else you’d like to know? Leave a comment.
Psych YAML in ruby 1.9.2 with RVM and Snow Leopard OSX
Note that you must have libyaml installed before you compile ruby, so this probably means you’ll need to recompile your current version.
1 2 3 |
sudo brew install libyaml
rvm install ruby-1.9.2 --with-libyaml-dir=/usr/local
ruby -rpsych -e 'puts Psych.load("win: true")'
|
Ordering by a field in a join model with DataMapper
The public interface for datamapper 1.0.3 does not support ordering by a column in a joined model on a query. The core of datamapper does support this though, so we can use some hacks to make it work, as the following code demonstrates.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
require 'rubygems' require 'dm-core' require 'dm-migrations' DataMapper::Logger.new($stdout, :debug) DataMapper.setup(:default, 'postgres://localhost/test') # createdb test class User include DataMapper::Resource property :id, Serial has 1, :user_profile def self.ranked order = DataMapper::Query::Direction.new(user_profile.ranking, :desc) query = all.query # Access a blank query object for us to manipulate query.instance_variable_set("@order", [order]) # Force the user_profile model to be joined into the query query.instance_variable_set("@links", [relationships['user_profile'].inverse]) all(query) # Create a new collection with the modified query end end class UserProfile include DataMapper::Resource property :user_id, Integer, :key => true property :ranking, Integer, :default => 0 belongs_to :user end DataMapper.finalize DataMapper.auto_migrate! User.create(:user_profile => UserProfile.new(:ranking => 2)) User.create(:user_profile => UserProfile.new(:ranking => 5)) User.create(:user_profile => UserProfile.new(:ranking => 3)) puts User.ranked.map {|x| x.user_profile.ranking }.inspect |
Padrino, MongoHQ and Heroku
Next time I google for this I’ll find the answer waiting:
1 2 3 4 5 6 7 8 9 |
# config/database.rb if ENV['MONGOHQ_URL'] uri = URI.parse(ENV['MONGOHQ_URL']) MongoMapper.connection = Mongo::Connection.from_uri(ENV['MONGOHQ_URL'], :logger => logger) MongoMapper.database = uri.path.gsub(/^\//, '') else MongoMapper.connection = Mongo::Connection.new('localhost', nil, :logger => logger) MongoMapper.database = "myapp_#{Padrino.env}" end |
Also I’ll write MongoDB here for google. Nicked from Fikus.
Rails 3, Ruby 1.9.2, Windows 2008, and SQL Server 2008 Tutorial
This took me a while to figure out, especially since I’m not so great with either windows or SQL server, but in the end the process isn’t so difficult.
Rails 3, Ruby 1.9.2, Windows 2008, and SQL Server 2008 Screencast
The steps covered in this screencast are:
- Create user
- Create database
- Give user permissions
- Create DSN
- Install ruby
- Install devkit (Needed to complie native extensions for ODBC)
- Create a new rails app
- Add
activerecord-sqlserver-adapterandruby-odbcto Gemfile - Customize
config/database.yml
1 2 3 4 5 6 7 8 |
# config/database.yml development: adapter: sqlserver dsn: testdsn_user mode: odbc database: test username: xavier password: |
Some errors you may encounter:
The specified module could not be found – odbc.so You have likely copied odbc.so from i386-msvcrt-ruby-odbc.zip. This is for 1.8.7, and does not work for 1.9. Remove the .so file, and install ruby-odbc as above.
The specified DSN contains an architecture mismatch between the Driver and the Application. Perhaps you have created a system DSN. Try creating a user DSN instead. I also found some suggestions that you need to use a different version of the ODBC configuration panel, but this wasn’t relevant for me.
Transactional before all with RSpec and DataMapper
By default, before(:all) in rspec executes outside of any transaction, meaning that you can’t really use it for creating objects. Normally this should go in a before(:each), but for a spec with simple creation and a large number of assertions this is terribly inefficient.
Let’s fix it!
This code assumes you are using DataMapper, and that your database supports some form of nested transactions (at the very least faking them with savepoints – see nested transactions in postgres with datamapper). It wraps each before/after :all and :each in it’s own transaction.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
RSpec.configure do |config| [:all, :each].each do |x| config.before(x) do repository(:default) do |repository| transaction = DataMapper::Transaction.new(repository) transaction.begin repository.adapter.push_transaction(transaction) end end config.after(x) do repository(:default).adapter.pop_transaction.rollback end end config.include(RSpecExtensions::Set) end |
See that RSpecExtensions::Set include? That’s a version of the lovely let helpers that works with before(:all) setup. Props to pcreux for this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
module RSpecExtensions module Set module ClassMethods # Generates a method whose return value is memoized # in before(:all). Great for DB setup when combined with # transactional before alls. def set(name, &block) define_method(name) do __memoized[name] ||= instance_eval(&block) end before(:all) { __send__(name) } before(:each) do __send__(name).tap do |obj| obj.reload if obj.respond_to?(:reload) end end end end module InstanceMethods def __memoized # :nodoc: @__memoized ||= {} end end def self.included(mod) # :nodoc: mod.extend ClassMethods mod.__send__ :include, InstanceMethods end end end |
Fast specs make me a happy man.
Nested Transactions in Postgres with DataMapper
Hacks to get nested transactions support for Postgres in DataMapper. Not extensively tested, more a proof of concept. It re-opens the existing Transaction class to add a check for whether we need a nested transaction or not, and adds a new NestedTransaction transaction primitive that issues savepoint commands rather than begin/commit.
I put this code in a Rails initializer.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 |
# Hacks to get nested transactions in Postgres # Not extensively tested, more a proof of concept # # It re-opens the existing Transaction class to add a check for whether # we need a nested transaction or not, and adds a new NestedTransaction # transaction primitive that issues savepoint commands rather than begin/commit. module DataMapper module Resource def transaction(&block) self.class.transaction(&block) end end class Transaction # Overridden to allow nested transactions def connect_adapter(adapter) if @transaction_primitives.key?(adapter) raise "Already a primitive for adapter #{adapter}" end primitive = if adapter.current_transaction adapter.nested_transaction_primitive else adapter.transaction_primitive end @transaction_primitives[adapter] = validate_primitive(primitive) end end module NestedTransactions def nested_transaction_primitive DataObjects::NestedTransaction.create_for_uri(normalized_uri, current_connection) end end class NestedTransactionConfig < Rails::Railtie config.after_initialize do repository.adapter.extend(DataMapper::NestedTransactions) end end end module DataObjects class NestedTransaction < Transaction # The host name. Note, this relies on the host name being configured # and resolvable using DNS HOST = "#{Socket::gethostbyname(Socket::gethostname)[0]}" rescue "localhost" @@counter = 0 # The connection object for this transaction - must have already had # a transaction begun on it attr_reader :connection # A unique ID for this transaction attr_reader :id def self.create_for_uri(uri, connection) uri = uri.is_a?(String) ? URI::parse(uri) : uri DataObjects::NestedTransaction.new(uri, connection) end # # Creates a NestedTransaction bound to an existing connection # def initialize(uri, connection) @connection = connection @id = Digest::SHA256.hexdigest( "#{HOST}:#{$$}:#{Time.now.to_f}:nested:#{@@counter += 1}") end def close end def begin run %{SAVEPOINT "#{@id}"} end def commit run %{RELEASE SAVEPOINT "#{@id}"} end def rollback run %{ROLLBACK TO SAVEPOINT "#{@id}"} end private def run(cmd) connection.create_command(cmd).execute_non_query end end end |
I wrote code similar to this with hassox while at NZX, big ups to those guys. I’m working on a proper patch, but haven’t quite figured out the internals enough. If you know how DataMapper works, please check out and comment on this sample patch for three dm gems.
Why I Rewrote Chronic
It seems like a pretty epic yak shave. If you want to parse natural language dates in ruby, you use Chronic. That’s just how it is. (There’s also Tickle for recurring dates, which is similar, but based on Chronic anyways.) It’s the standard, everyone uses it, so why oh why did I write my own version from scratch?
Three reasons I can see.
Chronic is unmaintained. Check the network graph for Chronic. A more avid historian could turn this into an epic teledrama, but for now here’s the summary: The main repository hasn’t had a commit since late 2008. Evaryont made a valiant attempt to take the reins, but his stamina only lasted an extra year to August 2009. Since then numerous people have forked his efforts, mostly to add 1.9 support. These efforts are fragmented though. The inertia of such a large project with no clear leadership sees every man running for himself.
Further, the new maintainers aren’t providing a rock solid base. From Evaryont’s README:
I decided on my own volition that the 40-some (as reported by Github) network should be merged together. I got it to run, but quite haphazardly. There are a lot of new features (mostly undocumented except the git logs) so be a little flexible in your language passed to Chronic. [emphasis mine]
This does not fill me with confidence.
Chronic has a large barrier to entry. Natural date parsing is a big challenge. In the original README, there are ~50 examples of formats it supports, and that is excluding all of the features added in forks in the last two years. The result is a large code base which is intimidating for a new comer, especially with no high level guidance as to how everything fits together. On a project of this size, “the documentation is in the specs” is insufficient. I know what it does, I need to know how it does it.
Chronic solves the wrong problem. I want an alternative to date pickers. As such, I don’t need time support, and I only need very simple day parsing. Chronic seems geared towards a calendar type application (“tomorrow at 6:45pm”), but also parses many expressions which simply are not useful in a real application either because they are obtuse - “7 hours before tomorrow at noon” - or just not how users think about dates - “3 months ago saturday at 5:00 pm”. (Note the last assertion is a totally unsubstantiated claim with no user research to support it.)
Further, it is not hard to find simple examples that Chronic doesn’t support. Omitting a year is an easy one: 14 Sep, April 9.
So what to do?
Chronic needs a leader. Chronic neads a hero. One man to reunite the forks, document the code, and deliver it to the promised land.
I am not that man.
I sketched out the formats I actually needed to support for my application, looked at it and thought “really it can’t be that hard”. Natural date parsing is hard; parsing only the dates your application requires is easy. One hour later I had a gem that not only had 100% support for all of the Chronic features I had been using, but also covered some extra formats I wanted (“14 Sep”), and could also convert a date back into a human readable description. That’s less time than I had already sunk into trying to get Chronic working.
Less than 100 lines of code, totally specced, totally solved my problem. Ultimately, I don’t want to deal with this problem, so I wanted the easiest solution. While patching Chronic would intuitively appear to be pragmatic, a quick spike in the other direction turned out to be worthwhile. Sometimes 80% just isn’t that hard.