tag:www.rhnh.net,2008:/railsRails - Xavier Shay's Blog2011-09-03T22:46:34ZEnkiXavier Shaynotreal@rhnh.nettag:www.rhnh.net,2008:Post/8502011-07-30T05:45:00Z2011-09-03T22:46:34ZInterface Mocking<p><strong><span class="caps">UPDATE</span>:</strong> This is a gem now: <a href="https://github.com/xaviershay/rspec-fire">rspec-fire</a> The code in the gem is better than that presented here.</p>
<p>Here is a screencast I put together in response to a recent Destroy All Software screencast on <a href="https://www.destroyallsoftware.com/screencasts/catalog/test-isolation-and-refactoring">test isolation and refactoring</a>, showing off an idea I’ve been tinkering around with for automatic validation of your implicit interfaces that you stub in tests.</p>
<p><iframe src="http://player.vimeo.com/video/27079042?title=0&byline=0&portrait=0&color=FFFACD" width="600" height="338" frameborder="0"></iframe><p><a href="http://vimeo.com/27079042">Interface Mocking screencast</a>.</p></p>
<p>Here is the code for <code>InterfaceMocking</code>:</p><table class="CodeRay"><tr>
<td class="line_numbers" title="click to toggle" onclick="with (this.firstChild.style) { display = (display == '') ? 'none' : '' }"><pre>1<tt>
</tt>2<tt>
</tt>3<tt>
</tt>4<tt>
</tt>5<tt>
</tt>6<tt>
</tt>7<tt>
</tt>8<tt>
</tt>9<tt>
</tt><strong>10</strong><tt>
</tt>11<tt>
</tt>12<tt>
</tt>13<tt>
</tt>14<tt>
</tt>15<tt>
</tt>16<tt>
</tt>17<tt>
</tt>18<tt>
</tt>19<tt>
</tt><strong>20</strong><tt>
</tt>21<tt>
</tt>22<tt>
</tt>23<tt>
</tt>24<tt>
</tt>25<tt>
</tt>26<tt>
</tt>27<tt>
</tt>28<tt>
</tt>29<tt>
</tt><strong>30</strong><tt>
</tt>31<tt>
</tt>32<tt>
</tt>33<tt>
</tt>34<tt>
</tt>35<tt>
</tt>36<tt>
</tt>37<tt>
</tt>38<tt>
</tt>39<tt>
</tt><strong>40</strong><tt>
</tt>41<tt>
</tt>42<tt>
</tt>43<tt>
</tt>44<tt>
</tt>45<tt>
</tt>46<tt>
</tt>47<tt>
</tt>48<tt>
</tt>49<tt>
</tt><strong>50</strong><tt>
</tt>51<tt>
</tt>52<tt>
</tt>53<tt>
</tt>54<tt>
</tt>55<tt>
</tt>56<tt>
</tt>57<tt>
</tt>58<tt>
</tt>59<tt>
</tt><strong>60</strong><tt>
</tt>61<tt>
</tt>62<tt>
</tt>63<tt>
</tt>64<tt>
</tt>65<tt>
</tt>66<tt>
</tt>67<tt>
</tt>68<tt>
</tt>69<tt>
</tt><strong>70</strong><tt>
</tt>71<tt>
</tt>72<tt>
</tt>73<tt>
</tt>74<tt>
</tt>75<tt>
</tt>76<tt>
</tt>77<tt>
</tt>78<tt>
</tt>79<tt>
</tt><strong>80</strong><tt>
</tt>81<tt>
</tt>82<tt>
</tt>83<tt>
</tt>84<tt>
</tt>85<tt>
</tt>86<tt>
</tt>87<tt>
</tt>88<tt>
</tt>89<tt>
</tt><strong>90</strong><tt>
</tt>91<tt>
</tt>92<tt>
</tt>93<tt>
</tt>94<tt>
</tt>95<tt>
</tt>96<tt>
</tt>97<tt>
</tt>98<tt>
</tt>99<tt>
</tt><strong>100</strong><tt>
</tt>101<tt>
</tt>102<tt>
</tt></pre></td>
<td class="code"><pre ondblclick="with (this.style) { overflow = (overflow == 'auto' || overflow == '') ? 'visible' : 'auto' }"><span class="r">module</span> <span class="cl">InterfaceMocking</span><tt>
</tt><tt>
</tt> <span class="c"># Returns a new interface double. This is equivalent to an RSpec double,</span><tt>
</tt> <span class="c"># stub or, mock, except that if the class passed as the first parameter</span><tt>
</tt> <span class="c"># is loaded it will raise if you try to set an expectation or stub on</span><tt>
</tt> <span class="c"># a method that the class has not implemented.</span><tt>
</tt> <span class="r">def</span> <span class="fu">interface_double</span>(stubbed_class, methods = {})<tt>
</tt> <span class="co">InterfaceDouble</span>.new(stubbed_class, methods)<tt>
</tt> <span class="r">end</span><tt>
</tt><tt>
</tt> <span class="r">module</span> <span class="cl">InterfaceDoubleMethods</span><tt>
</tt><tt>
</tt> include <span class="co">RSpec</span>::<span class="co">Matchers</span><tt>
</tt><tt>
</tt> <span class="r">def</span> <span class="fu">should_receive</span>(method_name)<tt>
</tt> ensure_implemented(method_name)<tt>
</tt> <span class="r">super</span><tt>
</tt> <span class="r">end</span><tt>
</tt><tt>
</tt> <span class="r">def</span> <span class="fu">should_not_receive</span>(method_name)<tt>
</tt> ensure_implemented(method_name)<tt>
</tt> <span class="r">super</span><tt>
</tt> <span class="r">end</span><tt>
</tt><tt>
</tt> <span class="r">def</span> <span class="fu">stub!</span>(method_name)<tt>
</tt> ensure_implemented(method_name)<tt>
</tt> <span class="r">super</span><tt>
</tt> <span class="r">end</span><tt>
</tt><tt>
</tt> <span class="r">def</span> <span class="fu">ensure_implemented</span>(*method_names)<tt>
</tt> <span class="r">if</span> recursive_const_defined?(<span class="co">Object</span>, <span class="iv">@__stubbed_class__</span>)<tt>
</tt> recursive_const_get(<span class="co">Object</span>, <span class="iv">@__stubbed_class__</span>).<tt>
</tt> should implement(method_names, <span class="iv">@__checked_methods__</span>)<tt>
</tt> <span class="r">end</span><tt>
</tt> <span class="r">end</span><tt>
</tt><tt>
</tt> <span class="r">def</span> <span class="fu">recursive_const_get</span> object, name<tt>
</tt> name.split(<span class="s"><span class="dl">'</span><span class="k">::</span><span class="dl">'</span></span>).inject(<span class="co">Object</span>) {|klass,name| klass.const_get name }<tt>
</tt> <span class="r">end</span><tt>
</tt><tt>
</tt> <span class="r">def</span> <span class="fu">recursive_const_defined?</span> object, name<tt>
</tt> !!name.split(<span class="s"><span class="dl">'</span><span class="k">::</span><span class="dl">'</span></span>).inject(<span class="co">Object</span>) {|klass,name|<tt>
</tt> <span class="r">if</span> klass && klass.const_defined?(name)<tt>
</tt> klass.const_get name<tt>
</tt> <span class="r">end</span><tt>
</tt> }<tt>
</tt> <span class="r">end</span><tt>
</tt><tt>
</tt> <span class="r">end</span><tt>
</tt><tt>
</tt> <span class="r">class</span> <span class="cl">InterfaceDouble</span> < <span class="co">RSpec</span>::<span class="co">Mocks</span>::<span class="co">Mock</span><tt>
</tt><tt>
</tt> include <span class="co">InterfaceDoubleMethods</span><tt>
</tt><tt>
</tt> <span class="r">def</span> <span class="fu">initialize</span>(stubbed_class, *args)<tt>
</tt> args << {} <span class="r">unless</span> <span class="co">Hash</span> === args.last<tt>
</tt><tt>
</tt> <span class="iv">@__stubbed_class__</span> = stubbed_class<tt>
</tt> <span class="iv">@__checked_methods__</span> = <span class="sy">:public_instance_methods</span><tt>
</tt> ensure_implemented *args.last.keys<tt>
</tt><tt>
</tt> <span class="c"># __declared_as copied from rspec/mocks definition of `double`</span><tt>
</tt> args.last[<span class="sy">:__declared_as</span>] = <span class="s"><span class="dl">'</span><span class="k">InterfaceDouble</span><span class="dl">'</span></span><tt>
</tt> <span class="r">super</span>(stubbed_class, *args)<tt>
</tt> <span class="r">end</span><tt>
</tt><tt>
</tt> <span class="r">end</span><tt>
</tt><span class="r">end</span><tt>
</tt><tt>
</tt><span class="co">RSpec</span>::<span class="co">Matchers</span>.define <span class="sy">:implement</span> <span class="r">do</span> |expected_methods, checked_methods|<tt>
</tt> match <span class="r">do</span> |stubbed_class|<tt>
</tt> unimplemented_methods(<tt>
</tt> stubbed_class,<tt>
</tt> expected_methods,<tt>
</tt> checked_methods<tt>
</tt> ).empty?<tt>
</tt> <span class="r">end</span><tt>
</tt><tt>
</tt> <span class="r">def</span> <span class="fu">unimplemented_methods</span>(stubbed_class, expected_methods, checked_methods)<tt>
</tt> implemented_methods = stubbed_class.send(checked_methods)<tt>
</tt> unimplemented_methods = expected_methods - implemented_methods<tt>
</tt> <span class="r">end</span><tt>
</tt><tt>
</tt> failure_message_for_should <span class="r">do</span> |stubbed_class|<tt>
</tt> <span class="s"><span class="dl">"</span><span class="k">%s does not publicly implement:</span><span class="ch">\n</span><span class="k">%s</span><span class="dl">"</span></span> % [<tt>
</tt> stubbed_class,<tt>
</tt> unimplemented_methods(<tt>
</tt> stubbed_class,<tt>
</tt> expected_methods,<tt>
</tt> checked_methods<tt>
</tt> ).sort.map {|x|<tt>
</tt> <span class="s"><span class="dl">"</span><span class="k"> </span><span class="il"><span class="idl">#{</span>x<span class="idl">}</span></span><span class="dl">"</span></span><tt>
</tt> }.join(<span class="s"><span class="dl">"</span><span class="ch">\n</span><span class="dl">"</span></span>)<tt>
</tt> ]<tt>
</tt> <span class="r">end</span><tt>
</tt><span class="r">end</span><tt>
</tt><tt>
</tt><span class="co">RSpec</span>.configure <span class="r">do</span> |config|<tt>
</tt><tt>
</tt> config.include <span class="co">InterfaceMocking</span><tt>
</tt><tt>
</tt><span class="r">end</span><tt>
</tt></pre></td>
</tr></table>
tag:www.rhnh.net,2008:Post/8492011-07-29T05:55:00Z2011-07-29T05:55:02ZStatic Asset Caching on Heroku Cedar Stack<p>I recently moved this blog over to <a href="http://heroku.com">Heroku</a>, and in the process added in some proper <span class="caps">HTTP</span> caching headers. The dynamic pages use the build in <code>fresh_when</code> and <code>stale?</code> Rails helpers, combined with <code>Rack::Cache</code> and the free memcached plugin available on Heroku. That was all pretty straight forward, what was more difficult was configuring Heroku to serve all static assets (such as images and stylesheets) with a far-future <code>max-age</code> header so that they will be cached for eternity. What I’ve documented here is somewhat of a hack, and hopefully Heroku will provide a better way of doing this in the future.</p>
<p>By default Heroku serves everything in <code>public</code> directly via nginx. This is a problem for us since we don’t get a chance to configure the caching headers. Instead, use the <code>Rack::StaticCache</code> middleware (provided in the <code>rack-contrib</code> gem) to serve static files, which by default adds far future max age cache control headers. This needs to be out of different directory to <code>public</code> since there is no way to disable the nginx serving. I renamed by <code>public</code> folder to <code>public_cached</code>.</p><table class="CodeRay"><tr>
<td class="line_numbers" title="click to toggle" onclick="with (this.firstChild.style) { display = (display == '') ? 'none' : '' }"><pre>1<tt>
</tt>2<tt>
</tt>3<tt>
</tt>4<tt>
</tt>5<tt>
</tt>6<tt>
</tt>7<tt>
</tt>8<tt>
</tt>9<tt>
</tt><strong>10</strong><tt>
</tt></pre></td>
<td class="code"><pre ondblclick="with (this.style) { overflow = (overflow == 'auto' || overflow == '') ? 'visible' : 'auto' }"><span class="c"># config/application.rb</span><tt>
</tt>config.middleware.use <span class="co">Rack</span>::<span class="co">StaticCache</span>, <tt>
</tt> <span class="ke">urls</span>: <span class="s"><span class="dl">%w(</span><span class="k"><tt>
</tt> /stylesheets<tt>
</tt> /images<tt>
</tt> /javascripts<tt>
</tt> /robots.txt<tt>
</tt> /favicon.ico<tt>
</tt> </span><span class="dl">)</span></span>,<tt>
</tt> <span class="ke">root</span>: <span class="s"><span class="dl">"</span><span class="k">public_cached</span><span class="dl">"</span></span><tt>
</tt></pre></td>
</tr></table>
<p>I also disabled the built in Rails serving of static assets in development mode, so that it didn’t interfere:</p><table class="CodeRay"><tr>
<td class="line_numbers" title="click to toggle" onclick="with (this.firstChild.style) { display = (display == '') ? 'none' : '' }"><pre>1<tt>
</tt>2<tt>
</tt></pre></td>
<td class="code"><pre ondblclick="with (this.style) { overflow = (overflow == 'auto' || overflow == '') ? 'visible' : 'auto' }"><span class="c"># config/environments/development.rb</span><tt>
</tt>config.serve_static_assets = <span class="pc">false</span><tt>
</tt></pre></td>
</tr></table>
<p>In the production config, I configured the <code>x_sendfile_header</code> option to be “X-Accel-Redirect”. It was “X-Sendfile” which is an apache directive, and was causing nginx to hang (Heroku would never actually serve the assets to the browser).</p><table class="CodeRay"><tr>
<td class="line_numbers" title="click to toggle" onclick="with (this.firstChild.style) { display = (display == '') ? 'none' : '' }"><pre>1<tt>
</tt>2<tt>
</tt></pre></td>
<td class="code"><pre ondblclick="with (this.style) { overflow = (overflow == 'auto' || overflow == '') ? 'visible' : 'auto' }"><span class="c"># config/environments/production.rb</span><tt>
</tt>config.action_dispatch.x_sendfile_header = <span class="s"><span class="dl">'</span><span class="k">X-Accel-Redirect</span><span class="dl">'</span></span><tt>
</tt></pre></td>
</tr></table>
<p>A downside of this approach is that if you have a lot of static assets, they all have to hit the Rails stack in order to be served. If you only have one dyno (the free plan) then the initial load can be slower than it otherwise would be if nginx was serving them directly. As I mentioned in the introduction, hopefully Heroku will provide a nicer way to do this in the future.</p>tag:www.rhnh.net,2008:Post/8472011-05-28T01:08:00Z2011-05-28T01:08:12ZSpeeding up Rails startup time<p>In which I provide easy instructions to try a new patch that drastically improves the start up time of Ruby applications, in the hope that with wide support it will be merged into the upcoming 1.9.3 release. Skip to the bottom for instructions, or keep reading for the narrative.</p>
<p><strong><span class="caps">UPDATE</span>:</strong> If you have trouble installing, grab a recent copy of rvm: rvm get head.</p>
<h2>Background</h2>
<p>Recent releases of <span class="caps">MRI</span> Ruby have introduced some fairly major performance regressions when requiring files:</p>
<p><img src="https://img.skitch.com/20110528-xigici83u5texbpnwnwntfrkuq.jpg" alt="" /></p>
<p>For reference, our medium-sized Rails application requires around 2200 files &emdash; off the right-hand side of this graph. This is problematic. On 1.9.2 it takes 20s to start up, on 1.9.3 it takes 46s. Both are far too long.</p>
<p>There are a few reasons for this, but the core of the problem is the basic algorithm which looks something like this:</p><table class="CodeRay"><tr>
<td class="line_numbers" title="click to toggle" onclick="with (this.firstChild.style) { display = (display == '') ? 'none' : '' }"><pre>1<tt>
</tt>2<tt>
</tt>3<tt>
</tt>4<tt>
</tt>5<tt>
</tt>6<tt>
</tt>7<tt>
</tt></pre></td>
<td class="code"><pre ondblclick="with (this.style) { overflow = (overflow == 'auto' || overflow == '') ? 'visible' : 'auto' }"><span class="r">def</span> <span class="fu">require</span>(file)<tt>
</tt> <span class="gv">$loaded</span>.each <span class="r">do</span> |x|<tt>
</tt> <span class="r">return</span> <span class="pc">false</span> <span class="r">if</span> x == file<tt>
</tt> <span class="r">end</span><tt>
</tt> load(file)<tt>
</tt> <span class="gv">$loaded</span> << file<tt>
</tt><span class="r">end</span><tt>
</tt></pre></td>
</tr></table>
<p>That loop is no good, and gets worse the more files you have required. I have written a patch for 1.9.3 which changes this algorithm to:</p><table class="CodeRay"><tr>
<td class="line_numbers" title="click to toggle" onclick="with (this.firstChild.style) { display = (display == '') ? 'none' : '' }"><pre>1<tt>
</tt>2<tt>
</tt>3<tt>
</tt>4<tt>
</tt>5<tt>
</tt></pre></td>
<td class="code"><pre ondblclick="with (this.style) { overflow = (overflow == 'auto' || overflow == '') ? 'visible' : 'auto' }"><span class="r">def</span> <span class="fu">require</span>(file)<tt>
</tt> <span class="r">return</span> <span class="pc">false</span> <span class="r">if</span> <span class="gv">$loaded</span>[file] <tt>
</tt> load(file)<tt>
</tt> <span class="gv">$loaded</span>[file] = <span class="pc">true</span><tt>
</tt><span class="r">end</span><tt>
</tt></pre></td>
</tr></table>
<p>That gives you a performance curve that looks like this:</p>
<p><img src="https://img.skitch.com/20110528-gtsgba1twaiwkd3frewen54ts.jpg" alt="" /></p>
<p>Much nicer.</p>
<p>That’s just a synthetic benchmark, but it works in the real world too. My <a href="http://theconversation.edu.au">main Rails application</a> now loads in a mite over 10s, down from 20s it was taking on 1.9.2. A blank Rails app loads in 1.1s, which is even faster than 1.8.7.</p>
<p><img src="https://img.skitch.com/20110528-cu9nux6619fxruh5rq6ppywp7p.jpg" alt="" /></p>
<h2>Getting the fix</h2>
<p>Here is how you can try out my patch right now in just ten minutes using <span class="caps">RVM</span>.</p><table class="CodeRay"><tr>
<td class="line_numbers" title="click to toggle" onclick="with (this.firstChild.style) { display = (display == '') ? 'none' : '' }"><pre>1<tt>
</tt>2<tt>
</tt>3<tt>
</tt>4<tt>
</tt>5<tt>
</tt>6<tt>
</tt>7<tt>
</tt>8<tt>
</tt>9<tt>
</tt><strong>10</strong><tt>
</tt>11<tt>
</tt>12<tt>
</tt>13<tt>
</tt>14<tt>
</tt>15<tt>
</tt></pre></td>
<td class="code"><pre ondblclick="with (this.style) { overflow = (overflow == 'auto' || overflow == '') ? 'visible' : 'auto' }"># First get a baseline measurement<tt>
</tt>cd /your/rails/app<tt>
</tt>time script/rails runner "puts 1"<tt>
</tt><tt>
</tt># Install a patched ruby<tt>
</tt>curl https://gist.github.com/raw/996418/e2b346fbadeed458506fc69ca213ad96d1d08c3e/require-performance-fix-r31758.patch > /tmp/require-performance-fix.patch<tt>
</tt>rvm install ruby-head --patch /tmp/require-performance-fix.patch -n patched<tt>
</tt># ... get a cup of tea, this took about 8 minutes on my MBP<tt>
</tt><tt>
</tt># Get a new measurement<tt>
</tt>cd /your/rails/app<tt>
</tt>rvm use ruby-head-patched<tt>
</tt>gem install bundler --no-rdoc --no-ri<tt>
</tt>bundle<tt>
</tt>time script/rails runner "puts 1"<tt>
</tt></pre></td>
</tr></table>
<h2>How you can help</h2>
<p>I need a lot more eyeballs on this patch before it can be considered for merging into trunk. I would really appreciate any of the following:</p>
<ul>
<li>Try it out on your app and report timings in the comments.</li>
<li><a href="https://github.com/ruby/ruby/pull/25">Code review the patch on this GitHub pull request</a> (it’s C code, but don’t let that scare you off).</li>
<li>Try it on Windows.</li>
<li>Report any bugs you find.</li>
</ul>
<h2>Next steps</h2>
<p>I imagine there will be a bit more work to get this into Ruby 1.9.3, but after that this is just the first step of many to try and speed up the time Rails takes to start up. Bundler and RubyGems still spend a lot of time doing … something, which I want to investigate. I also want to port these changes over to JRuby which has similar issues (Rubinius isn’t quite as fast out of the gate, but does not degrade exponentially so would not benefit from this patch).</p>
<p>Thank you for your time.</p>tag:www.rhnh.net,2008:Post/8442011-02-25T04:30:52Z2011-02-25T04:30:52ZPostgreSQL 9 and ruby full text search tricks<p>I have just released an introduction to PostgreSQL screencast, published through PeepCode. It is over an hour long and covers a large number of juicy topics:</p>
<ul>
<li>Setup full text search</li>
<li>Optimize search with triggers and indexes</li>
<li>Use Postgres with Ruby on Rails 3</li>
<li>Optimize indexes by including only the rows that you need</li>
<li>Use database standards for more reliable queries</li>
<li>Write powerful reports in only a few lines of code</li>
<li>Convert an existing MySQL application to use Postgres</li>
</ul>
<p>It’s a steal at only $12. You can <a href="http://peepcode.com/products/postgresql">buy it over at PeepCode</a>.</p>
<p>In it, I introduce full text search in postgres, and use a trigger to keep a search vector up to date. I’m not going to cover that here, but the point I get to is:</p><table class="CodeRay"><tr>
<td class="line_numbers" title="click to toggle" onclick="with (this.firstChild.style) { display = (display == '') ? 'none' : '' }"><pre>1<tt>
</tt>2<tt>
</tt>3<tt>
</tt>4<tt>
</tt></pre></td>
<td class="code"><pre ondblclick="with (this.style) { overflow = (overflow == 'auto' || overflow == '') ? 'visible' : 'auto' }"><span class="r">CREATE</span> <span class="r">TRIGGER</span> posts_search_vector_refresh <tt>
</tt> <span class="r">BEFORE</span> <span class="r">INSERT</span> <span class="r">OR</span> <span class="r">UPDATE</span> <span class="r">ON</span> posts <tt>
</tt>FOR EACH ROW EXECUTE PROCEDURE<tt>
</tt> tsvector_update_trigger(search_vector, <span class="s"><span class="dl">'</span><span class="k">pg_catalog.english</span><span class="dl">'</span></span>, body, title);<tt>
</tt></pre></td>
</tr></table>
<p>That is good for simple models, but what if you want to index child models as well? For instance, we want to include comment authors in the search index. I rolled up my sleeves an came up with this:</p><table class="CodeRay"><tr>
<td class="line_numbers" title="click to toggle" onclick="with (this.firstChild.style) { display = (display == '') ? 'none' : '' }"><pre>1<tt>
</tt>2<tt>
</tt>3<tt>
</tt>4<tt>
</tt>5<tt>
</tt>6<tt>
</tt>7<tt>
</tt>8<tt>
</tt>9<tt>
</tt><strong>10</strong><tt>
</tt>11<tt>
</tt>12<tt>
</tt>13<tt>
</tt>14<tt>
</tt>15<tt>
</tt>16<tt>
</tt>17<tt>
</tt>18<tt>
</tt>19<tt>
</tt><strong>20</strong><tt>
</tt>21<tt>
</tt>22<tt>
</tt>23<tt>
</tt></pre></td>
<td class="code"><pre ondblclick="with (this.style) { overflow = (overflow == 'auto' || overflow == '') ? 'visible' : 'auto' }"><span class="r">CREATE</span> <span class="r">OR</span> <span class="r">REPLACE</span> FUNCTION search_trigger() RETURNS <span class="r">trigger</span> <span class="r">AS</span> <span class="er">$</span><span class="er">$</span><tt>
</tt>DECLARE<tt>
</tt> search <span class="pt">TEXT</span>;<tt>
</tt> child_search <span class="pt">TEXT</span>;<tt>
</tt><span class="r">begin</span><tt>
</tt> <span class="r">SELECT</span> string_agg(author_name, <span class="s"><span class="dl">'</span><span class="k"> </span><span class="dl">'</span></span>) <span class="r">INTO</span> child_search<tt>
</tt> <span class="r">FROM</span> comments<tt>
</tt> <span class="r">WHERE</span> post_id = new.id;<tt>
</tt><tt>
</tt> search <span class="er">:</span>= <span class="s"><span class="dl">'</span><span class="dl">'</span></span>;<tt>
</tt> search <span class="er">:</span>= search || <span class="s"><span class="dl">'</span><span class="k"> </span><span class="dl">'</span></span> || coalesce(new.title);<tt>
</tt> search <span class="er">:</span>= search || <span class="s"><span class="dl">'</span><span class="k"> </span><span class="dl">'</span></span> || coalesce(new.body);<tt>
</tt> search <span class="er">:</span>= search || <span class="s"><span class="dl">'</span><span class="k"> </span><span class="dl">'</span></span> child_search;<tt>
</tt><tt>
</tt> new.search_index <span class="er">:</span>= to_tsvector(search); <tt>
</tt> return new;<tt>
</tt><span class="r">end</span><tt>
</tt><span class="er">$</span><span class="er">$</span> LANGUAGE plpgsql;<tt>
</tt><tt>
</tt><span class="r">CREATE</span> <span class="r">TRIGGER</span> posts_search_vector_refresh <tt>
</tt> <span class="r">BEFORE</span> <span class="r">INSERT</span> <span class="r">OR</span> <span class="r">UPDATE</span> <span class="r">ON</span> posts<tt>
</tt>FOR EACH ROW EXECUTE PROCEDURE<tt>
</tt> search_trigger();<tt>
</tt></pre></td>
</tr></table>
<p>Getting a bit ugly eh. It might be nice to move that logic back into ruby land, but we have the problem that we need to call a database function to convert our search document into the correct data-type. In this case, a quick work around is to store a <code>search_document</code> in a text field on the model, then use a trigger to only index that field into our <code>search_vector</code> field. The <code>search_document</code> field can then easily be set from your <span class="caps">ORM</span>.</p>
<p>Of course, any self-respecting rubyist should hide all this complexity behind a neat interface. I have come up with one using DataMapper that automatically adds the required triggers and indexes via auto-migrations. You use it thusly:</p><table class="CodeRay"><tr>
<td class="line_numbers" title="click to toggle" onclick="with (this.firstChild.style) { display = (display == '') ? 'none' : '' }"><pre>1<tt>
</tt>2<tt>
</tt>3<tt>
</tt>4<tt>
</tt>5<tt>
</tt>6<tt>
</tt>7<tt>
</tt>8<tt>
</tt>9<tt>
</tt><strong>10</strong><tt>
</tt></pre></td>
<td class="code"><pre ondblclick="with (this.style) { overflow = (overflow == 'auto' || overflow == '') ? 'visible' : 'auto' }"><span class="r">class</span> <span class="cl">Post</span><tt>
</tt> include <span class="co">DataMapper</span>::<span class="co">Resource</span><tt>
</tt> include <span class="co">Searchable</span><tt>
</tt><tt>
</tt> property <span class="sy">:id</span>, <span class="co">Serial</span><tt>
</tt> property <span class="sy">:title</span>, <span class="co">String</span><tt>
</tt> property <span class="sy">:body</span>, <span class="co">Text</span><tt>
</tt><tt>
</tt> searchable <span class="sy">:title</span>, <span class="sy">:body</span> <span class="c"># Provides Post.search('keyword')</span><tt>
</tt><span class="r">end</span><tt>
</tt></pre></td>
</tr></table>
<p>You can find the <a href="https://github.com/xaviershay/sandbox/blob/master/misc/searchable.rb">Searchable module code over on github</a>. In it you can also find a fugly proof-of-concept for a <span class="caps">DSL</span> that generates the above <span class="caps">SQL</span> for indexing child models using DataMapper’s rich property model. It worked, but I’m not using it in any production code so I can hardly recommend it. Maybe you want to have a play though.</p>tag:www.rhnh.net,2008:Post/8382010-10-10T00:00:00Z2010-10-10T04:25:57ZRails 3, Ruby 1.9.2, Windows 2008, and SQL Server 2008 Tutorial<p>This took me a while to figure out, especially since I’m not so great with either windows or <span class="caps">SQL</span> server, but in the end the process isn’t so difficult.</p>
<p><iframe src="http://player.vimeo.com/video/15701033?title=0&byline=0&portrait=0&color=FFFACD" width="600" height="510" frameborder="0"></iframe><p><a href="http://vimeo.com/15701033">Rails 3, Ruby 1.9.2, Windows 2008, and <span class="caps">SQL</span> Server 2008 Screencast</a></p></p>
<p>The steps covered in this screencast are:</p>
<ol>
<li>Create user</li>
<li>Create database</li>
<li>Give user permissions</li>
<li>Create <span class="caps">DSN</span></li>
<li>Install <a href="http://rubyinstaller.org/">ruby</a></li>
<li>Install <a href="http://github.com/oneclick/rubyinstaller/wiki/Development-Kit">devkit</a> (Needed to complie native extensions for <span class="caps">ODBC</span>)</li>
<li>Create a new rails app</li>
<li>Add <code>activerecord-sqlserver-adapter</code> and <code>ruby-odbc</code> to Gemfile</li>
<li>Customize <code>config/database.yml</code></li>
</ol><table class="CodeRay"><tr>
<td class="line_numbers" title="click to toggle" onclick="with (this.firstChild.style) { display = (display == '') ? 'none' : '' }"><pre>1<tt>
</tt>2<tt>
</tt>3<tt>
</tt>4<tt>
</tt>5<tt>
</tt>6<tt>
</tt>7<tt>
</tt>8<tt>
</tt></pre></td>
<td class="code"><pre ondblclick="with (this.style) { overflow = (overflow == 'auto' || overflow == '') ? 'visible' : 'auto' }"><span class="c"># config/database.yml</span><tt>
</tt><span class="ke">development</span>:<tt>
</tt> <span class="ke">adapter</span>: <span class="s">sqlserver</span><tt>
</tt> <span class="ke">dsn</span>: <span class="s">testdsn_user</span><tt>
</tt> <span class="ke">mode</span>: <span class="s">odbc</span><tt>
</tt> <span class="ke">database</span>: <span class="s">test</span><tt>
</tt> <span class="ke">username</span>: <span class="s">xavier</span><tt>
</tt> <span class="ke">password</span>:<tt>
</tt></pre></td>
</tr></table>
<p>Some errors you may encounter:</p>
<p><strong>The specified module could not be found – odbc.so</strong> You have likely copied odbc.so from i386-msvcrt-ruby-odbc.zip. This is for 1.8.7, and does not work for 1.9. Remove the .so file, and install ruby-odbc as above.</p>
<p><strong>The specified <span class="caps">DSN</span> contains an architecture mismatch between the Driver and the Application.</strong> Perhaps you have created a system <span class="caps">DSN</span>. Try creating a user <span class="caps">DSN</span> instead. I also found some suggestions that you need to <a href="http://social.answers.microsoft.com/Forums/en-US/addbuz/thread/a2c16c27-194f-452f-8447-5a70a4178a42">use a different version of the <span class="caps">ODBC</span> configuration panel</a>, but this wasn’t relevant for me.</p>tag:www.rhnh.net,2008:Post/8352010-09-23T09:28:41Z2010-09-23T09:28:41ZWhy I Rewrote Chronic<p>It seems like a pretty epic yak shave. If you want to parse natural language dates in ruby, you use Chronic. That’s just how it is. (There’s also Tickle for recurring dates, which is similar, but based on Chronic anyways.) It’s the standard, everyone uses it, so why oh why did I write my own version from scratch?</p>
<p>Three reasons I can see.</p>
<p><strong>Chronic is unmaintained.</strong> Check the network graph for Chronic. A more avid historian could turn this into an epic teledrama, but for now here’s the summary: The main repository hasn’t had a commit since late 2008. Evaryont made a valiant attempt to take the reins, but his stamina only lasted an extra year to August 2009. Since then numerous people have forked his efforts, mostly to add 1.9 support. These efforts are fragmented though. The inertia of such a large project with no clear leadership sees every man running for himself.</p>
<p>Further, the new maintainers aren’t providing a rock solid base. From Evaryont’s <span class="caps">README</span>:<br />
<em>I decided on my own volition that the 40-some (as reported by Github) network should be merged together. I got it to run, but <strong>quite haphazardly</strong>. There are a lot of new features (<strong>mostly undocumented</strong> except the git logs) so be a little flexible in your language passed to Chronic.</em> [emphasis mine]</p>
<p>This does not fill me with confidence.</p>
<p><strong>Chronic has a large barrier to entry.</strong> Natural date parsing is a big challenge. In the original <span class="caps">README</span>, there are ~50 examples of formats it supports, and that is excluding all of the features added in forks in the last two years. The result is a large code base which is intimidating for a new comer, especially with no high level guidance as to how everything fits together. On a project of this size, “the documentation is in the specs” is insufficient. I know what it <strong>does</strong>, I need to know <strong>how</strong> it does it.</p>
<p><strong>Chronic solves the wrong problem.</strong> I want an alternative to date pickers. As such, I don’t need time support, and I only need very simple day parsing. Chronic seems geared towards a calendar type application (“tomorrow at 6:45pm”), but also parses many expressions which simply are not useful in a real application either because they are obtuse <del>-</del> “7 hours before tomorrow at noon” <del>-</del> or just not how users think about dates <del>-</del> “3 months ago saturday at 5:00 pm”. (Note the last assertion is a totally unsubstantiated claim with no user research to support it.)</p>
<p>Further, it is not hard to find simple examples that Chronic doesn’t support. Omitting a year is an easy one: 14 Sep, April 9.</p>
<h2>So what to do?</h2>
<p>Chronic needs a leader. Chronic neads a hero. One man to reunite the forks, document the code, and deliver it to the promised land.</p>
<p>I am not that man.</p>
<p>I sketched out the formats I actually needed to support for my application, looked at it and thought “really it can’t be that hard”. Natural date parsing is hard; parsing only the dates your application requires is easy. One hour later I had a gem that not only had 100% support for all of the Chronic features I had been using, but also covered some extra formats I wanted (“14 Sep”), and could also convert a date <em>back</em> into a human readable description. That’s less time than I had already sunk into trying to get Chronic working.</p>
<p><a href="http://github.com/xaviershay/kronic">Introducing Kronic.</a></p>
<p>Less than 100 lines of code, totally specced, totally solved my problem. Ultimately, I don’t want to deal with this problem, so I wanted the easiest solution. While patching Chronic would intuitively appear to be pragmatic, a quick spike in the other direction turned out to be worthwhile. Sometimes 80% just isn’t that hard.</p>tag:www.rhnh.net,2008:Post/8342010-09-18T13:59:53Z2010-09-18T13:59:53ZBuild time graph with buildhawk<p><img src="http://farm5.static.flickr.com/4147/5000723561_6c5a18eef0_z.jpg" alt="" /></p>
<p>How long your build took to run, in a graph, on a webpage. That’s pretty fantastic. You need to be <a href="http://rhnh.net/2010/09/06/storing-build-time-in-git-notes-with-zsh">storing your build time in git notes</a>, as I wrote about a few weeks back. Then simply:</p><table class="CodeRay"><tr>
<td class="line_numbers" title="click to toggle" onclick="with (this.firstChild.style) { display = (display == '') ? 'none' : '' }"><pre>1<tt>
</tt>2<tt>
</tt>3<tt>
</tt></pre></td>
<td class="code"><pre ondblclick="with (this.style) { overflow = (overflow == 'auto' || overflow == '') ? 'visible' : 'auto' }">gem install buildhawk<tt>
</tt>buildhawk > report.html<tt>
</tt>open report.html<tt>
</tt></pre></td>
</tr></table>
<p>This is a simple gem I hacked together today that parses <code>git log</code> and stuffs the output into an <span class="caps">ERB</span> template that uses <a href="http://xaviershay.github.com/tufte-graph">TufteGraph</a> and some subtle jQuery animation to make it look nice. For extra prettiness, I use the <a href="http://www.dafont.com/monofur.font">Monofur</a> font, but maybe you are content with your default <code>monospace</code>. If you want to poke around the internals (there’s not much!) have a look on <a href="http://github.com/xaviershay/buildhawk">Github</a>.</p>tag:www.rhnh.net,2008:Post/8332010-09-09T01:55:00Z2010-09-09T10:55:04ZSix best talks from LSRC 2010<p><strong>I wrote this last fortnight, but was waiting for videos. Still missing a few, but it’s a start. Enjoy!</strong></p>
<p>I am just finishing up a week in Austin, Texas. I was here for Lone Star Ruby Conference, at which I ran both my <a href="http://www.dbisyourfriend.com">Database Is Your Friend Training</a>, and also a full day introduction to MongoDB course. I was then free to enjoy the talks for the remaining two days. Here are my top picks.</p>
<h3>Debugging Ruby</h3>
<p>Aman Gupta gave a fantastic overview of the low level tools available for debugging ruby applications, including perf-tools, strace, gdb, bleak-house, and some nice ruby wrappers he has written around them. I had heard of these tools before, but was never sure when to use them or where to start if I wanted to use them. Aman’s presentation was the hook I needed to get into these tools, giving plenty of real examples of where they had been useful and how he used them.</p>
<p><a href="http://www.slideshare.net/mongosf/debugging-ruby-aman-gupta">Slides</a></p>
<h3>Seven Languages in Seven Weeks</h3>
<p>Bruce Tate gave an entertaining talk in which he compared seven languages to movie characters. It was a great narrative, and is energy and excitement about the languages was infectious. He has written a book on the same topic, which I plan on purchasing when I make some time to work through it. There are some sample chapters available at the pragprog site.</p>
<p><a href="http://pragprog.com/titles/btlang/seven-languages-in-seven-weeks">Book</a></p>
<h3>Greasing Your Suite</h3>
<p>I had seen the content of Nick’s talk “Greasing Your Suite” before in slide format, and it was just as excellent live. Nick takes the run time of a rails test suite from 13 minutes down to <em>eighteen seconds</em>. An incredible effort. While watching his talk I installed and set up his <a href="http://github.com/ngauthier/hydra">hydra gem</a>, and it was dead simple to get my tests running in parallel. I only added a rake task and a tiny yml file—-no other setup required—-and I got a significant speed up even on trivial test suites. I was impressed at how easy it was to get going, and I’ll be using it on all my apps from now on.</p>
<p><a href="http://vimeo.com/12705404">Video</a> (From Goruco, but he gave the same talk)</p>
<h3>Deciphering Yehuda</h3>
<p>Gregg Pollack’s talk on how some of the techniques used in the internals of rails and bundler work was excellent. While the content wasn’t new to me, I was impressed at Gregg’s ability to explain code on slides, a task difficult to do well. If you ever plan to present you should watch this to pick up some of Gregg’s techniques. I am going to be checking out his <a href="http://rubyonrails.org/screencasts/rails3">Introduction to Rails 3 screencasts</a> for the same reason.</p>
<p><a href="http://confreaks.net/videos/285-lsrc2010-decyphering-yehuda">Video</a></p>
<h3>Real Software Engineering</h3>
<p>Glenn Vanderburg opened the conference with a fantastic talk on the history of software engineering. This answered a lot of questions that have been floating around my mind, especially to do with the misleading comparisons often made to other engineering disciplines. Give a civil engineer the ability to quickly prototype bridges for little cost, they are going to do a lost less modelling. A mathematical model is simply a way to reduce costs. And cost is always an object. Watch the talk, it’s brilliant.</p>
<p><a href="http://confreaks.net/videos/282-lsrc2010-real-software-engineering">Video</a></p>
<h3>Keynote</h3>
<p>The best overall talk was Tom Preston-Werner’s keynote Friday evening. His mix of story, humour, and inspiration were perfect for a keynote, and his delivery was excellent. He pitched his content expertly and though there was no specific item I hadn’t heard before, it has had a significant impact on my thoughts the past few days. Hopefully a video is up soon.</p>tag:www.rhnh.net,2008:Post/8322010-09-07T03:30:00Z2010-09-06T16:22:43ZSpeeding Up Rails Rake<p>On a brand new rails project (this article is rails 3, but the same principle applies to rails 2), <code>rake --tasks</code> takes about a second to run. This is just the time it takes to load all the tasks, as a result any task you define will take at least this amount of time to run, even if it is has nothing to do with rails. Tab completion is slow. That makes me sad.</p>
<p>The issue is that since rails and gems can provide rake tasks for your project, the entire rails environment has to be loaded just to figure out which tasks are available. If you are familiar with the tasks available, you can hack around things to wring some extra speed out of your rake.</p>
<p><strong><span class="caps">WARNING</span>: Hacks abound beyond this point. Proceed at own risk.</strong></p>
<p>Below is my edited Rakefile. Narrative continues in the comments below.</p><table class="CodeRay"><tr>
<td class="line_numbers" title="click to toggle" onclick="with (this.firstChild.style) { display = (display == '') ? 'none' : '' }"><pre>1<tt>
</tt>2<tt>
</tt>3<tt>
</tt>4<tt>
</tt>5<tt>
</tt>6<tt>
</tt>7<tt>
</tt>8<tt>
</tt>9<tt>
</tt><strong>10</strong><tt>
</tt>11<tt>
</tt>12<tt>
</tt>13<tt>
</tt>14<tt>
</tt>15<tt>
</tt>16<tt>
</tt>17<tt>
</tt>18<tt>
</tt>19<tt>
</tt><strong>20</strong><tt>
</tt>21<tt>
</tt>22<tt>
</tt>23<tt>
</tt>24<tt>
</tt>25<tt>
</tt>26<tt>
</tt>27<tt>
</tt>28<tt>
</tt>29<tt>
</tt><strong>30</strong><tt>
</tt>31<tt>
</tt>32<tt>
</tt>33<tt>
</tt>34<tt>
</tt>35<tt>
</tt>36<tt>
</tt>37<tt>
</tt>38<tt>
</tt>39<tt>
</tt><strong>40</strong><tt>
</tt>41<tt>
</tt>42<tt>
</tt>43<tt>
</tt>44<tt>
</tt>45<tt>
</tt>46<tt>
</tt>47<tt>
</tt>48<tt>
</tt>49<tt>
</tt></pre></td>
<td class="code"><pre ondblclick="with (this.style) { overflow = (overflow == 'auto' || overflow == '') ? 'visible' : 'auto' }"><span class="c"># Rakefile</span><tt>
</tt><span class="r">def</span> <span class="fu">load_rails_environment</span><tt>
</tt> require <span class="co">File</span>.expand_path(<span class="s"><span class="dl">'</span><span class="k">../config/application</span><span class="dl">'</span></span>, <span class="pc">__FILE__</span>)<tt>
</tt> require <span class="s"><span class="dl">'</span><span class="k">rake</span><span class="dl">'</span></span><tt>
</tt> <span class="co">Speedtest</span>::<span class="co">Application</span>.load_tasks<tt>
</tt><span class="r">end</span><tt>
</tt><tt>
</tt><span class="c"># By default, do not load the Rails environment. This allows for faster</span><tt>
</tt><span class="c"># loading of all the rake files, so that getting the task list, or kicking</span><tt>
</tt><span class="c"># off a spec run (which loads the environment by itself anyways) is much</span><tt>
</tt><span class="c"># quicker.</span><tt>
</tt><span class="r">if</span> <span class="co">ENV</span>[<span class="s"><span class="dl">'</span><span class="k">LOAD_RAILS</span><span class="dl">'</span></span>] == <span class="s"><span class="dl">'</span><span class="k">1</span><span class="dl">'</span></span><tt>
</tt> <span class="c"># Bypass these hacks that prevent the Rails environment loading, so that the</span><tt>
</tt> <span class="c"># original descriptions and tasks can be seen, or to see other rake tasks provided</span><tt>
</tt> <span class="c"># by gems.</span><tt>
</tt> load_rails_environment<tt>
</tt><span class="r">else</span><tt>
</tt> <span class="c"># Create a stub task for all Rails provided tasks that will load the Rails</span><tt>
</tt> <span class="c"># environment, which in will append the real definition of the task to</span><tt>
</tt> <span class="c"># the end of the stub task, so it will be run directly afterwards.</span><tt>
</tt> <span class="c">#</span><tt>
</tt> <span class="c"># Refresh this list with:</span><tt>
</tt> <span class="c"># LOAD_RAILS=1 rake -T | ruby -ne 'puts $_.split(/\s+/)[1]' | tail -n+2 | xargs</span><tt>
</tt> <span class="s"><span class="dl">%w(</span><span class="k"><tt>
</tt> about db:create db:drop db:fixtures:load db:migrate db:migrate:status <tt>
</tt> db:rollback db:schema:dump db:schema:load db:seed db:setup <tt>
</tt> db:structure:dump db:version doc:app log:clear middleware notes <tt>
</tt> notes:custom rails:template rails:update routes secret stats test <tt>
</tt> test:recent test:uncommitted time:zones:all tmp:clear tmp:create<tt>
</tt> </span><span class="dl">)</span></span>.each <span class="r">do</span> |task_name|<tt>
</tt> task task_name <span class="r">do</span><tt>
</tt> load_rails_environment<tt>
</tt> <span class="c"># Explicitly invoke the rails environment task so that all configuration</span><tt>
</tt> <span class="c"># gets loaded before the actual task (appended on to this one) runs.</span><tt>
</tt> <span class="co">Rake</span>::<span class="co">Task</span>[<span class="s"><span class="dl">'</span><span class="k">environment</span><span class="dl">'</span></span>].invoke<tt>
</tt> <span class="r">end</span><tt>
</tt> <span class="r">end</span><tt>
</tt><tt>
</tt> <span class="c"># Create an empty task that will show up in rake -T, instructing how to</span><tt>
</tt> <span class="c"># get a list of all the actual tasks. This isn't necessary but is a courtesy</span><tt>
</tt> <span class="c"># to your future self.</span><tt>
</tt> desc <span class="s"><span class="dl">"</span><span class="k">!!! Default rails tasks are hidden, run with LOAD_RAILS=1 to reveal.</span><span class="dl">"</span></span><tt>
</tt> task <span class="sy">:rails</span><tt>
</tt><span class="r">end</span><tt>
</tt><tt>
</tt><span class="c"># Load all tasks defined in lib/tasks/*.rake</span><tt>
</tt><span class="co">Dir</span>[<span class="co">File</span>.expand_path(<span class="s"><span class="dl">"</span><span class="k">../lib/tasks/</span><span class="dl">"</span></span>, <span class="pc">__FILE__</span>) + <span class="s"><span class="dl">'</span><span class="k">/*.rake</span><span class="dl">'</span></span>].each <span class="r">do</span> |file|<tt>
</tt> load file<tt>
</tt><span class="r">end</span><tt>
</tt></pre></td>
</tr></table>
<p>Now <code>rake --tasks</code> executes near instantaneously, and tasks will generally kick off faster (including <code>rake spec</code>). Much nicer!</p>
<p>This technique has the added benefit of hiding all the built in tasks. Depending on your experience this may not be a win, but since I already know the rails ones by heart, I’m usually only interested in the tasks specific to the project.</p>
<p>I don’t pretend this is a pretty or permanent solution, but I share it here because it has made my life better in recent times.</p>tag:www.rhnh.net,2008:Post/8302010-08-22T19:18:00Z2010-08-22T19:18:56ZDuplicate Data<p><strong><span class="caps">UPDATE</span>:</strong> If you are on PostgreSQL, check this <a href="http://rhnh.net/2011/04/30/deleting-duplicate-data-with-postgresql">updated query</a>, it’s more useful.</p>
<p>Forgotten to back <code>validates_uniqueness_of</code> with a unique constraint in your database? Oh no! Here is some <span class="caps">SQL</span> that will pull out all the duplicate records for you.</p><table class="CodeRay"><tr>
<td class="line_numbers" title="click to toggle" onclick="with (this.firstChild.style) { display = (display == '') ? 'none' : '' }"><pre>1<tt>
</tt>2<tt>
</tt>3<tt>
</tt>4<tt>
</tt>5<tt>
</tt>6<tt>
</tt>7<tt>
</tt>8<tt>
</tt>9<tt>
</tt></pre></td>
<td class="code"><pre ondblclick="with (this.style) { overflow = (overflow == 'auto' || overflow == '') ? 'visible' : 'auto' }"><span class="co">User</span>.find_by_sql <span class="s"><span class="dl"><<-EOS</span></span><span class="s"><span class="k"><tt>
</tt> SELECT * <tt>
</tt> FROM users <tt>
</tt> WHERE name IN (<tt>
</tt> SELECT name <tt>
</tt> FROM users <tt>
</tt> GROUP BY name <tt>
</tt> HAVING count(name) > 1);</span><span class="dl"><tt>
</tt>EOS</span></span><tt>
</tt></pre></td>
</tr></table>
<p>You will need your own strategy for resolving the duplicates, since it is totally dependent on your data. Some ideas:</p>
<ul>
<li>Arbitrarily deleting one of the records. Perhaps based on latest update time? Don’t forget about child records! If you have forgotten a uniqueness constraint it is likely you have also forgotten a foreign key, so you will have to delete child records manually.</li>
<li>Merge the records, including child records.</li>
<li>Manually resolving the conflicts on a case by case basis. Possible if there are not too many duplicates.</li>
</ul>tag:www.rhnh.net,2008:Post/8292010-08-19T18:30:38Z2010-08-19T18:30:38ZSTI is the global variable of data modelling<p>A Single Table Inheritance table is really easy to both update and query. This makes it ideal for rapid prototyping: just throw some extra columns on it and you are good to go! This is why <span class="caps">STI</span> is so popular, and it fits perfectly into the Rails philosophy of getting things up and running fast.</p>
<p>Fast coding techniques do not always transfer into solid, maintainable code however. It is really easy to hack something together with global variables, but we eschew them when writing industry code. <span class="caps">STI</span> falls into the same category. I have written about <a href="http://rhnh.net/2010/07/02/3-reasons-why-you-should-not-use-single-table-inheritance">the downsides of <span class="caps">STI</span></a> before: it clutters your data model, weakens your data integrity, and can be difficult to index. <span class="caps">STI</span> is a fast technique to get started with, but is not necessarily a great option for maintainable applications, especially when there are other modelling techniques such as class table inheritance available.</p>tag:www.rhnh.net,2008:Post/8282010-08-17T16:41:42Z2010-08-17T16:41:42ZUpdating Class Table Inheritance Tables<p>My last post covered <a href="http://rhnh.net/2010/08/15/class-table-inheritance-and-eager-loading">querying class table inheritance</a> tables; this one presents a method for updating them. Having set up our ActiveRecord models using composition, we can use a standard rails method <code>accepts_nested_attributes_for</code> to allow easy one-form updating of the relationship.</p><table class="CodeRay"><tr>
<td class="line_numbers" title="click to toggle" onclick="with (this.firstChild.style) { display = (display == '') ? 'none' : '' }"><pre>1<tt>
</tt>2<tt>
</tt>3<tt>
</tt>4<tt>
</tt>5<tt>
</tt>6<tt>
</tt>7<tt>
</tt>8<tt>
</tt>9<tt>
</tt><strong>10</strong><tt>
</tt>11<tt>
</tt>12<tt>
</tt>13<tt>
</tt>14<tt>
</tt>15<tt>
</tt>16<tt>
</tt>17<tt>
</tt>18<tt>
</tt>19<tt>
</tt><strong>20</strong><tt>
</tt></pre></td>
<td class="code"><pre ondblclick="with (this.style) { overflow = (overflow == 'auto' || overflow == '') ? 'visible' : 'auto' }"><span class="r">class</span> <span class="cl">Item</span> < <span class="co">ActiveRecord</span>::<span class="co">Base</span><tt>
</tt> validates_numericality_of <span class="sy">:quantity</span><tt>
</tt><tt>
</tt> <span class="co">SUBCLASSES</span> = [<span class="sy">:dvd</span>, <span class="sy">:car</span>]<tt>
</tt> <span class="co">SUBCLASSES</span>.each <span class="r">do</span> |class_name|<tt>
</tt> has_one class_name<tt>
</tt> <span class="r">end</span><tt>
</tt><tt>
</tt> accepts_nested_attributes_for *<span class="co">SUBCLASSES</span><tt>
</tt><span class="r">end</span><tt>
</tt><tt>
</tt><span class="iv">@item</span> = <span class="co">Dvd</span>.create!(<tt>
</tt> <span class="sy">:title</span> => <span class="s"><span class="dl">'</span><span class="k">The Matix</span><span class="dl">'</span></span>,<tt>
</tt> <span class="sy">:item</span> => <span class="co">Item</span>.create!(<span class="sy">:quantity</span> => <span class="i">1</span>))<tt>
</tt><tt>
</tt><span class="iv">@item</span>.update_attributes(<tt>
</tt> <span class="sy">:quantity</span> => <span class="i">2</span>,<tt>
</tt> <span class="sy">:dvd_attributes</span> => {<tt>
</tt> <span class="sy">:id</span> => <span class="iv">@item</span>.dvd.id,<tt>
</tt> <span class="sy">:title</span> => <span class="s"><span class="dl">'</span><span class="k">The Matrix</span><span class="dl">'</span></span>})<tt>
</tt></pre></td>
</tr></table>
<p>This issues the following <span class="caps">SQL</span> to the database:</p><table class="CodeRay"><tr>
<td class="line_numbers" title="click to toggle" onclick="with (this.firstChild.style) { display = (display == '') ? 'none' : '' }"><pre>1<tt>
</tt>2<tt>
</tt></pre></td>
<td class="code"><pre ondblclick="with (this.style) { overflow = (overflow == 'auto' || overflow == '') ? 'visible' : 'auto' }">UPDATE "items" SET "quantity" = 10 WHERE ("items"."id" = 12)<tt>
</tt>UPDATE "dvds" SET "title" = 'The Matrix' WHERE ("dvds"."id" = 12)<tt>
</tt></pre></td>
</tr></table>
<p>Note that depending on your application, you may need some extra locking to ensure this method is concurrent, for example if you allow items to change type. Be sure to read the <a href="http://apidock.com/rails/ActiveRecord/NestedAttributes/ClassMethods/accepts_nested_attributes_for">accepts_nested_attributes_for documentation</a> for the full <span class="caps">API</span>.</p>
<p><em>I talk about this sort of thing in my “Your Database Is Your Friend” training sessions. They are happening throughout the US and UK in the coming months. One is likely coming to a city near you. Head on over to <a href="http://www.dbisyourfriend.com">www.dbisyourfriend.com</a> for more information and free screencasts <img src="http://www.dbisyourfriend.com/favicon.ico" alt="" /></em></p>tag:www.rhnh.net,2008:Post/8272010-08-15T05:30:00Z2010-08-17T16:34:23ZClass Table Inheritance and Eager Loading <p>Consider a typical class table inheritance table structure with <code>items</code> as the base class and <code>dvds</code> and <code>cars</code> as two subclasses. In addition to what is strictly required, <code>items</code> also has an <code>item_type</code> parameter. This denormalization is usually a good idea, I will save the justification for another post so please take it for granted for now.</p>
<p>The easiest way to map this relationship with Rails and ActiveRecord is to use composition, rather than trying to hook into the class loading code. Something akin to:</p><table class="CodeRay"><tr>
<td class="line_numbers" title="click to toggle" onclick="with (this.firstChild.style) { display = (display == '') ? 'none' : '' }"><pre>1<tt>
</tt>2<tt>
</tt>3<tt>
</tt>4<tt>
</tt>5<tt>
</tt>6<tt>
</tt>7<tt>
</tt>8<tt>
</tt>9<tt>
</tt><strong>10</strong><tt>
</tt>11<tt>
</tt>12<tt>
</tt>13<tt>
</tt>14<tt>
</tt>15<tt>
</tt>16<tt>
</tt>17<tt>
</tt>18<tt>
</tt>19<tt>
</tt><strong>20</strong><tt>
</tt>21<tt>
</tt>22<tt>
</tt>23<tt>
</tt>24<tt>
</tt>25<tt>
</tt>26<tt>
</tt>27<tt>
</tt>28<tt>
</tt>29<tt>
</tt><strong>30</strong><tt>
</tt>31<tt>
</tt></pre></td>
<td class="code"><pre ondblclick="with (this.style) { overflow = (overflow == 'auto' || overflow == '') ? 'visible' : 'auto' }"><span class="r">class</span> <span class="cl">Item</span> < <span class="co">ActiveRecord</span>::<span class="co">Base</span><tt>
</tt> <span class="co">SUBCLASSES</span> = [<span class="sy">:dvd</span>, <span class="sy">:car</span>]<tt>
</tt> <span class="co">SUBCLASSES</span>.each <span class="r">do</span> |class_name|<tt>
</tt> has_one class_name<tt>
</tt> <span class="r">end</span><tt>
</tt><tt>
</tt> <span class="r">def</span> <span class="fu">description</span><tt>
</tt> send(item_type).description<tt>
</tt> <span class="r">end</span><tt>
</tt><span class="r">end</span><tt>
</tt><tt>
</tt><span class="r">class</span> <span class="cl">Dvd</span> < <span class="co">ActiveRecord</span>::<span class="co">Base</span><tt>
</tt> belongs_to <span class="sy">:item</span><tt>
</tt><tt>
</tt> validates_presence_of <span class="sy">:title</span>, <span class="sy">:running_time</span><tt>
</tt> validates_numericality_of <span class="sy">:running_time</span><tt>
</tt><tt>
</tt> <span class="r">def</span> <span class="fu">description</span><tt>
</tt> title<tt>
</tt> <span class="r">end</span><tt>
</tt><span class="r">end</span><tt>
</tt><tt>
</tt><span class="r">class</span> <span class="cl">Car</span> < <span class="co">ActiveRecord</span>::<span class="co">Base</span><tt>
</tt> belongs_to <span class="sy">:item</span><tt>
</tt><tt>
</tt> validates_presence_of <span class="sy">:make</span>, <span class="sy">:registration</span><tt>
</tt><tt>
</tt> <span class="r">def</span> <span class="fu">description</span><tt>
</tt> make<tt>
</tt> <span class="r">end</span><tt>
</tt><span class="r">end</span><tt>
</tt></pre></td>
</tr></table>
<p>A naive way to fetch all the items might look like this:</p><table class="CodeRay"><tr>
<td class="line_numbers" title="click to toggle" onclick="with (this.firstChild.style) { display = (display == '') ? 'none' : '' }"><pre>1<tt>
</tt></pre></td>
<td class="code"><pre ondblclick="with (this.style) { overflow = (overflow == 'auto' || overflow == '') ? 'visible' : 'auto' }"><span class="co">Item</span>.all(<span class="sy">:include</span> => <span class="co">Item</span>::<span class="co">SUBCLASSES</span>)<tt>
</tt></pre></td>
</tr></table>
<p>This will issue one initial query, then one for each subclass. (Since Rails 2.1, eager loading is done like this rather than joining.) This is inefficient, since at the point we preload the associations we already know which subclass tables we should be querying. There is no need to query all of them. A better way is to hook into the Rails eager loading ourselves to ensure that only the tables required are loaded:</p><table class="CodeRay"><tr>
<td class="line_numbers" title="click to toggle" onclick="with (this.firstChild.style) { display = (display == '') ? 'none' : '' }"><pre>1<tt>
</tt>2<tt>
</tt>3<tt>
</tt></pre></td>
<td class="code"><pre ondblclick="with (this.style) { overflow = (overflow == 'auto' || overflow == '') ? 'visible' : 'auto' }"><span class="co">Item</span>.all(opts).tap <span class="r">do</span> |items|<tt>
</tt> preload_associations(items, items.map(&<span class="sy">:item_type</span>).uniq)<tt>
</tt><span class="r">end</span><tt>
</tt></pre></td>
</tr></table>
<p>Wrapping that up in a class method on items is neat because we can then use it as a kicker at the end of named scopes or associations – <code>person.items.preloaded</code>, for instance.</p>
<p>Here are some tests demonstrating this:</p><table class="CodeRay"><tr>
<td class="line_numbers" title="click to toggle" onclick="with (this.firstChild.style) { display = (display == '') ? 'none' : '' }"><pre>1<tt>
</tt>2<tt>
</tt>3<tt>
</tt>4<tt>
</tt>5<tt>
</tt>6<tt>
</tt>7<tt>
</tt>8<tt>
</tt>9<tt>
</tt><strong>10</strong><tt>
</tt>11<tt>
</tt>12<tt>
</tt>13<tt>
</tt>14<tt>
</tt>15<tt>
</tt>16<tt>
</tt>17<tt>
</tt>18<tt>
</tt>19<tt>
</tt><strong>20</strong><tt>
</tt>21<tt>
</tt>22<tt>
</tt>23<tt>
</tt>24<tt>
</tt>25<tt>
</tt>26<tt>
</tt>27<tt>
</tt>28<tt>
</tt>29<tt>
</tt><strong>30</strong><tt>
</tt>31<tt>
</tt>32<tt>
</tt>33<tt>
</tt>34<tt>
</tt>35<tt>
</tt></pre></td>
<td class="code"><pre ondblclick="with (this.style) { overflow = (overflow == 'auto' || overflow == '') ? 'visible' : 'auto' }">require <span class="s"><span class="dl">'</span><span class="k">test/test_helper</span><span class="dl">'</span></span><tt>
</tt><tt>
</tt><span class="r">class</span> <span class="cl">PersonTest</span> < <span class="co">ActiveRecord</span>::<span class="co">TestCase</span><tt>
</tt> setup <span class="r">do</span><tt>
</tt> item = <span class="co">Item</span>.create!(<span class="sy">:item_type</span> => <span class="s"><span class="dl">'</span><span class="k">dvd</span><span class="dl">'</span></span>)<tt>
</tt> dvd = <span class="co">Dvd</span>.create!(<span class="sy">:item</span> => item, <span class="sy">:title</span> => <span class="s"><span class="dl">'</span><span class="k">Food Inc.</span><span class="dl">'</span></span>)<tt>
</tt> <span class="r">end</span><tt>
</tt><tt>
</tt> test <span class="s"><span class="dl">'</span><span class="k">naive eager load</span><span class="dl">'</span></span> <span class="r">do</span><tt>
</tt> items = []<tt>
</tt> assert_queries(<span class="i">3</span>) { items = <span class="co">Item</span>.all(<span class="sy">:include</span> => <span class="co">Item</span>::<span class="co">SUBCLASSES</span>) }<tt>
</tt> assert_equal <span class="i">1</span>, items.size<tt>
</tt> assert_queries(<span class="i">0</span>) { items.map(&<span class="sy">:description</span>) }<tt>
</tt> <span class="r">end</span><tt>
</tt><tt>
</tt> test <span class="s"><span class="dl">'</span><span class="k">smart eager load</span><span class="dl">'</span></span> <span class="r">do</span><tt>
</tt> items = []<tt>
</tt> assert_queries(<span class="i">2</span>) { items = <span class="co">Item</span>.preloaded }<tt>
</tt> assert_equal <span class="i">1</span>, items.size<tt>
</tt> assert_queries(<span class="i">0</span>) { items.map(&<span class="sy">:description</span>) }<tt>
</tt> <span class="r">end</span><tt>
</tt><span class="r">end</span><tt>
</tt><tt>
</tt><span class="c"># Monkey patch stolen from activerecord/test/cases/helper.rb</span><tt>
</tt><span class="co">ActiveRecord</span>::<span class="co">Base</span>.connection.class.class_eval <span class="r">do</span><tt>
</tt> <span class="co">IGNORED_SQL</span> = [<span class="rx"><span class="dl">/</span><span class="k">^PRAGMA</span><span class="dl">/</span></span>, <span class="rx"><span class="dl">/</span><span class="k">^SELECT currval</span><span class="dl">/</span></span>, <span class="rx"><span class="dl">/</span><span class="k">^SELECT CAST</span><span class="dl">/</span></span>, <span class="rx"><span class="dl">/</span><span class="k">^SELECT @@IDENTITY</span><span class="dl">/</span></span>, <span class="rx"><span class="dl">/</span><span class="k">^SELECT @@ROWCOUNT</span><span class="dl">/</span></span>, <span class="rx"><span class="dl">/</span><span class="k">^SAVEPOINT</span><span class="dl">/</span></span>, <span class="rx"><span class="dl">/</span><span class="k">^ROLLBACK TO SAVEPOINT</span><span class="dl">/</span></span>, <span class="rx"><span class="dl">/</span><span class="k">^RELEASE SAVEPOINT</span><span class="dl">/</span></span>, <span class="rx"><span class="dl">/</span><span class="k">SHOW FIELDS</span><span class="dl">/</span></span>]<tt>
</tt><tt>
</tt> <span class="r">def</span> <span class="fu">execute_with_query_record</span>(sql, name = <span class="pc">nil</span>, &block)<tt>
</tt> <span class="gv">$queries_executed</span> ||= []<tt>
</tt> <span class="gv">$queries_executed</span> << sql <span class="r">unless</span> <span class="co">IGNORED_SQL</span>.any? { |r| sql =~ r }<tt>
</tt> execute_without_query_record(sql, name, &block)<tt>
</tt> <span class="r">end</span><tt>
</tt><tt>
</tt> alias_method_chain <span class="sy">:execute</span>, <span class="sy">:query_record</span><tt>
</tt><span class="r">end</span><tt>
</tt></pre></td>
</tr></table>
<p><em>I talk about this sort of thing in my “Your Database Is Your Friend” training sessions. They are happening throughout the US and UK in the coming months. One is likely coming to a city near you. Head on over to <a href="http://www.dbisyourfriend.com">www.dbisyourfriend.com</a> for more information and free screencasts <img src="http://www.dbisyourfriend.com/favicon.ico" alt="" /></em></p>tag:www.rhnh.net,2008:Post/8262010-08-10T02:00:00Z2010-08-10T17:46:35ZLast minute training in Seattle<p>If you or someone you know missed out on Saturday, I’ve scheduled a last minute database training for Seattle <strong>tomorrow</strong>. <a href='http://www.dbisyourfriend.com/events/20100811/registrations/new'>Register here</a>. Last chance before I head to Chicago for a <a href="http://www.dbisyourfriend.com/events/20100813/registrations/new">training on Friday</a>.</p>tag:www.rhnh.net,2008:Post/8252010-08-06T07:30:00Z2010-08-06T00:47:41ZConstraints assist understanding<p>The hardest thing for a new developer on a project to wrap his head around is not the code. For the most part, ruby code stays the same across projects. My controllers look like your controllers, my models look like your models. What defines an application is not the code, but the domain. The business concepts, and how they are translated into code, can take weeks or months to understand cleanly. Modelling your domain in a way that it is easily understood is an important principle to speed up this learning process.</p>
<p>In an application I am looking at there is an email field in the user model. It is defined as a string that allows null values. This is confusing. I need to figure in what circumstances a null value makes sense (can they choose to withhold that piece of information? Is there a case where a new column I am adding should be null?), which is extra information I need to locate and process before I can understand the code. There is a <code>validates_presence_of</code> declaration on the attribute, but production data has some null values. Two parts of the application are telling me two contradicting stories about the domain.</p>
<p>Further, when I am tracking down a bug in the application, eliminating the possibility that a column could be null is an extra step I need to take. The data model is harder to reason about because there are more possible states than strictly necessary.</p>
<p>Allowing a null value in a column creates another piece of information that a developer has to process. It creates an extra question that needs to be answered when reading the code: in what circumstances is a null value appropriate? Multiply this problem out to multiple columns (and factor in other sub-optimal modeling techniques not covered here), and the time to understanding quickly grows out of hand.</p>
<p>Adding not-null constraints on your database is a quick and cheap way to bring your data model inline with the code that sits on top of it. In addition to cutting lines of code, cut out extraneous information from your data model. For little cost, constraints simplify your application conceptually and allow your data to be reasoned about more efficiently.</p>
<p><em>I talk about this sort of thing in my “Your Database Is Your Friend” training sessions. They are happening throughout the US and UK in the coming months. One is likely coming to a city near you. Head on over to <a href="http://www.dbisyourfriend.com">www.dbisyourfriend.com</a> for more information and free screencasts <img src="http://www.dbisyourfriend.com/favicon.ico" alt="" /></em></p>