Robot Has No Heart

Xavier Shay blogs here

A robot that does not have a heart

PostgreSQL 9 and ruby full text search tricks

I have just released an introduction to PostgreSQL screencast, published through PeepCode. It is over an hour long and covers a large number of juicy topics:

  • Setup full text search
  • Optimize search with triggers and indexes
  • Use Postgres with Ruby on Rails 3
  • Optimize indexes by including only the rows that you need
  • Use database standards for more reliable queries
  • Write powerful reports in only a few lines of code
  • Convert an existing MySQL application to use Postgres

It’s a steal at only $12. You can buy it over at PeepCode.

In it, I introduce full text search in postgres, and use a trigger to keep a search vector up to date. I’m not going to cover that here, but the point I get to is:

1
2
3
4
CREATE TRIGGER posts_search_vector_refresh 
  BEFORE INSERT OR UPDATE ON posts 
FOR EACH ROW EXECUTE PROCEDURE
  tsvector_update_trigger(search_vector, 'pg_catalog.english',  body, title);

That is good for simple models, but what if you want to index child models as well? For instance, we want to include comment authors in the search index. I rolled up my sleeves an came up with this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
CREATE OR REPLACE FUNCTION search_trigger() RETURNS trigger AS $$
DECLARE
  search TEXT;
  child_search TEXT;
begin
  SELECT string_agg(author_name, ' ') INTO child_search
  FROM comments
  WHERE post_id = new.id;

  search := '';
  search := search || ' ' || coalesce(new.title);
  search := search || ' ' || coalesce(new.body);
  search := search || ' ' child_search;

  new.search_index := to_tsvector(search); 
  return new;
end
$$ LANGUAGE plpgsql;

CREATE TRIGGER posts_search_vector_refresh 
  BEFORE INSERT OR UPDATE ON posts
FOR EACH ROW EXECUTE PROCEDURE
  search_trigger();

Getting a bit ugly eh. It might be nice to move that logic back into ruby land, but we have the problem that we need to call a database function to convert our search document into the correct data-type. In this case, a quick work around is to store a search_document in a text field on the model, then use a trigger to only index that field into our search_vector field. The search_document field can then easily be set from your ORM.

Of course, any self-respecting rubyist should hide all this complexity behind a neat interface. I have come up with one using DataMapper that automatically adds the required triggers and indexes via auto-migrations. You use it thusly:

1
2
3
4
5
6
7
8
9
10
class Post
  include DataMapper::Resource
  include Searchable

  property :id, Serial
  property :title, String
  property :body, Text

  searchable :title, :body # Provides Post.search('keyword')
end

You can find the Searchable module code over on github. In it you can also find a fugly proof-of-concept for a DSL that generates the above SQL for indexing child models using DataMapper’s rich property model. It worked, but I’m not using it in any production code so I can hardly recommend it. Maybe you want to have a play though.

  1. Ben Hamill says:

    I have it on my list of things to do to get this screencast. I've been monkeying with Postgres a lot recently at work and so far, it's been a pretty good time. Seems worth while to be good at it. We started using tenderlove's texticle gem for FTS. Wondered if you've looked at it and what you thought if so: http://github.com/tenderlove/texticle

  2. Xavier Shay says:

    Hadn't seen texticle, but it looks the goods. You'd want to understand what it's doing in the background though (easily done from scanning the source if you already grok FTS).

Post a comment


(lesstile enabled - surround code blocks with ---)

A pretty flower Another pretty flower