tag:www.rhnh.net,2008:/lsi Lsi - Xavier Shay's Blog 2008-04-17T18:51:11Z Enki Xavier Shay notreal@rhnh.net tag:www.rhnh.net,2008:Post/757 2008-04-16T19:16:00Z 2008-04-17T18:51:11Z Classifier gem rubbish for recommending posts <p>Chatting with <a href="http://toolmantim.com">Tim</a> today he suggested maybe using Classifier::LSI would be a cool way to offer &#8216;related posts&#8217; suggestions for a blog.</p> <p>Not really knowing anything about it, I whipped up a prototype rake task. It creates the index then marshals it to disk because it takes ages to create and it&#8217;s not much fun to play with when you have to wait minutes each time. It then presents 3 related suggestions for each post.</p><table class="CodeRay"><tr> <td class="line_numbers" title="click to toggle" onclick="with (this.firstChild.style) { display = (display == '') ? 'none' : '' }"><pre>1<tt> </tt>2<tt> </tt>3<tt> </tt>4<tt> </tt>5<tt> </tt>6<tt> </tt>7<tt> </tt>8<tt> </tt>9<tt> </tt><strong>10</strong><tt> </tt>11<tt> </tt>12<tt> </tt>13<tt> </tt>14<tt> </tt>15<tt> </tt>16<tt> </tt>17<tt> </tt>18<tt> </tt>19<tt> </tt><strong>20</strong><tt> </tt>21<tt> </tt>22<tt> </tt>23<tt> </tt></pre></td> <td class="code"><pre ondblclick="with (this.style) { overflow = (overflow == 'auto' || overflow == '') ? 'visible' : 'auto' }">require <span class="s"><span class="dl">'</span><span class="k">classifier</span><span class="dl">'</span></span><tt> </tt><tt> </tt>namespace <span class="sy">:lsi</span> <span class="r">do</span><tt> </tt> task <span class="sy">:test</span> =&gt; <span class="sy">:environment</span> <span class="r">do</span><tt> </tt> <span class="r">if</span> <span class="co">File</span>.exists?(<span class="s"><span class="dl">&quot;</span><span class="k">lsidata.dump</span><span class="dl">&quot;</span></span>)<tt> </tt> lsi = <span class="co">File</span>.open(<span class="s"><span class="dl">&quot;</span><span class="k">lsidata.dump</span><span class="dl">&quot;</span></span>) {|f| <span class="co">Marshal</span>.load(f) }<tt> </tt> <span class="r">else</span> <tt> </tt> lsi = <span class="co">Classifier</span>::<span class="co">LSI</span>.new<tt> </tt> <span class="co">Post</span>.find(<span class="sy">:all</span>, <span class="sy">:order</span> =&gt; <span class="s"><span class="dl">'</span><span class="k">published_at DESC</span><span class="dl">'</span></span>).each <span class="r">do</span> |post|<tt> </tt> text = post.body<tt> </tt> categories = post.tags.collect(&amp;<span class="sy">:name</span>)<tt> </tt> puts <span class="s"><span class="dl">&quot;</span><span class="k">Indexing </span><span class="dl">&quot;</span></span> + post.title<tt> </tt> lsi.add_item(text, *categories)<tt> </tt> <span class="r">end</span><tt> </tt> <span class="co">File</span>.open(<span class="s"><span class="dl">&quot;</span><span class="k">lsidata.dump</span><span class="dl">&quot;</span></span>, <span class="s"><span class="dl">&quot;</span><span class="k">w</span><span class="dl">&quot;</span></span>) {|f| <span class="co">Marshal</span>.dump(lsi, f) }<tt> </tt> <span class="r">end</span><tt> </tt><tt> </tt> <span class="co">Post</span>.find(<span class="sy">:all</span>).each <span class="r">do</span> |post|<tt> </tt> puts post.title<tt> </tt> puts lsi.find_related(post.body, <span class="i">3</span>).collect {|i| <span class="co">Post</span>.find_by_body(i).title }.inspect<tt> </tt> <span class="r">end</span><tt> </tt> <span class="r">end</span><tt> </tt><span class="r">end</span><tt> </tt></pre></td> </tr></table> <p>Here&#8217;s the data for my last 5 articles. I don&#8217;t know what I was expecting, but this just doesn&#8217;t seem very helpful. I don&#8217;t have a very rich set of tags on my posts, so that probably has something to do with it. Was kind of hoping it would just look at text and all just work * waves hands *.</p><table class="CodeRay"><tr> <td class="line_numbers" title="click to toggle" onclick="with (this.firstChild.style) { display = (display == '') ? 'none' : '' }"><pre>1<tt> </tt>2<tt> </tt>3<tt> </tt>4<tt> </tt>5<tt> </tt>6<tt> </tt>7<tt> </tt>8<tt> </tt>9<tt> </tt><strong>10</strong><tt> </tt>11<tt> </tt>12<tt> </tt>13<tt> </tt>14<tt> </tt>15<tt> </tt>16<tt> </tt>17<tt> </tt>18<tt> </tt>19<tt> </tt><strong>20</strong><tt> </tt></pre></td> <td class="code"><pre ondblclick="with (this.style) { overflow = (overflow == 'auto' || overflow == '') ? 'visible' : 'auto' }">Seagate 500Gb FreeAgent Pro external drive - first impressions<tt> </tt> - Building Firefox Extensions<tt> </tt> - The Colemak Diaries<tt> </tt> - Counting ActiveRecord associations: count, size or length?<tt> </tt>Coconut Oats<tt> </tt> - The Colemak Diaries<tt> </tt> - Summertime Tagliarini<tt> </tt> - Mary Iron Chef - Chocolate Jaffa Boxes<tt> </tt>Mary Iron Chef - Chocolate Jaffa Boxes<tt> </tt> - The Colemak Diaries<tt> </tt> - Building Firefox Extensions<tt> </tt> - Summertime Tagliarini<tt> </tt>Paypal IPN fails date standards<tt> </tt> - Building Firefox Extensions<tt> </tt> - Straight Sailing with Magellan<tt> </tt> - The Colemak Diaries<tt> </tt>I'm number 8!<tt> </tt> - Extending Rails<tt> </tt> - Practical Hpricot: SVG<tt> </tt> - Day of days<tt> </tt></pre></td> </tr></table> <p>Next step is to try tagging my stuff better and seeing if that helps out.</p> <h3>Getting classifier working</h3> <p>Quick side note &#8211; pure ruby classifier doesn&#8217;t work out of the box with rails because it also redefines <code>Array#sum</code>. If you install the <span class="caps">GSL</span> lib and the ruby bindings (see classifier docs) you&#8217;ll still need this one line patch to classifier to get it to work:</p><table class="CodeRay"><tr> <td class="line_numbers" title="click to toggle" onclick="with (this.firstChild.style) { display = (display == '') ? 'none' : '' }"><pre>1<tt> </tt>2<tt> </tt>3<tt> </tt>4<tt> </tt>5<tt> </tt>6<tt> </tt>7<tt> </tt>8<tt> </tt>9<tt> </tt><strong>10</strong><tt> </tt>11<tt> </tt>12<tt> </tt></pre></td> <td class="code"><pre ondblclick="with (this.style) { overflow = (overflow == 'auto' || overflow == '') ? 'visible' : 'auto' }">Index: lib/classifier/lsi.rb<tt> </tt>===================================================================<tt> </tt>--- lib/classifier/lsi.rb (revision 31)<tt> </tt>+++ lib/classifier/lsi.rb (working copy)<tt> </tt>@@ -25,6 +25,8 @@<tt> </tt> # please consult Wikipedia[http://en.wikipedia.org/wiki/Latent_Semantic_Indexing].<tt> </tt> class LSI<tt> </tt> <tt> </tt>+ include GSL if $GSL<tt> </tt>+ <tt> </tt> attr_reader :word_list<tt> </tt> attr_accessor :auto_rebuild<tt> </tt></pre></td> </tr></table> <p><strong><span class="caps">UPDATE</span>:</strong> I&#8217;ve forked <a href="http://github.com/xaviershay/classifier/tree/master">classifier on github</a>, so you can just grab that version if you like.</p>