Kevin Bowling wrote on 02/04/2009 11:57 PM:
>> Ranking should be somewhat better.
> I look forward to the better rankings!
have you tried the different RankScheme options in Swish-e 2.4?
> What really kills performance is PDF/PS to HTML conversions on my box. It
> would be really nice to thread the indexing and converting so it doesn't block
> on this case.
you can do that yourself. Just filter your PDF/PS separately and cache
the output, then index your cache. That's a common approach.
>>> What I am confused about is that it now uses Xapian. I haven't tried
>>> Xapian but I know they have their own system called Omega. How does
>>> Swish3 differ from it? I just need a local filesystem indexer for a
>>> website with 200k+ HTML, PDF, TXT and PS files. Are Swish-e and Omega
>>> the only two FOSS contenders?
>> Oh no. There are many. Lucene and its clones. KinoSearch. HyperEstraier
>> (though it seems to have fallen out of support). There are many others.
> I tried several solutions at one time, but I'm really not interested in
> writing an indexer, interface, or anything else as many of these are just
> libraries. To me, web search for a case like this (static documents) should
> be pretty turn-key. Lucene seems to be really nice but it suffers from this.
> I couldn't find a simple, direct file system indexer.
yes. That's what Swish3 will try to do: make IR libraries like Lucene,
Xapian, et al, into turnkey apps. Omega is like that, but it only works
> That is what I like about Swish-e. It was somewhat easy to set up, and at
> least straight forward.
Exactly. See http://blog.peknet.com/projects/swish/whySwish3
>> If all you need is a local filesystem indexer for a website with 200k+ docs
>> (which I would call medium-sized -- these days folks deal with
>> multi-million doc collections), and you don't need UTF-8 or incremental
>> indexing, Swish-e 2.4.5 is about as good as it gets. Don't let its age fool
>> you. :)
> Yes improvements and a nicer interface are really all I would like to see.
> It's 2009 and the interface looks like it is 10+ years old (not that that is a
> bad thing, but a 'modern' interface would be nice as well).
by 'interface' do you mean the swish.cgi script? or the options to the
swish-e cli? or ...?
Peter Karman . peter(at)not-real.peknet.com . http://peknet.com/
Users mailing list
Received on Thu Feb 5 09:32:46 2009