Kevin Bowling wrote on 02/05/2009 09:08 AM:
> On Thursday 05 February 2009 07:32:46 Peter Karman wrote:
>> Kevin Bowling wrote on 02/04/2009 11:57 PM:
>>>> Ranking should be somewhat better.
>>> I look forward to the better rankings!
>> have you tried the different RankScheme options in Swish-e 2.4?
>>> What really kills performance is PDF/PS to HTML conversions on my box.
>>> It would be really nice to thread the indexing and converting so it
>>> doesn't block on this case.
>> you can do that yourself. Just filter your PDF/PS separately and cache
>> the output, then index your cache. That's a common approach.
> I assume I am not the only person in the world indexing PDF and PS.
> Everything in FOSS search seems to be DIY, but really everybody has similar
> requirements. I'm actually only indexing the PS as text (with pstotext) since
> Swish-e doesn't have a suitable script included. Again, I'm sure plenty of
> people do this. Why not include robust scripts to deal with PDF, PS, DOC, et
SWISH::Filter should handle all of those save .ps I think. It would be
easy to add one -- why not give it a try based on what you do already
and submit a patch?
The latest SWISH::Filter is on CPAN:
> Also, using 'file' or MIME information to index rather than file extensions,
> even when run on the local file system, would be pretty nice. I know there is
> a lot of data on my box that isn't indexed because of this.
>>>> If all you need is a local filesystem indexer for a website with 200k+
>>>> docs (which I would call medium-sized -- these days folks deal with
>>>> multi-million doc collections), and you don't need UTF-8 or incremental
>>>> indexing, Swish-e 2.4.5 is about as good as it gets. Don't let its age
>>>> fool you. :)
>>> Yes improvements and a nicer interface are really all I would like to
>>> see. It's 2009 and the interface looks like it is 10+ years old (not that
>>> that is a bad thing, but a 'modern' interface would be nice as well).
>> by 'interface' do you mean the swish.cgi script? or the options to the
>> swish-e cli? or ...?
> Yes.. swish.cgi. A nice, full featured and modern interface that could be
> themed or integrated into other pages would really round out the turn-key
> search solution.
> This all seems like low hanging fruit. We both agree that Swish-e is fast and
> has good results. I think improving the included interface and filters would
> go a long way. Hopefully you can start pushing releases as well. Long
> release cycles are no good for FOSS.
agreed, in theory. low hanging fruit and pushing releases still requires
tuits however. have any to share?
> I hope I don't come off sounding selfish or sound like 'do this work for me
> please'. I just use this on a non-commercial site
> (http://ps-2.kev009.com:8081/). Hopefully I am at least providing useful
> critique as an end user.
become more than an end user and send along a PS filter. :)
Peter Karman . peter(at)not-real.peknet.com . http://peknet.com/
Users mailing list
Received on Thu Feb 5 10:14:38 2009