Skip to main content.
home | support | download

Back to List Archive

Re: problems with highlighting and output

From: Peter Karman <peter(at)not-real.peknet.com>
Date: Fri Aug 26 2005 - 13:07:22 GMT
around line 1875 in swish.cgi, you might try putting this debugging line in:

       # Returns hash of the properties that were highlighted
       my $highlighted = $self->highlight_props( $props ) || {};
       warn "HIGHLIGHTED: $_\n" for keys %$highlighted; # debugging

that should show up in your log, showing which properties were actually 
highlighted. That would help pinpoint the point of failure.

I just tried it at my copy:

   http://peknet.com/cgi-bin/swish.cgi

and I get lines like this in my httpd error log:

HIGHLIGHTED: swishdescription
HIGHLIGHTED: swishtitle
HIGHLIGHTED: swishdescription
HIGHLIGHTED: swishtitle
[...]


and the highlighting seems to work.


psydok@sulb.uni-saarland.de scribbled on 8/26/05 1:59 AM:

> Hi Bill, hi Peter,
> 
> thanks for your support.
> 
> @peter: I tried all of the highlighting modules, not one worked.
> 
> 
>>>If the terms were found within the first lines and the are displayed, they
>>>are not highlighted. Nevertheless I increased the StoreDescription
>>>parameter to 999999999, but it did not fix the problem.
>>
>>Are most of your files larger than that?
> 
> 
> some of them are larger, some are not.
> 
> 
>>You can assume that others are using the same code and would have
>>reported this as a problem by now.
> 
> 
> That's absolutely true. So here are is some code:
> 
> First: the configuration for indexing:
> ### start indexing config
> IndexOnly .htm .html .pdf .ps .txt .xml .ppt .pps .doc .rtf .gif. .jpg 
> jpeg .xbm .au .mov .mpg .avi
> 
> IndexFile /u/swish-e/index.swish-e
> IndexDir /u/test/htdocs/test/
> IndexContents HTML .htm .html .pdf .ppt .txt .ps .xml .pps .doc .rtf
> NoContents .gif .jpg .jpeg .xbm .au .mov .mpg .avi
> FollowSymLinks yes
> obeyRobotsNoIndex no
> ConvertHTMLEntities YES
> MetaNames swishtitle swishdescription swishdocpath DC.Title 
> DC.title.translated Title Titel Author DC.Creator DC.Creator.PersonalName 
> DC.Creator.CorporateName DC.Description description contributor 
> DC.Contributor.CorporateName DC.Contributor.PersonalName DC.Subject Subject 
> Keywords DC.Language publisher dc.publisher dc.publisher.corporatename 
> dc.publisher.personalname
> MetaNamesRank 10 body
> MetaNamesRank 9 swishdescription
> MetaNamesRank 8 Title
> MetaNamesRank 7 swishtitle
> MetaNamesRank 6 DC.Title
> MetaNamesRank 5 author
> MetaNamesRank 4 subject
> 
> PropertyNames DC.title swishdescription author contributor subject 
> DC.Language publisher
> PropertyNameAlias DC.Title title Titel
> PropertyNameAlias swishtitle Dc.Title.translated
> PropertyNameAlias author DC.Creator DC.Creator.PersonalName 
> DC.Creator.CorporateName
> PropertyNameAlias swishdescription body DC.Description description
> PropertyNameAlias contributor DC.Contributor.CorporateName 
> DC.Contributor.PersonalName
> PropertyNameAlias subject DC.Subject Keywords
> PropertyNameAlias publisher dc.publisher dc.publisher.corporatename 
> dc.publisher.personalname
> 
> UndefinedMetaTags INDEX
> WordCharacters abcdefghijklmnopqrstuvwxyz0123456789.-
> IgnoreFirstChar .-
> IgnoreLastChar  .-
> BeginCharacters abcdefghijklmnopqrstuvwxyz0123456789
> EndCharacters   abcdefghijklmnopqrstuvwxyz0123456789
> IndexReport 3
> IgnoreWords File: /home/swish-e/stopwords.txt
> IgnoreLimit 90 4
> IgnoreNumberChars 0123456789.,;/&%§$
> IndexComments no
> TranslateCharacters :ascii7:
> BumpPositionCounterCharacters |.
> FileRules dirname contains incoming original
> FileRules filename contains robots. incoming new.txt
> FileFilter .pdf /usr/local/share/doc/swish-e/examples/filter-bin/_pdf2html.pl
> FileFilter .doc /usr/local/bin/catdoc
> FileFilter .rtf /usr/local/bin/catdoc
> FileFilter .ppt /usr/local/bin/catppt
> FileFilter .pps /usr/local/bin/catppt
> 
> StoreDescription HTML <body> 999999999
> 
> ### end indexing config
> 
> And here comes the configuration for the swish.cgi:
> ### start swish.cgi config
> return {
>      swish_binary    => '/usr/local/bin/swish-e',
>      swish_index     => '/u/swish-e/index.swish-e',
>      title => 'Volltextsuche in den Dokumenten',
>      title_property => 'Dc.Title',
>      description_property => swishdescription,
> 
> 
> 
>    display_props   => [qw/Author DC.Title swishtitle DC.Language Subject 
> swishlastmodified swishdocsize swishdocpath /],
>          name_labels => {
>              swishdefault        => 'Alle Elemente durchsuchen',
>              Author              => 'Autor, beteiligte Person/Einrichtung',
>              'DC.Title'          => 'Titel',
>              swishtitle        => 'Alternativer Titel',
>              swishrank           => 'Rank',
>              'DC.Language'         => 'Sprache',
>              swishlastmodified   => 'Datum der letzten Änderung',
>              swishdocsize        => 'Größe des Dokuments',
>              swishdocpath        => 'URL',
>          },
>      date_ranges => 0,
>      page_size       => 10,
>      sorts => [qw/swishrank Title swishlastmodified/],
>      secondary_sort  => [qw/swishlastmodified desc/],
>      template => {
>      package     => 'SWISH::TemplateDefault',
>                  },
>      timeout         => 10,
>      max_query_length => 400,
>      max_chars       => 500,
> 
>        highlight       => {
>               package         => 'SWISH::DefaultHighlight',
>               show_words      => 10,
>               max_words       => 100,
>               occurrences     => 6,
>               highlight_on   => '<b>',
>               highlight_off  => '</b>',
>                  },
> }
> ### end swish.cgi config
> 
> thanks again for your support
> 
> Herb
> 

-- 
Peter Karman  .  http://peknet.com/  .  peter(at)not-real.peknet.com
Received on Fri Aug 26 06:07:26 2005