Skip to main content.
home | support | download

Back to List Archive

if document name is too long it does not show up in result

From: Sascha Frinken <swish-e(at)not-real.safri-net.de>
Date: Thu Jul 20 2006 - 12:14:36 GMT
Hi,

If I index a directory as follows

-------------------- indexing start --------------------

swishlin:~/swishtest# swish-e -v 3 -c swish.conf
Parsing config file 'swish.conf'
Indexing Data Source: "File-System"
Indexing "files"

Checking dir "files"...
  SDB-BS-01-Acidiol-480203.pdf - Using DEFAULT (HTML2) parser -  (1307 words)
  SDB-BS-02-Detergent.pdf - Using DEFAULT (HTML2) parser -  (1011 words)
  SDB-BS-03-Buffer Powder.pdf - Using DEFAULT (HTML2) parser -  (1028 words)
  SDB-BS-04-Enzym 150.pdf - Using DEFAULT (HTML2) parser -  (1127 words)
  SDB-BS-05-Ethidiumbromid-Farbstoffloesung.pdf - Using DEFAULT (HTML2) parser -  (992 words)
  SDB-BS-07-Kaliumsorbat-Granulat-105119.pdf - Using DEFAULT (HTML2) parser -  (903 words)
  SDB-BS-08-Ringertabletten-115525.pdf - Using DEFAULT (HTML2) parser -  (774 words)
  SDB-BS-09-Glycerin-2289.pdf - Using DEFAULT (HTML2) parser -  (916 words)
  SDB-BS-10-BCS.pdf - Using DEFAULT (HTML2) parser -  (959 words)
  SDB-BS-11-PCS.pdf - Using DEFAULT (HTML2) parser -  (990 words)
  SDB-BS-12-Rinse Concentrate.pdf - Using DEFAULT (HTML2) parser -  (1299 words)
  SDB-BS-13-Ammoniak-105428.pdf - Using DEFAULT (HTML2) parser -  (1198 words)
  SDB-BS-05-Ethidiumbromid.pdf - Using DEFAULT (HTML2) parser -  (993 words)

Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 2,054 words alphabetically
Writing header ...
Writing index entries ...
  Writing word text: Complete
  Writing word hash: Complete
  Writing word data: Complete
2,054 unique words indexed.
4 properties sorted.
13 files indexed.  593,511 total bytes.  13,524 total words.
Elapsed time: 00:00:02 CPU time: 00:00:00
Indexing done!

-------------------- indexing end --------------------

and then search for files containing SBD-BS in file name, it does not list 
these two files:
SDB-BS-05-Ethidiumbromid-Farbstoffloesung.pdf
SDB-BS-07-Kaliumsorbat-Granulat-105119.pdf

-------------------- searching start --------------------

swishlin:~/swishtest# swish-e -f index.swish-e -H 1 -w swishdocpath=sdb-bs*
# SWISH format: 2.4.3
# Search words: swishdocpath=sdb-bs*
# Removed stopwords:
# Number of hits: 11
# Search time: 0.002 seconds
# Run time: 0.053 seconds
1000 files/SDB-BS-01-Acidiol-480203.pdf "SDB-BS-01-Acidiol-480203.pdf" 16930
1000 files/SDB-BS-13-Ammoniak-105428.pdf "SDB-BS-13-Ammoniak-105428.pdf" 20078
1000 files/SDB-BS-12-Rinse Concentrate.pdf "SDB-BS-12-Rinse Concentrate.pdf" 49950
1000 files/SDB-BS-11-PCS.pdf "SDB-BS-11-PCS.pdf" 46071
1000 files/SDB-BS-10-BCS.pdf "SDB-BS-10-BCS.pdf" 42995
1000 files/SDB-BS-09-Glycerin-2289.pdf "SDB-BS-09-Glycerin-2289.pdf" 150080
1000 files/SDB-BS-08-Ringertabletten-115525.pdf "SDB-BS-08-Ringertabletten-115525.pdf" 15251
1000 files/SDB-BS-04-Enzym 150.pdf "SDB-BS-04-Enzym 150.pdf" 49771
1000 files/SDB-BS-03-Buffer Powder.pdf "SDB-BS-03-Buffer Powder.pdf" 47372
1000 files/SDB-BS-02-Detergent.pdf "SDB-BS-02-Detergent.pdf" 44943
1000 files/SDB-BS-05-Ethidiumbromid.pdf "SDB-BS-05-Ethidiumbromid.pdf" 47066

-------------------- searching end --------------------

If I shorten the name of the file to less than 41 characters the file shows up in 
the search result...

-------------------- another run start --------------------

swishlin:~/swishtest# mv files/SDB-BS-07-Kaliumsorbat-Granulat-105119.pdf files/SDB-BS-07-Kaliumsorbat-Granulat-1051.pdf
swishlin:~/swishtest# swish-e -v 3 -c swish.conf
[....]
swishlin:~/swishtest# swish-e -f index.swish-e -H 1 -w swishdocpath=sdb-bs*
[....]
1000 files/SDB-BS-07-Kaliumsorbat-Granulat-1051.pdf "SDB-BS-07-Kaliumsorbat-Granulat-1051.pdf" 15938

-------------------- another run end --------------------

Is this a bug or am I missing some settings / parameters?

Thanks in advance

Sascha


My swish.conf:

IndexDir "files"
IndexOnly .doc .xls .pdf .txt .ppt .chm
FileRules filename contains ^~\$
MetaNames swishdocpath
WordCharacters abcdefghijklmnopqrstuvwxyz0123456789.-
FileFilter .pdf /root/swish/pdf2html.php
PropertyNamesMaxLength 1000 swishdocpath
Received on Thu Jul 20 05:14:41 2006