Skip to main content.
home | support | download

Back to List Archive

Re: Search by path

From: Peter Karman <peter(at)not-real.peknet.com>
Date: Fri May 27 2005 - 13:21:00 GMT
Joe Lopez scribbled on 5/27/05 6:51 AM:

> I need the possibility to search by _path_. So when I search for
> "foo", all files
> that contain the word "foo" in their path (eg. /bar/foo/txt or
> /foo/bar/t.txt but not
> /bar/foo.txt) would be found.
> 
> Is there some straightforward way to achieve that in swish-e?

yes. make swishdocpath a metaname. by default it is only a property.

caveats: you must search for 'foo' within the swishdocpath metaname. also, your 
results will be thrown off if you include / (slash) as a WordCharacter.

here's a long example. notice how I do NOT include the c (config) file the first 
time I index.

karpet@cartermac 117% cat c
MetaNames swishdocpath
karpet@cartermac 118% cat foo.html
<html>
<body>
some text
</body>
</html>

karpet@cartermac 109% swish-e -i $PWD/foo.html -v 9
Indexing Data Source: "File-System"
Indexing "/tmp/foo.html"

Checking file "/tmp/foo.html"...
   foo.html - Using DEFAULT (HTML2) parser -  (2 words)

..
Indexing done!
karpet@cartermac 110% swish-e -T index_all
..


-----> METANAMES for index.swish-e <-----
         swishdefault : id= 1 type= 1  META_INDEX  Rank Bias=  0
        swishreccount : id= 2 type=42  META_INTERNAL META_PROP:NUMBER
            swishrank : id= 3 type=42  META_INTERNAL META_PROP:NUMBER
         swishfilenum : id= 4 type=42  META_INTERNAL META_PROP:NUMBER
          swishdbfile : id= 5 type=38  META_INTERNAL 
META_PROP:STRING(case:compare) SortKeyLen: 100
         swishdocpath : id= 6 type= 6  META_PROP:STRING(case:compare) 
SortKeyLen: 100  *presorted*
           swishtitle : id= 7 type=70  META_PROP:STRING(case:ignore) SortKeyLen: 
100  *presorted*
         swishdocsize : id= 8 type=10  META_PROP:NUMBER *presorted*
    swishlastmodified : id= 9 type=18  META_PROP:DATE *presorted*


-----> WORD INFO in index index.swish-e <-----

some
  Meta:1 /tmp/foo.html Freq:1 Pos/Struct:5/9

text
  Meta:1 /tmp/foo.html Freq:1 Pos/Struct:6/9


-----> FILES in index index.swish-e <-----
Dumping File Properties for File Number: 1
  (No Properties)

ReadAllDocProperties:
           swishdocpath: 6 ( 13) S: "/tmp/foo.html"
           swishdocsize: 8 (  4) N: "40"
      swishlastmodified: 9 (  4) D: "2005-05-27 08:12:43 CDT"

ReadSingleDocPropertiesFromDisk:
           swishdocpath: 6 ( 13) S: "/tmp/foo.html"
           swishdocsize: 8 (  4) N: "40"
      swishlastmodified: 9 (  4) D: "2005-05-27 08:12:43 CDT"


karpet@cartermac 111% swish-e -w foo
# SWISH format: 2.5.4
# Search words: foo
# Removed stopwords:
err: no results
.
karpet@cartermac 112% vi c
karpet@cartermac 113% swish-e -i $PWD/foo.html -v 9 -c c
Parsing config file 'c'
Indexing Data Source: "File-System"
Indexing "/tmp/foo.html"

Checking file "/tmp/foo.html"...
   foo.html - Using DEFAULT (HTML2) parser -  (2 words)

...
karpet@cartermac 114% swish-e -T index_all
..


-----> METANAMES for index.swish-e <-----
         swishdefault : id= 1 type= 1  META_INDEX  Rank Bias=  0
        swishreccount : id= 2 type=42  META_INTERNAL META_PROP:NUMBER
            swishrank : id= 3 type=42  META_INTERNAL META_PROP:NUMBER
         swishfilenum : id= 4 type=42  META_INTERNAL META_PROP:NUMBER
          swishdbfile : id= 5 type=38  META_INTERNAL 
META_PROP:STRING(case:compare) SortKeyLen: 100
         swishdocpath : id= 6 type= 6  META_PROP:STRING(case:compare) 
SortKeyLen: 100  *presorted*
           swishtitle : id= 7 type=70  META_PROP:STRING(case:ignore) SortKeyLen: 
100  *presorted*
         swishdocsize : id= 8 type=10  META_PROP:NUMBER *presorted*
    swishlastmodified : id= 9 type=18  META_PROP:DATE *presorted*
         swishdocpath : id=10 type= 1  META_INDEX  Rank Bias=  0


-----> WORD INFO in index index.swish-e <-----

foo
  Meta:10 /tmp/foo.html Freq:1 Pos/Struct:2/1

html
  Meta:10 /tmp/foo.html Freq:1 Pos/Struct:3/1

some
  Meta:1 /tmp/foo.html Freq:1 Pos/Struct:5/9

text
  Meta:1 /tmp/foo.html Freq:1 Pos/Struct:6/9

tmp
  Meta:10 /tmp/foo.html Freq:1 Pos/Struct:1/1


-----> FILES in index index.swish-e <-----
Dumping File Properties for File Number: 1
  (No Properties)

ReadAllDocProperties:
           swishdocpath: 6 ( 13) S: "/tmp/foo.html"
           swishdocsize: 8 (  4) N: "40"
      swishlastmodified: 9 (  4) D: "2005-05-27 08:12:43 CDT"

ReadSingleDocPropertiesFromDisk:
           swishdocpath: 6 ( 13) S: "/tmp/foo.html"
           swishdocsize: 8 (  4) N: "40"
      swishlastmodified: 9 (  4) D: "2005-05-27 08:12:43 CDT"


karpet@cartermac 115% swish-e -w foo
# SWISH format: 2.5.4
# Search words: foo
# Removed stopwords:
err: no results
.
karpet@cartermac 116% swish-e -w swishdocpath=foo
# SWISH format: 2.5.4
# Search words: swishdocpath=foo
# Removed stopwords:
# Number of hits: 1
# Search time: 0.004 seconds
# Run time: 0.031 seconds
1000 /tmp/foo.html "foo.html" 40
.


-- 
Peter Karman  .  http://peknet.com/  .  peter(at)not-real.peknet.com
Received on Fri May 27 06:21:01 2005