Joe Lopez scribbled on 5/27/05 6:51 AM:
> I need the possibility to search by _path_. So when I search for
> "foo", all files
> that contain the word "foo" in their path (eg. /bar/foo/txt or
> /foo/bar/t.txt but not
> /bar/foo.txt) would be found.
>
> Is there some straightforward way to achieve that in swish-e?
yes. make swishdocpath a metaname. by default it is only a property.
caveats: you must search for 'foo' within the swishdocpath metaname. also, your
results will be thrown off if you include / (slash) as a WordCharacter.
here's a long example. notice how I do NOT include the c (config) file the first
time I index.
karpet@cartermac 117% cat c
MetaNames swishdocpath
karpet@cartermac 118% cat foo.html
<html>
<body>
some text
</body>
</html>
karpet@cartermac 109% swish-e -i $PWD/foo.html -v 9
Indexing Data Source: "File-System"
Indexing "/tmp/foo.html"
Checking file "/tmp/foo.html"...
foo.html - Using DEFAULT (HTML2) parser - (2 words)
..
Indexing done!
karpet@cartermac 110% swish-e -T index_all
..
-----> METANAMES for index.swish-e <-----
swishdefault : id= 1 type= 1 META_INDEX Rank Bias= 0
swishreccount : id= 2 type=42 META_INTERNAL META_PROP:NUMBER
swishrank : id= 3 type=42 META_INTERNAL META_PROP:NUMBER
swishfilenum : id= 4 type=42 META_INTERNAL META_PROP:NUMBER
swishdbfile : id= 5 type=38 META_INTERNAL
META_PROP:STRING(case:compare) SortKeyLen: 100
swishdocpath : id= 6 type= 6 META_PROP:STRING(case:compare)
SortKeyLen: 100 *presorted*
swishtitle : id= 7 type=70 META_PROP:STRING(case:ignore) SortKeyLen:
100 *presorted*
swishdocsize : id= 8 type=10 META_PROP:NUMBER *presorted*
swishlastmodified : id= 9 type=18 META_PROP:DATE *presorted*
-----> WORD INFO in index index.swish-e <-----
some
Meta:1 /tmp/foo.html Freq:1 Pos/Struct:5/9
text
Meta:1 /tmp/foo.html Freq:1 Pos/Struct:6/9
-----> FILES in index index.swish-e <-----
Dumping File Properties for File Number: 1
(No Properties)
ReadAllDocProperties:
swishdocpath: 6 ( 13) S: "/tmp/foo.html"
swishdocsize: 8 ( 4) N: "40"
swishlastmodified: 9 ( 4) D: "2005-05-27 08:12:43 CDT"
ReadSingleDocPropertiesFromDisk:
swishdocpath: 6 ( 13) S: "/tmp/foo.html"
swishdocsize: 8 ( 4) N: "40"
swishlastmodified: 9 ( 4) D: "2005-05-27 08:12:43 CDT"
karpet@cartermac 111% swish-e -w foo
# SWISH format: 2.5.4
# Search words: foo
# Removed stopwords:
err: no results
.
karpet@cartermac 112% vi c
karpet@cartermac 113% swish-e -i $PWD/foo.html -v 9 -c c
Parsing config file 'c'
Indexing Data Source: "File-System"
Indexing "/tmp/foo.html"
Checking file "/tmp/foo.html"...
foo.html - Using DEFAULT (HTML2) parser - (2 words)
...
karpet@cartermac 114% swish-e -T index_all
..
-----> METANAMES for index.swish-e <-----
swishdefault : id= 1 type= 1 META_INDEX Rank Bias= 0
swishreccount : id= 2 type=42 META_INTERNAL META_PROP:NUMBER
swishrank : id= 3 type=42 META_INTERNAL META_PROP:NUMBER
swishfilenum : id= 4 type=42 META_INTERNAL META_PROP:NUMBER
swishdbfile : id= 5 type=38 META_INTERNAL
META_PROP:STRING(case:compare) SortKeyLen: 100
swishdocpath : id= 6 type= 6 META_PROP:STRING(case:compare)
SortKeyLen: 100 *presorted*
swishtitle : id= 7 type=70 META_PROP:STRING(case:ignore) SortKeyLen:
100 *presorted*
swishdocsize : id= 8 type=10 META_PROP:NUMBER *presorted*
swishlastmodified : id= 9 type=18 META_PROP:DATE *presorted*
swishdocpath : id=10 type= 1 META_INDEX Rank Bias= 0
-----> WORD INFO in index index.swish-e <-----
foo
Meta:10 /tmp/foo.html Freq:1 Pos/Struct:2/1
html
Meta:10 /tmp/foo.html Freq:1 Pos/Struct:3/1
some
Meta:1 /tmp/foo.html Freq:1 Pos/Struct:5/9
text
Meta:1 /tmp/foo.html Freq:1 Pos/Struct:6/9
tmp
Meta:10 /tmp/foo.html Freq:1 Pos/Struct:1/1
-----> FILES in index index.swish-e <-----
Dumping File Properties for File Number: 1
(No Properties)
ReadAllDocProperties:
swishdocpath: 6 ( 13) S: "/tmp/foo.html"
swishdocsize: 8 ( 4) N: "40"
swishlastmodified: 9 ( 4) D: "2005-05-27 08:12:43 CDT"
ReadSingleDocPropertiesFromDisk:
swishdocpath: 6 ( 13) S: "/tmp/foo.html"
swishdocsize: 8 ( 4) N: "40"
swishlastmodified: 9 ( 4) D: "2005-05-27 08:12:43 CDT"
karpet@cartermac 115% swish-e -w foo
# SWISH format: 2.5.4
# Search words: foo
# Removed stopwords:
err: no results
.
karpet@cartermac 116% swish-e -w swishdocpath=foo
# SWISH format: 2.5.4
# Search words: swishdocpath=foo
# Removed stopwords:
# Number of hits: 1
# Search time: 0.004 seconds
# Run time: 0.031 seconds
1000 /tmp/foo.html "foo.html" 40
.
--
Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
Received on Fri May 27 06:21:01 2005