>On 10/25/2007 10:07 AM, josh@relativelysane.com wrote:
>
>>
>> The weird thing is that its grabbing and populating flavor, and I know thats
>from the ProperyName string because when I remove it from there; flavor isn't
>in the dump like the one above.
>>
>
>can you copy/paste what your config and example docs look like so we can try
>and duplicate what you are seeing?
>
>--
>Peter Karman . peter(at)not-real.peknet.com . http://peknet.com/
>
Sure, they are literally identical to what you used in your example (with the exception of the IndexDir field in my config). Full dumps of the files, the indexing status, the -T INDEX_ALL, and the search query are below.
[josh@josh]# cat index.cfg
IndexDir test
ExtractPath flavor regex !test/doc-(normal|strong|href)/.*$!$1!
PropertyNames flavor strong a
[josh@josh test]# ls
doc-href doc-normal doc-strong
[josh@josh doc-href]# cat docswith-ahref.html
<html>
<head><title>real title</title></head>
<body><a href="bar">title i want</a></body>
</html>
[josh@josh doc-normal]# cat docsthatarenormal.html
<html>
<head><title>real title is the title i want</title></head>
<body><a href="bar">link text</a><strong>strong text</strong> blah </body>
</html>
[josh@josh doc-strong]# cat docswith-strong.html
<html>
<head><title>real title</title></head>
<body><strong>title I want</strong></body>
</html>
[josh@josh]# swish-e -c index.cfg
Indexing Data Source: "File-System"
Indexing "test"
Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 12 words alphabetically
Writing header ...
Writing index entries ...
Writing word text: Complete
Writing word hash: Complete
Writing word data: Complete
12 unique words indexed.
7 properties sorted.
3 files indexed. 344 total bytes. 25 total words.
Elapsed time: 00:00:00 CPU time: 00:00:00
Indexing done!
[josh@josh]# swish-e -T INDEX_ALL
# Name:
# Saved as: index.swish-e
# Total Words: 12
# Total Files: 3
# Removed Files: 0
# Total Word Pos: 25
# Removed Word Pos: 0
# Indexed on: 2007-10-25 11:24:19 EDT
# Description:
# Pointer:
# Maintained by:
# MinWordLimit: 1
# MaxWordLimit: 40
# WordCharacters: 0123456789abcdefghijklmnopqrstuvwxyzªµºÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ
# BeginCharacters: 0123456789abcdefghijklmnopqrstuvwxyzªµºÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ
# EndCharacters: 0123456789abcdefghijklmnopqrstuvwxyzªµºÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ
# IgnoreFirstChar:
# IgnoreLastChar:
# StopWords:
# BuzzWords:
# Stemming Applied: 0
# Soundex Applied: 0
# Fuzzy Mode: None
# IgnoreTotalWordCountWhenRanking: 1
-----> METANAMES for index.swish-e <-----
swishdefault : id= 1 type= 1 META_INDEX Rank Bias= 0
swishreccount : id= 2 type=42 META_INTERNAL META_PROP:NUMBER
swishrank : id= 3 type=42 META_INTERNAL META_PROP:NUMBER
swishfilenum : id= 4 type=42 META_INTERNAL META_PROP:NUMBER
swishdbfile : id= 5 type=38 META_INTERNAL META_PROP:STRING(case:compare) SortKeyLen: 100
swishdocpath : id= 6 type= 6 META_PROP:STRING(case:compare) SortKeyLen: 100 *presorted*
swishtitle : id= 7 type=70 META_PROP:STRING(case:ignore) SortKeyLen: 100 *presorted*
swishdocsize : id= 8 type=10 META_PROP:NUMBER *presorted*
swishlastmodified : id= 9 type=18 META_PROP:DATE *presorted*
flavor : id=10 type= 1 META_INDEX Rank Bias= 0
flavor : id=11 type=70 META_PROP:STRING(case:ignore) SortKeyLen: 100 *presorted*
strong : id=12 type=70 META_PROP:STRING(case:ignore) SortKeyLen: 100 *presorted*
a : id=13 type=70 META_PROP:STRING(case:ignore) SortKeyLen: 100 *presorted*
-----> WORD INFO in index index.swish-e <-----
blah
Meta:1 test/doc-normal/docsthatarenormal.html Freq:1 Pos/Struct:12/9
href
Meta:10 test/doc-href/docswith-ahref.html Freq:1 Pos/Struct:1/1
i
Meta:1 test/doc-href/docswith-ahref.html Freq:1 Pos/Struct:4/9
Meta:1 test/doc-normal/docsthatarenormal.html Freq:1 Pos/Struct:6/7
Meta:1 test/doc-strong/docswith-strong.html Freq:1 Pos/Struct:4/49
is
Meta:1 test/doc-normal/docsthatarenormal.html Freq:1 Pos/Struct:3/7
link
Meta:1 test/doc-normal/docsthatarenormal.html Freq:1 Pos/Struct:8/9
normal
Meta:10 test/doc-normal/docsthatarenormal.html Freq:1 Pos/Struct:1/1
real
Meta:1 test/doc-href/docswith-ahref.html Freq:1 Pos/Struct:1/7
Meta:1 test/doc-normal/docsthatarenormal.html Freq:1 Pos/Struct:1/7
Meta:1 test/doc-strong/docswith-strong.html Freq:1 Pos/Struct:1/7
strong
Meta:1 test/doc-normal/docsthatarenormal.html Freq:1 Pos/Struct:10/49
Meta:10 test/doc-strong/docswith-strong.html Freq:1 Pos/Struct:1/1
text
Meta:1 test/doc-normal/docsthatarenormal.html Freq:2 Pos/Struct:9/9,11/49
the
Meta:1 test/doc-normal/docsthatarenormal.html Freq:1 Pos/Struct:4/7
title
Meta:1 test/doc-href/docswith-ahref.html Freq:2 Pos/Struct:2/7,3/9
Meta:1 test/doc-normal/docsthatarenormal.html Freq:2 Pos/Struct:2/7,5/7
Meta:1 test/doc-strong/docswith-strong.html Freq:2 Pos/Struct:2/7,3/49
want
Meta:1 test/doc-href/docswith-ahref.html Freq:1 Pos/Struct:5/9
Meta:1 test/doc-normal/docsthatarenormal.html Freq:1 Pos/Struct:7/7
Meta:1 test/doc-strong/docswith-strong.html Freq:1 Pos/Struct:5/49
-----> FILES in index index.swish-e <-----
Dumping File Properties for File Number: 1
(No Properties)
ReadAllDocProperties:
swishdocpath: 6 ( 33) S: "test/doc-href/docswith-ahref.html"
swishtitle: 7 ( 10) S: "real title"
swishdocsize: 8 ( 4) N: "98"
swishlastmodified: 9 ( 4) D: "2007-10-25 11:21:41 EDT"
flavor:11 ( 4) S: "href"
ReadSingleDocPropertiesFromDisk:
swishdocpath: 6 ( 33) S: "test/doc-href/docswith-ahref.html"
swishtitle: 7 ( 10) S: "real title"
swishdocsize: 8 ( 4) N: "98"
swishlastmodified: 9 ( 4) D: "2007-10-25 11:21:41 EDT"
flavor:11 ( 4) S: "href"
Dumping File Properties for File Number: 2
(No Properties)
ReadAllDocProperties:
swishdocpath: 6 ( 38) S: "test/doc-normal/docsthatarenormal.html"
swishtitle: 7 ( 30) S: "real title is the title i want"
swishdocsize: 8 ( 4) N: "149"
swishlastmodified: 9 ( 4) D: "2007-10-25 11:22:43 EDT"
flavor:11 ( 6) S: "normal"
ReadSingleDocPropertiesFromDisk:
swishdocpath: 6 ( 38) S: "test/doc-normal/docsthatarenormal.html"
swishtitle: 7 ( 30) S: "real title is the title i want"
swishdocsize: 8 ( 4) N: "149"
swishlastmodified: 9 ( 4) D: "2007-10-25 11:22:43 EDT"
flavor:11 ( 6) S: "normal"
Dumping File Properties for File Number: 3
(No Properties)
ReadAllDocProperties:
swishdocpath: 6 ( 36) S: "test/doc-strong/docswith-strong.html"
swishtitle: 7 ( 10) S: "real title"
swishdocsize: 8 ( 4) N: "97"
swishlastmodified: 9 ( 4) D: "2007-10-25 11:23:35 EDT"
flavor:11 ( 6) S: "strong"
ReadSingleDocPropertiesFromDisk:
swishdocpath: 6 ( 36) S: "test/doc-strong/docswith-strong.html"
swishtitle: 7 ( 10) S: "real title"
swishdocsize: 8 ( 4) N: "97"
swishlastmodified: 9 ( 4) D: "2007-10-25 11:23:35 EDT"
flavor:11 ( 6) S: "strong"
[josh@josh]# swish-e -w title AND flavor=strong -x '"<strong>" "<swishtitle>" "<flavor>"\n'
# SWISH format: 2.4.5
# Search words: title AND flavor=strong
# Removed stopwords:
# Number of hits: 1
# Search time: 0.000 seconds
# Run time: 0.009 seconds
"" "real title" "strong"
josh
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Thu Oct 25 11:29:01 2007