We use swish to index a Linux knowledgebase used to support 30 engineers across Australia some of the documentation stores phone numbers for each customer. Now as we are dealing with engineering notes the phone number may be captured in several different formats such as full IDD format +61396260437, or local internal format 00396260437 and without the dial prefix 0396260437 and without the state 96260437. Therefore searching for the local extension 60437 fails to find any hits. As swish cannot search for *60437 as leading wild cards are not permitted. I would like to table a simple solution to add wild card searching with leading wild cards. The method suggested is to search only through the keys using some code a sample is attached below once we have a matching key that contains a match we can us this found key to execute a search without a major rewrite.
Please not this came from the C users journal originally I have just tuned it to handle START of WORD and End of WORD options like grep does ^ and $ characters.
The code can be used freely without saying where it was sourced.
We have tried all major search engines and none can search with a leading wild card.
Please consider this code and the suggestion as adding this functionality will add a unique feature to swish making the other search suppliers play catch up.
Rgds Ron Soutter....
From: users-bounces(at)not-real.lists.swish-e.org [mailto:users-bounces(at)not-real.lists.swish-e.org] On Behalf Of Peter Karman
Sent: Sunday, 8 July 2012 11:34 AM
Subject: Re: [swish-e] Ok - I do not understand what the problem is
Joseph Norris wrote on 6/29/12 6:12 PM:
> I have the following:
> swish-e -w "China Confirms Gold Price Suppression" -s names desc title
> desc -f /home/cs/sw/swishe/gold/i
> No hits
> however when I do this:
> grep -l "China Confirms Gold Price Suppression" /home/cs/sw/swishe/gold/*
> I get a hit
Try reducing your sample size to one doc, with the -T options as David suggests.
Here's a small self-contained example. I suggest you put together a similar
example of your own, which IME will often reveal the problem.
[karpet(at)not-real.pekmac:~/tmp/s]$ swish-e -c conf
Indexing Data Source: "File-System"
Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 8 words alphabetically
Writing header ...
Writing index entries ...
Writing word text: Complete
Writing word hash: Complete
Writing word data: Complete
8 unique words indexed.
6 properties sorted.
1 file indexed. 96 total bytes. 8 total words.
Elapsed time: 00:00:00 CPU time: 00:00:00
[karpet(at)not-real.pekmac:~/tmp/s]$ swish-e -w "China Confirms Gold Price Suppression" -s
names desc title
# SWISH format: 2.5.8
# Search words: China Confirms Gold Price Suppression
# Removed stopwords:
# Number of hits: 1
# Search time: 0.002 seconds
# Run time: 0.009 seconds
1000 docs/china.xml "china.xml" 96
[karpet(at)not-real.pekmac:~/tmp/s]$ cat docs/china.xml
<title>China Confirms Gold Price Suppression</title>
<names>foo bar baz</names>
[karpet(at)not-real.pekmac:~/tmp/s]$ cat conf
IndexContents XML* .xml
PropertyNames names title
Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com<mailto:peter(at)not-real.peknet.com>
Users mailing list
Users mailing list
Received on Mon Jul 09 2012 - 01:11:10 GMT