Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] Parsing plain text emails to use the subject line as the title

From: Troy Wical <troy(at)not-real.wical.com>
Date: Fri Mar 26 2010 - 16:37:19 GMT
On Fri, 26 Mar 2010 09:02:00 -0500, Peter Karman <peter@peknet.com> wrote:
> 
> another way to test for $VERSION is to do this:
> 
>  % perl -e 'use SWISH::Prog 999'
> SWISH::Prog version 999 required--this is only version 0.44 at -e line 1.

[root@purple /home/mail-archive/search/index.swish3]# perl -e 'use
SWISH::Prog 999'
SWISH::Prog version 999 required--this is only version 0.44_01 at -e line
1.

> Did you run 'make install' after the 'make test' for SWISH::Prog?
> Do you have multiple copies of perl installed, and make install into a
> different version's tree than what swish3 is using?

*sigh* No, I forgot to install. Guess I can't argue with an easy fix, cause
we have success now!

############################################
############################################
[root@purple /home/mail-archive/search]# swish3 -d -v 3 -S mailfs -c
test.conf -i /home/mail-archive/test/
{
Debug           => 0,
Format          => "native",
Headers         => 1,
Limit           => [],
Merge           => undef,
Source          => "fs",
Version         => 0,
Warnings        => 2,
aggregator      => "mailfs",
begin           => 0,
config          => "test.conf",
debug           => 1,
extended_output => undef,
folder          => "index.swish3",
help            => 0,
indexer         => "native",
input           => 1,
invindex        => "index.swish3",
links           => 0,
max             => undef,
newer_than      => undef,
query           => "",
sort_order      => "",
test_mode       => 0,
verbose         => 3,
}
creating indexer: SWISH::Prog::Native::Indexer at
/usr/local/lib/perl5/site_perl/5.8.8/SWISH/Prog.pm line 114.
creating aggregator: SWISH::Prog::Aggregator::MailFS at
/usr/local/lib/perl5/site_perl/5.8.8/SWISH/Prog.pm line 139.
do {
my $a = bless({
_start     => 1269621376,
aggregator => bless({
_ext_re          =>
qr/\.(html|htm|xml|txt|pdf|ps|doc|ppt|xls|mp3|css|ico|js|php)(\.gz)?/i,
_mailer          => bless({
_start => 1269621377,
debug => 1,
doc_class => "SWISH::Prog::Doc",
indexer => bless({
_start    => 1269621376,
config    => bless({
DefaultContents                   => ["TXT*"],
"IgnoreTotalWordCountWhenRanking" => [0],
IndexDir                          => ["/home/mail-archive/test/"],
IndexFile                         =>
["/home/mail-archive/search/test.index"],
IndexReport                       => [1],
MetaNameAlias                     => ["swishdefault mail"],
MetaNames                         => { url => 1 },
PropertyNames                     => { url => 1 },
ReplaceRules                      => [
"replace \"/home/mail-archive/\" \"http://type2.com/mail-archives/\"",
],
StoreDescription                  => ["XML* <body>"],
_start                            => 1269621376,
debug                             => 0,
verbose                           => 0,
}, "SWISH::Prog::Config"),
debug     => 1,
exe       => "swish-e",
invindex  => bless({
_start  => 1269621376,
clobber => 0,
debug   => 0,
file    => bless({
dir => bless({ dirs => ["index.swish3"], file_spec_class => undef, volume
=> "" }, "Path::Class::Dir"),
file => "index.swish-e",
file_spec_class => undef,
}, "Path::Class::File"),
path    => bless({ dirs => ["index.swish3"], file_spec_class => undef,
volume => "" }, "Path::Class::Dir"),
verbose => 0,
}, "SWISH::Prog::Native::InvIndex"),
test_mode => 0,
verbose   => 3,
}, "SWISH::Prog::Native::Indexer"),
progress_size => 1000,
swish_filter_obj => bless({
doc_class    => "SWISH::Filter::Document",
filters      => [
bless({
_mimetypes => bless({}, "SWISH::Filter::MIMETypes"),
gz => { perl => 1 },
mimetypes => [qr|application/x-gzip|],
type => 1,
}, "SWISH::Filters::Decompress"),
],
mimetypes    => bless({}, "SWISH::Filter::MIMETypes"),
skip_filters => {},
}, "SWISH::Filter"),
verbose => 3,
}, "SWISH::Prog::Aggregator::Mail"),
_start           => 1269621377,
_swish3          => bless(do{\(my $o = 674251808)}, "SWISH::3"),
debug            => 1,
doc_class        => "SWISH::Prog::Doc",
indexer          => 'fix',
progress_size    => 1000,
swish_filter_obj => bless({
doc_class    => "SWISH::Filter::Document",
filters      => [
bless({
_mimetypes => bless({}, "SWISH::Filter::MIMETypes"),
gz => { perl => 1 },
mimetypes => [qr|application/x-gzip|],
type => 1,
}, "SWISH::Filters::Decompress"),
],
mimetypes    => bless({}, "SWISH::Filter::MIMETypes"),
skip_filters => {},
}, "SWISH::Filter"),
test_mode        => 0,
verbose          => 3,
}, "SWISH::Prog::Aggregator::MailFS"),
config     => "test.conf",
debug      => 1,
indexer    => 'fix',
invindex   => "index.swish3",
test_mode  => 0,
verbose    => 3,
}, "SWISH::Prog");
$a->{aggregator}{indexer} = $a->{aggregator}{_mailer}{indexer};
$a->{indexer} = $a->{aggregator}{_mailer}{indexer};
$a;
} at /usr/local/bin/swish3 line 186
opening: swish-e  -f index.swish3/index.swish-e -v3 -W0 -S prog -i stdin -c
/tmp/jnBtN1qeon at /usr/local/lib/perl5/site_perl/5.8.8/SWISH/Prog.pm line
197
checking dir /home/mail-archive/test
/home/mail-archive/test -> ok
crawling /home/mail-archive/test
checking file /home/mail-archive/test/00
/home/mail-archive/test/00 -> ok
Parsing config file '/tmp/jnBtN1qeon'
Indexing Data Source: "External-Program"
Indexing "stdin"
checking file /home/mail-archive/test/01
/home/mail-archive/test/01 -> ok
checking file /home/mail-archive/test/02
/home/mail-archive/test/02 -> ok
/home/mail-archive/test.167.8a5fbc8.2999a7b9(-at-)aol.com - Using XML2
parser -  (485 words)
checking file /home/mail-archive/test/03
/home/mail-archive/test/03 -> ok

############################################
<lots of beautiful success snipped here>
############################################

checking file /home/mail-archive/test/97
/home/mail-archive/test.22155-3C6A94CB-324(-at-)storefull-136.iap.bryant.webtv.net
- Using XML2 parser -  (158 words)
/home/mail-archive/test/97 -> ok
/home/mail-archive/test.00c801c1b4b3$49eb7d40$5311080a(-at-)arlut.utexas.edu
- Using XML2 parser -  (130 words)
/home/mail-archive/test.OF40B6C651.86127B4F-ON85256B5F.0060590B(-at-)cambridgeassociates.com
- Using XML2 parser -  (168 words)
checking file /home/mail-archive/test/98
/home/mail-archive/test/98 -> ok
checking file /home/mail-archive/test/99
/home/mail-archive/test/99 -> ok
/home/mail-archive/test.619F54FD5EA8D311873600805F4C3CAFCBEF6C(-at-)pplant.ucdavis.edu
- Using XML2 parser -  (103 words)
/home/mail-archive/test.20020213174137.89629.qmail(-at-)web9010.mail.yahoo.com
- Using XML2 parser -  (70 words)
/home/mail-archive/test.OFE13930C9.E0C92B2F-ON85256B5F.00609791(-at-)cambridgeassociates.com
- Using XML2 parser -  (333 words)
/home/mail-archive/test.AICFIFGMPIHEBBAA(-at-)mailcity.com - Using XML2
parser -  (112 words)

Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 2,782 words alphabetically
Writing header ...
Writing index entries ...
Writing word text: Complete
Writing word hash: Complete
Writing word data: Complete
2,782 unique words indexed.
6 properties sorted.                                              
100 files indexed.  153,686 total bytes.  21,786 total words.
Elapsed time: 00:00:01 CPU time: 00:00:00
Indexing done!
100 documents in 00:00:01
############################################
############################################

There is a ways to go here I'm sure, but this has to be a good step.

Thanks,
Troy
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Fri Mar 26 12:40:35 2010