Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] Parsing plain text emails to use the subject line as the title

From: Troy Wical <troy(at)not-real.wical.com>
Date: Fri Mar 26 2010 - 04:31:10 GMT
On Mar 25, 2010, at 9:19 PM, Peter Karman wrote:

> Just to be sure, I just bumped SWISH::Prog svn trunk version to  
> 0.44_01 so if
> you would, make sure you are running that. If you have r2926  
> installed, that's
> the same code, but it will report version 0.44.

Couple errors showed up. Output below.

#############################################################
#############################################################
[~/swish-prog-2927]# perl Makefile.PL && make test
include /usr/home/swish-prog-2927/inc/Module/Install.pm
include inc/Module/Install/Metadata.pm
include inc/Module/Install/Base.pm
include inc/Module/Install/Scripts.pm
include inc/Module/Install/Makefile.pm
include inc/Module/Install/MakeMaker.pm
Checking if your kit is complete...
Warning: the following files are missing in your kit:
         META.yml
Please inform the author.
Warning: prerequisite ExtUtils::MakeMaker 6.42 not found. We have  
unknown version.
Writing Makefile for SWISH::Prog
Writing META.yml
cp lib/SWISH/Prog/Doc.pm blib/lib/SWISH/Prog/Doc.pm
cp lib/SWISH/Prog.pm blib/lib/SWISH/Prog.pm
cp lib/SWISH/Prog/Results.pm blib/lib/SWISH/Prog/Results.pm
cp lib/SWISH/Prog/Queue.pm blib/lib/SWISH/Prog/Queue.pm
cp lib/SWISH/Prog/Aggregator/DBI.pm blib/lib/SWISH/Prog/Aggregator/ 
DBI.pm
cp lib/SWISH/Prog/Native/Indexer.pm blib/lib/SWISH/Prog/Native/ 
Indexer.pm
cp lib/SWISH/Prog/Aggregator/MailFS.pm blib/lib/SWISH/Prog/Aggregator/ 
MailFS.pm
cp lib/SWISH/Prog/InvIndex.pm blib/lib/SWISH/Prog/InvIndex.pm
cp lib/SWISH/Prog/Aggregator/FS.pm blib/lib/SWISH/Prog/Aggregator/FS.pm
cp lib/SWISH/Prog/Aggregator/Object.pm blib/lib/SWISH/Prog/Aggregator/ 
Object.pm
cp lib/SWISH/Prog/Cache.pm blib/lib/SWISH/Prog/Cache.pm
cp lib/SWISH/Prog/Aggregator.pm blib/lib/SWISH/Prog/Aggregator.pm
cp lib/SWISH/Prog/InvIndex/Meta.pm blib/lib/SWISH/Prog/InvIndex/Meta.pm
cp lib/SWISH/Prog/Indexer.pm blib/lib/SWISH/Prog/Indexer.pm
cp lib/SWISH/Prog/Config.pm blib/lib/SWISH/Prog/Config.pm
cp lib/SWISH/Prog/Aggregator/Spider/UA.pm blib/lib/SWISH/Prog/ 
Aggregator/Spider/UA.pm
cp lib/SWISH/Prog/Utils.pm blib/lib/SWISH/Prog/Utils.pm
cp lib/SWISH/Prog/Headers.pm blib/lib/SWISH/Prog/Headers.pm
cp lib/SWISH/Prog/Class.pm blib/lib/SWISH/Prog/Class.pm
cp lib/SWISH/Prog/Native/Result.pm blib/lib/SWISH/Prog/Native/Result.pm
cp lib/SWISH/Prog/Result.pm blib/lib/SWISH/Prog/Result.pm
cp lib/SWISH/Prog/Aggregator/Spider.pm blib/lib/SWISH/Prog/Aggregator/ 
Spider.pm
cp lib/SWISH/Prog/Native/Searcher.pm blib/lib/SWISH/Prog/Native/ 
Searcher.pm
cp lib/SWISH/Prog/Searcher.pm blib/lib/SWISH/Prog/Searcher.pm
cp lib/SWISH/Prog/Aggregator/Mail.pm blib/lib/SWISH/Prog/Aggregator/ 
Mail.pm
cp lib/SWISH/Prog/Native/InvIndex.pm blib/lib/SWISH/Prog/Native/ 
InvIndex.pm
cp examples/swish3 blib/script/swish3
/usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/ 
script/swish3
PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e"  
"test_harness(0, 'inc', 'blib/lib', 'blib/arch')" t/*.t
t/00_synopsis.t ... 1/5 # testing SWISH::Prog version 0.44_01
# 2.4.5 installed
# doc filter on t/test.pdf
# doc filter on t/test.html
# doc filter on t/test.pdf.gz
skipping t/test.pdf.gz - filtering error
# doc filter on t/test.xml
# doc filter on t/test2.html
t/00_synopsis.t ... ok
t/01_fs.t ......... 1/11 skipping t/test.pdf.gz - filtering error
t/01_fs.t ......... ok
t/02_dbi.t ........ ok
t/03_object.t ..... 1/15 # missing installed module: Can't locate YAML/ 
Syck.pm in @INC (@INC contains: /usr/home/swish-prog-2927/inc /usr/ 
home/swish-prog-2927/blib/lib /usr/home/swish-prog-2927/blib/arch /usr/ 
local/lib/perl5/5.8.8/BSDPAN /usr/local/lib/perl5/site_perl/5.8.8/ 
mach /usr/local/lib/perl5/site_perl/5.8.8 /usr/local/lib/perl5/ 
site_perl /usr/local/lib/perl5/5.8.8/mach /usr/local/lib/ 
perl5/5.8.8 .) at /usr/home/swish-prog-2927/blib/lib/SWISH/Prog/ 
Aggregator/Object.pm line 8.
# BEGIN failed--compilation aborted at /usr/home/swish-prog-2927/blib/ 
lib/SWISH/Prog/Aggregator/Object.pm line 8.
# Compilation failed in require at (eval 75) line 2.
# BEGIN failed--compilation aborted at (eval 75) line 2.
t/03_object.t ..... ok
t/04_mail.t ....... ok
t/05_spider.t ..... # set TEST_SPIDER env var to test the spider
t/05_spider.t ..... ok
t/06_config.t ..... ok
t/08_meta.t ....... ok
t/09_xml.t ........ ok
t/10_config.t ..... ok
t/11-leak-test.t .. skipped: require Test::LeakTrace
t/12-merge.t ...... ok
t/13_mail_fs.t .... ok
t/pod-coverage.t .. skipped: Test::Pod::Coverage 1.04 required for  
testing POD coverage
t/pod.t ........... ok
All tests successful.
Files=15, Tests=122, 10 wallclock secs ( 0.29 usr  0.10 sys +  8.04  
cusr  1.34 csys =  9.77 CPU)
Result: PASS
#############################################################
#############################################################

Though I'm not certain that I am looking in the right spot, I'm not  
seeing a reference to 0.44_01

#############################################################
#############################################################
[/usr/local/lib/perl5/site_perl/5.8.8/SWISH/Prog]# grep VERSION *
Aggregator.pm:our $VERSION = '0.44';
Cache.pm:our $VERSION = '0.44';
Class.pm:our $VERSION = '0.44';
Config.pm:our $VERSION = '0.44';
Doc.pm:our $VERSION = '0.44';
Headers.pm:our $VERSION = '0.44';
Indexer.pm:our $VERSION = '0.44';
InvIndex.pm:our $VERSION = '0.44';
Queue.pm:our $VERSION = '0.44';
Result.pm:our $VERSION = '0.44';
Results.pm:our $VERSION = '0.44';
Searcher.pm:our $VERSION = '0.44';
Utils.pm:our $VERSION = '0.44';
#############################################################
#############################################################

Then ran the following command, where "01" references this exact  
file... http://type2.com/mail-archives/test/01

swish3 -d -v 3 -S mailfs -c test.conf -i /home/mail-archive/test/01

Still got a core file. Since I've bloated this email with output  
already, I'll put the debug info in here with it...

############################################################
############################################################
[/home/mail-archive/search]# swish3 -d -v 3 -S mailfs -c test.conf -i / 
home/mail-archive/test/01
{
   Debug           => 0,
   Format          => "native",
   Headers         => 1,
   Limit           => [],
   Merge           => undef,
   Source          => "fs",
   Version         => 0,
   Warnings        => 2,
   aggregator      => "mailfs",
   begin           => 0,
   config          => "test.conf",
   debug           => 1,
   extended_output => undef,
   folder          => "index.swish3",
   help            => 0,
   indexer         => "native",
   input           => 1,
   invindex        => "index.swish3",
   links           => 0,
   max             => undef,
   newer_than      => undef,
   query           => "",
   sort_order      => "",
   test_mode       => 0,
   verbose         => 3,
}
creating indexer: SWISH::Prog::Native::Indexer at /usr/local/lib/perl5/ 
site_perl/5.8.8/SWISH/Prog.pm line 114.
creating aggregator: SWISH::Prog::Aggregator::MailFS at /usr/local/lib/ 
perl5/site_perl/5.8.8/SWISH/Prog.pm line 139.
do {
   my $a = bless({
     _start     => 1269577712,
     aggregator => bless({
                     _ext_re          => qr/\.(html|htm|xml|txt|pdf|ps| 
doc|ppt|xls|mp3|css|ico|js|php)(\.gz)?/i,
                     _mailer          => bless({
                                           _start => 1269577712,
                                           debug => 1,
                                           doc_class =>  
"SWISH::Prog::Doc",
                                           indexer => bless({
                                                 _start    =>  
1269577712,
                                                 config    => bless({
                                                                 
DefaultContents                   => ["TXT*"],
                                                                 
"IgnoreTotalWordCountWhenRanking" => [0],
                                                                 
IndexDir                          => ["/home/mail-archive/test/"],
                                                                 
IndexFile                         => ["/home/mail-archive/search/ 
test.index"],
                                                                 
IndexReport                       => [1],
                                                                 
MetaNameAlias                     => ["swishdefault mail"],
                                                                 
MetaNames                         => { url => 1 },
                                                                 
PropertyNames                     => { url => 1 },
                                                                 
ReplaceRules                      => [
                                                                                                       "replace 
  \"/home/mail-archive/\" \"http://type2.com/mail-archives/\"",
                                                                                                     ],
                                                                 
StoreDescription                  => ["XML* <body>"],
                                                                 
_start                            => 1269577712,
                                                                 
debug                             => 0,
                                                                 
verbose                           => 0,
                                                              },  
"SWISH::Prog::Config"),
                                                 debug     => 1,
                                                 exe       => "swish-e",
                                                 invindex  => bless({
                                                                 
_start  => 1269577712,
                                                                 
clobber => 0,
                                                                 
debug   => 0,
                                                                 
file    => bless({
                                                                             dir 
  => bless({ dirs => ["index.swish3"], file_spec_class => undef,  
volume => "" }, "Path::Class::Dir"),
                                                                             file 
  => "index.swish-e",
                                                                             file_spec_class 
  => undef,
                                                                           }, "Path 
::Class::File"),
                                                                 
path    => bless({ dirs => ["index.swish3"], file_spec_class => undef,  
volume => "" }, "Path::Class::Dir"),
                                                                 
verbose => 0,
                                                              },  
"SWISH::Prog::Native::InvIndex"),
                                                 test_mode => 0,
                                                 verbose   => 3,
                                               },  
"SWISH::Prog::Native::Indexer"),
                                           progress_size => 1000,
                                           swish_filter_obj => bless({
                                                 doc_class    =>  
"SWISH::Filter::Document",
                                                 filters      => [
                                                                    
bless({
                                                                      
_mimetypes => bless({}, "SWISH::Filter::MIMETypes"),
                                                                      
gz => { perl => 1 },
                                                                      
mimetypes => [qr|application/x-gzip|],
                                                                      
type => 1,
                                                                   },  
"SWISH::Filters::Decompress"),
                                                                 ],
                                                 mimetypes    =>  
bless({}, "SWISH::Filter::MIMETypes"),
                                                 skip_filters => {},
                                               }, "SWISH::Filter"),
                                           verbose => 3,
                                         },  
"SWISH::Prog::Aggregator::Mail"),
                     _start           => 1269577712,
                     _swish3          => bless(do{\(my $o =  
674251808)}, "SWISH::3"),
                     debug            => 1,
                     doc_class        => "SWISH::Prog::Doc",
                     indexer          => 'fix',
                     progress_size    => 1000,
                     swish_filter_obj => bless({
                                           doc_class    =>  
"SWISH::Filter::Document",
                                           filters      => [
                                                             bless({
                                                                
_mimetypes => bless({}, "SWISH::Filter::MIMETypes"),
                                                               gz =>  
{ perl => 1 },
                                                                
mimetypes => [qr|application/x-gzip|],
                                                               type =>  
1,
                                                             },  
"SWISH::Filters::Decompress"),
                                                           ],
                                           mimetypes    => bless({},  
"SWISH::Filter::MIMETypes"),
                                           skip_filters => {},
                                         }, "SWISH::Filter"),
                     test_mode        => 0,
                     verbose          => 3,
                   }, "SWISH::Prog::Aggregator::MailFS"),
     config     => "test.conf",
     debug      => 1,
     indexer    => 'fix',
     invindex   => "index.swish3",
     test_mode  => 0,
     verbose    => 3,
   }, "SWISH::Prog");
   $a->{aggregator}{indexer} = $a->{aggregator}{_mailer}{indexer};
   $a->{indexer} = $a->{aggregator}{_mailer}{indexer};
   $a;
} at /usr/local/bin/swish3 line 186
opening: swish-e  -f index.swish3/index.swish-e -v3 -W0 -S prog -i  
stdin -c /tmp/utF5I2S82j at /usr/local/lib/perl5/site_perl/5.8.8/SWISH/ 
Prog.pm line 197
checking file /home/mail-archive/test/01
   /home/mail-archive/test/01 -> ok
Bad realloc() ignored at /usr/local/lib/perl5/site_perl/5.8.8/SWISH/ 
Prog/Aggregator/FS.pm line 273.
Parsing config file '/tmp/utF5I2S82j'
Indexing Data Source: "External-Program"
Indexing "stdin"

Removing very common words...
no words removed.
Writing main index...
err: No unique words indexed!
.
Segmentation fault: 11 (core dumped)
############################################################
############################################################

Thanks again. Your patience is quite astounding.

Troy
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Fri Mar 26 00:31:17 2010