FW: Memory Problems while indexing

From: Klingensmith, Rick <klingensmith(at)>
Date: Thu Sep 11 2003 - 15:58:30 GMT
There is a missing statement in this post. It must be one of those Microsoft
outlook issues. Plus we use rich text and HTML and I forgot to change it to
plain text.

This is the root problem.

We've received an error while indexing our website the last two attempts.
The error reads The instruction at "0x0041cca7" referenced memory at
"0x00000000". The memory could not be "read".


I experienced this problem when installing SWISH-s and resolved it by using
the -e option and pointed the Temp-Dir to a drive that has lots of space
(almost 100GB). Now it appears to be an issue again. I've also tried
indexing by in a command window using my admin account and receive the same
error. We use the following command to execute the index:
D:\ProgramFiles\Swish-E\swish-e -S http -e -c
D:\ProgramFiles\Swish-E\conf\siteindex.config. The config file looks like


# ----- SiteIndex.config - Spider using "http" method -------


#  Please see the swish-e documentation for

#  information on configuration directives.

#  Documentation is included with the swish-e

#  distribution, and also can be found on-line

#  at



#  This example demonstrates how to use the

#  the "http" method of spidering.


#  Indexing (spidering) is started with the following

#  command issued from the "d:\Program Files\Swish-e" directory:


#     swish-e -S http -c Siteindex.config


#  Note: You should have the current Bundle::LWP bundle

#  of perl modules installed.  This was tested with:

#     libwww-perl-5.53


#  ** Do not spider a web server without permission **




# Include our site-wide configuration settings:


IncludeConfigFile D:/ProgramFiles/Swish-E/conf/Settings.config


# Specify the URL (or URLs) to index:




# If a server goes by more than one name you can use this directive:


# EquivalentServer




# This defines how many links the spider should

# follow before stopping.  A value of 0 configures the spider to

# traverse all links. The default is 5

# The idea is to limit spidering, but seems of questionable use

# since depth may not be related to anything useful.


MaxDepth 10


# The number of seconds to wait between issuing

# requests to a server.  The default is 60 seconds.


Delay 1



# Skip pages with Meta tag "noindex"


obeyRobotsNoIndex yes



# (default /var/tmp)  The location of a writeable temp directory

# on your system.  The HTTP access method tells the Perl helper to place

# its files there.  The default is defined in src/config.h and depends on

# the current OS.


TmpDir D:/Inetpub/Indexes/Temp



# The "http" method uses a perl helper program to fetch each document

# from the web called "swishspider" and is included in the src directory of

# the swish-e distribution.


SpiderDirectory D:/ProgramFiles/Swish-E


# Put the index files in the Inetpub/Indexes directory

IndexFile D:/Inetpub/Indexes/SiteIndex.New.index



# end of SiteIndex Config file


I am receiving the following warning in my log files from the indexing job:
Warning: Configuration setting for TmpDir 'D:/Inetpub/Indexes/Temp' will be
overridden by environment setting 'C:\DOCUME~1\rek\LOCALS~1\Temp' which does
not exist. When I look in the specified temp directory I've found SWISH-e
work files so I'm not sure if this is a problem or not.


The summaries of the last good index on 9/8 look like: 

1468 files indexed.  39839610 total bytes.  810188 total words.

Elapsed time: 00:32:05 CPU time: 00:32:05

Indexing done!


We are using the latest windows version of Swish-e on a Windows 2000 server.


The archives and FAQ point to the -e option to fix memory issues. What have
I missed?




Richard Klingensmith

MSU Human Resources Information Systems

1407 S. Harrison Road Ste. 40

East Lansing, MI 48823

(517) 432-4636 ext. 155


