Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] Using ExtractPath to Exclude Some Subdirectory from Search Result

From: Ronny Rahardjo <rrahardjo(at)not-real.gmail.com>
Date: Fri Sep 18 2009 - 22:48:01 GMT
Hi Peter,

Please ignore my question no.1. I was able to figure out which spider.pl it
is called. However, could you please let me know how can I check whether my
spider.pl is using spiderconfig.pl. I found spiderconfig.pl in the same
folder as swish.config, but I don't see any reference in the spider.pl.

And secondly, how can I exclude "a href=#tab" link in spider.pl

Thanks.

On Fri, Sep 18, 2009 at 11:37 AM, Ronny Rahardjo <rrahardjo@gmail.com>wrote:

> Hi Peter,
>
> Thanks for your help on this. Now, I can narrowing the issue, but I have
> few questions:
>
> 1. How can I find out if my runindex.bat is calling which config? The
> problem is when I run a command, swish-e -c swish.config, it is complained
> for my spider.pl (because of the incorrect path). However, my scheduled task
> for runindex.bat run just fine. So, I need to know if it is really execute
> spider.pl or something else.
>
> 2. Here is some of the content on my base url which may cause the issue,
> thats why I want to exclude it from indexing using test url:
>     <div class="tabset">
>        <ul>
>         <li><a href="#tab1_1" class="tab
> active"><span>Content</span></a></li>
>         <li><a href="#tab1_2" class="tab"><span>Content</span></a></li>
>         <li><a href="#tab1_3" class="tab"><span>Content</span></a></li>
>        </ul>
>       </div>
>       <!-- tab 1 tabset-one -->
>       <div class="tab innovaton tab-box" id="tab1_1">
>        <div class="tab-holder">
>         <strong class="replace">Title</strong>
>         <p><a href="news/1250.html"><u>Content</u></a
>       </div>
>        <a href="/news/11223.html" class="read-more-btn">Read More</a>
>       </div>
>
> I think the issue is on the tabset (javascript), so I want to exclude it
> from my indexing. Could you please let me know how to exclude any <a
> href="#tab"> using test_url? Or you have any other method which can exclude
> them.
>
> Thanks.
>   On Fri, Sep 18, 2009 at 7:41 AM, Peter Karman <peter@peknet.com> wrote:
>
>> Ronny Rahardjo wrote on 09/17/2009 07:07 PM:
>> > I have swishe.config under my installation folder C:\SWISH-E\bin which
>> > IncludeConfigFIle common.config.
>> > Is that mean my configuration file is common.config?
>>
>> per your other thread, I now know you are using spider.pl. If you want
>> to exclude certain URLs from being indexed, look at
>>
>> http://swish-e.org/docs/spider.html#test_url
>>
>> --
>>  Peter Karman  .  http://peknet.com/  .  peter(at)not-real.peknet.com
>> _______________________________________________
>> Users mailing list
>> Users@lists.swish-e.org
>> http://lists.swish-e.org/listinfo/users
>>
>
>


_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Fri Sep 18 18:48:03 2009