Skip to main content.
home | support | download

Back to List Archive

[swish-e] mod_deflate & Warning: document' has no content

From: Sheridan Small <SSmall(at)not-real.cityplym.ac.uk>
Date: Thu Apr 16 2009 - 10:36:55 GMT
Hi,

I don't know if this is an Apache issue or something to do with our Swish-e config.

We have had Swish-e working fine for some months.
However recently we have added Apache mod_deflate to our web server and Swish-e broke.

Without the following location directive Swish-e indexes our website fine.

<Location />
SetOutputFilter DEFLATE

# This next line is a work around which works if it is un-commented.
# BrowserMatch swish no-gzip

# Don't compress images, pdf, doc, exe, gz, gz2, sit, rar
SetEnvIfNoCase Request_URI \
\.(?:gif|jpe?g|png)$ no-gzip dont-vary
SetEnvIfNoCase Request_URI \.pdf$ no-gzip dont-vary
SetEnvIfNoCase Request_URI \.doc$ no-gzip dont-vary
SetEnvIfNoCase Request_URI  \
\.(?:exe|t?gz|zip|gz2|sit|rar)$ no-gzip dont-vary

# Make sure proxies don't deliver the wrong content
Header append Vary User-Agent env=!dont-vary
</Location>

Other Apache settings which may bear some relevance include:
(However I have tested with them commented out)

AddEncoding x-compress .Z
AddEncoding x-gzip .gz .tgz
AddType application/x-tar .tgz

We have a work around which is commented out in the directive.

With the work around or without the directive Swish-e indexes 3,411 documents (the whole website).

Summary for: http://www.cityplym.ac.uk 
             Connection: Close:         49  (0.1/sec)
        Connection: Keep-Alive:      3,393  (7.5/sec)
                    Duplicates:    115,967  (255.4/sec)
            Location Redirects:         21  (0.0/sec)
                MD5 Duplicates:         13  (0.0/sec)
                Off-site links:     14,588  (32.1/sec)
                       Skipped:         28  (0.1/sec)
                   Total Bytes: 39,183,136  (86306.5/sec)
                    Total Docs:      3,411  (7.5/sec)
                   Unique URLs:      3,463  (7.6/sec)
application/msword->text/plain:        132  (0.3/sec)
    application/pdf->text/html:         47  (0.1/sec)
                     text/html:      3,232  (7.1/sec)


With the location directive without the work around it indexes 27 documents and the debug output looks like this:-


vvvvvvvvvvvvvvvv HEADERS for http://www.cityplym.ac.uk/index.php?h=equality vvvvvvvvvvvvvvvvvvvvv

---- Request ------
GET http://www.cityplym.ac.uk/index.php?h=equality 
Accept-Encoding: gzip, x-gzip, deflate
From: removed to prevent spam 
Referer: http://www.cityplym.ac.uk 
User-Agent: swish-e http://swish-e.org/ 
Cookie: style_cookie=text_only; PHPSESSID=b2af247e7c54e72343667062f2f83fb4
Cookie2: $Version="1"


---- Response ---
Status: 200 OK
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Date: Wed, 15 Apr 2009 14:20:14 GMT
Pragma: no-cache
Server: Apache/2.2.3 (Linux/SUSE) mod_ssl/2.2.3 OpenSSL/0.9.8a
Vary: Accept-Encoding,User-Agent
Content-Encoding: gzip
Content-Length: 3152
Content-Type: text/html
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Client-Date: Wed, 15 Apr 2009 14:33:58 GMT
Client-Peer: 172.16.1.67:80
Client-Response-Num: 27

^^^^^^^^^^^^^^^ END HEADERS ^^^^^^^^^^^^^^^^^^^^^^^^^^

>> +Fetched 1 Cnt: 26 GET  http://www.cityplym.ac.uk/index.php?h=equality  200 OK text/html 3152 parent:http://www.cityplym.ac.uk depth:1
Warning: document 'http://www.cityplym.ac.uk/index.php?h=equality' has no content

vvvvvvvvvvvvvvvv HEADERS for http://www.cityplym.ac.uk/index.php?page_id=0138 vvvvvvvvvvvvvvvvvvvvv

---- Request ------
GET http://www.cityplym.ac.uk/index.php?page_id=0138 
Accept-Encoding: gzip, x-gzip, deflate
From: removed to prevent spam 
Referer: http://www.cityplym.ac.uk 
User-Agent: swish-e http://swish-e.org/ 
Cookie: style_cookie=text_only; PHPSESSID=b2af247e7c54e72343667062f2f83fb4
Cookie2: $Version="1"


---- Response ---
Status: 200 OK
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Date: Wed, 15 Apr 2009 14:20:14 GMT
Pragma: no-cache
Server: Apache/2.2.3 (Linux/SUSE) mod_ssl/2.2.3 OpenSSL/0.9.8a
Vary: Accept-Encoding,User-Agent
Content-Encoding: gzip
Content-Length: 2689
Content-Type: text/html
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Client-Date: Wed, 15 Apr 2009 14:33:58 GMT
Client-Peer: 172.16.1.67:80
Client-Response-Num: 28

^^^^^^^^^^^^^^^ END HEADERS ^^^^^^^^^^^^^^^^^^^^^^^^^^

>> +Fetched 1 Cnt: 27 GET  http://www.cityplym.ac.uk/index.php?page_id=0138  200 OK text/html 2689 parent:http://www.cityplym.ac.uk depth:1
Warning: document 'http://www.cityplym.ac.uk/index.php?page_id=0138' has no content

vvvvvvvvvvvvvvvv HEADERS for http://www.cityplym.ac.uk/euro/ vvvvvvvvvvvvvvvvvvvvv

---- Request ------
GET http://www.cityplym.ac.uk/euro/ 
Accept-Encoding: gzip, x-gzip, deflate
From: removed to prevent spam 
Referer: http://www.cityplym.ac.uk 
User-Agent: swish-e http://swish-e.org/ 
Cookie: style_cookie=text_only; PHPSESSID=b2af247e7c54e72343667062f2f83fb4
Cookie2: $Version="1"


---- Response ---
Status: 200 OK
Date: Wed, 15 Apr 2009 14:20:14 GMT
Server: Apache/2.2.3 (Linux/SUSE) mod_ssl/2.2.3 OpenSSL/0.9.8a
Vary: Accept-Encoding,User-Agent
Content-Encoding: gzip
Content-Length: 1245
Content-Type: text/html
Client-Date: Wed, 15 Apr 2009 14:33:58 GMT
Client-Peer: 172.16.1.67:80
Client-Response-Num: 31

^^^^^^^^^^^^^^^ END HEADERS ^^^^^^^^^^^^^^^^^^^^^^^^^^

>> +Fetched 1 Cnt: 30 GET  http://www.cityplym.ac.uk/euro/  200 OK text/html 1245 parent:http://www.cityplym.ac.uk depth:1
Warning: document 'http://www.cityplym.ac.uk/euro/' has no content

Summary for: http://www.cityplym.ac.uk 
     Connection: Close:      1  (1.0/sec)
Connection: Keep-Alive:     28  (28.0/sec)
            Duplicates:      6  (6.0/sec)
    Location Redirects:      1  (1.0/sec)
        Off-site links:      6  (6.0/sec)
           Total Bytes: 14,122  (14122.0/sec)
            Total Docs:     27  (27.0/sec)
           Unique URLs:     30  (30.0/sec)
             text/html:      1  (1.0/sec)

Configuration Details:-

SWISH-E 2.4.5

IO::Compress::Gzip
ver 2.015 and at a different time ver 2.017 

swishspider.conf

my ($filter_sub, $response_sub) = swish_filter();
@servers = (
        {
      base_url    => 'http://www.cityplym.ac.uk',
      email       => 'removed to prevent spam',
	use_md5     => 1,
	use_cookies => 1,
	keep_alive  => 1,         
	delay_sec  => 1,	
      filter_content  => $filter_sub,
	test_response   => $response_sub,
	debug       => DEBUG_URL | DEBUG_SKIPPED | DEBUG_HEADERS,
        },
    );
1;

I have also tried use_md5     => 0,


Does anybody know what the problem is or even is it an Apache or Swish-e issue?

Is the server sending the wrong Content-Encoding?

Are there any steps I can take to investigate this further?


Kind regards,

Dan

 


 E-MAIL DISCLAIMER
 
This e-mail and any attachments are intended for the named recipient
 only and are to be treated as confidential unless the College agrees otherwise.
 If you are not the intended recipient, please notify the sender immediately
 deleting this e-mail without making copies or using it in any way. 
  The College may be legally obliged to disclose e-mail communications in a
 response to a legitimate request pursuant to both the Freedom of Information
 Act 2000 and the Data Protection Act 1998. 
  City College Plymouth reserves the right to monitor, in
 accordance with its legal obligations, any and all aspects of its e-mail
 system, including the content of e-mails received, but will not do so routinely.
 City College Plymouth cannot guarantee that this e-mail or
 any attachments to it are virus free and does not accept any liability for
 any damage, costs or loss resulting from any virus infection.
 Any views expressed in the message are those of the sender and may not
 necessarily reflect the views of the College.
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Thu Apr 16 06:37:31 2009