Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do we really need "Accept-Encoding = none"? #90

Closed
novill opened this issue May 27, 2016 · 3 comments
Closed

Do we really need "Accept-Encoding = none"? #90

novill opened this issue May 27, 2016 · 3 comments

Comments

@novill
Copy link

novill commented May 27, 2016

Hi!
I wonder if you could help me with your LinkThumbnailer a little bit.
I'm running into a problem with your gem when trying to extract data from slashfilm.com. For example, the result is empty when I'm executing this:

2.2.3 :002 > lt = LinkThumbnailer.generate 'http://www.slashfilm.com/'
ETHON: started MULTI
ETHON: performed MULTI
 => #<LinkThumbnailer::Models::Website:0x0000000bdf6208 @images=[], @videos=[], @url=#<URI::HTTP http://www.slashfilm.com/>, @title="", @description="", @favicon=""> 

But if I comment out the line #43 in processor.rb:

http.override_headers['Accept-Encoding'] = 'none'

Everything works ok.

So, my question is: do we really need this line?

@gottfrois
Copy link
Owner

If i remember correctly, this line prevents issues when webpage is encoded using gzip for example #41

Can you look at the http headers slashfilm.com returns?

@novill
Copy link
Author

novill commented May 31, 2016

Without "Accept-Encoding = none"

2.2.3 :008 > y ::Net::HTTP::Persistent.new.request(URI.parse('http://www.slashfilm.com/'))
--- !ruby/object:Net::HTTPOK
http_version: '1.1'
code: '200'
message: OK
header:
  server:
  - nginx
  date:
  - Tue, 31 May 2016 13:19:26 GMT
  content-type:
  - text/html; charset=UTF-8
  content-length:
  - '12751'
  connection:
  - keep-alive
  vary:
  - Accept-Encoding,Cookie
  last-modified:
  - Tue, 31 May 2016 13:02:23 GMT
  cache-control:
  - max-age=0, public
  expires:
  - Tue, 31 May 2016 13:18:18 GMT
  x-powered-by:
  - PleskLin
  ms-author-via:
  - DAV
  x-pingback:
  - http://www.slashfilm.com/wp/xmlrpc.php
  pragma:
  - public
  x-cache-status:
  - HIT
  accept-ranges:
  - bytes
body: !binary |-
  PCFET0NUWVBFIGh0bWwgUFVCTElDICItLy9XM0MvL0RURCBYSFRNTCAxLjAg
...
  PC9zY3JpcHQ+PC9ib2R5PjwvaHRtbD4=
read: true
uri: 
decode_content: true
socket: 
body_exist: true

With this line it returns gzip encoding

  http   = ::Net::HTTP::Persistent.new
  http.override_headers['Accept-Encoding'] = 'none'
  response          = http.request(::URI.parse('http://www.slashfilm.com/'))
   y response
--- !ruby/object:Net::HTTPOK
http_version: '1.1'
code: '200'
message: OK
header:
  server:
  - nginx
  date:
  - Tue, 31 May 2016 13:23:01 GMT
  content-type:
  - text/html; charset=UTF-8
  content-length:
  - '12751'
  connection:
  - keep-alive
  vary:
  - Accept-Encoding,Cookie
  last-modified:
  - Tue, 31 May 2016 13:02:23 GMT
  cache-control:
  - max-age=0, public
  expires:
  - Tue, 31 May 2016 13:20:17 GMT
  x-powered-by:
  - PleskLin
  ms-author-via:
  - DAV
  x-pingback:
  - http://www.slashfilm.com/wp/xmlrpc.php
  pragma:
  - public
  content-encoding:
  - gzip
  x-cache-status:
  - HIT
  accept-ranges:
  - bytes
body: !binary |-
  H4sIAAAAAAAAA+19+3rbOJLv//MUaPVsy+4WdXV8kSJnfImT9Oa2sXsys0mO
...
  APiD2wm7/x9s4uAdINgAAA==
read: true
uri: 
decode_content: false
socket: 
body_exist: true

@gottfrois
Copy link
Owner

We should probably allow anyone to override those. Provide best possible defaults but allow to be overrided

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants