I have a ruby script that goes and saves web pages from various sites, how do i make sure that it checks if the server can send gzipped files and saves them if available…
any help would be great!
R – How to request for gzipped pages from web servers through ruby scripts
compressiongziphttpruby
Related Solutions
This explanation is based on a commented Ruby script from a friend of mine. If you want to improve the script, feel free to update it at the link.
First, note that when Ruby calls out to a shell, it typically calls /bin/sh
, not Bash. Some Bash syntax is not supported by /bin/sh
on all systems.
Here are ways to execute a shell script:
cmd = "echo 'hi'" # Sample string that can be used
Kernel#`
, commonly called backticks –`cmd`
This is like many other languages, including Bash, PHP, and Perl.
Returns the result (i.e. standard output) of the shell command.
Docs: http://ruby-doc.org/core/Kernel.html#method-i-60
value = `echo 'hi'` value = `#{cmd}`
Built-in syntax,
%x( cmd )
Following the
x
character is a delimiter, which can be any character. If the delimiter is one of the characters(
,[
,{
, or<
, the literal consists of the characters up to the matching closing delimiter, taking account of nested delimiter pairs. For all other delimiters, the literal comprises the characters up to the next occurrence of the delimiter character. String interpolation#{ ... }
is allowed.Returns the result (i.e. standard output) of the shell command, just like the backticks.
Docs: https://docs.ruby-lang.org/en/master/syntax/literals_rdoc.html#label-Percent+Strings
value = %x( echo 'hi' ) value = %x[ #{cmd} ]
Kernel#system
Executes the given command in a subshell.
Returns
true
if the command was found and run successfully,false
otherwise.Docs: http://ruby-doc.org/core/Kernel.html#method-i-system
wasGood = system( "echo 'hi'" ) wasGood = system( cmd )
Kernel#exec
Replaces the current process by running the given external command.
Returns none, the current process is replaced and never continues.
Docs: http://ruby-doc.org/core/Kernel.html#method-i-exec
exec( "echo 'hi'" ) exec( cmd ) # Note: this will never be reached because of the line above
Here's some extra advice:
$?
, which is the same as $CHILD_STATUS
, accesses the status of the last system executed command if you use the backticks, system()
or %x{}
.
You can then access the exitstatus
and pid
properties:
$?.exitstatus
For more reading see:
It is up to the browser but they behave in similar ways.
F5 usually updates the page only if it is modified. Modern browsers sends Cache-Control: max-age=0
to tell any cache the maximum amount of time a resource is considered fresh, relative to the time of the request.
CTRL-F5 is used to force an update, disregarding any cache. Modern browsers sends Cache-Control: no-cache
and Pragma: No-cache
If I remember correctly it was Netscape which was the first browser to add support for cache-control by adding Pragma: No-cache
when you pressed CTRL-F5.
┌───────────┬──────────────┬─────┬─────────────────┬──────────────────────────────┐
│ Version 4 │ F5 │ R │ CLICK │ Legend: │
│2021 MAY 19├──┬──┬──┬──┬──┼──┬──┼──┬──┬──┬──┬──┬──┤ C = Cache-Control: no-cache │
│ │ │S │C │A │A │C │C │ │S │C │A │A │C │ I = If-Modified-Since │
│ │ │H │T │L │L │T │T │ │H │T │L │L │T │ M = Cache-Control: max-age=0 │
│ │ │I │R │T │T │R │R │ │I │R │T │T │R │ N = Not tested │
│ │ │F │L │ │G │L │L │ │F │L │ │G │L │ P = Pragma: No-cache │
│ │ │T │ │ │R │ │+ │ │T │ │ │R │+ │ - = ignored │
│ │ │ │ │ │ │ │S │ │ │ │ │ │S │ │
│ │ │ │ │ │ │ │H │ │ │ │ │ │H │ With 'CLICK' I refer to a │
│ │ │ │ │ │ │ │I │ │ │ │ │ │I │ mouse click on the browsers │
│ │ │ │ │ │ │ │F │ │ │ │ │ │F │ refresh-icon. │
│ │ │ │ │ │ │ │T │ │ │ │ │ │T │ │
│ │ │ │ │ │ │ │ │ │ │ │ │ │ │ 1: Version 3.0.6 sends I │
├───────────┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┤ and C, but 3.1.6 opens │
│Brave 1.24 │M │CP│CP│- │- │M │CP│M │CP│CP│M │CP│CP│ the page in a new tab, │
├───────────┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┤ making a normal request │
│Chrome 1 │MI│MI│MI│- │- │MI│- │MI│MI│MI│MI│MI│N │ with only I. │
│Chrome 6 │MI│CP│CP│- │- │MI│CP│MI│CP│CP│MI│- │N │ 2: Version 10.62 does │
│Chrome 90 │M │CP│CP│- │- │M │CP│M │CP│CP│M │CP│CP│ nothing. 9.61 might do C │
├───────────┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┤ unless it was a typo in │
│Edge 90 │M │CP│CP│- │- │M │CP│M │CP│CP│M │CP│CP│ my old table. │
├───────────┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┤ 3: Opens the currernt tab in │
│Firefox 3.x│MI│- │CP│- │- │MI│CP│MI│CP│1 │M │MI│N │ a new tab, but does not │
│Firefox 89 │M │- │CP│- │M │M │CP│M │CP│3 │M │M │3 │ refresh the page if it is │
├───────────┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┤ cached in the browser. │
│MSIE 8, 7 │I │- │C │- │I │I │ │I │I │C │I │I │N │ │
├───────────┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┤ │
│Opera 10, 9│C │- │- │2 │- │C │- │C │C │C │C │- │N │ │
│Opera 76 │M │CP│CP│- │- │M │- │M │CP│CP│M │CP│CP│ │
├───────────┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──────────────────────────────┤
│ https://stackoverflow.com/a/385491/36866 │
└─────────────────────────────────────────────────────────────────────────────────┘
Note about Chrome 6.0.472: If you do a forced reload (like CTRL-F5) it behaves like the url is internally marked to always do a forced reload. The flag is cleared if you go to the address bar and press enter.
Related Topic
- How do browser cookie domains work
- Javascript – Detect when browser receives file download
- Ruby – How to remove RVM (Ruby Version Manager) from the system
- Ruby – How to make a HTTP request using Ruby on Rails
- Javascript – Serving gzipped CSS and JavaScript from Amazon CloudFront via S3
- Bash – How to properly handle a gzipped page when using curl
- Html – the motivation behind the introduction of preflight CORS requests
Best Answer
One can send custom headers as hashes ...
you can then check the response by defining a response object as :
Thanks to those who responded...