ruby: batch link checking

While testing a web application, sometimes it's useful to batch test a number of URLs. Even more, some applications might not be too friendly towards free browsing, thus when testing something like this, it might be a good idea to keep some sort of state on the client (cookies for example).

I know that one of the best tools for link checking is simply wget. It can do anything that you want, but it's too generic to be proficient. This is why I came up with the following script. It reads the links from the specified file and "visits" (i.e. retrieves) them - while also storing any cookies that might be passed. Thus, all visits appear to have come from the same browser, with the same cookies set and the session is preserved.

For each encountered error, an entry is made in the report file. No other data is saved in memory.

require "Mechanize"
 
#
# Retrieves the links read from the given input file and reports
# any errors in the report file.
#
def checkLinks(agent, inputFile, reportFile)
  failures = 0
  links = 0
  while (line = inputFile.gets)
    puts timestamp() + " - getting: " + line
 
    begin
      # retrieves the content from the specified address
      agent.get(line)
    rescue => e    
      reportFile.puts(timestamp() + " - failed to read from #{line} (#{e.message})")
      failures = failures + 1
    end
 
    links = links + 1
  end
 
  puts "\nTotal links: #{links}"
  puts "   Failures: #{failures}"
end
 
#
# Returns the current timestamp
#
def timestamp()
  t = Time.now
  return t.strftime("%Y/%m/%d %H:%M:%S")
end
 
# Validates the number of arguments
if ARGV.length == 1
 
  # Consult the Mechanize documentation for optional parameters that can
  # be passsed to the agent
  agent = WWW::Mechanize.new
 
  inputFile = File.open(ARGV[0], "r")
  reportFile = File.open(ARGV[0]+"-report.log", "w")
 
  # Main body
  checkLinks(agent, inputFile, reportFile)
 
  # Cleanup
  inputFile.close
  reportFile.close
else
  puts "Not enough arguments."
  puts "Syntax: linktester.rb inputfile"
end

In order to run this script, you will need the Mechanize gem installed. This provides a simple browser engine, that keeps cookies (thus sessions too) and it can be used for the programmatic access of web sites.