Last.fm API - gathering artist tags with Ruby

Ever wondered what kind of music you listen to? Last.fm (and Audioscrobbler underneath) allows users to track their listening habits and find some other bands/songs that they might like. When I first discovered Last.fm, I was very enthusiastic about it - not only does it track down your own listening habits, but it also allows you to find people with a similar music preferences. More important, you can also find other bands by looking at what others are listening - and this is not a machine generated list, but a human one.

Of course there is a price - privacy. It's definitively not the kind of service for the paranoid types - but really, does it matter that much what you listen to? Especially when your identity is hidden underneath a nickname, with not much personal details attached to it.

One of the beauties of Last.fm is the ability of accessing the statistics through a web service API. Accessing the service is as simple as building up a HTTP request and parsing a result XML file.

I really wanted to see what kind of music I like - of course I should know this better than a machine, but sometimes it's interesting to know an 'outside opinion'. Last.fm doesn't seem to provide this ability, at least not in terms of music style (it provides it on an artist level though) - but building up a list of styles is quite simple when you consider that the majority of tags associated with artists are actually musical styles. Therefore, all we need to do is retrieve the list of the most listened artists by a user (or us) and then for each artist, retrieve the associated tags; compile all these lists into a single one and you have the list of musical styles (ok, after ignoring tags such as "seen live", etc.).

Here's a simple Ruby script that does this. In order to parse the resulted XML, it uses the HTree library and REXML to query the XML tree.

require 'open-uri'
require 'htree'
require 'rexml/document'
require 'uri'
 
# Let's pretend we're a Gecko browser; it's not really needed though.
@user_agent = "Mozilla/5.001 (windows; U; NT4.0; en-us) Gecko/25250101"
 
#
# Obtain the list of tags for a given artist.
#  @param artist_name the name of the artist
#  @return a hash with the tag as the key and the 
#          tag popularity as value
#
def get_artist_tags( artist_name )
  link = "http://ws.audioscrobbler.com/1.0/artist/" + URI.encode( artist_name ) + "/toptags.xml"
  tags = Hash.new
 
  # Open the REST link
  open( link, "User-Agent" => @user_agent ) do |page|
    page_content = page.read()
    doc = HTree( page_content ).to_rexml
 
    # Parse the returned XML result.
    doc.root.each_element('//tag') do |tag|
      tag_name = tag.elements.to_a('name')[0].text.to_s.downcase
      tag_count = tag.elements.to_a('count')[0].text.to_i
 
      if tags[tag_name].nil?
        tags[tag_name] = tag_count
      else
        tags[tag_name] += tag_count
      end
    end
  end
 
  return tags
end
 
#
# Obtain a list of the most listened artists by a user.
#  @param username the name of the username to look upon
#  @param max_artists the maximum number of artists returned
#  @return an array containg the desired artists
def get_user_artists( username, max_artists = 10 )
  link = "http://ws.audioscrobbler.com/1.0/user/#{username}/topartists.xml?type=overall"
  artists = []
 
  # Open the REST link
  open( link, "User-Agent" => @user_agent ) do |page|
    page_content = page.read()
    doc = HTree( page_content ).to_rexml
 
    # Parse the returned XML result.
    doc.root.each_element('//artist/name') do |elem|
      artists << elem.text.to_s
      if artists.length >= max_artists
        break
      end
    end
  end
 
  return artists
end
 
#
# The main function, it retrives a the list of artists for a given
# username and builds the list of tags associated with these artists.
#
def artist_tags( username, max_artists )
  tags = Hash.new
  artists = get_user_artists( username, max_artists )
 
  puts "Total artists: " + artists.length.to_s 
  artists.each do |artist|
    artist_tags = get_artist_tags( artist )
 
    puts "Receiving tags for #{artist}"
    artist_tags.keys.each do |tag|
      if tags[tag].nil?
        # Don't use the tag popularity counter
        # tags[tag] = artist_tags[tag]
        tags[tag] = 1
      else
        # Don't use the tag popularity counter
        # tags[tag] += artist_tags[tag]
        tags[tag] += 1
      end
    end
 
    # This will slow down everything, but it's always a good practice not to 
    # overload web APIs with requests.
    sleep(2)
  end
 
  # Sort the hash by using the values - and not the keys. The result will  
  # obviously be an array
  tags_sorted = tags.sort { |a,b| a[1]<=>b[1] }
  tags_sorted.each do |tag_data|
    puts tag_data[0] + " (" + tag_data[1].to_s + ")"
  end
end
 
if ARGV.length < 1
  puts "Not enough arguments."
  puts "Syntax: artist_tags.rb <username>"
  exit
end
 
if ARGV.length == 1
  artist_tags( ARGV[0] )
else 
  artist_tags( ARGV[0], ARGV[1].to_i)
end