Skip to content
This repository was archived by the owner on Nov 27, 2019. It is now read-only.

CSV reading guide (Ruby)

Liz Conlan edited this page Jun 12, 2013 · 1 revision

Comma-separated value files are a quite commonly used text form of spreadsheet. To process them often requires special cases, such as parsing dates or removing bad rows.

For more details, read the full Ruby CSV documentation.

Reading rows

Download the CSV file first. (As we happen to know that the file we are downloading is supplied in utf-8 format, the first line of the code tells Ruby what to expect. If there are quirks in the input file, you might at this point want to preprocess the data using, for example, the .gsub function)

# encoding: utf-8

require "scraperwiki"
data = ScraperWiki.scrape("http://s3-eu-west-1.amazonaws.com/ukhmgdata-cabinetoffice/Spend-data-2010-11-01/Spend-Transactions-with-descriptions-HMT-09-Sep-2010.csv")

p data[0..499]

Load it into the standard Ruby CSV reader, which should automatically convert it into a list of lines.

require "csv"
data = ScraperWiki.scrape("http://s3-eu-west-1.amazonaws.com/ukhmgdata-cabinetoffice/Spend-data-2010-11-01/Spend-Transactions-with-descriptions-HMT-09-Sep-2010.csv")

reader = CSV::parse(data)

You can then loop through the rows as if they were a list.

reader.each do |row|
    p "#{row[7]} GBP spent on #{row[3]}"
end

Saving to the datastore

Conventionally the first line gives the names for the columns. You can get the standard reader to load in each row as a dictionary, where the keys are those names.

reader = CSV::parse(data, :headers => {:first_row => true})

This makes it easy to save the data - all we need to do is convert each row to a Hash. By default everything comes out as strings. We convert the 'Amount' row to a number type, so that it can then be added and sorted.

reader.each do |row|
  if row['Transaction Number']
    row = row.to_hash
    row['Amount'] = (row['Amount']).to_f
    ScraperWiki.save_sqlite(['Transaction Number', 'Expense Type', 'Expense Area'], row)
  end
end
Clone this wiki locally