forked from kunalmahajan/hw0
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.md.old
executable file
·190 lines (126 loc) · 8.08 KB
/
README.md.old
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
# Lab 1
*Assigned: September 5, 2013*
*Due: September 10, 2013, 12:59 PM (just before class)*
The goal of this lab is for you to set up Amazon Web Services ("Amazon Cloud").
Many of the labs in this class will use Amazon's cloud computing
infrastructure. Using a cloud service like Amazon makes it easy to
share data sets, and quickly run any number of virtual machines that are
identical for all students in the class. We have credits from Amazon,
which we will use for later labs (in this lab, we will use a free
"micro" instance.)
### Sign up and setup the OS
**Signup**: [register for an account](https://aws-portal.amazon.com/gp/aws/developer/registration/index.html)
You will need to provide a credit card, however the class has Amazon credits so
you should not expect to _use_ the credit card even if you exhaust the free usage tier.
Once the class registration has settled down, we can provide you with an account to utilize the class' Amazon credits instead.
[[ old: Once the class registration has settled down, we will add you to the class's Amazon groups. ]]
If you're worried about being billed unexpectedly because you left a server or service running for too long, sign up for [billing alerts](https://portal.aws.amazon.com/gp/aws/developer/account?ie=UTF8&action=billing-alerts&sc_icampaign=welcome_email_2&sc_icontent=billing_alerts_link&sc_iplace=welcome_email_2&sc_idetail=aws_resources).
**Launch an instance**
1. Go to [http://aws.amazon.com](http://aws.amazon.com) and click 'AWS Management Console' under 'My Account/Console'
in the upper right.
1. Click EC2
1. Click Launch Instance. (You can optionally change the region you're in from 'Oregon' to 'Virginia' in the top right, which might get you lower latency to your server.)
1. Use the "Classic Wizard". As of this writing the "Quick Launch Wizard" would not successfully launch.
1. Select the 64-bit version of "Ubuntu Server 13.04."
1. Specify 1 instance of type "t1.micro". During the first year, Amazon does not charge for the first 750 hours of micro-instance usage (per month), so this won't cost you anything to launch.
1. You don't care about the subnet, and can simply click "Continue" on the "Advanced Instance Options", "Storage Device Configuration", and "Add Tags" pages.
1. You will need to specify a key pair, or create a new one. If you choose to create a new one, make sure you download it and save it (your file extension should be `.pem`).
1. The default security group is fine (you will need to enable ssh access to it below)
1. Click "Launch". It will take a few minutes for the instance to launch. Close the dialog, and wait on the instance listing table.
1. After the instances launches, click on it to obtain its "Public DNS" name. It should look something like ec2-xx-xxx-xx-xxx.us-REGION-2.compute.amazonaws.com
**Enable SSH Access**:
Before you can ssh to your instance, you need to enable ssh access for it. To do this:
1. Go the [EC2 Console](https://console.aws.amazon.com/ec2/v2/home?region=us-west-2)
1. Click on "Security Groups" under "Network & Security" in the leftmost panel.
1. Click the checkbox next to the "default" security group
1. Click on the "inbound" tab
1. Choose "SSH" from the "Create a new rule:" menu
1. Enter 0.0.0.0/0 to enable access from all IPs, or enter the IP address of your machine followed by "/32". (e.g., 192.168.1.10/32) to enable access from just your IP.
(This will enable ssh access to allow VMs in the default security group, so you shouldn't need to do this in the future.)
**SSH to Your Instance**:
Using a terminal program (e.g, MacOS Terminal, or an xterm on Athena, or a Cygwin terminal under windows), type:
ssh -i <PEM FILE> ubuntu@<public address>
Where <PEM FILE> is key file you downloaded when launching the instance.
You may get an error about the permissions of your PEM file. If so, type:
chmod 400 <PEM FILE>
and then try to ssh again.
**Setup the OS**: ensure the following packages are available using the Ubuntu package management tool _apt-get_.
To install a package, type:
sudo apt-get install <packagename>
Make sure you have the following packages:
* python2.7
* python-psycopg2
* python-sqlalchemy
* python-protobuf
* postgresql-9.1
* postgresql-client-9.1
* sqlite3
* git
**Checkout the class repository**
The class repository is publicly accessibly, and contains a `hws`
directory that contains all of your homeworks (and a `projects` directory with your projects).
Clone it using git:
git clone https://github.com/cudbg/4111-fall2015.git
**Test that things worked**
Let's make sure you have access to Python, sqlite3, MongoDB, and the git repository.
**Python**: Type `python` and ensure that you see the following:
Python 2.7.4 (default, Apr 19 2013, 18:28:01)
[GCC 4.7.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>
If you do, push `ctrl+d` to exit the prompt.
**sqlite3**:
SQLite is an "embedded" SQL database (it doesn't depend on a dedicated server process; instead the client just manipulated a stored
datbase file directly.)
To ensure it is installed, type `sqlite3` and verify that you see the following:
SQLite version 3.7.15.2 2013-01-09 11:53:05
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite>
If you do, push `ctrl+d` to exit the prompt.
**PostgreSQL using Amazon RDS**:
PostgreSQL is an open source standalone database server (a DBMS!)
Amazon provides a **cloud database service** that makes it easy to
start and access a PostgreSQL (or MySQL, etc) database without the hassle
of installation.
Set it up by logging into your [aws.amazon.com](http://aws.amazon.com) account and following
[**the instructions**](rdssetup.pdf).
If you can run `psql` and access your RDS database, then push `ctrl+d` to exit the `psql` prompt.
**git repository**: Type `cat 4111-fall2015/hws/hw0/README.md`
You should see the instructions for this lab fly by.
**Stop your virtual machine**:
While the AWS free tier should get you through the class, you only get
750 hours per month of compute time. That's enough time for one micro
instance machine to run for a month, but it's very easy to accidentally spend hundreds or thousands of dollars.
To conserve your hours (and avoid wasting energy), make sure to turn off your machine
**whenever you are not using it**.
1. Go to the [EC2 Console](https://console.aws.amazon.com/ec2/v2/home?region=us-west-2).
* note that AWS shows your use _per region_ based on the data center you are running in.
Click **Global** in the upper right of the main console to pick your region. You may need to
try a few if you forget which region you are in.
1. In the dashboard on the left, click on "Instances"
1. Click on the checkbox on the left of your running micro instance.
1. In the "Actions" menu at the top, choose "Stop". This will shut down the instance. Before doing future labs, you'll have to follow these instructions and choose "Start" to restart your instance. Note that restarting the instance will potentially change the "Public DNS" value for that instance.
**Handing in your work**:
To complete this lab, download the "zoo.json" file from Amazon into your "micro" instance, by typing:
wget https://raw.githubusercontent.com/cudbg/4111-fall2015/master/hws/hw0/iowa-liquor-sample.csv
Write a python script tthat reads the file and computes the number of records
(in this dataset, each line is a record) that contain the exact phrase "single malt scotch" (ignoring case).
Ignore upper and lower casing, so "Single Malt Scotch", and "SINGLE Malt Scotch" all match, whereas
"Single's Malty Scootch" does not.
You should create a text (.txt) file with the following format:
<YOURNAME>
<Student ID>
<# of single malt scotch records>
<your python program>
For example:
Eugene Wu
1234568
9
import sys
num_whiskies = 0
num_whiskies += 9
print num_whiskies
Upload it to the [coureworks website](http://courseworks.columbia.edu/) as the "hw0" assignment.
Now you're almost done! Go read the assigned paper(s) for today.
You can always feel free to send us questions on Piazza!