...
No Format |
---|
CREATE TABLE u_data ( userid INT, movieid INT, rating INT, unixtime STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE; |
Then, download and unzip the data files from MovieLens 100k (see on the GroupLens datasets page for (which also has a README.txt file and index of unzipped files):
No Format |
---|
wget http://wwwfiles.grouplens.org/sites/wwwdatasets/movielens/ml-100k.zip |
or:
No Format |
---|
curl --remote-name http://files.grouplens.org/external_filesdatasets/datamovielens/ml-data.tar.gz tar xvzf ml-data.tar.gz -100k.zip |
Note: If the link to GroupLens datasets does not work, please report it on HIVE-5341 or send a message to the user@hive.apache.org mailing list.
Unzip the data files:
No Format |
---|
unzip ml-100k.zip |
And load u.data
And load it into the table that was just created:
No Format |
---|
LOAD DATA LOCAL INPATH 'ml-data<path>/u.data' OVERWRITE INTO TABLE u_data; |
...
No Format |
---|
SELECT COUNT(*) FROM u_data; |
Note that for older versions of Hive which don't include HIVE-287, you'll need to use COUNT(1) in place of COUNT(*).
...