Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: fix link to MovieLens dataset and revise text (continued)

...

No Format
CREATE TABLE u_data (
  userid INT,
  movieid INT,
  rating INT,
  unixtime STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE;

Then, download and unzip the data files from MovieLens 100k (see on the GroupLens datasets page for (which also has a README.txt file and index of unzipped files):

No Format
wget http://wwwfiles.grouplens.org/sites/wwwdatasets/movielens/ml-100k.zip

or:

No Format
curl --remote-name http://files.grouplens.org/external_filesdatasets/datamovielens/ml-data.tar.gz
tar xvzf ml-data.tar.gz
-100k.zip

Note:  If the link to GroupLens datasets does not work, please report it on HIVE-5341 or send a message to the user@hive.apache.org mailing list.

Unzip the data files:

No Format
unzip ml-100k.zip

And load u.data And load it into the table that was just created:

No Format
LOAD DATA LOCAL INPATH 'ml-data<path>/u.data'
OVERWRITE INTO TABLE u_data;

...

No Format
SELECT COUNT(*) FROM u_data;

Note that for older versions of Hive which don't include HIVE-287, you'll need to use COUNT(1) in place of COUNT(*).

...