Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Reverted from v. 3

...

  1. Querying files in S3 using EC2, Hive and Hadoop

    Appendix

    <<Anchor(S3n00b)>>

    S3 for n00bs

    One of the things useful to understand is how S3 is used as a file system normally. Each S3 bucket can be considered as a root of a File System. Different files within this filesystem become objects stored in S3 - where the path name of the file (path components joined with '/') become the S3 key within the bucket and file contents become the value. Different tools like [S3Fox|https:--addons.mozilla.org-en-US-firefox-addon-3247] and native S3 !FileSystem in Hadoop (s3n) show a directory structure that's implied by the common prefixes found in the keys. Not all tools are able to create an empty directory. In particular - S3Fox does (by creating a empty key representing the directory). Other popular tools like aws, s3cmd and s3curl provide convenient ways of accessing S3 from the command - line - but don't have the capability of creating empty directories.