Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 4.0

Apache Hive

The

unmigrated-inline-wiki-markup

...

{tm}[Apache Hive|http://hive.apache.org]{tm}

...

data

...

warehouse

...

software

...

facilitates

...

querying

...

and

...

managing

...

large

...

datasets

...

residing

...

in

...

distributed

...

storage.

...

Built

...

on

...

top

...

of

Wiki Markup
{tm}[Apache Hadoop|http://hadoop.apache.org]{tm}
,

...

it

...

provides

...

  • Tools

...

  • to

...

  • enable

...

  • easy

...

  • data

...

  • extract/transform/load

...

  • (ETL)

...

  • A

...

  • mechanism

...

  • to

...

  • impose

...

  • structure

...

  • on

...

  • a

...

  • variety

...

  • of

...

  • data

...

  • formats

...

  • Access

...

  • to

...

  • files

...

  • stored

...

  • either

...

  • directly

...

  • in
    Wiki Markup
    {tm}Apache HDFS{tm}

...

  • or

...

  • in

...

  • other

...

  • data

...

  • storage

...

  • systems

...

  • such

...

  • as
    Wiki Markup
    {tm}Apache HBase{tm}

...

  • Query

...

  • execution

...

  • via

...

  • MapReduce

...

Hive

...

defines

...

a

...

simple

...

SQL-like

...

query

...

language,

...

called

...

QL,

...

that

...

enables

...

users

...

familiar

...

with

...

SQL

...

to

...

query

...

the

...

data.

...

At

...

the

...

same

...

time,

...

this

...

language

...

also

...

allows

...

programmers

...

who

...

are

...

familiar

...

with

...

the

...

MapReduce

...

framework

...

to

...

be

...

able

...

to

...

plug

...

in

...

their

...

custom

...

mappers

...

and

...

reducers

...

to

...

perform

...

more

...

sophisticated

...

analysis

...

that

...

may

...

not

...

be

...

supported

...

by

...

the

...

built-in

...

capabilities

...

of

...

the

...

language.

...

QL

...

can

...

also

...

be

...

extended

...

with

...

custom

...

scalar

...

functions

...

(UDF's),

...

aggregations

...

(UDAF's),

...

and

...

table

...

functions

...

(UDTF's).

...

Hive

...

does

...

not

...

mandate

...

read

...

or

...

written

...

data

...

be

...

in

...

the

...

"Hive

...

format"---there

...

is

...

no

...

such

...

thing.

...

Hive

...

works

...

equally

...

well

...

on

...

Thrift,

...

control

...

delimited,

...

or

...

your

...

specialized

...

data

...

formats.

...

Please

...

see

...

File

...

Format

...

and

...

SerDe

...

in

...

the

...

Developer

...

Guide

...

for

...

details.

...

Hive

...

is

...

not

...

designed

...

for

...

OLTP

...

workloads

...

and

...

does

...

not

...

offer

...

real-time

...

queries

...

or

...

row-level

...

updates.

...

It

...

is

...

best

...

used

...

for

...

batch

...

jobs

...

over

...

large

...

sets

...

of

...

append-only

...

data

...

(like

...

web

...

logs).

...

What

...

Hive

...

values

...

most

...

are

...

scalability

...

(scale

...

out

...

with

...

more

...

machines

...

added

...

dynamically

...

to

...

the

...

Hadoop

...

cluster),

...

extensibility

...

(with

...

MapReduce

...

framework

...

and

...

UDF/UDAF/UDTF),

...

fault-tolerance,

...

and

...

loose-coupling

...

with

...

its

...

input

...

formats.

...

Components

...

of

...

Hive

...

include

...

HCatalog

...

and

...

WebHCat.

...

HCatalog

...

is

...

a

...

component

...

of

...

Hive.

...

It

...

is

...

a

...

table

...

and

...

storage

...

management

...

layer

...

for

...

Hadoop

...

that

...

enables

...

users

...

with

...

different

...

data

...

processing

...

tools

...

...

including

...

Pig

...

and

...

MapReduce

...

...

to

...

more

...

easily

...

read

...

and

...

write

...

data

...

on

...

the

...

grid.

...

WebHCat provides a service that you can use to run Hadoop MapReduce (or YARN), Pig, Hive jobs or perform Hive metadata operations using a HTTP (REST style) interface.

General Information about Hive

User Documentation

Administrator Documentation

HCatalog and WebHCat Documentation

Resources for Contributors

For more information, please see the official Hive website.

Apache Hive, Apache Hadoop, Apache HBase, Apache HDFS, Apache, the Apache feather logo, and the Apache Hive project logo are trademarks of The Apache Software Foundation.