THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!

Apache Kylin : Analytical Data Warehouse for Big Data

Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Please remind that the segment range use the GMT+0/UTC timezone, if you see a segment named "20200923160000_20200924160000", and it means that this segment started from "2020-09-23 16:00:00 GMT+00:00", that is "2020-09-24 00:00:00 GMT+08:00" for the people lived in China.


Code Block
languagesql
themeRDark
titleLambdaTable DDL
linenumberstrue
CREATE EXTERNAL TABLE IF NOT EXISTS lambda_flat_table
(
-- event timestamp and debug purpose column
EVENT_TIME timestamp,
str_minute_second string COMMENT "For debug purpose, maybe check timezone etc",

-- dimension column
act_type string COMMENT "What did user interact with our mobile app in this event",
user_devide_type string COMMENT "Which kind of device did user use in this event",
location_city string COMMENT "Which city did user locate in this event",
video_id bigint COMMENT "Which video did user watch in this event",
device_brand string,
page_id string,

-- measure column
play_times bigint,
play_duration decimal(23, 10),
pageview_id string COMMENT "Identier of a pageview",


-- for kylin used (dimension)
MINUTE_START timestamp,
HOUR_START timestamp,
MONTH_START date
)
COMMENT 'Fact table. Store raw user action log.'
PARTITIONED BY (DAY_START date)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION 'hdfs:///LACUS/lambda_data/lambda_flat_table';

Prepare sample event script

Says that we wan want to monitor user's action against our mobile video application. Following script(Python2) will send event in JSON format to STDOUT. 

...

Please configure "kylin.stream.event.timezone" in your with your local timezone. Here is what I use(kylin.stream.event.timezone=GMT+8).

...

For each derived time column(event_time, minute_start, hour_start in our case), please make sure you remove REDUCE your local timezone offset. For example, for a local timestamp "2020-09-23 00:07:35 GMT+08:00", please reduce timezone offset (8 hour) and remove timezone suffix, the result is "2020-09-22 16:07:35".

...

Send request to refresh segment, for startTime and endTime, make sure you are use (local)timestamp of "2020-09-23 00:00:00 GMT+08:00" .

...