一、數據準備
事實表:dwd_payment_info
維度表:dwd_order_info和dwd_user_info
1.1建表
- dwd_payment_info
hive (gmall)>
drop table if exists dwd_payment_info;
create external table dwd_payment_info(
`id` bigint COMMENT '',
`out_trade_no` string COMMENT '',
`order_id` string COMMENT '',
`user_id` string COMMENT '',
`alipay_trade_no` string COMMENT '',
`total_amount` decimal(16,2) COMMENT '',
`subject` string COMMENT '',
`payment_type` string COMMENT '',
`payment_time` string COMMENT ''
)
PARTITIONED BY (`dt` string)
stored as parquet
location '/warehouse/gmall/dwd/dwd_payment_info/'
tblproperties ("parquet.compression"="snappy")
;
- dwd_order_info
hive (gmall)>
drop table if exists dwd_order_info;
create external table dwd_order_info (
`id` string COMMENT '',
`total_amount` decimal(10,2) COMMENT '',
`order_status` string COMMENT ' 1 2 3 4 5',
`user_id` string COMMENT 'id',
`payment_way` string COMMENT '',
`out_trade_no` string COMMENT '',
`create_time` string COMMENT '',
`operate_time` string COMMENT ''
)
PARTITIONED BY (`dt` string)
stored as parquet
location '/warehouse/gmall/dwd/dwd_order_info/'
tblproperties ("parquet.compression"="snappy")
;
- dwd_user_info
hive (gmall)>
drop table if exists dwd_user_info;
create external table dwd_user_info(
`id` string COMMENT 'id',
`name` string COMMENT '',
`birthday` string COMMENT '',
`gender` string COMMENT '',
`email` string COMMENT '',
`user_level` string COMMENT '',
`create_time` string COMMENT ''
)
PARTITIONED BY (`dt` string)
stored as parquet
location '/warehouse/gmall/dwd/dwd_user_info/'
tblproperties ("parquet.compression"="snappy")
;
此三張表組成簡單的星型模型
因為dwd_order_info和dwd_user_info表為每日分區,kylin不支持維度表分區,導致外鍵重復,解決辦法是使用臨時表或者視圖。
對維度表創建視圖:
- dwd_order_view
hive (gmall)>
create view dwd_order_view as select * from dwd_order_info where dt=current_date;
- dwd_user_view
hive (gmall)>
create view dwd_user_view as select * from dwd_user_info where dt=current_date;
二、kylin操作
1.創建project(類比database)
點Add Project->gmall->test
2.導數據
data soucre->load table from tree-選擇準備的三張表
選完后表名會變粗
下面可以看到有表元數據了:
3.創建model
3.1 點擊new model->Model Name:module_payment
3.2 選擇事實表
3.3 添加維度表
3.3.1 DWD_PAYMENT_INFO -> INNER JOIN -> DWD_ORDERE_INFO -> New Join Condition: ORDER_ID=ID
3.3.2 DWD_PAYMENT_INFO -> INNER JOIN -> DWD_USER_INFO -> New Join Condition:USER_ID=ID
3.4.Dimensions(維)
1.DWD_PAYMENT_INFO : PAYMENTN_TYPE
2.DWD_ORDER-INFO : PARMENT_WAY
3.DWD_USER_INFO : GENDER, USER_LEVEL
3.5. Messures(度量)
1.DWD_PAYMENT_INFO : TOTAL_AMOUNT
3.6.Settings
3.6.1Partiton
select Partition Table -> DWD_PAYMENT_INFO -> DT -> yyyy-MM-dd
3.6.2 Filter(過濾)
根據自己業務需要
4 創建cube
4.1 Cube info -> module_payment -> Cube_payment
4.2 Dimensions(維度)
Add Dimensions -> DWD_PAYMENT_INFO[FactTable]:選PAYMENT_TYPE -> DWD_ORDER_INFO:選PAYMENT_WAY -> DWD_USER_INFO: 選 GENDER和USER_LEVEL
另外,我們要選Normal,不選Derived(衍生,優化)
4.3 Measures(度量)
4.4Defresh Setting
直接默認值
每天做一個構建,數據存hbase,每天在hbase中新生成一個表,導致hbase去查數據時如果查詢一個月的數據就要查詢30個表,會很慢,所以就根據這個setting合并,7天一小并(將每天的合并),28天一大并(將每7天的數據合并)