題目1
需求:
每個用戶截止到每月為止的最大單月訪問次數和累計到該月的總訪問次數
三個字段的意思:
用戶名,月份,訪問次數
數據:
A,2015-01,5
A,2015-01,15
B,2015-01,5
A,2015-01,8
B,2015-01,25
A,2015-01,5
A,2015-02,4
A,2015-02,6
B,2015-02,10
B,2015-02,5
A,2015-03,16
A,2015-03,22
B,2015-03,23
B,2015-03,10
B,2015-03,11
最后結果展示:
用戶 月份 最大訪問次數 總訪問次數 當月訪問次數
A 2015-01 33 33 33
A 2015-02 33 43 10
A 2015-03 38 81 38
B 2015-01 30 30 30
B 2015-02 30 45 15
B 2015-03 44 89 44
解題思路
以username分和month分組,統計出每月訪問次數,得到如下結果
CREATE TABLE t01_s1 AS
SELECT table01.username, table01. MONTH, sum(table01.count) sum FROM myhive.table01 GROUP BY table01.username, table01. MONTH;進行自連接,選出tl.month>tr.month的字段,并用username和month分組就可以得到結果
SELECT tl.username, tl. MONTH, max(tr.sum) maxvisit, sum(tr.sum) sumvisit, max(tl.sum) currentmonth
FROM t01_s1 tl JOIN t01_s1 tr ON tl.username = tr.username
WHERE tl. MONTH >= tr. MONTH
GROUP BY tl.username, tl. MONTH;
進行自連接后的結果如下:
tl.username tl.month tl.sum tr.username tr.month tr.sum
A 2015-01 33 A 2015-01 33
A 2015-02 10 A 2015-01 33
A 2015-03 38 A 2015-01 33
A 2015-01 33 A 2015-02 10
A 2015-02 10 A 2015-02 10
A 2015-03 38 A 2015-02 10
A 2015-01 33 A 2015-03 38
A 2015-02 10 A 2015-03 38
A 2015-03 38 A 2015-03 38
B 2015-01 30 B 2015-01 30
B 2015-02 15 B 2015-01 30
B 2015-03 44 B 2015-01 30
B 2015-01 30 B 2015-02 15
B 2015-02 15 B 2015-02 15
B 2015-03 44 B 2015-02 15
B 2015-01 30 B 2015-03 44
B 2015-02 15 B 2015-03 44
B 2015-03 44 B 2015-03 44
題目2
// 建表語句:
CREATE TABLE course
(
id
int(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,
sid
int(11) DEFAULT NULL,
course
varchar(255) DEFAULT NULL,
score
int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
// 插入數據
// 字段解釋:id, 學號, 課程, 成績
INSERT INTO course
VALUES (1, 1, 'yuwen', 43);
INSERT INTO course
VALUES (2, 1, 'shuxue', 55);
INSERT INTO course
VALUES (3, 2, 'yuwen', 77);
INSERT INTO course
VALUES (4, 2, 'shuxue', 88);
INSERT INTO course
VALUES (5, 3, 'yuwen', 98);
INSERT INTO course
VALUES (6, 3, 'shuxue', 65);
求:所有數學課程成績 大于 語文課程成績的學生的學號
解答:
自連接的方式:
select c1.* from course c1 join course c2 on c1.sid=c2.sid where c1.score>c2.score and c1.course='shuxue';行列轉換的方式
select a.sid from (select sid,
max(case when course='yuwen'then score else 0 end) yuwen,
max(case when course='shuxue'then score else 0 end) shuxue
from course group by sid having shuxue>yuwen) a;
題目3
數據:
2014010114
2014010216
2014010317
2014010410
2014010506
2012010609
2012010732
2012010812
2012010919
2012011023
2001010116
2001010212
2001010310
2001010411
2001010529
2013010619
2013010722
2013010812
2013010929
2013011023
2008010105
2008010216
2008010337
2008010414
2008010516
2007010619
2007010712
2007010812
2007010999
2007011023
2010010114
2010010216
2010010317
2010010410
2010010506
2015010649
2015010722
2015010812
2015010999
2015011023
要求: 求出一年中出現最高溫度的那一天
輸出以下數據:
20010105 29
20070109 99
20080103 37
20100103 17
20120107 32
20130109 29
20140103 17
20150109 99
解題
解法一:
SELECT * FROM exercise3
WHERE concat( substr(DATA, 1, 4), substr(DATA, 9, 2))
IN ( SELECT concat( substr(DATA, 1, 4), max(substr(DATA, 9, 2))) FROM exercise3 GROUP BY substr(DATA, 1, 4));
思路:通過年份分組求出最高溫度的那一年和最高溫度,把這些數據看成一個集合。再查出原始表中出現這些數據的那一行。
解法二:
select substring(b.line, 1, 8) as max_temp_date, a.max_temp
from exercise3 b join
(select substring(c.line, 1, 4) as year, max(substring(c.line, -2)) as max_temp
from exercise3 c group by substring(c.line, 1, 4)) a
on a.year = substring(b.line, 1, 4) and
a.max_temp = substring(b.line, -2);
思路:1. 求出以你那為分組,求出最最高溫度和年份
- 用原始表和這個表進行連接,連接條件為年份相同且最高溫度相同的條目
題目4
現有一份以下格式的數據:
表示有id為1,2,3的學生選修了課程a,b,c,d,e,f中其中幾門:
數據:
id course
1,a
1,b
1,c
1,e
2,a
2,c
2,d
2,f
3,a
3,b
3,c
3,e
編寫Hive的HQL語句來實現以下結果:
表中的1表示選修,表中的0表示未選修
id a b c d e f
1 1 1 1 0 1 0
2 1 0 1 1 0 1
3 1 1 1 0 1 0
解題要點行列轉換。
解法1:
select id,
sum(case course when "a" then 1 else 0 end) as a,
sum(case course when "b" then 1 else 0 end) as b,
sum(case course when "c" then 1 else 0 end) as c,
sum(case course when "d" then 1 else 0 end) as d,
sum(case course when "e" then 1 else 0 end) as e,
sum(case course when "f" then 1 else 0 end) as f
from id_course group by id;
解法2:
先構造以下表
id id_courses courses
1 ["a","b","c","e"] ["a","b","c","d","e","f"]
2 ["a","c","d","f"] ["a","b","c","d","e","f"]
3 ["a","b","c","e"] ["a","b","c","d","e","f"]
1.左邊:
select d.id as id, collect_set(d.course) as id_courses from id_course d group by d.id;
右邊:
select sort_array(collect_set(course)) as tt from id_course;左右連接得到需要的表:
create id_courses table as
select a.id, a.id_courses, b.tt
from
(select d.id as id, collect_set(d.course) as id_courses from id_course d group by d.id) a
join
(select sort_array(collect_set(c.course)) as tt from id_course c) b ;查詢出最終結果
使用if判斷
select
id,
if(array_contains(a.id_courses, courses[0]),1,0) as a,
if(array_contains(a.id_courses, courses[1]),1,0) as b,
if(array_contains(a.id_courses, courses[2]),1,0) as c,
if(array_contains(a.id_courses, courses[3]),1,0) as d,
if(array_contains(a.id_courses, courses[4]),1,0) as e,
if(array_contains(a.id_courses, courses[5]),1,0) as f
from id_courses a;