查詢全體學生的學號與姓名
hive> select Sno,Sname from student;
查詢選修了課程的學生姓名
hive> select distinct Sname from student inner join sc on student.Sno=Sc.Sno;
----hive的group by 和集合函數
查詢學生的總人數
hive> select count(distinct Sno)count from student;
計算1號課程的學生平均成績
hive> select avg(distinct Grade) from sc where Cno=1;
查詢各科成績平均分
hive> select Cno,avg(Grade) from sc group by Cno;
查詢選修1號課程的學生最高分數
select Grade from sc where Cno=1 sort by Grade desc limit 1;
(注意比較:select * from sc where Cno=1 sort by Grade
select Grade from sc where Cno=1 order by Grade)
求各個課程號及相應的選課人數
hive> select Cno,count(1) from sc group by Cno;
查詢選修了3門以上的課程的學生學號
hive> select Sno from (select Sno,count(Cno) CountCno from sc group by Sno)a where a.CountCno>3;
或 hive> select Sno from sc group by Sno having count(Cno)>3;
hive的Order By/Sort By/Distribute By
Order By ,在strict 模式下(hive.mapred.mode=strict),order by 語句必須跟著limit語句,但是在nonstrict下就不是必須的,這樣做的理由是必須有一個reduce對最終的結果進行排序,如果最后輸出的行數過多,一個reduce需要花費很長的時間。
查詢學生信息,結果按學號全局有序
hive> set hive.mapred.mode=strict; <默認nonstrict>
hive> select Sno from student order by Sno;//必須先設置為nonstrict或者跟著limit語句
FAILED: Error in semantic analysis: 1:33 In strict mode, if ORDER BY is specified, LIMIT must also be specified. Error encountered near token 'Sno'
Sort By,它通常發生在每一個redcue里,“order by” 和“sort by"的區別在于,前者能給保證輸出都是有順序的,而后者如果有多個reduce的時候只是保證了輸出的部分有序。set mapred.reduce.tasks=<number>在sort by可以指定,在用sort by的時候,如果沒有指定列,它會隨機的分配到不同的reduce里去。distribute by 按照指定的字段對數據進行劃分到不同的輸出reduce中
此方法會根據性別劃分到不同的reduce中 ,然后按年齡排序并輸出到不同的文件中。
查詢學生信息,按性別分區,在分區內按年齡有序
hive> set mapred.reduce.tasks=2;//設置兩個reduce 和性別對齊
hive> insert overwrite local directory '/home/hadoop/out' //導出排序結果 也可以直接導出到某個目錄的文件
select * from student distribute by Sex sort by Sage;//sort區內排序
Join查詢,join只支持等值連接
查詢每個學生及其選修課程的情況
hive> select student.*,sc.* from student join sc on (student.Sno =sc.Sno);
查詢學生的得分情況。
hive>select student.Sname,course.Cname,sc.Grade from student
join sc on student.Sno=sc.Sno
join course on sc.cno=course.cno;
查詢選修2號課程且成績在90分以上的所有學生。---//學生表和成績表做join 過了才是限制條件
hive> select student.Sname,sc.Grade from student
join sc on student.Sno=sc.Sno
where sc.Cno=2 and sc.Grade>90;
LEFT,RIGHT 和 FULL OUTER JOIN ,inner join, left semi join
查詢所有學生的信息,如果在成績表中有成績,則輸出成績表中的課程號
hive> select student.Sname,sc.Cno from student left outer join sc on student.Sno=sc.Sno;
如果student的sno值對應的sc在中沒有值,則會輸出student.Sname null.如果用right out join會保留右邊的值,左邊的為null。
Join 發生在WHERE 子句之前。如果你想限制 join 的輸出,應該在 WHERE 子句中寫過濾條件——或是在join 子句中寫。
LEFT SEMI JOIN
Hive 當前沒有實現 IN/EXISTS 子查詢,可以用 LEFT SEMI JOIN 重寫子查詢語句
重寫以下子查詢為LEFT SEMI JOIN
SELECT a.key, a.value
FROM a
WHERE a.key exist in
(SELECT b.key
FROM B);//不支持exist子查詢
可以被重寫為:
SELECT a.key, a.val
FROM a LEFT SEMI JOIN b on (a.key = b.key) //只返回innor join左邊部分
查詢與“劉晨”在同一個系學習的學生---兩個表自己join自己
hive> select s1.Sname from student s1 left semi join student s2 on s1.Sdept=s2.Sdept and s2.Sname='劉晨';
注意比較:
select * from student s1 left join student s2 on s1.Sdept=s2.Sdept and s2.Sname='劉晨';
select * from student s1 right join student s2 on s1.Sdept=s2.Sdept and s2.Sname='劉晨';
select * from student s1 inner join student s2 on s1.Sdept=s2.Sdept and s2.Sname='劉晨';
select * from student s1 left semi join student s2 on s1.Sdept=s2.Sdept and s2.Sname='劉晨';
select s1.Sname from student s1 right semi join student s2 on s1.Sdept=s2.Sdept and s2.Sname='劉晨';
保存數據的幾種方式
保存select查詢結果的幾種方式:
1、將查詢結果保存到一張新的hive表中
create table t_tmp
as
select * from t_p;
2、將查詢結果保存到一張已經存在的hive表中
insert into table t_tmp
select * from t_p;
3、將查詢結果保存到指定的文件目錄(可以是本地,也可以是hdfs)
insert overwrite local directory '/home/hadoop/test'
select * from t_p;
insert overwrite directory '/aaa/test'
select * from t_p;
數據準備
stu
95001,李勇,男,20,CS
95002,劉晨,女,19,IS
95003,王敏,女,22,MA
95004,張立,男,19,IS
95005,劉剛,男,18,MA
95006,孫慶,男,23,CS
95007,易思玲,女,19,MA
95008,李娜,女,18,CS
95009,夢圓圓,女,18,MA
95010,孔小濤,男,19,CS
95011,包小柏,男,18,MA
95012,孫花,女,20,CS
95013,馮偉,男,21,CS
95014,王小麗,女,19,CS
95015,王君,男,18,MA
95016,錢國,男,21,MA
95017,王風娟,女,18,IS
95018,王一,女,19,IS
95019,邢小麗,女,19,IS
95020,趙錢,男,21,IS
95021,周二,男,17,MA
95022,鄭明,男,20,MA
sc
95001,1,81
95001,2,85
95001,3,88
95001,4,70
95002,2,90
95002,3,80
95002,4,71
95002,5,60
95003,1,82
95003,3,90
95003,5,100
95004,1,80
95004,2,92
95004,4,91
95004,5,70
95005,1,70
95005,2,92
95005,3,99
95005,6,87
95006,1,72
95006,2,62
95006,3,100
95006,4,59
95006,5,60
95006,6,98
95007,3,68
95007,4,91
95007,5,94
95007,6,78
95008,1,98
95008,3,89
95008,6,91
95009,2,81
95009,4,89
95009,6,100
95010,2,98
95010,5,90
95010,6,80
95011,1,81
95011,2,91
95011,3,81
95011,4,86
95012,1,81
95012,3,78
95012,4,85
95012,6,98
95013,1,98
95013,2,58
95013,4,88
95013,5,93
95014,1,91
95014,2,100
95014,4,98
95015,1,91
95015,3,59
95015,4,100
95015,6,95
95016,1,92
95016,2,99
95016,4,82
95017,4,82
95017,5,100
95017,6,58
95018,1,95
95018,2,100
95018,3,67
95018,4,78
95019,1,77
95019,2,90
95019,3,91
95019,4,67
95019,5,87
95020,1,66
95020,2,99
95020,5,93
95021,2,93
95021,5,91
95021,6,99
95022,3,69
95022,4,93
95022,5,82
95022,6,100
course
1,數據庫
2,數學
3,信息系統
4,操作系統
5,數據結構
6,數據處理
stu-sc-course-ddl
create table student(Sno int,Sname string,Sex string,Sage int,Sdept string)row format delimited fields terminated by ','stored as textfile;
create table course(Cno int,Cname string) row format delimited fields terminated by ',' stored as textfile;
create table sc(Sno int,Cno int,Grade int)row format delimited fields terminated by ',' stored as textfile;
load data local inpath '/home/hadoop/hivedata/students.txt' overwrite into table student;
load data local inpath '/home/hadoop/hivedata/sc.txt' overwrite into table sc;
load data local inpath '/home/hadoop/hivedata/course.txt' overwrite into table course;