国产成a人亚洲精v品无码,亚洲精品无码午夜福利理论片 ,综合网

文章作者：foochane
</br>
原文鏈接：https://foochane.cn/article/2019062501.html

1 基本介紹

1.1 HIVE簡單介紹

Hive是一個可以將SQL翻譯為MR程序的工具，支持用戶將HDFS上的文件映射為表結(jié)構(gòu)，然后用戶就可以輸入SQL對這些表（HDFS上的文件）進行查詢分析。Hive將用戶定義的庫、表結(jié)構(gòu)等信息存儲hive的元數(shù)據(jù)庫（可以是本地derby，也可以是遠程mysql）中。

1.2 Hive的用途

做數(shù)據(jù)分析，不用自己寫大量的MR程序，只需要寫SQL腳本即可
用于構(gòu)建大數(shù)據(jù)體系下的數(shù)據(jù)倉庫

hive 2 以后把底層引擎從MapReduce換成了Spark

啟動hive前要先啟動hdfs 和yarn

2 使用方式

2.1 方式1：直接使用hive服務(wù)端

輸入命令 $ hive即可：

hadoop@Master:~$ hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/bigdata/hive-2.3.5/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/bigdata/hadoop-2.7.1/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in file:/usr/local/bigdata/hive-2.3.5/conf/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive>show databases;
OK
dbtest
default
Time taken: 3.539 seconds, Fetched: 2 row(s)
hive>

技巧：
讓提示符顯示當(dāng)前庫：

hive>set hive.cli.print.current.db=true;

顯示查詢結(jié)果是顯示自帶名稱：

hive>set hive.cli.print.header=true;

這樣設(shè)置只是對當(dāng)前窗口有效，永久生效可以在當(dāng)前用戶目錄下建一個.hiverc文件。
加入如下內(nèi)容：

set hive.cli.print.current.db=true;
set hive.cli.print.header=true;

2.2 方式2：使用beeline客戶端

將hive啟動為一個服務(wù)端，然后可以在任意一臺機器上使用beeline客戶端連接hive服務(wù)，進行交互式查詢

hive是一個單機的服務(wù)端可以在任何一臺機器里安裝，它訪問的是hdfs集群。

啟動hive服務(wù) ：

$ nohup hiveserver2 1>/dev/null 2>&1 &

啟動后，可以用beeline去連接，beeline是一個客戶端，可以在任意機器啟動,只要能夠跟hive服務(wù)端相連即可。

在本地啟動beeline

$ beeline -u jdbc:hive2://localhost:10000 -n hadoop -p hadoop

在啟動機器上啟動beeline

$ beeline -u jdbc:hive2://Master:10000 -n hadoop -p hadoop

示例：

hadoop@Master:~$ beeline -u jdbc:hive2://Master:10000 -n hadoop -p hadoop
Connecting to jdbc:hive2://Master:10000
19/06/25 01:50:12 INFO jdbc.Utils: Supplied authorities: Master:10000
19/06/25 01:50:12 INFO jdbc.Utils: Resolved authority: Master:10000
19/06/25 01:50:13 INFO jdbc.HiveConnection: Will try to open client transport with JDBC Uri: jdbc:hive2://Master:10000
Connected to: Apache Hive (version 2.3.5)
Driver: Hive JDBC (version 1.2.1.spark2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.2.1.spark2 by Apache Hive
0: jdbc:hive2://Master:10000>

參數(shù)說明

u ：指定連接方式
n ：登錄的用戶（系統(tǒng)用戶）
p ：用戶密碼

報錯

 errorMessage:Failed to open new session: java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: hadoop is not allowed to impersonate hadoop), serverProtocolVersion:null)

解決

在 hadoop配置文件中的core-site.xml 文件中添加如下內(nèi)容,然后重啟hadoop集群：

<property>
      <name>hadoop.proxyuser.hadoop.groups</name>
      <value>hadoop</value>
      <description>Allow the superuser oozie to impersonate any members of the group group1 and group2</description>
 </property>
 
 <property>
      <name>hadoop.proxyuser.hadoop.hosts</name>
      <value>Master,127.0.0.1,localhost</value>
      <description>The superuser can connect only from host1 and host2 to impersonate a user</description>
  </property>

2.3 方式3：使用hive命令運行sql

接用 hive -e 在命令行中運行sql命令，該命令可以一起運行多條sql語句，用;隔開。

hive -e "sql1;sql2;sql3;sql4"

另外，還可以使用 hive -f命令。

事先將sql語句寫入一個文件比如 q.hql，然后用hive -f命令執(zhí)行：

bin/hive -f q.hql

2.4 方式4：寫腳本

可以將方式3寫入一個xxx.sh腳本中,然后運行該腳本。

3 表的基本操作

3.1 新建數(shù)據(jù)庫

create database db1;

示例：

0: jdbc:hive2://Master:10000> create database db1;
No rows affected (1.123 seconds)
0: jdbc:hive2://Master:10000> show databases;
+----------------+--+
| database_name  |
+----------------+--+
| db1            |
| dbtest         |
| default        |
+----------------+--+

成功后，hive就會在/user/hive/warehouse/下建一個文件夾： db1.db

3.2 刪除數(shù)據(jù)庫

drop database db1;

示例：

0: jdbc:hive2://Master:10000> drop database db1;
No rows affected (0.969 seconds)
0: jdbc:hive2://Master:10000> show databases;
+----------------+--+
| database_name  |
+----------------+--+
| dbtest         |
| default        |
+----------------+--+

3.3 建內(nèi)部表

use db1;
create table t_test(id int,name string,age int)
row format delimited
fields terminated by ',';

示例：

0: jdbc:hive2://Master:10000> use db1;
No rows affected (0.293 seconds)
0: jdbc:hive2://Master:10000> create table t_test(id int,name string,age int)
0: jdbc:hive2://Master:10000> row format delimited
0: jdbc:hive2://Master:10000> fields terminated by ',';
No rows affected (1.894 seconds)
0: jdbc:hive2://Master:10000> desc db1.t_test;
+-----------+------------+----------+--+
| col_name  | data_type  | comment  |
+-----------+------------+----------+--+
| id        | int        |          |
| name      | string     |          |
| age       | int        |          |
+-----------+------------+----------+--+
3 rows selected (0.697 seconds)

建表后，hive會在倉庫目錄中建一個表目錄： /user/hive/warehouse/db1.db/t_test

3.4 建外部表

create external table t_test1(id int,name string,age int)
row format delimited
fields terminated by ','
location '/user/hive/external/t_test1';

這里的location指的是hdfs上的目錄，可以直接在該目錄下放入相應(yīng)格式的文件，就可以在hive表中查看到。

示例：

0: jdbc:hive2://Master:10000> create external table t_test1(id int,name string,age int)
0: jdbc:hive2://Master:10000> row format delimited
0: jdbc:hive2://Master:10000> fields terminated by ','
0: jdbc:hive2://Master:10000> location '/user/hive/external/t_test1';
No rows affected (0.7 seconds)
0: jdbc:hive2://Master:10000> desc db1.t_test1;
+-----------+------------+----------+--+
| col_name  | data_type  | comment  |
+-----------+------------+----------+--+
| id        | int        |          |
| name      | string     |          |
| age       | int        |          |
+-----------+------------+----------+--+
3 rows selected (0.395 seconds)

本地創(chuàng)建測試文件user.data

1,xiaowang,28
2,xiaoli,18
3,xiaohong,23

放入hdfs中：

$ hdfs dfs -mkdir -p /user/hive/external/t_test1
$ hdfs dfs -put ./user.data /user/hive/external/t_test1

此時在hive表中就可以查看到數(shù)據(jù)：

0: jdbc:hive2://Master:10000> select * from db1.t_test1;
+-------------+---------------+--------------+--+
| t_test1.id  | t_test1.name  | t_test1.age  |
+-------------+---------------+--------------+--+
| 1           | xiaowang      | 28           |
| 2           | xiaoli        | 18           |
| 3           | xiaohong      | 23           |
+-------------+---------------+--------------+--+
3 rows selected (8 seconds)

注意：如果刪除外部表，hdfs里的文件并不會刪除

也就是如果包db1.t_test1刪除，hdfs下/user/hive/external/t_test1/user.data文件并不會被刪除。

3.5 導(dǎo)入數(shù)據(jù)

本質(zhì)上就是把數(shù)據(jù)文件放入表目錄；

可以用hive命令來做：

load data [local] inpath '/data/path' [overwrite] into table t_test;

加local代表導(dǎo)入本地數(shù)據(jù)。

導(dǎo)入本地數(shù)據(jù)

load data local inpath '/home/hadoop/user.data' into table t_test;

示例：

0: jdbc:hive2://Master:10000> load data local inpath '/home/hadoop/user.data' into table t_test;
No rows affected (2.06 seconds)
0: jdbc:hive2://Master:10000> select * from db1.t_test;
+------------+--------------+-------------+--+
| t_test.id  | t_test.name  | t_test.age  |
+------------+--------------+-------------+--+
| 1          | xiaowang     | 28          |
| 2          | xiaoli       | 18          |
| 3          | xiaohong     | 23          |
+------------+--------------+-------------+--+

導(dǎo)入hdfs中的數(shù)據(jù)

load data inpath '/user/hive/external/t_test1/user.data' into table t_test;

示例：

0: jdbc:hive2://Master:10000> load data inpath '/user/hive/external/t_test1/user.data' into table t_test;
No rows affected (1.399 seconds)
0: jdbc:hive2://Master:10000> select * from db1.t_test;
+------------+--------------+-------------+--+
| t_test.id  | t_test.name  | t_test.age  |
+------------+--------------+-------------+--+
| 1          | xiaowang     | 28          |
| 2          | xiaoli       | 18          |
| 3          | xiaohong     | 23          |
| 1          | xiaowang     | 28          |
| 2          | xiaoli       | 18          |
| 3          | xiaohong     | 23          |
+------------+--------------+-------------+--+
6 rows selected (0.554 seconds)

注意：從本地導(dǎo)入數(shù)據(jù)，本地數(shù)據(jù)不是發(fā)生變化，從hdfs中導(dǎo)入數(shù)據(jù)，hdfs中的導(dǎo)入的文件會被移動到數(shù)據(jù)倉庫相應(yīng)的目錄下

3.6 建分區(qū)表

分區(qū)的意義在于可以將數(shù)據(jù)分子目錄存儲，以便于查詢時讓數(shù)據(jù)讀取范圍更精準(zhǔn)

create table t_test1(id int,name string,age int,create_time bigint)
partitioned by (day string,country string)
row format delimited
fields terminated by ',';

插入數(shù)據(jù)到指定分區(qū)：

> load data [local] inpath '/data/path1' [overwrite] into table t_test partition(day='2019-06-04',country='China');
> load data [local] inpath '/data/path2' [overwrite] into table t_test partition(day='2019-06-05',country='China');
> load data [local] inpath '/data/path3' [overwrite] into table t_test partition(day='2019-06-04',country='England');

導(dǎo)入完成后，形成的目錄結(jié)構(gòu)如下：

/user/hive/warehouse/db1.db/t_test1/day=2019-06-04/country=China/...
/user/hive/warehouse/db1.db/t_test1/day=2019-06-04/country=England/...
/user/hive/warehouse/db1.db/t_test1/day=2019-06-05/country=China/...

4 查詢語法

4.1 條件查詢

select * from t_table where a<1000 and b>0;

4.2 join關(guān)聯(lián)查詢

各類join

測試數(shù)據(jù)：
a.txt：

a,1
b,2
c,3
d,4

b.txt:

b,16
c,17
d,18
e,19

建表導(dǎo)入數(shù)據(jù)：

create table t_a(name string,num int)
row format delimited
fields terminated by ',';

create table t_b(name string,age int)
row format delimited
fields terminated by ',';

load data local inpath '/home/hadoop/a.txt' into table t_a;
load data local inpath '/home/hadoop/b.txt' into table t_b;

表數(shù)據(jù)如下：

0: jdbc:hive2://Master:10000> select * from t_a;
+-----------+----------+--+
| t_a.name  | t_a.num  |
+-----------+----------+--+
| a         | 1        |
| b         | 2        |
| c         | 3        |
| d         | 4        |
+-----------+----------+--+
4 rows selected (0.523 seconds)
0: jdbc:hive2://Master:10000> select * from t_b;
+-----------+----------+--+
| t_b.name  | t_b.age  |
+-----------+----------+--+
| b         | 16       |
| c         | 17       |
| d         | 18       |
| e         | 19       |
+-----------+----------+--+

4 rows selected (0.482 seconds)

4.3 內(nèi)連接

指定join條件

select a.*,b.*
from 
t_a a join t_b b on a.name=b.name;

示例：

0: jdbc:hive2://Master:10000> select a.*,b.*
0: jdbc:hive2://Master:10000> from
0: jdbc:hive2://Master:10000> t_a a join t_b b on a.name=b.name;
....
+---------+--------+---------+--------+--+
| a.name  | a.num  | b.name  | b.age  |
+---------+--------+---------+--------+--+
| b       | 2      | b       | 16     |
| c       | 3      | c       | 17     |
| d       | 4      | d       | 18     |
+---------+--------+---------+--------+--+

4.4 左外連接（左連接）

select a.*,b.*
from 
t_a a left outer join t_b b on a.name=b.name;

示例：

0: jdbc:hive2://Master:10000> select a.*,b.*
0: jdbc:hive2://Master:10000> from
0: jdbc:hive2://Master:10000> t_a a left outer join t_b b on a.name=b.name;
...
+---------+--------+---------+--------+--+
| a.name  | a.num  | b.name  | b.age  |
+---------+--------+---------+--------+--+
| a       | 1      | NULL    | NULL   |
| b       | 2      | b       | 16     |
| c       | 3      | c       | 17     |
| d       | 4      | d       | 18     |
+---------+--------+---------+--------+--+

4.5 右外連接（右連接）

select a.*,b.*
from 
t_a a right outer join t_b b on a.name=b.name;

示例：

0: jdbc:hive2://Master:10000> select a.*,b.*
0: jdbc:hive2://Master:10000> from
0: jdbc:hive2://Master:10000> t_a a right outer join t_b b on a.name=b.name;
....
+---------+--------+---------+--------+--+
| a.name  | a.num  | b.name  | b.age  |
+---------+--------+---------+--------+--+
| b       | 2      | b       | 16     |
| c       | 3      | c       | 17     |
| d       | 4      | d       | 18     |
| NULL    | NULL   | e       | 19     |
+---------+--------+---------+--------+--+

4.6 全外連接

select a.*,b.*
from
t_a a full outer join t_b b on a.name=b.name;

示例：

0: jdbc:hive2://Master:10000> select a.*,b.*
0: jdbc:hive2://Master:10000> from
0: jdbc:hive2://Master:10000> t_a a full outer join t_b b on a.name=b.name;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
+---------+--------+---------+--------+--+
| a.name  | a.num  | b.name  | b.age  |
+---------+--------+---------+--------+--+
| a       | 1      | NULL    | NULL   |
| b       | 2      | b       | 16     |
| c       | 3      | c       | 17     |
| d       | 4      | d       | 18     |
| NULL    | NULL   | e       | 19     |
+---------+--------+---------+--------+--+

4.7 左半連接

求存在于a表，且b表里也存在的數(shù)據(jù)。

select a.*
from 
t_a a left semi join t_b b on a.name=b.name;

示例：

0: jdbc:hive2://Master:10000> select a.*
0: jdbc:hive2://Master:10000> from
0: jdbc:hive2://Master:10000> t_a a left semi join t_b b on a.name=b.name;
.....
+---------+--------+--+
| a.name  | a.num  |
+---------+--------+--+
| b       | 2      |
| c       | 3      |
| d       | 4      |
+---------+--------+--+

4.8 group by分組聚合

構(gòu)建測試數(shù)據(jù)

192.168.33.3,http://www.xxx.cn/stu,2019-08-04 15:30:20
192.168.33.3,http://www.xxx.cn/teach,2019-08-04 15:35:20
192.168.33.4,http://www.xxx.cn/stu,2019-08-04 15:30:20
192.168.33.4,http://www.xxx.cn/job,2019-08-04 16:30:20

192.168.33.5,http://www.xxx.cn/job,2019-08-04 15:40:20
192.168.33.3,http://www.xxx.cn/stu,2019-08-05 15:30:20
192.168.44.3,http://www.xxx.cn/teach,2019-08-05 15:35:20
192.168.33.44,http://www.xxx.cn/stu,2019-08-05 15:30:20
192.168.33.46,http://www.xxx.cn/job,2019-08-05 16:30:20

192.168.33.55,http://www.xxx.cn/job,2019-08-05 15:40:20
192.168.133.3,http://www.xxx.cn/register,2019-08-06 15:30:20
192.168.111.3,http://www.xxx.cn/register,2019-08-06 15:35:20
192.168.34.44,http://www.xxx.cn/pay,2019-08-06 15:30:20
192.168.33.46,http://www.xxx.cn/excersize,2019-08-06 16:30:20
192.168.33.55,http://www.xxx.cn/job,2019-08-06 15:40:20
192.168.33.46,http://www.xxx.cn/excersize,2019-08-06 16:30:20
192.168.33.25,http://www.xxx.cn/job,2019-08-06 15:40:20
192.168.33.36,http://www.xxx.cn/excersize,2019-08-06 16:30:20
192.168.33.55,http://www.xxx.cn/job,2019-08-06 15:40:20

建分區(qū)表，導(dǎo)入數(shù)據(jù)：

create table t_pv(ip string,url string,time string)
partitioned by (dt string)
row format delimited 
fields terminated by ',';

load data local inpath '/home/hadoop/pv.log.0804' into table t_pv partition(dt='2019-08-04');
load data local inpath '/home/hadoop/pv.log.0805' into table t_pv partition(dt='2019-08-05');
load data local inpath '/home/hadoop/pv.log.0806' into table t_pv partition(dt='2019-08-06');

查看數(shù)據(jù)：

0: jdbc:hive2://Master:10000> select * from t_pv;
+----------------+------------------------------+----------------------+-------------+--+
|    t_pv.ip     |           t_pv.url           |      t_pv.time       |   t_pv.dt   |
+----------------+------------------------------+----------------------+-------------+--+
| 192.168.33.3   | http://www.xxx.cn/stu        | 2019-08-04 15:30:20  | 2019-08-04  |
| 192.168.33.3   | http://www.xxx.cn/teach      | 2019-08-04 15:35:20  | 2019-08-04  |
| 192.168.33.4   | http://www.xxx.cn/stu        | 2019-08-04 15:30:20  | 2019-08-04  |
| 192.168.33.4   | http://www.xxx.cn/job        | 2019-08-04 16:30:20  | 2019-08-04  |
| 192.168.33.5   | http://www.xxx.cn/job        | 2019-08-04 15:40:20  | 2019-08-05  |
| 192.168.33.3   | http://www.xxx.cn/stu        | 2019-08-05 15:30:20  | 2019-08-05  |
| 192.168.44.3   | http://www.xxx.cn/teach      | 2019-08-05 15:35:20  | 2019-08-05  |
| 192.168.33.44  | http://www.xxx.cn/stu        | 2019-08-05 15:30:20  | 2019-08-05  |
| 192.168.33.46  | http://www.xxx.cn/job        | 2019-08-05 16:30:20  | 2019-08-05  |
| 192.168.33.55  | http://www.xxx.cn/job        | 2019-08-05 15:40:20  | 2019-08-06  |
| 192.168.133.3  | http://www.xxx.cn/register   | 2019-08-06 15:30:20  | 2019-08-06  |
| 192.168.111.3  | http://www.xxx.cn/register   | 2019-08-06 15:35:20  | 2019-08-06  |
| 192.168.34.44  | http://www.xxx.cn/pay        | 2019-08-06 15:30:20  | 2019-08-06  |
| 192.168.33.46  | http://www.xxx.cn/excersize  | 2019-08-06 16:30:20  | 2019-08-06  |
| 192.168.33.55  | http://www.xxx.cn/job        | 2019-08-06 15:40:20  | 2019-08-06  |
| 192.168.33.46  | http://www.xxx.cn/excersize  | 2019-08-06 16:30:20  | 2019-08-06  |
| 192.168.33.25  | http://www.xxx.cn/job        | 2019-08-06 15:40:20  | 2019-08-06  |
| 192.168.33.36  | http://www.xxx.cn/excersize  | 2019-08-06 16:30:20  | 2019-08-06  |
| 192.168.33.55  | http://www.xxx.cn/job        | 2019-08-06 15:40:20  | 2019-08-06  |
+----------------+------------------------------+----------------------+-------------+--+

查看表分區(qū)：

show partitions t_pv;

0: jdbc:hive2://Master:10000> show partitions t_pv;
+----------------+--+
|   partition    |
+----------------+--+
| dt=2019-08-04  |
| dt=2019-08-05  |
| dt=2019-08-06  |
+----------------+--+
3 rows selected (0.575 seconds)

每一行的url變成大寫

針對每一行進行運算

select ip,upper(url),time
from t_pv

0: jdbc:hive2://Master:10000> select ip,upper(url),time
0: jdbc:hive2://Master:10000> from t_pv
+----------------+------------------------------+----------------------+--+
|       ip       |             _c1              |         time         |
+----------------+------------------------------+----------------------+--+
| 192.168.33.3   | HTTP://WWW.XXX.CN/STU        | 2019-08-04 15:30:20  |
| 192.168.33.3   | HTTP://WWW.XXX.CN/TEACH      | 2019-08-04 15:35:20  |
| 192.168.33.4   | HTTP://WWW.XXX.CN/STU        | 2019-08-04 15:30:20  |
| 192.168.33.4   | HTTP://WWW.XXX.CN/JOB        | 2019-08-04 16:30:20  |
| 192.168.33.5   | HTTP://WWW.XXX.CN/JOB        | 2019-08-04 15:40:20  |
| 192.168.33.3   | HTTP://WWW.XXX.CN/STU        | 2019-08-05 15:30:20  |
| 192.168.44.3   | HTTP://WWW.XXX.CN/TEACH      | 2019-08-05 15:35:20  |
| 192.168.33.44  | HTTP://WWW.XXX.CN/STU        | 2019-08-05 15:30:20  |
| 192.168.33.46  | HTTP://WWW.XXX.CN/JOB        | 2019-08-05 16:30:20  |
| 192.168.33.55  | HTTP://WWW.XXX.CN/JOB        | 2019-08-05 15:40:20  |
| 192.168.133.3  | HTTP://WWW.XXX.CN/REGISTER   | 2019-08-06 15:30:20  |
| 192.168.111.3  | HTTP://WWW.XXX.CN/REGISTER   | 2019-08-06 15:35:20  |
| 192.168.34.44  | HTTP://WWW.XXX.CN/PAY        | 2019-08-06 15:30:20  |
| 192.168.33.46  | HTTP://WWW.XXX.CN/EXCERSIZE  | 2019-08-06 16:30:20  |
| 192.168.33.55  | HTTP://WWW.XXX.CN/JOB        | 2019-08-06 15:40:20  |
| 192.168.33.46  | HTTP://WWW.XXX.CN/EXCERSIZE  | 2019-08-06 16:30:20  |
| 192.168.33.25  | HTTP://WWW.XXX.CN/JOB        | 2019-08-06 15:40:20  |
| 192.168.33.36  | HTTP://WWW.XXX.CN/EXCERSIZE  | 2019-08-06 16:30:20  |
| 192.168.33.55  | HTTP://WWW.XXX.CN/JOB        | 2019-08-06 15:40:20  |
+----------------+------------------------------+----------------------+--+

求每條url的訪問次數(shù)

select url ,count(1) --對分好組的數(shù)據(jù)進行逐行運算
from t_pv
group by url;

0: jdbc:hive2://Master:10000> select url ,count(1)
0: jdbc:hive2://Master:10000> from t_pv
0: jdbc:hive2://Master:10000> group by url;
·····
+------------------------------+------+--+
|             url              | _c1  |
+------------------------------+------+--+
| http://www.xxx.cn/excersize  | 3    |
| http://www.xxx.cn/job        | 7    |
| http://www.xxx.cn/pay        | 1    |
| http://www.xxx.cn/register   | 2    |
| http://www.xxx.cn/stu        | 4    |
| http://www.xxx.cn/teach      | 2    |
+------------------------------+------+--+

可以給_c1加入字段名稱：

select url ,count(1) as count
from t_pv
group by url;

求每個頁面的訪問者中ip最大的一個

select url,max(ip)
from t_pv
group by url;

0: jdbc:hive2://Master:10000> select url,max(ip)
0: jdbc:hive2://Master:10000> from t_pv
0: jdbc:hive2://Master:10000> group by url;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
+------------------------------+----------------+--+
|             url              |      _c1       |
+------------------------------+----------------+--+
| http://www.xxx.cn/excersize  | 192.168.33.46  |
| http://www.xxx.cn/job        | 192.168.33.55  |
| http://www.xxx.cn/pay        | 192.168.34.44  |
| http://www.xxx.cn/register   | 192.168.133.3  |
| http://www.xxx.cn/stu        | 192.168.33.44  |
| http://www.xxx.cn/teach      | 192.168.44.3   |
+------------------------------+----------------+--+

求每個用戶訪問同一個頁面的所有記錄中，時間最晚的一條

select ip,url,max(time)
from t_pv
group by ip,url;

0: jdbc:hive2://Master:10000> select ip,url,max(time)
0: jdbc:hive2://Master:10000> from t_pv
0: jdbc:hive2://Master:10000> group by ip,url;
.....
+----------------+------------------------------+----------------------+--+
|       ip       |             url              |         _c2          |
+----------------+------------------------------+----------------------+--+
| 192.168.111.3  | http://www.xxx.cn/register   | 2019-08-06 15:35:20  |
| 192.168.133.3  | http://www.xxx.cn/register   | 2019-08-06 15:30:20  |
| 192.168.33.25  | http://www.xxx.cn/job        | 2019-08-06 15:40:20  |
| 192.168.33.3   | http://www.xxx.cn/stu        | 2019-08-05 15:30:20  |
| 192.168.33.3   | http://www.xxx.cn/teach      | 2019-08-04 15:35:20  |
| 192.168.33.36  | http://www.xxx.cn/excersize  | 2019-08-06 16:30:20  |
| 192.168.33.4   | http://www.xxx.cn/job        | 2019-08-04 16:30:20  |
| 192.168.33.4   | http://www.xxx.cn/stu        | 2019-08-04 15:30:20  |
| 192.168.33.44  | http://www.xxx.cn/stu        | 2019-08-05 15:30:20  |
| 192.168.33.46  | http://www.xxx.cn/excersize  | 2019-08-06 16:30:20  |
| 192.168.33.46  | http://www.xxx.cn/job        | 2019-08-05 16:30:20  |
| 192.168.33.5   | http://www.xxx.cn/job        | 2019-08-04 15:40:20  |
| 192.168.33.55  | http://www.xxx.cn/job        | 2019-08-06 15:40:20  |
| 192.168.34.44  | http://www.xxx.cn/pay        | 2019-08-06 15:30:20  |
| 192.168.44.3   | http://www.xxx.cn/teach      | 2019-08-05 15:35:20  |
+----------------+------------------------------+----------------------+--+

求8月4號以后，每天http://www.xxx.cn/job的總訪問次數(shù)，及訪問者中ip地址中最大的

select dt,'http://www.xxx.cn/job',count(1),max(ip)
from t_pv
where url='http://www.xxx.cn/job'
group by dt having dt>'2019-08-04';


select dt,max(url),count(1),max(ip)
from t_pv
where url='http://www.xxx.cn/job'
group by dt having dt>'2019-08-04';


select dt,url,count(1),max(ip)
from t_pv
where url='http://www.xxx.cn/job'
group by dt,url having dt>'2019-08-04';



select dt,url,count(1),max(ip)
from t_pv
where url='http://www.xxx.cn/job' and dt>'2019-08-04'
group by dt,url;

求8月4號以后，每天每個頁面的總訪問次數(shù)，及訪問者中ip地址中最大的

select dt,url,count(1),max(ip)
from t_pv
where dt>'2019-08-04'
group by dt,url;

求8月4號以后，每天每個頁面的總訪問次數(shù)，及訪問者中ip地址中最大的，且只查詢出總訪問次數(shù)>2 的記錄

方式1：

select dt,url,count(1) as cnts,max(ip)
from t_pv
where dt>'2019-08-04'
group by dt,url having cnts>2;

方式2：用子查詢

select dt,url,cnts,max_ip
from
(select dt,url,count(1) as cnts,max(ip) as max_ip
from t_pv
where dt>'2019-08-04'
group by dt,url) tmp
where cnts>2;

5 基本數(shù)據(jù)類型

5.1 數(shù)字類型

TINYINT (1-byte signed integer, from -128 to 127)
SMALLINT (2-byte signed integer, from -32,768 to 32,767)
INT/INTEGER (4-byte signed integer, from -2,147,483,648 to 2,147,483,647)
BIGINT (8-byte signed integer, from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807)
FLOAT (4-byte single precision floating point number)
DOUBLE (8-byte double precision floating point number)

示例：

create table t_test(a string ,b int,c bigint,d float,e double,f tinyint,g smallint)

5.2 日期類型

TIMESTAMP (Note: Only available starting with Hive 0.8.0)
DATE (Note: Only available starting with Hive 0.12.0)

示例，假如有以下數(shù)據(jù)文件：

1,zhangsan,1985-06-30
2,lisi,1986-07-10
3,wangwu,1985-08-09

那么，就可以建一個表來對數(shù)據(jù)進行映射

create table t_customer(id int,name string,birthday date)
row format delimited fields terminated by ',';

然后導(dǎo)入數(shù)據(jù)

load data local inpath '/root/customer.dat' into table t_customer;

然后，就可以正確查詢

5.3 字符串類型

STRING
VARCHAR (Note: Only available starting with Hive 0.12.0)
CHAR (Note: Only available starting with Hive 0.13.0)

5.4 雜類型

BOOLEAN
BINARY (Note: Only available starting with Hive 0.8.0)

5.5 復(fù)合類型

5.5.1 數(shù)組類型

有如下數(shù)據(jù)：

玩具總動員4,湯姆·漢克斯:蒂姆·艾倫:安妮·波茨,2019-06-21
流浪地球,屈楚蕭:吳京:李光潔:吳孟達,2019-02-05
千與千尋,柊瑠美:入野自由:夏木真理:菅原文太,2019-06-21
戰(zhàn)狼2,吳京:弗蘭克·格里羅:吳剛:張翰:盧靖姍,2017-08-16

建表導(dǎo)入數(shù)據(jù)：

--建表映射：
create table t_movie(movie_name string,actors array<string>,first_show date)
row format delimited fields terminated by ','
collection items terminated by ':';

--導(dǎo)入數(shù)據(jù)
load data local inpath '/home/hadoop/actor.dat' into table t_movie;

0: jdbc:hive2://Master:10000> select * from t_movie;
+---------------------+-----------------------------------+---------------------+--+
| t_movie.movie_name  |          t_movie.actors           | t_movie.first_show  |
+---------------------+-----------------------------------+---------------------+--+
| 玩具總動員4              | ["湯姆·漢克斯","蒂姆·艾倫","安妮·波茨"]        | 2019-06-21          |
| 流浪地球                | ["屈楚蕭","吳京","李光潔","吳孟達"]          | 2019-02-05          |
| 千與千尋                | ["柊瑠美","入野自由","夏木真理","菅原文太"]      | 2019-06-21          |
| 戰(zhàn)狼2                 | ["吳京","弗蘭克·格里羅","吳剛","張翰","盧靖姍"]  | 2017-08-16          |
+---------------------+-----------------------------------+---------------------+--+

array[]

查詢每部電影主演

select movie_name,actors[0],first_show from t_movie;

0: jdbc:hive2://Master:10000> select movie_name,actors[0],first_show from t_movie;
+-------------+---------+-------------+--+
| movie_name  |   _c1   | first_show  |
+-------------+---------+-------------+--+
| 玩具總動員4      | 湯姆·漢克斯  | 2019-06-21  |
| 流浪地球        | 屈楚蕭     | 2019-02-05  |
| 千與千尋        | 柊瑠美     | 2019-06-21  |
| 戰(zhàn)狼2         | 吳京      | 2017-08-16  |
+-------------+---------+-------------+--+

array_contains

查詢包含'吳京'的電影

select movie_name,actors,first_show
from t_movie where array_contains(actors,'吳京');

0: jdbc:hive2://Master:10000> select movie_name,actors,first_show
0: jdbc:hive2://Master:10000> from t_movie where array_contains(actors,'吳京');
+-------------+-----------------------------------+-------------+--+
| movie_name  |              actors               | first_show  |
+-------------+-----------------------------------+-------------+--+
| 流浪地球        | ["屈楚蕭","吳京","李光潔","吳孟達"]          | 2019-02-05  |
| 戰(zhàn)狼2         | ["吳京","弗蘭克·格里羅","吳剛","張翰","盧靖姍"]  | 2017-08-16  |
+-------------+-----------------------------------+-------------+--+

size

每部電影查詢列出的演員數(shù)量

select movie_name
,size(actors) as actor_number
,first_show
from t_movie;

0: jdbc:hive2://Master:10000> from t_movie;
+-------------+---------------+-------------+--+
| movie_name  | actor_number  | first_show  |
+-------------+---------------+-------------+--+
| 玩具總動員4      | 3             | 2019-06-21  |
| 流浪地球        | 4             | 2019-02-05  |
| 千與千尋        | 4             | 2019-06-21  |
| 戰(zhàn)狼2         | 5             | 2017-08-16  |
+-------------+---------------+-------------+--+

5.5.2 map類型

數(shù)據(jù)

1,zhangsan,father:xiaoming#mother:xiaohuang#brother:xiaoxu,28
2,lisi,father:mayun#mother:huangyi#brother:guanyu,22
3,wangwu,father:wangjianlin#mother:ruhua#sister:jingtian,29
4,mayun,father:mayongzhen#mother:angelababy,26

導(dǎo)入數(shù)據(jù)

-- 建表映射上述數(shù)據(jù)
create table t_family(id int,name string,family_members map<string,string>,age int)
row format delimited fields terminated by ','
collection items terminated by '#'
map keys terminated by ':';

-- 導(dǎo)入數(shù)據(jù)
load data local inpath '/root/hivetest/fm.dat' into table t_family;

0: jdbc:hive2://Master:10000> select * from t_family;
+--------------+----------------+----------------------------------------------------------------+---------------+--+
| t_family.id  | t_family.name  |                    t_family.family_members                     | t_family.age  |
+--------------+----------------+----------------------------------------------------------------+---------------+--+
| 1            | zhangsan       | {"father":"xiaoming","mother":"xiaohuang","brother":"xiaoxu"}  | 28            |
| 2            | lisi           | {"father":"mayun","mother":"huangyi","brother":"guanyu"}       | 22            |
| 3            | wangwu         | {"father":"wangjianlin","mother":"ruhua","sister":"jingtian"}  | 29            |
| 4            | mayun          | {"father":"mayongzhen","mother":"angelababy"}                  | 26            |
+--------------+----------------+----------------------------------------------------------------+---------------+--+

查出每個人的爸爸、姐妹

select id,name,family_members["father"] as father,family_members["sister"] as sister,age
from t_family;

查出每個人有哪些親屬關(guān)系

select id,name,map_keys(family_members) as relations,age
from  t_family;

查出每個人的親人名字

select id,name,map_values(family_members) as relations,age
from  t_family;

查出每個人的親人數(shù)量

select id,name,size(family_members) as relations,age
from  t_family;

查出所有擁有兄弟的人及他的兄弟是誰

-- 方案1：一句話寫完
select id,name,age,family_members['brother']
from t_family  where array_contains(map_keys(family_members),'brother');


-- 方案2：子查詢
select id,name,age,family_members['brother']
from
(select id,name,age,map_keys(family_members) as relations,family_members 
from t_family) tmp 
where array_contains(relations,'brother');

5.5.3 stuct類型

數(shù)據(jù)

1,zhangsan,18:male:深圳
2,lisi,28:female:北京
3,wangwu,38:male:廣州
4,laowang,26:female:上海
5,yangyang,35:male:杭州

導(dǎo)入數(shù)據(jù)：


-- 建表映射上述數(shù)據(jù)

drop table if exists t_user;
create table t_user(id int,name string,info struct<age:int,sex:string,addr:string>)
row format delimited fields terminated by ','
collection items terminated by ':';

-- 導(dǎo)入數(shù)據(jù)
load data local inpath '/home/hadoop/user.dat' into table t_user;

0: jdbc:hive2://Master:10000> select * from t_user;
+------------+--------------+----------------------------------------+--+
| t_user.id  | t_user.name  |              t_user.info               |
+------------+--------------+----------------------------------------+--+
| 1          | zhangsan     | {"age":18,"sex":"male","addr":"深圳"}    |
| 2          | lisi         | {"age":28,"sex":"female","addr":"北京"}  |
| 3          | wangwu       | {"age":38,"sex":"male","addr":"廣州"}    |
| 4          | laowang      | {"age":26,"sex":"female","addr":"上海"}  |
| 5          | yangyang     | {"age":35,"sex":"male","addr":"杭州"}    |
+------------+--------------+----------------------------------------+--+

查詢每個人的id name和地址

select id,name,info.addr
from t_user;

0: jdbc:hive2://Master:10000> select id,name,info.addr
0: jdbc:hive2://Master:10000> from t_user;
+-----+-----------+-------+--+
| id  |   name    | addr  |
+-----+-----------+-------+--+
| 1   | zhangsan  | 深圳    |
| 2   | lisi      | 北京    |
| 3   | wangwu    | 廣州    |
| 4   | laowang   | 上海    |
| 5   | yangyang  | 杭州    |
+-----+-----------+-------+--+

6 常用內(nèi)置函數(shù)

測試函數(shù)

select substr("abcdef",1,3);

0: jdbc:hive2://Master:10000> select substr("abcdef",1,3);
+------+--+
| _c0  |
+------+--+
| abc  |
+------+--+

6.1 時間處理函數(shù)

from_unixtime(21938792183,'yyyy-MM-dd HH:mm:ss')

返回： '2017-06-03 17:50:30'

6.2 類型轉(zhuǎn)換函數(shù)

select cast("8" as int);
select cast("2019-2-3" as data)

6.3 字符串截取和拼接

substr("abcde",1,3)  -->   'abc'
concat('abc','def')  -->  'abcdef'

0: jdbc:hive2://Master:10000> select substr("abcde",1,3);
+------+--+
| _c0  |
+------+--+
| abc  |
+------+--+
1 row selected (0.152 seconds)
0: jdbc:hive2://Master:10000> select concat('abc','def');
+---------+--+
|   _c0   |
+---------+--+
| abcdef  |
+---------+--+
1 row selected (0.165 seconds)

6.4 Json數(shù)據(jù)解析函數(shù)

get_json_object('{\"key1\":3333，\"key2\":4444}' , '$.key1')

返回：3333

json_tuple('{\"key1\":3333，\"key2\":4444}','key1','key2') as(key1,key2)

返回：3333, 4444

6.5 url解析函數(shù)

parse_url_tuple('http://www.xxxx.cn/bigdata?userid=8888','HOST','PATH','QUERY','QUERY:userid')

返回： www.xxxx.cn /bigdata userid=8888 8888

7 自定義函數(shù)

7.1 問題

測試數(shù)據(jù)如下：

1,zhangsan:18-1999063117:30:00-beijing
2,lisi:28-1989063117:30:00-shanghai
3,wangwu:20-1997063117:30:00-tieling

建表導(dǎo)入數(shù)據(jù)：

create table t_user_info(info string)
row format delimited;

導(dǎo)入數(shù)據(jù)：

load data local inpath '/root/udftest.data' into table t_user_info;

需求：利用上表生成如下新表

t_user：uid,uname,age,birthday,address

思路：可以自定義一個函數(shù)parse_user_info()，能傳入一行上述數(shù)據(jù)，返回切分好的字段

然后可以通過如下sql完成需求：

create t_user
as
select 
parse_user_info(info,0) as uid,
parse_user_info(info,1) as uname,
parse_user_info(info,2) as age,
parse_user_info(info,3) as birthday_date,
parse_user_info(info,4) as birthday_time,
parse_user_info(info,5) as address
from t_user_info;

實現(xiàn)關(guān)鍵：自定義parse_user_info() 函數(shù)

7.2 實現(xiàn)步驟

1、寫一個java類實現(xiàn)函數(shù)所需要的功能

public class UserInfoParser extends UDF{    
    // 1,zhangsan:18-1999063117:30:00-beijing
    public String evaluate(String line,int index) {
        String newLine = line.replaceAll(",", "\001").replaceAll(":", "\001").replaceAll("-", "\001");
        StringBuilder sb = new StringBuilder();
        String[] split = newLine.split("\001");
        StringBuilder append = sb.append(split[0])
        .append("\t")
        .append(split[1])
        .append("\t")
        .append(split[2])
        .append("\t")
        .append(split[3].substring(0, 8))
        .append("\t")
        .append(split[3].substring(8, 10)).append(split[4]).append(split[5])
        .append("\t")
        .append(split[6]);
        
        String res = append.toString();

        return res.split("\t")[index];
    }
}

2、將java類打成jar包: d:/up.jar

3、上傳jar包到hive所在的機器上 /root/up.jar

4、在hive的提示符中添加jar包

hive>  add jar /root/up.jar;

5、創(chuàng)建一個hive的自定義函數(shù)名跟寫好的jar包中的java類對應(yīng)

hive>  create temporary function parse_user_info as 'com.doit.hive.udf.UserInfoParser';

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美 国产 综合 欧美 视频

Hive常用函數(shù)的使用

1 基本介紹

1.1 HIVE簡單介紹

1.2 Hive的用途

2 使用方式

2.1 方式1：直接使用hive服務(wù)端

2.2 方式2：使用beeline客戶端

參數(shù)說明

報錯

解決

2.3 方式3：使用hive命令運行sql

2.4 方式4：寫腳本

3 表的基本操作

3.1 新建數(shù)據(jù)庫

3.2 刪除數(shù)據(jù)庫

3.3 建內(nèi)部表

3.4 建外部表

3.5 導(dǎo)入數(shù)據(jù)

3.6 建分區(qū)表

4 查詢語法

4.1 條件查詢

4.2 join關(guān)聯(lián)查詢

4.3 內(nèi)連接

4.4 左外連接（左連接）

4.5 右外連接（右連接）

4.6 全外連接

4.7 左半連接

4.8 group by分組聚合

每一行的url變成大寫

求每條url的訪問次數(shù)

求每個頁面的訪問者中ip最大的一個

求每個用戶訪問同一個頁面的所有記錄中，時間最晚的一條

求8月4號以后，每天http://www.xxx.cn/job的總訪問次數(shù)，及訪問者中ip地址中最大的

求8月4號以后，每天每個頁面的總訪問次數(shù)，及訪問者中ip地址中最大的

求8月4號以后，每天每個頁面的總訪問次數(shù)，及訪問者中ip地址中最大的，且只查詢出總訪問次數(shù)>2 的記錄

5 基本數(shù)據(jù)類型

5.1 數(shù)字類型

5.2 日期類型

5.3 字符串類型

5.4 雜類型

5.5 復(fù)合類型

5.5.1 數(shù)組類型

array[]

查詢每部電影主演

array_contains

查詢包含'吳京'的電影

size

每部電影查詢列出的演員數(shù)量

5.5.2 map類型

數(shù)據(jù)

查出每個人的 爸爸、姐妹

查出每個人有哪些親屬關(guān)系

查出每個人的親人名字

查出每個人的親人數(shù)量

查出所有擁有兄弟的人及他的兄弟是誰

5.5.3 stuct類型

查詢每個人的id name和地址

6 常用內(nèi)置函數(shù)

6.1 時間處理函數(shù)

6.2 類型轉(zhuǎn)換函數(shù)

6.3 字符串截取和拼接

6.4 Json數(shù)據(jù)解析函數(shù)

6.5 url解析函數(shù)

7 自定義函數(shù)

7.1 問題

7.2 實現(xiàn)步驟

推薦閱讀更多精彩內(nèi)容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频

查出每個人的爸爸、姐妹