Hadoop之操作篇

【基礎(chǔ)篇】

連接AWS

action->start,connect

打開(kāi)terminal cd key-pair.pem的地址 輸入:ssh -I "xxxxx.pem" ubuntu@xxxxxxxx.compute-1.amazonaws.com

?進(jìn)入服務(wù)器Linux系統(tǒng)

vi file.txt 編輯txt文件

啟用Hadoop

在Linux根目錄下:sh runstart.sh?

?啟用Hadoop 之后可以進(jìn)行Hadoop操作

hadoop fs -ls /?查看Hadoop根目錄下的文件

hadoop fs -cat /user/myfilm/part-m-00000 | head -5 查看文件的前五行

hadoop fs -cat 查看文件內(nèi)容

hadoop fs -get file1 file2 把Hadoop的file1放到Linux的file2里

hadoop fs -put product.txt /userdata 把Linux的product.txt放到Hadoop的/userdata里

hadoop fs -rm 刪除文件夾包括其中的所有子文件夾和文件

進(jìn)入mysql

在Linux任意目錄下:mysql -u ubuntu -p 輸入密碼

?進(jìn)入mysql

看數(shù)據(jù)庫(kù):show databases;

進(jìn)數(shù)據(jù)庫(kù):use (database);

看數(shù)據(jù)庫(kù)下的表:show tables;


【Sqoop篇】

Sqoop作用:在mysql和HDFS之間互相導(dǎo)出

從mysql導(dǎo)入HDFS

更多參數(shù)見(jiàn):https://blog.csdn.net/w1992wishes/article/details/92027765

從mysql的sakila數(shù)據(jù)庫(kù)下將表actor導(dǎo)入到HDFS位置為/userdata的父級(jí)目錄下:

sqoop import \--connect jdbc:mysql://localhost/sakila \--username ubuntu --password training \--warehouse-dir /userdata \--table actor

從mysql的sakila數(shù)據(jù)庫(kù)下將表film導(dǎo)入到HDFS位置為/user/myfilms的目錄下:

sqoop import \--connect jdbc:mysql://localhost/sakila \--username ubuntu --password training \--target-dir /user/myfilms \--table film

從mysql的sakila數(shù)據(jù)庫(kù)下將表city的兩類'city_id, city'的導(dǎo)入到HDFS位置為/user/userdata的父級(jí)目錄下:

sqoop import \--connect jdbc:mysql://localhost/sakila \--username ubuntu --password training \--warehouse-dir /userdata \--table city \--columns 'city_id, city'

從mysql的sakila數(shù)據(jù)庫(kù)下將表rental滿足條件'inventory_id <= 10'的數(shù)據(jù)導(dǎo)入到HDFS位置為/user/userdata的父級(jí)目錄下:

sqoop import \--connect jdbc:mysql://localhost/sakila \--username ubuntu --password training \--warehouse-dir /userdata \--table rental \--where 'inventory_id <= 10'

針對(duì)rental_id來(lái)更新import表:

sqoop import \--connect jdbc:mysql://localhost/sakila \--username ubuntu --password training \--warehouse-dir /userdata \--table rental \--where 'inventory_id > 10 and inventory_id < 20' \--incremental append \--check-column rental_id

從HDFS導(dǎo)入mysql

Mysql> CREATE TABLE new_rental SELECT * FROM rental LIMIT 0;

$ sqoop export \--connect jdbc:mysql://localhost/sakila \--username ubuntu --password training \--export-dir /userdata/rental \--table new_rental


【Pig篇】

Pig作用:處理HDFS的數(shù)據(jù)

use Pig interactively

在Linux輸入pig 出現(xiàn)援引?“grunt” –?the Pig shell

例:film.pig

Line #1: Upload (read) data from (hdfs)?/user/myfilm into ‘data’ variable

data = LOAD '/user/myfilm' USING PigStorage(',')

as (film_id:int, title:chararray, rental_rate:float);

Line #4: Filter data by rental_rate greater than or equal to $3.99

data = FILTER data BY rental_rate >= 3.99;

Line #6: Return the data to the screen (dump)

DUMP data;

Line #8: Also, store the data into a new HDFS folder called “top_films”

STORE data INTO '/user/top_films' USING PigStorage('|');

例: realestate.pig

Load “realestate.txt” data into “l(fā)istings”object (notice file path):

listings = LOAD '/mydata/class2/realestate.txt' USING PigStorage(',')

as

(listing_id:int, date_listed:chararray, list_price:float,

sq_feet:int, address:chararray);

Convert date (string) to datetime format:

listings = FOREACH listings GENERATE listing_id, ToDate(date_listed, 'YYYY-MM-dd') AS date_listed, list_price, sq_feet, address;

--DUMP listings;

Filter data:

bighomes = FILTER listings BY sq_feet >= 2000;

Select columns (same as before):

bighomes_dateprice = FOREACH bighomes GENERATE

listing_id, date_listed, list_price;

DUMP bighomes_dateprice;

Store data in HDFS:

STORE bighomes_dateprice INTO '/mydata/class2/homedata';

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

推薦閱讀更多精彩內(nèi)容

  • 1/列出mysql數(shù)據(jù)庫(kù)中的所有數(shù)據(jù)庫(kù)sqoop list-databases -connect jdbc:mys...
    時(shí)待吾閱讀 2,755評(píng)論 1 5
  • //Hadoop_v4_14.pdf Hadoop生態(tài)圈 UDF? 什么是UDF? 支持UDF的項(xiàng)目:例如Pig,...
    葡萄喃喃囈語(yǔ)閱讀 677評(píng)論 0 1
  • 一.sqoop的簡(jiǎn)單概論 1.sqoop產(chǎn)生的原因: A. 多數(shù)使用hadoop技術(shù)的處理大數(shù)據(jù)業(yè)務(wù)的企業(yè),有大量...
    宇晨棒棒的閱讀 50,700評(píng)論 0 20
  • 才剛剛進(jìn)入十月,感覺(jué)深秋已經(jīng)到了。 今年的秋天來(lái)得比以往都要早,現(xiàn)在,秋意正濃。 冬天還沒(méi)到呢,我怎么感覺(jué)...
    暮雪紛飛閱讀 161評(píng)論 0 1
  • 823b553febcd閱讀 332評(píng)論 0 1