本文為《Flink大數據項目實戰》學習筆記,想通過視頻系統學習Flink這個最火爆的大數據計算框架的同學,推薦學習課程:
Flink大數據項目實戰:http://t.cn/EJtKhaz
1. 快速生成Flink項目
1.推薦開發工具
idea+maven+git
2.推薦開發語言
Java或者Scala
https://ci.apache.org/projects/flink/flink-docs-release-1.6/quickstart/java_api_quickstart.html
3.Flink項目構建步驟
1)通過maven構建Flink項目
這里我們選擇構建1.6.2版本的Flink項目,打開終端輸入如下命令:
mvn archetype:generate-DarchetypeGroupId=org.apache.flink -DarchetypeArtifactId=flink-quickstart-java??? -DarchetypeVersion=1.6.2
項目構建過程中需要輸入groupId,artifactId,version和package
然后輸入y確認
然后顯示Maven項目構建成功
2)打開IDEA導入Flink 構建的maven項目
打開IDEA開發工具,點擊open選項
選擇剛剛創建的Flink項目
IDEA打開Flink項目
2. Flink Batch版WordCount
新建一個batch package
打開github Flink源碼,將批處理WordCount代碼copy到batch包下。
打開批處理WordCount代碼:
package com.dsj.flink.batch;
import
org.apache.flink.api.common.functions.FlatMapFunction;
import
org.apache.flink.api.java.DataSet;
import
org.apache.flink.api.java.ExecutionEnvironment;
import
org.apache.flink.api.java.tuple.Tuple2;
import
org.apache.flink.api.java.utils.ParameterTool;
import
org.apache.flink.examples.java.wordcount.util.WordCountData;
import
org.apache.flink.util.Collector;
/**
?*統計單詞詞頻
?*/public class WordCount {
public static void main(String[]
args) throws Exception {
//解析命令行傳過來的參數
????? final ParameterToolparams = ParameterTool.fromArgs(args);
// 獲取一個執行環境,本地或者集群環境會自動識別
????? final ExecutionEnvironmentenv = ExecutionEnvironment.getExecutionEnvironment();
// make parameters available in the web interface
env.getConfig().setGlobalJobParameters(params);
// 讀取輸入數據
????? DataSet<String> text;
????? if
(params.has("input")) {
// 讀取text文件
???????? text = env.readTextFile(params.get("input"));
} else{
// 讀取默認測試數據集
???????? System.out.println("Executing WordCount example with default input data
set.");
System.out.println("Use --input to specify file input.");
text = WordCountData.getDefaultTextLineDataSet(env);
}
????? DataSet
, Integer>> counts =
// 切分每行單詞
??????????? text.flatMap(new Tokenizer())
//對每個單詞分組統計詞頻數
??????????? .groupBy(0)
??????????? .sum(
1);
// 輸出統計結果
????? if (params.has("output")) {
//數據輸出為CSV格式
???????? counts.writeAsCsv(params.get("output"), "\n", " ");
// 提交執行flink應用
???????? env.execute("WordCount Example");
} else{
???????? System.
out.println("Printing
result to stdout. Use --output to specify output path.");
//數據打印控制臺,內部封裝了execute提交flink應用
???????? counts.print();
}
?? }
//*************************************************************************
?? //????USER FUNCTIONS
?? // *************************************************************************
?? public static final class Tokenizer implementsFlatMapFunction<String, Tuple2<String, Integer>> {
@Override
public void flatMap(String value, Collector<Tuple2<String, Integer>>out) {
// normalize and split the line
String[] tokens = value.toLowerCase().split("\\W+");
// emit the pairs
for (Stringtoken : tokens) {
if (token.length()
> 0) {
?????????????? out.collect(
new Tuple2<>(token, 1));
}
???????? }
????? }
?? }
}
右鍵選擇run,運行Flink批處理WordCount,運行結果如下所示:
3. Flink Stream版WordCount
同樣,流處理我們也單獨創建一個包stream
打開github Flink源碼,將流處理WordCount代碼copy到stream包下。
打開流處理WordCount代碼:
package com.dsj.flink.stream;
import
org.apache.flink.api.common.functions.FlatMapFunction;
import
org.apache.flink.api.java.tuple.Tuple2;
import
org.apache.flink.api.java.utils.ParameterTool;
import
org.apache.flink.streaming.api.datastream.DataStream;
import
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import
org.apache.flink.util.Collector;
/**
?*
統計單詞詞頻
?*/public class WordCount {
public static void main(String[] args) throws Exception {
//解析命令行傳過來的參數
????? final ParameterTool params = ParameterTool.fromArgs(args);
// 獲取一個執行環境,本地或者集群環境會自動識別
????? final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
// make parameters available in the web interface
env.getConfig().setGlobalJobParameters(params);
// 讀取輸入數據
????? DataStream<String> text;
????? if
(params.has("input")) {
// 讀取text文件
???????? text = env.readTextFile(params.get("input"));
} else {
???????? System.
out.println("Executing WordCount example with default input data set.");
System.out.println("Use --input to specify file input.");
// 讀取默認測試數據集
???????? text = env.fromElements(WordCountData.WORDS);
}
????? DataStream
, Integer>> counts =
// 切分每行單詞
?????????????? text.flatMap(new Tokenizer())
//對每個單詞分組統計詞頻數
?????????????? .keyBy(0).sum(1);
// 輸出統計結果
????? if (params.has("output")) {
//寫入文件地址
???????? counts.writeAsText(params.get("output"));
} else {
???????? System.
out.println("Printing result to stdout. Use --output to specify output path.");
//數據打印控制臺
???????? counts.print();
}
// 執行flink 程序
????? env.execute("Streaming WordCount");
}
public static final class Tokenizer implements FlatMapFunction<String, Tuple2<String, Integer>> {
@Override
public void flatMap(String value, Collector<Tuple2<String, Integer>> out) {
// normalize and split the line
String[] tokens = value.toLowerCase().split("\\W+");
// emit the pairs
for (String token : tokens) {
if (token.length() > 0) {
?????????????? out.collect(
new Tuple2<>(token, 1));
}
???????? }
????? }
?? }
}
右鍵選擇run,運行Flink流處理WordCount,運行結果如下所示: