Flink可以選擇的部署方式有:
Local、Standalone(資源利用率低)、Yarn、Mesos、Docker、Kubernetes、AWS。
我們主要對(duì)Standalone模式和Yarn模式下的Flink集群部署進(jìn)行分析。
3.1Standalone模式安裝
1. ?軟件要求
· Java 1.8.x或更高版本,
· ssh(必須運(yùn)行sshd才能使用管理遠(yuǎn)程組件的Flink腳本)
集群部署規(guī)劃
2. 解壓
tar -zxvf flink-1.6.1-bin-hadoop28-scala_2.11.tgz -C /opt/module/
3. 修改配置文件
修改flink/conf/masters,slaves,flink-conf.yaml
[root@bigdata11 conf]$ sudo vi masters
bigdata11:8081
[root@bigdata11 conf]$ sudo vi slaves
bigdata12
bigdata13
[root@bigdata11 conf]$ sudo vi flink-conf.yaml
taskmanager.numberOfTaskSlots:2 ??//52行
jobmanager.rpc.address: bigdata11 ?//33行
可選配置:
· 每個(gè)JobManager(jobmanager.heap.mb)的可用內(nèi)存量,
· 每個(gè)TaskManager(taskmanager.heap.mb)的可用內(nèi)存量,
· 每臺(tái)機(jī)器的可用CPU數(shù)量(taskmanager.numberOfTaskSlots),
· 集群中的CPU總數(shù)(parallelism.default)和
· 臨時(shí)目錄(taskmanager.tmp.dirs)
4. 拷貝安裝包到各節(jié)點(diǎn)
[root@bigdata11 module]$ scp -r flink-1.6.1/ itstar@bigdata12:`pwd`
[root@bigdata11 module]$ scp -r flink-1.6.1/ itstar@bigdata13:`pwd`
5. 配置環(huán)境變量
配置所有節(jié)點(diǎn)Flink的環(huán)境變量
[root@bigdata11 flink-1.6.1]$ vi /etc/profile
export FLINK_HOME=/opt/module/flink-1.6.1
export PATH=$PATH:$FLINK_HOME/bin
[root@bigdata11 flink-1.6.1]$ source /etc/profile
6. 啟動(dòng)flink
[itstar@bigdata11 flink-1.6.1]$ ./bin/start-cluster.sh
Starting cluster.
Starting standalonesession daemon on host bigdata11.
Starting taskexecutor daemon on host bigdata12.
Starting taskexecutor daemon on host bigdata13.
jps查看進(jìn)程
7. ?WebUI查看
8. 運(yùn)行測試任務(wù)
[itstar@bigdata11 flink-1.6.1]$ bin/flink run -m bigdata11:8081?./examples/batch/WordCount.jar --input /opt/module/datas/word.txt
[itstar@bigdata11 flink-1.6.1]$ bin/flink run -m bigdata11:8081?./examples/batch/WordCount.jar --input hdfs:///LICENSE.txt?--output hdfs:///out
9. Flink 的?HA
首先,我們需要知道?Flink 有兩種部署的模式,分別是?Standalone 以及?Yarn Cluster 模式。對(duì)于?Standalone 來說,F(xiàn)link 必須依賴于?Zookeeper 來實(shí)現(xiàn)?JobManager 的?HA(Zookeeper 已經(jīng)成為了大部分開源框架?HA 必不可少的模塊)。在?Zookeeper 的幫助下,一個(gè)?Standalone 的?Flink 集群會(huì)同時(shí)有多個(gè)活著的?JobManager,其中只有一個(gè)處于工作狀態(tài),其他處于?Standby 狀態(tài)。當(dāng)工作中的?JobManager 失去連接后(如宕機(jī)或?Crash),Zookeeper 會(huì)從?Standby 中選舉新的?JobManager 來接管?Flink 集群。
對(duì)于?Yarn Cluaster 模式來說,F(xiàn)link 就要依靠?Yarn 本身來對(duì)?JobManager 做?HA 了。其實(shí)這里完全是?Yarn 的機(jī)制。對(duì)于?Yarn Cluster 模式來說,JobManager 和?TaskManager 都是被?Yarn 啟動(dòng)在?Yarn 的?Container 中。此時(shí)的?JobManager,其實(shí)應(yīng)該稱之為?Flink Application Master。也就說它的故障恢復(fù),就完全依靠著?Yarn 中的?ResourceManager(和?MapReduce 的?AppMaster 一樣)。由于完全依賴了?Yarn,因此不同版本的?Yarn 可能會(huì)有細(xì)微的差異。這里不再做深究。
1)修改配置文件
修改flink-conf.yaml,HA模式下,jobmanager不需要指定,在master file中配置,由zookeeper選出leader與standby。
#jobmanager.rpc.address: bigdata11
high-availability: zookeeper ??//73行
#指定高可用模式(必須) //88行
high-availability.zookeeper.quorum:bigdata11:2181,bigdata12:2181,bigdata13:2181
#ZooKeeper仲裁是ZooKeeper服務(wù)器的復(fù)制組,它提供分布式協(xié)調(diào)服務(wù)(必須) //82行
high-availability.storageDir: hdfs:///flink/ha/? ? ? ?
#JobManager元數(shù)據(jù)保存在文件系統(tǒng)storageDir中,只有指向此狀態(tài)的指針存儲(chǔ)在ZooKeeper中(必須) //沒有
high-availability.zookeeper.path.root: /flink? ? ? ? ?
#根ZooKeeper節(jié)點(diǎn),在該節(jié)點(diǎn)下放置所有集群節(jié)點(diǎn)(推薦) //沒有
high-availability.cluster-id:/flinkCluster? ? ? ? ? ?
#自定義集群(推薦)
state.backend: filesystem
state.checkpoints.dir: hdfs:///flink/checkpoints
state.savepoints.dir: hdfs:///flink/checkpoints
修改conf/zoo.cfg
server.1=bigdata11:2888:3888
server.2=bigdata12:2888:3888
server.3=bigdata13:2888:3888
修改conf/masters
bigdata11:8081
bigdata12:8081
修改slaves
bigdata12
bigdata13
同步配置文件conf到各節(jié)點(diǎn)
2)啟動(dòng)HA
先啟動(dòng)zookeeper集群各節(jié)點(diǎn)(測試環(huán)境中也可以用Flink自帶的start-zookeeper-quorum.sh),啟動(dòng)dfs ,再啟動(dòng)flink
[itstar@bigdata11 flink-1.6.1]$ bin/start-cluster.sh
WebUI查看,這是會(huì)自動(dòng)產(chǎn)生一個(gè)主Master,如下
3)驗(yàn)證HA
手動(dòng)殺死bigdata12上的master,此時(shí),bigdata11上的備用master轉(zhuǎn)為主mater。
4)手動(dòng)將JobManager / TaskManager實(shí)例添加到群集
您可以使用bin/jobmanager.sh和bin/taskmanager.sh腳本將JobManager和TaskManager實(shí)例添加到正在運(yùn)行的集群中。
添加JobManager
bin/jobmanager.sh ((start|start-foreground) [host] [webui-port])|stop|stop-all
添加TaskManager
bin/taskmanager.sh start|start-foreground|stop|stop-all
[itstar@bigdata12 flink-1.6.1]$ jobmanager.sh start bigdata12
新添加的為從master。
3.2Yarn模式安裝
在官網(wǎng)下載1.6.1版本Flink(https://archive.apache.org/dist/flink/flink-1.6.1/)。
將安裝包上傳到要按照J(rèn)obManager的節(jié)點(diǎn)(bigdata11)。
進(jìn)入Linux系統(tǒng)對(duì)安裝包進(jìn)行解壓:(同上)
修改安裝目錄下conf文件夾內(nèi)的flink-conf.yaml配置文件,指定JobManager:(同上)
修改安裝目錄下conf文件夾內(nèi)的slave配置文件,指定TaskManager:(同上)
將配置好的Flink目錄分發(fā)給其他的兩臺(tái)節(jié)點(diǎn):(同上)
明確虛擬機(jī)中已經(jīng)設(shè)置好了環(huán)境變量HADOOP_HOME。
啟動(dòng)Hadoop集群(HDFS和Yarn)。
在bigdata11節(jié)點(diǎn)提交Yarn-Session,使用安裝目錄下bin目錄中的yarn-session.sh腳本進(jìn)行提交:
在yarn-site.xml文件中加入以下配置
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>5</value>
</property>
/opt/module/flink-1.6.1/bin/yarn-session.sh -n 2 -s 4 -jm 1024 -tm 1024 -nm test -d
其中:
-n(--container):TaskManager的數(shù)量。
-s(--slots): 每個(gè)TaskManager的slot數(shù)量,默認(rèn)一個(gè)slot一個(gè)core,默認(rèn)每個(gè)taskmanager的slot的個(gè)數(shù)為1,有時(shí)可以多一些taskmanager,做冗余。
-jm:JobManager的內(nèi)存(單位MB)。
-tm:每個(gè)taskmanager的內(nèi)存(單位MB)。
-nm:yarn 的appName(現(xiàn)在yarn的ui上的名字)。?
-d:后臺(tái)執(zhí)行。
啟動(dòng)后查看Yarn的Web頁面,可以看到剛才提交的會(huì)話:
在提交Session的節(jié)點(diǎn)查看進(jìn)程
提交Jar到集群運(yùn)行:
/opt/module/flink-1.6.1/bin/flink run -m yarn-cluster examples/batch/WordCount.jar
提交后在Yarn的Web頁面查看任務(wù)運(yùn)行情況
任務(wù)運(yùn)行結(jié)束后在控制臺(tái)打印如下輸出
3.3 FlinkWordCount
3.3.1 使用Socket傳輸數(shù)據(jù)
[root@bigdata13 flink-1.6.1]# bin/flink run examples/streaming/SocketWindowWordCount.jar --port 9999
#另起一個(gè)Xshell客戶端
[root@bigdata13 flink-1.6.1]# nc -l 9999
#查看日志輸出
[root@bigdata13 flink-1.6.1]# vi log/flink-root-taskexecutor-1-bigdata13.out
3.3.2 Java代碼運(yùn)行WordCount
#在bigdata13中打開9999端口
nc -l 9999
#運(yùn)行以下代碼,然后輸入數(shù)據(jù)到以上的端口中
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.utils.ParameterTool;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.util.Collector;
public class WordCount {
public static void main(String[] args) throws Exception {
//定義socket的端口號(hào)
int port;
try{
ParameterTool parameterTool = ParameterTool.fromArgs(args);
port = parameterTool.getInt("port");
}catch (Exception e){
System.err.println("沒有指定port參數(shù),使用默認(rèn)值9000");
port = 9000;
}
//獲取運(yùn)行環(huán)境
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
//連接socket獲取輸入的數(shù)據(jù)
DataStreamSource<String> text = env.socketTextStream("192.168.1.53", port, "\n");
//計(jì)算數(shù)據(jù)
DataStream<WordWithCount> windowCount = text.flatMap(new FlatMapFunction<String, WordWithCount>() {
public void flatMap(String value, Collector<WordWithCount> out) throws Exception {
String[] splits = value.split("\\s");
for (String word:splits) {
out.collect(new WordWithCount(word,1L));
}
}
})//打平操作,把每行的單詞轉(zhuǎn)為<word,count>類型的數(shù)據(jù)
.keyBy("word")//針對(duì)相同的word數(shù)據(jù)進(jìn)行分組
.timeWindow(Time.seconds(2),Time.seconds(1))//指定計(jì)算數(shù)據(jù)的窗口大小和滑動(dòng)窗口大小
.sum("count");
//把數(shù)據(jù)打印到控制臺(tái)
windowCount.print()
.setParallelism(1);//使用一個(gè)并行度
//注意:因?yàn)閒link是懶加載的,所以必須調(diào)用execute方法,上面的代碼才會(huì)執(zhí)行
env.execute("streaming word count");
}
/**
*主要為了存儲(chǔ)單詞以及單詞出現(xiàn)的次數(shù)
*/
public static class WordWithCount{
public String word;
public long count;
public WordWithCount(){}
public WordWithCount(String word, long count) {
this.word = word;
this.count = count;
}
@Override
public String toString() {
return "WordWithCount{" +
"word='" + word + '\'' +
", count=" + count +
'}';
}
}
}
3.3.3 Scala代碼運(yùn)行WordCount
import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.api.windowing.time.Time
object ScalaWordCount {
def main(args: Array[String]): Unit = {
// get the execution environment
val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
// get input data by connecting to the socket
val text = env.socketTextStream("bigdata13", 9999, '\n')
// parse the data, group it, window it, and aggregate the counts
val windowCounts = text
.flatMap { w => w.split("\\s") }
.map { w => WordWithCount(w, 1) }
.keyBy("word")
.timeWindow(Time.seconds(5), Time.seconds(1))
.sum("count")
// print the results with a single thread, rather than in parallel
windowCounts.print().setParallelism(1)
env.execute("Socket Window WordCount")
}
// Data type for words with count
case class WordWithCount(word: String, count: Long)
}
注意:導(dǎo)包用的import org.apache.flink.streaming.api.scala._ 不然會(huì)有缺包的BUG
3.3.4 Flink 監(jiān)控維基百科
Pom.xml
<properties>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
<encoding>UTF-8</encoding>
<scala.version>2.11.12</scala.version>
<scala.binary.version>2.11</scala.binary.version>
<hadoop.version>2.8.4</hadoop.version>
<flink.version>1.6.1</flink.version>
</properties>
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-java</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-scala_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-scala_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-clients_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka-0.10_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.38</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-wikiedits_2.11</artifactId>
<version>1.6.1</version>
</dependency>
</dependencies>
代碼
import org.apache.flink.api.common.functions.FoldFunction;
import org.apache.flink.api.java.functions.KeySelector;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.KeyedStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.connectors.wikiedits.WikipediaEditEvent;
import org.apache.flink.streaming.connectors.wikiedits.WikipediaEditsSource;
public class WikipediaAnalysis {
public static void main(String[] args) throws Exception {
//創(chuàng)建一個(gè)streaming程序運(yùn)行的上下文
StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment();
//sowurce部分---數(shù)據(jù)來源部分
DataStream<WikipediaEditEvent> edits = see.addSource(new WikipediaEditsSource());
//獲得修改詞條的作者
KeyedStream<WikipediaEditEvent, String> keyedEdits = edits
.keyBy(new KeySelector<WikipediaEditEvent, String>() {
@Override
public String getKey(WikipediaEditEvent event) {
return event.getUser();
}
});
//獲得修改的結(jié)果
DataStream<Tuple2<String, Long>> result = keyedEdits
.timeWindow(Time.seconds(5))
.fold(new Tuple2<>("", 0L), new FoldFunction<WikipediaEditEvent, Tuple2<String, Long>>() {
@Override
public Tuple2<String, Long> fold(Tuple2<String, Long> acc, WikipediaEditEvent event) {
acc.f0 = event.getUser();
acc.f1 += event.getByteDiff();
return acc;
}
});
result.print();
see.execute();
}
}
然后在IDEA中直接執(zhí)行即可,稍等20S即可
3.3.5 Wiki To Kafka
Kafka主題創(chuàng)建
#在bigdata11上創(chuàng)建topic wiki-results
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic wiki-results
在Flink的項(xiàng)目中創(chuàng)建子module,Pom如下
<parent>
<artifactId>Flink</artifactId>
<groupId>com.itstar</groupId>
<version>1.0-SNAPSHOT</version>
</parent>
<modelVersion>4.0.0</modelVersion>
<artifactId>wiki</artifactId>
<dependencies>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-java</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_2.11</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-clients_2.11</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-wikiedits_2.11</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka-0.11_2.11</artifactId>
<version>1.6.1</version>
</dependency>
</dependencies>
代碼如下
package wikiedits;
import org.apache.flink.api.common.functions.FoldFunction;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.api.java.functions.KeySelector;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.KeyedStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer011;
import org.apache.flink.streaming.connectors.wikiedits.WikipediaEditEvent;
import org.apache.flink.streaming.connectors.wikiedits.WikipediaEditsSource;
public class WikipediaAnalysis {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<WikipediaEditEvent> edits = see.addSource(new WikipediaEditsSource());
KeyedStream<WikipediaEditEvent, String> keyedEdits = edits
.keyBy(new KeySelector<WikipediaEditEvent, String>() {
@Override
public String getKey(WikipediaEditEvent event) {
return event.getUser();
}
});
DataStream<Tuple2<String, Long>> result = keyedEdits
.timeWindow(Time.seconds(5))
.fold(new Tuple2<>("", 0L), new FoldFunction<WikipediaEditEvent, Tuple2<String, Long>>() {
@Override
public Tuple2<String, Long> fold(Tuple2<String, Long> acc, WikipediaEditEvent event) {
acc.f0 = event.getUser();
acc.f1 += event.getByteDiff();
return acc;
}
});
result.print();
result
.map(new MapFunction<Tuple2<String,Long>, String>() {
@Override
public String map(Tuple2<String, Long> tuple) {
return tuple.toString();
}
})
.addSink(new FlinkKafkaProducer011<>("bigdata11:9092", "wiki-result", new SimpleStringSchema()));
see.execute();
}
}
提示:注意導(dǎo)包如下
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer011;
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.api.common.functions.MapFunction;
啟動(dòng)Kafka的消費(fèi)者
bin/kafka-console-consumer.sh ?--zookeeper localhost:2181 --topic wiki-result
3.3.6 Flink Source實(shí)戰(zhàn):
?Kafka + Flink Stream + MySQL
創(chuàng)建student表
DROP TABLE IF EXISTS `student`;
CREATE TABLE `student` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(25) COLLATE utf8_bin DEFAULT NULL,
`password` varchar(25) COLLATE utf8_bin DEFAULT NULL,
`age` int(10) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
插入數(shù)據(jù)
INSERT INTO `student` VALUES ('1', 'Andy', '123456', '18'), ('2', 'Bndy', '000000', '17'), ('3', 'Cndy', '012345', '18'), ('4', 'Dndy', '123456', '16');
COMMIT;
Pom
<dependencies>
<!--flink java-->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-java</artifactId>
<version>${flink.version}</version>
<!--<scope>provided</scope>-->
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<!--<scope>provided</scope>-->
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
<version>1.7.7</version>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.17</version>
<scope>runtime</scope>
</dependency>
<!--flink kafka connector-->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka-0.11_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
</dependency>
<!--alibaba fastjson-->
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>fastjson</artifactId>
<version>1.2.51</version>
</dependency>
<!--alibaba fastjson-->
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>fastjson</artifactId>
<version>1.2.51</version>
</dependency>
<!-- https://mvnrepository.com/artifact/mysql/mysql-connector-java -->
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.27</version>
</dependency>
</dependencies>
Student Bean
package FlinkToMySQL;
public class Student {
public int id;
public String name;
public String password;
public int age;
public Student() {
}
public Student(int id, String name, String password, int age) {
this.id = id;
this.name = name;
this.password = password;
this.age = age;
}
@Override
public String toString() {
return "Student{" +
"id=" + id +
", name='" + name + '\'' +
", password='" + password + '\'' +
", age=" + age +
'}';
}
public int getId() {
return id;
}
public void setId(int id) {
this.id = id;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public String getPassword() {
return password;
}
public void setPassword(String password) {
this.password = password;
}
public int getAge() {
return age;
}
public void setAge(int age) {
this.age = age;
}
}
注意:使用lombok可能會(huì)導(dǎo)致其他報(bào)錯(cuò)
SourceFromMySQL
package FlinkToMySQL;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.functions.source.RichSourceFunction;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
public class SourceFromMySQL extends RichSourceFunction<Student> {
PreparedStatement ps;
private Connection connection;
/**
* open()方法中建立連接,這樣不用每次 invoke 的時(shí)候都要建立連接和釋放連接。
*
* @param parameters
* @throws Exception
*/
@Override
public void open(Configuration parameters) throws Exception {
connection = getConnection();
String sql = "select * from student;";
ps = this.connection.prepareStatement(sql);
}
/**
*程序執(zhí)行完畢就可以進(jìn)行,關(guān)閉連接和釋放資源的動(dòng)作了
*
* @throws Exception
*/
@Override
public void close() throws Exception {
if (connection != null) { //關(guān)閉連接和釋放資源
connection.close();
}
if (ps != null) {
ps.close();
}
}
/**
* DataStream調(diào)用一次 run() 方法用來獲取數(shù)據(jù)
*
* @param ctx
* @throws Exception
*/
@Override
public void run(SourceContext<Student> ctx) throws Exception {
ResultSet resultSet = ps.executeQuery();
while (resultSet.next()) {
Student student = new Student(
resultSet.getInt("id"),
resultSet.getString("name").trim(),
resultSet.getString("password").trim(),
resultSet.getInt("age"));
ctx.collect(student);
}
}
@Override
public void cancel() {
}
private static Connection getConnection() {
Connection con = null;
try {
Class.forName("com.mysql.jdbc.Driver");
con = DriverManager.getConnection("jdbc:mysql://bigdata11:3306/Andy?useUnicode=true&characterEncoding=UTF-8", "root", "000000");
} catch (Exception e) {
}
return con;
}
}
自定義Source的main方法
package FlinkToMySQL
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
public class customSource {
public static void main(String[] args) throws Exception {
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.addSource(new SourceFromMySQL()).print();
env.execute("Flink add data sourc");
}
}
Flink Stream + Kafka
Pom
<dependencies>
<!--flink java-->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-java</artifactId>
<version>${flink.version}</version>
<!--<scope>provided</scope>-->
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<!--<scope>provided</scope>-->
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
<version>1.7.7</version>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.17</version>
<scope>runtime</scope>
</dependency>
<!--flink kafka connector-->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka-0.11_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
</dependency>
<!--alibaba fastjson-->
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>fastjson</artifactId>
<version>1.2.51</version>
</dependency>
<!--alibaba fastjson-->
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>fastjson</artifactId>
<version>1.2.51</version>
</dependency>
<!-- https://mvnrepository.com/artifact/mysql/mysql-connector-java -->
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.27</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.6.0</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
</plugins>
</build>
Bean
package KafkaToFlink;
import lombok.*;
import java.util.Map;
public class Metric {
private String name;
private long timestamp;
private Map<String, Object> fields;
private Map<String, String> tags;
public Metric() {
}
public Metric(String name, long timestamp, Map<String, Object> fields, Map<String, String> tags) {
this.name = name;
this.timestamp = timestamp;
this.fields = fields;
this.tags = tags;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public long getTimestamp() {
return timestamp;
}
public void setTimestamp(long timestamp) {
this.timestamp = timestamp;
}
public Map<String, Object> getFields() {
return fields;
}
public void setFields(Map<String, Object> fields) {
this.fields = fields;
}
public Map<String, String> getTags() {
return tags;
}
public void setTags(Map<String, String> tags) {
this.tags = tags;
}
}
Kafkautils
package KafkaToFlink;
import com.alibaba.fastjson.JSON;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerRecord;
import java.util.HashMap;
import java.util.Map;
import java.util.Properties;
public class KafkaUtils {
public static final String broker_list = "bigdata11:9092";
// kafka topic
public static final String topic = "metric";
//key序列化
public static final String KEY = "org.apache.kafka.common.serialization.StringSerializer";
//value序列化
public static final String VALUE = "org.apache.kafka.common.serialization.StringSerializer";
public static void writeToKafka() throws InterruptedException {
Properties props = new Properties();
props.put("bootstrap.servers", broker_list);
props.put("key.serializer", KEY);
props.put("value.serializer", VALUE);
KafkaProducer producer = new KafkaProducer<String, String>(props);
Metric metric = new Metric();
metric.setName("mem");
long timestamp = System.currentTimeMillis();
metric.setTimestamp(timestamp);
Map<String, Object> fields = new HashMap<>();
fields.put("used_percent", 90d);
fields.put("max", 27244873d);
fields.put("used", 17244873d);
fields.put("init", 27244873d);
Map<String, String> tags = new HashMap<>();
tags.put("cluster", "Andy");
tags.put("host_ip", "192.168.1.51");
metric.setFields(fields);
metric.setTags(tags);
ProducerRecord record = new ProducerRecord<String, String>(topic, null, null, JSON.toJSONString(metric));
producer.send(record);
System.out.println("發(fā)送數(shù)據(jù): " + JSON.toJSONString(metric));
producer.flush();
}
public static void main(String[] args) throws InterruptedException {
while (true) {
Thread.sleep(300);
writeToKafka();
}
}
}
package KafkaToFlink;
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer011;
import java.util.Properties;
public class Main {
public static void main(String[] args) throws Exception {
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
Properties props = new Properties();
props.put("bootstrap.servers", "bigdata11:9092");
props.put("zookeeper.connect", "bigdata11:2181");
props.put("group.id", "metric-group");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); ?//key反序列化
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("auto.offset.reset", "earliest"); //value反序列化
DataStreamSource<String> dataStreamSource = env.addSource(new FlinkKafkaConsumer011<>(
"metric", ?//kafka topic
new SimpleStringSchema(), ?// String序列化
props)).setParallelism(1);
dataStreamSource.print(); //把從 kafka 讀取到的數(shù)據(jù)打印在控制臺(tái)
env.execute("Flink add data source");
}
}
注意:kafka主題會(huì)自動(dòng)創(chuàng)建Topic。無須手動(dòng)創(chuàng)建