背景
某應用1.0性能測試
服務強依賴于mysql, 許多接口都會請求mysql
-
對mysql的請求用GenericObjectPool的連接池來進行管理, 設置如下:
79 connectionPool.setMaxActive(maxActive); 80 connectionPool.setTestOnBorrow(false); 81 connectionPool.setTestOnReturn(true); 82 connectionPool.setTestWhileIdle(true); 83 connectionPool 84 .setMinEvictableIdleTimeMillis(minEvictableIdleTimeMillis); 85 86 // Start the Evictor 87 connectionPool 88 .setTimeBetweenEvictionRunsMillis(timeBetweenEvictionRunsMillis);
問題
-
用100并發打壓服務,發現拿到連接并在干活的線程數只有10+, 而其余80+的線程wait在borrowObject的邏輯, 相應的stack如下:
有許多線程在blocked在makeObject相關的邏輯,均在等一把鎖 1462 "resin-tcp-connection-*:3231-321" daemon prio=10 tid=0x000000004dc43800 nid=0x65f5 waiting for monitor entry [0x00000000507ff000] 1463 java.lang.Thread.State: BLOCKED (on object monitor) 1464 at org.apache.commons.dbcp.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:290) 1465 - waiting to lock <0x00000000b26ee8a8> (a org.apache.commons.dbcp.PoolableConnectionFactory) 1466 at org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:771) 1467 at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:95) 1468 at outfox.cps.dao.source.CpsDataSource.getConnection(CpsDataSource.java:61) 1469 at org.springframework.jdbc.datasource.DataSourceUtils.doGetConnection(DataSourceUtils.java:111) 1470 at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:77) 1471 at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:572) 1472 at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:636) 1473 at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:665) 1474 at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:673) 1475 at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:713) 1476 at outfox.cps.dao.impl.FriendshipDaoImpl.getFansWithTime(FriendshipDaoImpl.java:93) ...... 有一個線程拿到了makeObject相關的鎖,并在執行相關的邏輯 1208 "resin-tcp-connection-*:3231-149" daemon prio=10 tid=0x000000004d67e800 nid=0x66d7 runnable [0x000000005180f000] 1209 java.lang.Thread.State: RUNNABLE 1210 at java.net.SocketInputStream.socketRead0(Native Method) 1211 at java.net.SocketInputStream.read(SocketInputStream.java:129) 1212 at com.mysql.jdbc.util.ReadAheadInputStream.fill(ReadAheadInputStream.java:113) 1213 at com.mysql.jdbc.util.ReadAheadInputStream.readFromUnderlyingStreamIfNecessary(ReadAheadInputStream.java:160) 1214 at com.mysql.jdbc.util.ReadAheadInputStream.read(ReadAheadInputStream.java:188) 1215 - locked <0x000000009f5a4cc8> (a com.mysql.jdbc.util.ReadAheadInputStream) 1216 at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:2329) 1217 at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2774) 1218 at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2763) 1219 at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3299) 1220 at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1837) 1221 at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:1961) 1222 at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2537) 1223 - locked <0x000000009f5a1600> (a java.lang.Object) 1224 at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2466) 1225 at com.mysql.jdbc.StatementImpl.executeQuery(StatementImpl.java:1383) 1226 - locked <0x000000009f5a1600> (a java.lang.Object) 1227 at com.mysql.jdbc.ConnectionImpl.loadServerVariables(ConnectionImpl.java:3752) 1228 at com.mysql.jdbc.ConnectionImpl.initializePropsFromServer(ConnectionImpl.java:3345) 1229 at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2046) 1230 at com.mysql.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:729) 1231 at com.mysql.jdbc.JDBC4Connection.<init>(JDBC4Connection.java:46) 1232 at sun.reflect.GeneratedConstructorAccessor26.newInstance(Unknown Source) 1233 at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) 1234 at java.lang.reflect.Constructor.newInstance(Constructor.java:513) 1235 at com.mysql.jdbc.Util.handleNewInstance(Util.java:406) 1236 at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:302) 1237 at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:283) 1238 at java.sql.DriverManager.getConnection(DriverManager.java:582) 1239 at java.sql.DriverManager.getConnection(DriverManager.java:207) 1240 at org.apache.commons.dbcp.DriverManagerConnectionFactory.createConnection(DriverManagerConnectionFactory.java:46) 1241 at org.apache.commons.dbcp.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:290) 1242 - locked <0x00000000b26ee8a8> (a org.apache.commons.dbcp.PoolableConnectionFactory) 1243 at org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:771) 1244 at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:95) 1245 at outfox.cps.dao.source.CpsDataSource.getConnection(CpsDataSource.java:61) 1246 at org.springframework.jdbc.datasource.DataSourceUtils.doGetConnection(DataSourceUtils.java:111) 1247 at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:77) 1248 at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:572) 1249 at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:636) 1250 at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:665) 1251 at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:673) 1252 at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:713) 1253 at outfox.cps.dao.impl.FriendshipDaoImpl.getAttentionsWithTime(FriendshipDaoImpl.java:69) 1254 at outfox.cps.api.impl.FriendshipApi$RelationListOperation.copyRealtionListFromDbToRedis(FriendshipApi.java:118) 1255 at outfox.cps.api.impl.FriendshipApi$RelationListOperation.getRelationList(FriendshipApi.java:144) ......
分析
為什么maxActive設置為100, 還會有如此多的線程在makeObject?這是我們的setMaxActive沒有生效, 還是GenericObjectPool有其他destroy對象的機制被我們忽略了?
查看GenericObjectPool的java doc, 了解對象池關于對象創建和銷毀相關的方式以及配置方式
-
打壓開始后, 動態debug, 發現pool對象有許多我們未進行配置的成員變量,對照doc逐一check各變量
發現一個可懷疑點: numIdle, 其值一直在2-8之間波動, 且doc中如是描述:maxActive controls the maximum number of objects that can be borrowed from the pool at one time. When non-positive, there is no limit to the number of objects that may be active at one time. When maxActive is exceeded, the pool is said to be exhausted. The default setting for this parameter is 8. maxIdle controls the maximum number of objects that can sit idle in the pool at any time. When negative, there is no limit to the number of objects that may be idle at one time. The default setting for this parameter is 8.
可知默認的maxIdle值為8,若未進行配置,當pool中idle的object超過默認值8時,多余的對象就會被destroy
至此,問題基本定位到
驗證
-
增加對maxIdle的配置,即:
79 connectionPool.setMaxActive(maxActive); 80 connectionPool.setMaxIdle(maxActive); //實際中可以設小一些, 如30 81 connectionPool.setTestOnBorrow(false); 82 connectionPool.setTestOnReturn(true); 83 connectionPool.setTestWhileIdle(true); 84 connectionPool 85 .setMinEvictableIdleTimeMillis(minEvictableIdleTimeMillis); 86 87 // Start the Evictor 88 connectionPool 89 .setTimeBetweenEvictionRunsMillis(timeBetweenEvictionRunsMillis);
重新打壓, 發現吞吐增長近一倍,響應速度也顯著加快,stack中無線程blocked在borrowObject或makeObject相關的邏輯
另一個重要的問題
-
未設置maxWait, doc中對maxWait說明如下:
setMaxWait public void setMaxWait(long maxWait) Sets the maximum amount of time (in milliseconds) the borrowObject() method should block before throwing an exception when the pool is exhausted and the "when exhausted" action is WHEN_EXHAUSTED_BLOCK. When less than or equal to 0, the borrowObject() method may block indefinitely. Parameters: maxWait - maximum number of milliseconds to block when borrowing an object. See Also: getMaxWait(), setWhenExhaustedAction(byte), WHEN_EXHAUSTED_BLOCK
可知若不進行設置,當被依賴的服務由于某些故障(如機器資源被某進程大量占用)而響應極慢時,會有大量線程blocked在borrowObject的邏輯,最終導致resin線程耗盡,服務卡死,用戶請求被拒絕
可參考之前的一次線上故障:某服務變慢導致依賴于它的另一web卡死,其原因為改web請求改服務時的連接池未對borrowObject設置timeout,導致大量線程blocked在borrowObject的邏輯
-
因此需要對maxWait進行設置,且設置的時間不宜過大(高吞吐時仍有可能導致web卡死),也不宜過小(可能導致較多的請求失敗),具體的配置可參考如下:
基本原則: qt<N, 其中q為服務最大吞吐(請求/s), t為設置的maxWait時間(s), N為resin的最大線程數 resin的線程數當前為512, 預估最大吞吐為100, 因此有t<N/q=512/100=5.12 我們將其配置為3s(即3000ms), 從而當該對象池中的對象出現異常時,仍可扛住512/3約為170qps的壓力
-
具體的配置如下:
79 connectionPool.setMaxActive(maxActive); 80 connectionPool.setMaxIdle(maxActive); 81 connectionPool.setMaxWait(maxWait); // maxWait=3000 82 connectionPool.setTestOnBorrow(false); 83 connectionPool.setTestOnReturn(true); 84 connectionPool.setTestWhileIdle(true); 85 connectionPool 86 .setMinEvictableIdleTimeMillis(minEvictableIdleTimeMillis); 87 88 // Start the Evictor 89 connectionPool 90 .setTimeBetweenEvictionRunsMillis(timeBetweenEvictionRunsMillis);