【async/await】 asyncio--異步IO async--異步
將異步從yieled寫法中解放出來.
async一般用于方法或者條件語句前面,用于表明當前條件語句內部或者方法內部存在異步函數
await 用于具體的操作前面,表明當前操作為異步操作
!/usr/local/bin/python3.5
import asyncio
from aiohttp import ClientSession
async def hello():
async with ClientSession() as session:
async with session.get("http://httpbin.org/headers") as response:
response = await response.read()
print(response)
loop = asyncio.get_event_loop()
loop.run_until_complete(hello())
使用async和await將函數異步化,上述hello()實際有兩個異步操作:首先異步獲取響應;然后異步堵氣響應內容
Aiohttp推薦使用ClientSession作為主要的接口發起請求。ClientSession允許在多個請求之間保存cookie以及相關對象信息。
Session(會話)在使用完畢之后需要關閉,關閉Session是另一個異步操作,所以每次你都需要使用async with關鍵字。
要讓程序正常跑起來需要將他們加入時間循環中,因此要創建asyncio loop實例,然后將任務加入其中。
【aiohttp】
基礎用法:
async with aiohttp.get('https://github.com') as r:---異步發起請求
await r.text()-----異步操作
設置超時時間:
with aiohttp.Timeout(0.001):
async with aiohttp.get('https://github.com') as r:
await r.text()
構建session:
async with aiohttp.ClientSession() as session:
async with session.get('https://api.github.com/events') as resp:
print(resp.status)
print(await resp.text())
構建headers:
url = 'https://api.github.com/some/endpoint'
headers = {'content-type': 'application/json'}
await session.get(url, headers=headers)
使用代理:
EG_1:
conn = aiohttp.ProxyConnector(proxy="http://some.proxy.com")----創建代理
session = aiohttp.ClientSession(connector=conn)
async with session.get('http://python.org') as resp:
print(resp.status)
EG_2:
conn = aiohttp.ProxyConnector(
proxy="http://some.proxy.com",
proxy_auth=aiohttp.BasicAuth('user', 'pass')
)
session = aiohttp.ClientSession(connector=conn)
async with session.get('http://python.org') as r:
assert r.status == 200
自定義cookie:
url = 'http://httpbin.org/cookies'
async with ClientSession({'cookies_are': 'working'}) as session:
async with session.get(url) as resp:
assert await resp.json() == {"cookies": {"cookies_are": "working"}}
爬蟲實例:
import urllib.request as request
from bs4 import BeautifulSoup as bs
import asyncio
import aiohttp
@asyncio.coroutine
async def getPage(url,res_list):
print(url)
headers = {'User-Agent':'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'}
# conn = aiohttp.ProxyConnector(proxy="http://127.0.0.1:8087")
async with aiohttp.ClientSession() as session:
async with session.get(url,headers=headers) as resp:
assert resp.status==200
res_list.append(await resp.text())
class parseListPage():
def init(self,page_str):
self.page_str = page_str
def enter(self):
page_str = self.page_str
page = bs(page_str,'lxml')
# 獲取文章鏈接
articles = page.find_all('div',attrs={'class':'article_title'})
art_urls = []
for a in articles:
x = a.find('a')['href']
art_urls.append('http://blog.csdn.net'+x)
return art_urls
def exit(self, exc_type, exc_val, exc_tb):
pass
with open() as f:
page_num = 5
page_url_base = 'http://blog.csdn.net/u014595019/article/list/'
page_urls = [page_url_base + str(i+1) for i in range(page_num)]
loop = asyncio.get_event_loop()
ret_list = []
tasks = [getPage(host,ret_list) for host in page_urls]
loop.run_until_complete(asyncio.wait(tasks))
articles_url = []
for ret in ret_list:
with parseListPage(ret) as tmp:
articles_url += tmp
ret_list = []
tasks = [getPage(url, ret_list) for url in articles_url]
loop.run_until_complete(asyncio.wait(tasks))
loop.close()