協(xié)程是一種用戶(hù)輕量級(jí)線程。協(xié)程擁有自己的寄存器上下文和棧,協(xié)程調(diào)度切換時(shí)寄存器上下文和棧保存到其它地方,在切回來(lái)的時(shí)候恢復(fù)先前保存的寄存器上下文和棧,因此協(xié)程能保留上一次調(diào)用時(shí)的狀態(tài),每次過(guò)程重入時(shí)進(jìn)入上一次離開(kāi)時(shí)所處邏輯流的位置。協(xié)程的好處:
1、無(wú)需線程上下文切換的開(kāi)銷(xiāo)
2、無(wú)需原子操作鎖定及同步的開(kāi)銷(xiāo)
3、方便切換控制流,簡(jiǎn)化編程模型
4、高并發(fā)+高擴(kuò)展+低成本:一個(gè)cpu支持上萬(wàn)的協(xié)程都不是問(wèn)題,適合高并發(fā)處理
缺點(diǎn):
1、無(wú)法利用多核資源,協(xié)程的本質(zhì)是個(gè)單線程,它不能同時(shí)將單個(gè)cpu的多個(gè)核用上,協(xié)程需要和進(jìn)程配合才能在多cpu上運(yùn)行
2、進(jìn)行阻塞(blocking)操作(如IO)時(shí)會(huì)阻塞掉整個(gè)程序
greenlet模塊:
from greenlet import greenlet
def test1():
print('test1-1')
gr2.switch()
print('test1-2')
gr2.switch()
def test2():
print('test2-1')
gr1.switch()
print('test2-2')
gr1 = greenlet(test1) #啟動(dòng)一個(gè)協(xié)程
gr2 = greenlet(test2)
gr1.switch()
test1-1
test2-1
test1-2
test2-2
greenlet是手動(dòng)切換線程上下文。greenlet(test1)啟動(dòng)一個(gè)協(xié)程后,gr1.switch()切換到test1執(zhí)行,test1打印test1-1后切換到test2,test2打印test2-1后又切回test1,并記錄到了上一次執(zhí)行的位置,打印test1-2
gevent模塊:gevent模塊封裝了greenlet,實(shí)現(xiàn)自動(dòng)切換:
import gevent
def foo():
print('in foo')
gevent.sleep(2)#觸發(fā)切換
print('in foo 2')
def bar():
print('in bar 1')
gevent.sleep(1)
print('in bar 2')
def func3():
print('in func3 1')
gevent.sleep(0)
print('in func3 2')
gevent.joinall(
[
gevent.spawn(foo), #啟動(dòng)一個(gè)協(xié)程
gevent.spawn(bar),
gevent.spawn(func3)
]
)
in foo
in bar 1
in func3 1
in func3 2
in foo 2
in bar 2
啟動(dòng)三個(gè)協(xié)程,打印in foo后執(zhí)行g(shù)event.sleep(2),此時(shí)會(huì)切換至打印in bar 1,此時(shí)又遇切換,執(zhí)行打印in func3 1和in func3 2,之后回到foo函數(shù)gevent.sleep(2)還未到達(dá)2秒,到達(dá)1秒后打印in bar 2,到達(dá)2秒后再打印in foo 2,總耗時(shí)2秒
協(xié)程爬蟲(chóng)簡(jiǎn)單例子:
from urllib import request
import gevent
from gevent import monkey
import time
# monkey.patch_all()#gevent檢測(cè)不到urllib的IO操作,所以不會(huì)進(jìn)行切換。monkey.patch_all()是給當(dāng)前程序所有IO操作單獨(dú)做上標(biāo)記
def fun(url):
res = request.urlopen(url)
data = res.read()
f = open('url.html','wb')
f.write(data)
f.close()
print("%d bytes recived from %s" % (len(data),url))
urls = [ 'https://github.com/',
'https://zz.253.com/v5.html#/yun/index'
]
sync_all_time = time.time()
for url in urls:
fun(url)
print('同步總耗時(shí):',time.time()-sync_all_time)
async_start_time = time.time()
gevent.joinall(
[
gevent.spawn(fun,'https://github.com/'),
gevent.spawn(fun,'https://zz.253.com/v5.html#/yun/index'),
# gevent.spawn(fun,'http://www.lxweimin.com/'),
]
)
print('異步總耗時(shí):',time.time()-async_start_time)
59864 bytes recived from https://github.com/
1175 bytes recived from https://zz.253.com/v5.html#/yun/index
同步總耗時(shí): 2.9010000228881836
59854 bytes recived from https://github.com/
1175 bytes recived from https://zz.253.com/v5.html#/yun/index
異步總耗時(shí): 7.056999921798706
gevent檢測(cè)不到urllib的IO操作,不會(huì)進(jìn)行切換,所以為串行。monkey.patch_all()是給當(dāng)前程序所有IO操作單獨(dú)做上標(biāo)記,如此才并行。