使用Requests下載一個文件

想通過requests庫下載一個文件。

環境是:Python3.5.2+requests+windows10

import requests

# 通過requests庫下載文件
url = 'https://www.gipsa.usda.gov/fgis/exportgrain/CY2016.csv'
r = requests.get(url)
print(r.content)
with open("myCY2016.csv", "wb") as code:
    code.write(r.content)

但是,報錯。

  File "C:\Users\admin\AppData\Local\Programs\Python\Python35-32\lib\site-packages\requests\adapters.py", line 497, in send
    raise SSLError(e, request=request)
requests.exceptions.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:645)

看到stackoverflow中說有這樣:

The problem you are having is caused by an untrusted SSL certificate.
Like @dirk mentioned in a previous comment, the *quickest* fix is setting verify=False
.
Please note that this will cause the certificate not to be verified. **This will expose your application to security risks, such as man-in-the-middle attacks.**
Of course, apply judgment. As mentioned in the comments, this *may* be acceptable for quick/throwaway applications/scripts, *but really should not go to production software*.
If just skipping the certificate check is not acceptable in your particular context, consider the following options, your best option is to set the verify
 parameter to a string that is the path of the .pem
 file of the certificate (which you should obtain by some sort of secure means).
So, as of version 2.0, the verify
 parameter accepts the following values, with their respective semantics:
True
: causes the certificate to validated against the library's own trusted certificate authorities (Note: you can see which Root Certificates Requests uses via the Certifi library, a trust database of RCs extracted from Requests: [Certifi - Trust Database for Humans](http://certifiio.readthedocs.org/en/latest/)).
False
: bypasses certificate validation *completely*.
Path to a CA_BUNDLE file for Requests to use to validate the certificates.

Source: [Requests - SSL Cert Verification](http://docs.python-requests.org/en/master/user/advanced/?highlight=ssl#ssl-cert-verification)
Also take a look at the cert
 parameter on the same link.

好的,雖然不是很明白,但是把參數verify = False設置好。繼續。

C:\Users\admin\AppData\Local\Programs\Python\Python35-32\lib\site-packages\requests\packages\urllib3\connectionpool.py:843: 
InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. 
See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  InsecureRequestWarning)

又看到stackoverflow看到這個:

The reason doing urllib3.disable_warnings()
 didn't work for you is because it looks like you're using a separate instance of urllib3 vendored inside of requests.
I gather this based on the path here: /usr/lib/python2.6/site-packages/requests/packages/urllib3/connectionpool.py

To disable warnings in requests' vendored urllib3, you'll need to import that specific instance of the module:
import requestsfrom requests.packages.urllib3.exceptions import InsecureRequestWarningrequests.packages.urllib3.disable_warnings(InsecureRequestWarning)

OK,運行成功。

貼最后的代碼:

import requests
from requests.packages.urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)

# 通過requests庫下載文件
url = 'https://www.gipsa.usda.gov/fgis/exportgrain/CY2016.csv'
r = requests.get(url,verify = False)
print(r.content)
with open("myCY2016.csv", "wb") as code:
    code.write(r.content)

不過,好像網速巨慢。

so,問題來了,而且有很多,ssl到底是個什么東西?為什么同樣是發送一個get請求,在瀏覽器里面就直接會觸發下載按鈕,而模擬的時候,卻不是,而是放在body里面?也許是因為有什么字段告訴瀏覽器,把body里面的東西下載到本地之類的把?為什么HTTP傳輸的時候,有時候會說是用字節編碼傳遞過來,有時候是用字符編碼傳遞過來,但是最后編程網絡層進行傳輸之后,不都是字節流嗎?也許在某一層改變了呢?

另外stackoverflow真是厲害,不是嗎!?

最后編輯于
?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發布,文章內容僅代表作者本人觀點,簡書系信息發布平臺,僅提供信息存儲服務。

推薦閱讀更多精彩內容