同性两个17男互摸互吃的小说,性xxxx视频播放,igao为爱寻找激情

BeautifulSoup

官方文檔如下介紹：

Beautiful Soup 是一個可以從 HTML 或 XML 文件中提取數據的 Python 庫.它能夠通過你喜歡的轉換器實現慣用的文檔導航,查找,修改文檔的方式.Beautiful Soup 會幫你節省數小時甚至數天的工作時間.

1. 安裝

以下都是在 python2.7 中進行測試的

可以直接使用 pip 安裝：

$ pip install beautifulsoup4

BeautifulSoup 不僅支持 HTML 解析器,還支持一些第三方的解析器，如，lxml，XML，html5lib 但是需要安裝相應的庫。

$ pip install lxml

$ pip install html5lib

2. 開始使用

Beautiful Soup 的功能相當強大，但我們只介紹經常使用的功能。

簡單用法

將一段文檔傳入 BeautifulSoup 的構造方法,就能得到一個文檔的對象, 可以傳入一段字符串或一個文件句柄.


>>> from bs4 import BeautifulSoup

>>> soup = BeautifulSoup("<html><body><p>data</p></body></html>")

>>> soup
<html><body><p>data</p></body></html>

>>> soup('p')
[<p>data</p>]

首先傳入一個 html 文檔，soup 是獲得文檔的對象。然后,文檔被轉換成 Unicode ,并且 HTML 的實例都被轉換成 Unicode 編碼。然后,Beautiful Soup 選擇最合適的解析器來解析這段文檔,如果手動指定解析器那么 Beautiful Soup 會選擇指定的解析器來解析文檔。但是一般最好手動指定解析器，并且使用 requests 與 BeautifulSoup 結合使用， requests 是用于爬取網頁源碼的一個庫，此處不再介紹，requests 更多用法請參考 Requests 2.10.0 文檔。

要解析的文檔是什么類型: 目前支持, html, xml, 和 html5
指定使用哪種解析器: 目前支持, lxml, html5lib, 和 html.parser


from bs4 import BeautifulSoup
import requests

html = requests.get(‘http://www.lxweimin.com/’).content  
soup = BeautifulSoup(html, 'html.parser', from_encoding='utf-8')
result = soup('div')

對象的種類

Beautiful Soup 將復雜 HTML 文檔轉換成一個復雜的樹形結構,每個節點都是 Python 對象,所有對象可以歸納為 4 種: Tag , NavigableString , BeautifulSoup , Comment .

Tag：通俗點講就是 HTML 中的一個個標簽，像上面的 div，p。每個 Tag 有兩個重要的屬性 name 和 attrs，name 指標簽的名字或者 tag 本身的 name，attrs 通常指一個標簽的 class。
NavigableString：獲取標簽內部的文字，如，soup.p.string。
BeautifulSoup：表示一個文檔的全部內容。
Comment：Comment 對象是一個特殊類型的 NavigableString 對象，其輸出的內容不包括注釋符號.

示例

下面是一個示例，帶你了解 Beautiful Soup 的常見用法：


import sys  
reload(sys)  
sys.setdefaultencoding('utf-8') 
from bs4 import BeautifulSoup
import requests


html_doc = """
<head>
      <meta charset="utf-8">
      <meta http-equiv="X-UA-Compatible" content="IE=Edge">
    <title>首頁 - 簡書</title>
</head>

<body class="output fluid zh cn win reader-day-mode reader-font2 " data-js-module="recommendation" data-locale="zh-CN">

<ul class="article-list thumbnails">

  <li class=have-img>
      <a class="wrap-img" href="/p/49c4728c3ab2"><img src="http://upload-images.jianshu.io/upload_images/2442470-745c6471c6f8258c.jpg?imageMogr2/auto-orient/strip%7CimageView2/1/w/300/h/300" alt="300" /></a>
    <div>
      <p class="list-top">
        <a class="author-name blue-link" target="_blank" href="/users/0af6b163b687">阿隨向前沖</a>
        <em>·</em>
        <span class="time" data-shared-at="2016-07-27T07:03:54+08:00"></span>
      </p>
      <h4 class="title"><a target="_blank" href="/p/49c4728c3ab2"> 只裝了這六款軟件，工作就高效到有時間逛某寶刷某圈</a></h4>
      <div class="list-footer">
        <a target="_blank" href="/p/49c4728c3ab2">
          閱讀 1830
</a>        <a target="_blank" href="/p/49c4728c3ab2#comments">
           · 評論 35
</a>        <span> · 喜歡 95</span>
          <span> · 打賞 1</span>
        
      </div>
    </div>
  </li>
</ul>

</body>
"""

soup = BeautifulSoup(html_doc, 'html.parser', from_encoding='utf-8')

# 查找所有有關的節點
tags = soup.find_all('li', class_="have-img")

for tag in tags:
        image = tag.img['src']
        article_user = tag.p.a.get_text()
        article_user_url = tag.p.a['href']      
        created = tag.p.span['data-shared-at']        
        article_url = tag.h4.a['href']

        # 可以在查找的 tag 下繼續使用 find_all()
        tag_span = tag.div.div.find_all('span')

        likes = tag_span[0].get_text(strip=True)

BeautifulSoup 主要用來遍歷子節點及子節點的屬性，通過點取屬性的方式只能獲得當前文檔中的第一個 tag，例如，soup.li。如果想要得到所有的<li> 標簽,或是通過名字得到比一個 tag 更多的內容的時候,就需要用到 find_all()，find_all() 方法搜索當前 tag 的所有 tag 子節點,并判斷是否符合過濾器的條件find_all() 所接受的參數如下：

find_all( name , attrs , recursive , string , **kwargs )

按 name 搜索: name 參數可以查找所有名字為 name 的 tag,字符串對象會被自動忽略掉:
```
 soup.find_all("li")
```
按 id 搜索: 如果包含一個名字為 id 的參數,搜索時會把該參數當作指定名字 tag 的屬性來搜索:
```
 soup.find_all(id='link2')
```
按 attr 搜索：有些 tag 屬性在搜索不能使用,比如 HTML5 中的 data-* 屬性，但是可以通過 find_all() 方法的 attrs 參數定義一個字典參數來搜索包含特殊屬性的 tag:
```
 data_soup.find_all(attrs={"data-foo": "value"})
```
按 CSS 搜索: 按照 CSS 類名搜索 tag 的功能非常實用,但標識CSS 類名的關鍵字 class 在 Python 中是保留字,使用 class 做參數會導致語法錯誤.從 Beautiful Soup 的 4.1.1 版本開始,可以通過 class_ 參數搜索有指定 CSS 類名的 tag:
```
 soup.find_all('li', class_="have-img")
```
string 參數：通過 string 參數可以搜搜文檔中的字符串內容.與 name 參數的可選值一樣, string 參數接受字符串 , 正則表達式 , 列表, True 。看例子:
```
 soup.find_all("a", string="Elsie")
```
recursive 參數：調用 tag 的 find_all() 方法時,Beautiful Soup 會檢索當前 tag 的所有子孫節點,如果只想搜索 tag 的直接子節點,可以使用參數 recursive=False .
```
 soup.find_all("title", recursive=False)
```

find_all() 幾乎是 Beautiful Soup中最常用的搜索方法,也可以使用其簡寫方法，以下代碼等價：

    soup.find_all("a")
    soup("a")

get_text()

如果只想得到 tag 中包含的文本內容,那么可以用 get_text() 方法,這個方法獲取到 tag 中包含的所有文版內容包括子孫 tag 中的內容,并將結果作為 Unicode 字符串返回:

    tag.p.a.get_text()

如果想看更多內容，請參考 Beautiful Soup 4.4.0 文檔（中文文檔）。

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频

python：BeautifulSoup 模塊使用指南

python：BeautifulSoup 模塊使用指南

1. 安裝

2. 開始使用

簡單用法

對象的種類

示例

get_text()

推薦閱讀更多精彩內容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美 国产 综合 欧美 视频

python：BeautifulSoup 模塊使用指南

1. 安裝

2. 開始使用

簡單用法

對象的種類

示例

get_text()

推薦閱讀更多精彩內容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频