title: 西部世界 1080P高清下載和自動提醒后續新出的【Python】
date: 2016-10-13 20:59:28
tags:
西部世界 1080P高清下載和自動提醒后續新出的#
1、主要思路是,通過高清源頭的網站提供的資源,爬取后,通過迅雷實現自動下載,
然后后續新出的,比如下周1更新后,腳本會自動捕捉后發郵件通知,并自動下載。
2、代碼:
# -*- coding: utf-8 -*-
# python 3.5.2
# 測試系統,Win10
# Author:Van
# 實現《西部世界》有更新后自動下載,以及郵件通知
# V1.01
# 修改@href提取方法,因源頭網站變更
# 請把對應的帳號密碼修改成自己的
# from selenium import webdriver
import requests
from lxml import etree
import time
import os
from win32com.client import Dispatch
import smtplib
from email.mime.text import MIMEText
from email.header import Header
import copy
# hints
print('請確保電腦安裝了迅雷')
print('如果你用的是破解版的迅雷,請先開啟再運行程序')
print()
# requests
url = 'http://www.btbtdy.com/btdy/dy7280.html'
html = requests.get(url).content.decode('utf-8')
# lxml
selector = etree.HTML(html)
real_link = []
# to be easy, try 'starts-with' , very useful in this case :)
HDTV = selector.xpath('//a[starts-with(@title, "HDTV-1080P")]/text()')
for each in HDTV:
print(each)
# the site modified the magnet link position with adding the span
# we should use: following-sibling function to catch it :)
href = selector.xpath('//a[starts-with(@title, "HDTV-1080P")]/following-sibling::span/a/@href')
print()
print('目前有 %d 集西部世界' %len(href))
print()
for each in href:
# split to get the right magnet link
each = 'magnet' + each.split('magnet')[-1]
# print(each)
real_link.append(each)
print('他們的磁鏈接是 :\n', real_link)
# define a temp_link in deepcopy to compare for new series
temp_link = copy.deepcopy(real_link)
print('temp_link is :', temp_link)
def addTasktoXunlei(down_url,course_infos):
flag = False
o = Dispatch("ThunderAgent.Agent.1")
if down_url:
course_path = os.getcwd()
try:
#AddTask("下載地址", "另存文件名", "保存目錄","任務注釋","引用地址","開始模式", "只從原始地址下載","從原始地址下載線程數")
o.AddTask(down_url, '', course_path, "", "", -1, 0, 5)
o.CommitTasks()
flag = True
except Exception:
print(Exception.message)
print(" AddTask is fail!")
return flag
def new_href():
# to judge if there is a new series of WestWorld
time.sleep(2)
if len(real_link) > len(temp_link):
print('西部世界1080P有更新!')
print('現在一共有 %d 集了。' %len(real_link))
return True
else:
return False
def send_email(htm):
# send email to notice new WestWorld is coming
sender = 'xxxxxxxx@163.com'
receiver = 'xxxxxxxx@qq.com,xxxxxxxx@163.com'
subject = '西部世界 1080P有更新!'
smtpserver = 'smtp.163.com'
username = 'xxxxxxxx@163.com'
password = 'xxxxxxxx'
msg = MIMEText(htm, 'html', 'utf-8')
msg['Subject'] = Header(subject, 'UTF-8')
msg['From'] = sender
msg['To'] = ','.join(receiver)
smtp = smtplib.SMTP()
smtp.connect(smtpserver)
smtp.login(username, password)
smtp.sendmail(sender, receiver, msg.as_string())
smtp.quit()
def new_download():
# only download the new WestWorld series
if len(real_link) > len(temp_link):
# 2個地址數據的差集
new_link = list(set(real_link).difference(set(temp_link)))
for i in new_link:
addTasktoXunlei(i, course_infos=None)
if __name__ == '__main__':
# download the exiting series of WestWorld
# send_email('最新更新磁鏈接:'+ str(real_link))
for i in real_link:
addTasktoXunlei(i, course_infos=None)
# to get the later WestWorld for each hour
while 1:
if new_href():
send_email('所有的下載地址(磁鏈接):'+ str(real_link))
new_download()
time.sleep(15)
# wait for an hour
temp_link = real_link
print(temp_link)
print('神劇很好看吧,親,耐心等下一集!~!')
3、代碼分析,其中用到了deepcopy,這個功能很有用,并配合了2個數組的差集,使得可以規避定時器,而讓腳本直接比較temp_link的內容,而撲捉到網站有新的更新了。另外,在地址識別的時候,一開始用.xpath 沒顯示內容,有點奇怪,后來根據特性,使用了strats_with識別了內容。另外,原始的郵件發送函數,是一個接收人,如果要多發,則receiver的格式為list,并修改 msg['To'] = ','.join(receiver)
4、郵件的作用是可以利用微信綁定來推送,相對短信,更覺方便。
5、感謝:
@陌 提供了163發送email的代碼
@何方 提供了高清網站源
@其他人,交流了細節
6、可改進點:
郵件的地址內容顯示的是一個列表,有待改進。
7、github對應倉庫:
https://github.com/vansnowpea/WestWorld-auto-download-email-xunlei-
8、推薦 xpath學習教程: