由于西門子技術中心網站是由JS渲染,常規的requsts+beautiful sopu 無法獲取需要的資源;通過對網頁請求內容進行抓包解析發現,所有查詢結果均通過接口實現,返回數據為JSON結構,且內部包含相關手冊下載鏈接;因此,技術實現通過Requests.get()獲取接口內容,然后提取下載鏈接進行下載;
一. 獲取接口信息
通過抓包工具Fiddler Classic或者直接使用瀏覽器工具(edge按F12即可)可獲取網頁所有請求信息,Fiddler Classic比較直觀,如下圖查詢s7-200 samrt 手冊請求信息;有了接口信息就可以開始python程序編寫;
編輯
二. python程序設計
程序適用requests_html獲取接口數據,再對返回數據Json數據進行處理,因此需要安裝requests_html庫和Json庫;
2.1 安裝requests_html庫:
pip install requests-html
2.2 安裝JSON庫:
pip install json
2.3 python庫安裝完成后即可開始程序編寫,獲取返回結果,并將JSON轉換為python對象字典,具體代碼如下:
import requests_htmlimport json
session = requests_html.HTMLSession()host = "https://support.industry.siemens.com"API = "/webbackend/api/ProductSupport/ProductSupportSearch"url = host+APIpayload = {'language': 'zh', 'region': 'cn', 'networks': 'Internet', 'documentType': 'Manual', 'suppressedResource': 'productNodePath', '$search': "'s7-1200'", '$orderby': 'DefaultRankingDesc', '$top': '100', '$skip': f'{skpitem}', '$inlinecount': 'allpages'}try: content = session.get(url, params=urllib.parse.urlencode( payload, quote_via=urllib.parse.quote, safe='$')) print("請求接口完成") data = json.loads(content.text)except: print("請求接口失敗")
返回結果(示例):
{ "AlternateLanguageTitle": "en", "AlternateLanguageCount": 13, "Documents": [ { "ForProductsText": "6ES7288-2DR32-0AA0, 6ES7288-2DT16-0AA0,...", "ShowMoreProductsLink": true, "HasReleaseVersions": false, "Level1Id": "gen_1318291", "Output": "", "HasAttachment": true, "HasAttachmentsHits": true, "HasHint": false, "MlfbDruckForm": null, "PdfLink": "/cs/attachments/109745610/s7-200_SMART_system_manual_zh-CHS.pdf", "AvailableLanguages": [ { "LanguageTitle": "en", "DocumentTitle": "S7-200 SMART System manual " }, { "LanguageTitle": "zh", "DocumentTitle": "S7-200 SMART 系統手冊 " } ], "SlkNavigationNodeId": null, "BusinessUnitId": 4224, "Url": null, "Id": 1318292, "Title": "S7-200 SMART 系統手冊", "Description": "系統手冊", "Type": "Manual", "Network": "Intranet, Internet", "DocumentDate": "2021-07-15T00:00:00", "DocumentActuality": "None", "Rating": 4.540984, "RatingCount": 122, "LocaleGroupId": 109745610, "LanguageId": 6, "IsSipsManual": false, "SipsSummary": "" } }
2.4 轉化完成后即可提取下載鏈接并保存文件,代碼如下:
for k in range(len(data['Documents'])): if 'PdfLink' in data['Documents'][k] and data['Documents'][k]["DocumentDate"].startswith('202'): title = FileName(data['Documents'][k]['Title']) link = data['Documents'][k]['PdfLink'] donwloadlink = host+link if os.path.exists(path+f"/{title}.pdf"): print(f"{title}.pdf 已經存在") continue res = session.get(donwloadlink) with open(path+f"/{title}.pdf", 'wb') as f1, open(path+"/"+FileName(f"{payload['$search']} {payload['documentType']} link.txt"), 'a') as f2: f1.write(res.content) f2.writelines(f"文件名:{title}; 鏈接地址:{donwloadlink}"+'\n') print(f"標題:{title},鏈接: {link}")
至此程序完成,本程序jinxian于測試,正常需要多次運行減少bug;