Python crawler将在线html页面中的链接替换为本地链接,并保存html文件。
导入操作系统,re
def check_flag(标志):
regex = re.compile(r'images\/')
如果regex.match(flag)否则为False,则result = True
回送结果
# soup = beautiful soup(open(' index . html '))
从bs4导入BeautifulSoup
html_content = ' ' '
& lta href = " " & gt测试01
& lta href="/123 " >。测试02
& lta href = " " & gt测试01
& lta href = " " & gt测试01
'''
file = open(r ' favor-en . html ',' r ',encoding="UTF-8 ")
soup = BeautifulSoup(文件,“html.parser”)
对于soup.find_all('img ')中的元素:
如果element.attrs中有“src ”:
print(element.attrs['src'])
if check _ flag(element . attrs[' src ']):
#if element.attrs['src']。查找(" png "):
element . attrs[' src ']= " michenxxxxxxxxxxxx "+'/'+element . attrs[' src ']
打印(" ######################## ")
用open('index.html ',' w ',encoding="UTF-8 ")作为fp:
FP . write(soup . pretify())# pretify()?就是美化sp?,可读。