Python crawler将在线html页面中的链接替换为本地链接,并保存html文件。

导入操作系统,re

def check_flag(标志):

regex = re.compile(r'images\/')

如果regex.match(flag)否则为False,则result = True

回送结果

# soup = beautiful soup(open(' index . html '))

从bs4导入BeautifulSoup

html_content = ' ' '

& lta href = " " & gt测试01

& lta href="/123 " >。测试02

& lta href = " " & gt测试01

& lta href = " " & gt测试01

'''

file = open(r ' favor-en . html ',' r ',encoding="UTF-8 ")

soup = BeautifulSoup(文件,“html.parser”)

对于soup.find_all('img ')中的元素:

如果element.attrs中有“src ”:

print(element.attrs['src'])

if check _ flag(element . attrs[' src ']):

#if element.attrs['src']。查找(" png "):

element . attrs[' src ']= " michenxxxxxxxxxxxx "+'/'+element . attrs[' src ']

打印(" ######################## ")

用open('index.html ',' w ',encoding="UTF-8 ")作为fp:

FP . write(soup . pretify())# pretify()?就是美化sp?,可读。