Python网络爬虫编写2-BeautifulSoup解析网页

# coding=utf-8
”’
1.到http://www.crummy.com/software/BeautifulSoup/网站上上下载，最新版本是4.x.y。
2.下载完成之后需要解压缩，假设放到D:/python下。
3.运行cmd，切换到D:/python/beautifulsoup4-4.x.y/目录下（根据自己解压缩后的目录和下载的版本号修改），
cd /d D:/python//beautifulsoup4-4.x.y
4.运行命令：
setup.py build
setup.py install
5.在IDE下from bs4 import BeautifulSoup，没有报错说明安装成功。
”’

from bs4 import BeautifulSoup #导入beautifulsoup
import urllib2

url = ‘http://www.zengyuetian.com/’ #你要爬取的网页地址
req = urllib2.Request(url)
response = urllib2.urlopen(req)
the_page = response.read()

#print the_page
#用beautifulsoup提取页面内容
soup = BeautifulSoup(the_page)
print soup.title.string #页面title
print soup.find_all(‘a’) #页面链接

About author

曾月天

View all posts by 曾月天