项目说明
scrapy.cfg: 项目的配置文件
spiders/: 该项目的python模块。之后您将在此加入代码。
spiders/items.py: 项目中的item文件.
spiders/pipelines.py: 项目中的pipelines文件.
spiders/settings.py: 项目的设置文件.
spiders/spiders/: 放置spider代码的目录.
依赖安装
yum -y install epel-release python-pip
yum clean all
wget https://bootstrap.pypa.io/ez_setup.py -O - | python
#
yum -y install python-setuptools
#
easy_install pip
#
yum -y install libxslt-devel libffi libffi-devel python-devel gcc openssl openssl-devel
easy_install scrapy
pip3 install scrapy requests redis pymongo
scrapy使用
scrapy startproject project_name
scrapy genspider -l
scrapy genspider -t basic intbee_basic intbee.com
scrapy list
scrapy edit <spider>
scrapy fetch url
scrapy view url
scrapy crawl spider_name
scrapy runspider <spider_file.py>
scrapy配置
ROBOTSTXT_OBEY=False
解析
pip install beautifulsoup4
作者
jnan77
发表于
2017-05-27 10:04:25
,并被添加「
爬虫
」标签
,最后修改于
2017-05-27 10:07:09
Comments