当前位置：首页 > news >正文

许昌市网站建设找汉狮编写网页的软件

news 2025/8/2 14:48:41

许昌市网站建设找汉狮,编写网页的软件,企业微信公众号怎么创建,做网站需要会哪些知识Scrapy框架之全局配置文件settings.py详解前言 settings.py 文件是 Scrapy框架下，用来进行全局配置的设置文件，可以进行 User-Agent 、请求头、最大并发数等的设置，本文中介绍 settings.py 文件下的一些常用配置正文 1、爬虫的项目目录…

Scrapy框架之全局配置文件settings.py详解

前言

settings.py 文件是 Scrapy框架下，用来进行全局配置的设置文件，可以进行 User-Agent 、请求头、最大并发数等的设置，本文中介绍 settings.py 文件下的一些常用配置

正文

1、爬虫的项目目录名、爬虫文件名

BOT_NAME：Scrapy 项目实现的 bot 的名字。用来构造默认 User-Agent，同时也用来 log。当使用 startproject 命令创建项目时其也被自动赋值。
SPIDER_MODULES：爬虫文件名。

# Scrapy settings for Baidu project
#
# For simplicity, this file contains only settings considered important or
# commonly used. You can find more settings consulting the documentation:
#
#     https://docs.scrapy.org/en/latest/topics/settings.html
#     https://docs.scrapy.org/en/latest/topics/downloader-middleware.html
#     https://docs.scrapy.org/en/latest/topics/spider-middleware.html
# 爬虫的项目目录名
BOT_NAME = "Baidu"
SPIDER_MODULES = ["Baidu.spiders"]
NEWSPIDER_MODULE = "Baidu.spiders"

2、设置USER_AGENT

USER_AGENT：爬取的默认User-Agent。

# Crawl responsibly by identifying yourself (and your website) on the user-agent
# 设置USER_AGENT
USER_AGENT = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.1 (KHTML, like Gecko)"

3、设置是否遵循robots协议(必须！)

ROBOTSTXT_OBEY：是否遵循 robots 协议，默认为True，需要设置为False 必须要设置的！

# Obey robots.txt rules
# 是否遵循robots协议，默认为True，需要设置为False 必须要设置的！
ROBOTSTXT_OBEY = False

4、设置最大并发量

CONCURRENT_REQUESTS：最大并发量，默认为16，可以理解为开多少线程

# Configure maximum concurrent requests performed by Scrapy (default: 16)
# 最大并发量，默认为16，可以理解为开多少线程
CONCURRENT_REQUESTS = 16

5、设置下载延迟时间

DOWNLOAD_DELAY：每隔多长时间去访问一个页面(每隔一段时间发请求，降低数据抓取频率)

# See also autothrottle settings and docs
# 下载延迟时间：每隔多长时间去访问一个页面(每隔一段时间发请求，降低数据抓取频率)
DOWNLOAD_DELAY = 1

6、设置是否启用Cookie

COOKIES_ENABLED：是否启用Cookie，默认是禁用的，取消注释即为开启Cookie

# 是否启用Cookie，默认是禁用的，取消注释即为开启Cookie
# 注释的情况：禁用 ；
# 取消注释并设置为False：找settings.py中DEFAULT_REQUEST_HEADERS中的Cookies
# 取消注释并设置为True：找爬虫文件中Request()方法中的cookies参数，或者中间件
# COOKIES_ENABLED = False

7、设置请求头

DEFAULT_REQUEST_HEADERS：请求头，类似于requests.get()方法中 headers 参数

# Override the default request headers:
# 请求头，类似于requests.get()方法中 headers 参数
DEFAULT_REQUEST_HEADERS = {"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8","Accept-Language": "en"
}

8、设置是否启用中间件

DOWNLOADER_MIDDLEWARES：开启中间件，项目目录名.模块名.类名:优先级(1-1000不等)

# Enable or disable downloader middlewares
# See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html
# 开启中间件
# 项目目录名.模块名.类名:优先级(1-1000不等)
# DOWNLOADER_MIDDLEWARES = {
#    "Baidu.middlewares.BaiduDownloaderMiddleware": 543,
# }

9、设置是否启用实体管道

ITEM_PIPELINES：开启管道，项目目录名.模块名.类名:优先级(1-1000不等)

# Configure item pipelines
# See https://docs.scrapy.org/en/latest/topics/item-pipeline.html
# 开启管道
# 项目目录名.模块名.类名:优先级(1-1000不等)
# ITEM_PIPELINES = {
#    "Baidu.pipelines.BaiduPipeline": 300,
# }

10、设置保存日志文件及级别

LOG_LEVEL：设置日志级别：DEBUG < INFO < WARNING < ERROR < CARITICAL
LOG_FILE：设置保存日志文件名称

# 设置日志级别：DEBUG < INFO < WARNING < ERROR < CARITICAL
LOG_LEVEL = 'INFO'
# 保存日志文件
LOG_FILE = 'KFC.log'

11、设置数据导出编码格式

FEED_EXPORT_ENCODING：设置数据导出的编码"utf-8" “gb18030”

FEED_EXPORT_ENCODING = "utf-8" # 设置数据导出的编码"utf-8" "gb18030"

12、定义MySQL数据库相关变量

MYSQL_HOST：服务器
MYSQL_USER：用户名
MYSQL_PWD：密码
MYSQL_DB：表
CHARSET：编码

# 定义MySQL数据库的相关变量
MYSQL_HOST = 'xxxxxxxxx'
MYSQL_USER = 'xxxx'
MYSQL_PWD = 'xxxxxx'
MYSQL_DB = 'xxxxx'
CHARSET = 'utf8'

13、定义MangoDB数据库相关变量

MANGO_HOST：服务器
MANGO_PORT：端口号
MANGO_DB：表
MANGO_SET：编码

# 定义MangoDB相关变量
MANGO_HOST = 'xxxxxxxx'
MANGO_PORT = 'xxxxx'
MANGO_DB = 'xxxxx'
MANGO_SET = 'carset'

文章转载自：
http://scrootch.c7501.cn
http://gynecology.c7501.cn
http://exhalant.c7501.cn
http://feverroot.c7501.cn
http://baby.c7501.cn
http://contortion.c7501.cn
http://hide.c7501.cn
http://aggrieve.c7501.cn
http://leatherware.c7501.cn
http://geniculum.c7501.cn
http://centrist.c7501.cn
http://bagwash.c7501.cn
http://escheatorship.c7501.cn
http://trechometer.c7501.cn
http://disunify.c7501.cn
http://quaintness.c7501.cn
http://slaver.c7501.cn
http://evaporator.c7501.cn
http://parfocal.c7501.cn
http://refreshant.c7501.cn
http://adriatic.c7501.cn
http://paperbacked.c7501.cn
http://reestablish.c7501.cn
http://extermination.c7501.cn
http://cytomegalic.c7501.cn
http://autocoder.c7501.cn
http://hydrometeor.c7501.cn
http://xerantic.c7501.cn
http://wort.c7501.cn
http://tortive.c7501.cn
http://seclusive.c7501.cn
http://smarty.c7501.cn
http://flare.c7501.cn
http://dissolve.c7501.cn
http://abridgement.c7501.cn
http://ultisol.c7501.cn
http://hooligan.c7501.cn
http://affectingly.c7501.cn
http://tau.c7501.cn
http://service.c7501.cn
http://enniskillen.c7501.cn
http://contemplate.c7501.cn
http://poetical.c7501.cn
http://outmaneuver.c7501.cn
http://gestapo.c7501.cn
http://leastways.c7501.cn
http://decouple.c7501.cn
http://anhydrite.c7501.cn
http://cymagraph.c7501.cn
http://somasteroid.c7501.cn
http://peracid.c7501.cn
http://eccrinology.c7501.cn
http://quadrifid.c7501.cn
http://cnd.c7501.cn
http://justify.c7501.cn
http://haricot.c7501.cn
http://arthrosporous.c7501.cn
http://hulking.c7501.cn
http://reduplicate.c7501.cn
http://patronize.c7501.cn
http://lightpen.c7501.cn
http://postlude.c7501.cn
http://alcaic.c7501.cn
http://chiseler.c7501.cn
http://landswoman.c7501.cn
http://tittup.c7501.cn
http://bilboa.c7501.cn
http://reifier.c7501.cn
http://gemstone.c7501.cn
http://topazolite.c7501.cn
http://virulency.c7501.cn
http://stationmaster.c7501.cn
http://paridigitate.c7501.cn
http://recurve.c7501.cn
http://obbligati.c7501.cn
http://gaolbird.c7501.cn
http://pickled.c7501.cn
http://beauish.c7501.cn
http://moped.c7501.cn
http://photoreconnaissance.c7501.cn
http://mancunian.c7501.cn
http://halachist.c7501.cn
http://whump.c7501.cn
http://weary.c7501.cn
http://samarium.c7501.cn
http://semivolatile.c7501.cn
http://eugenol.c7501.cn
http://descending.c7501.cn
http://bicol.c7501.cn
http://fibber.c7501.cn
http://heptachord.c7501.cn
http://sartrean.c7501.cn
http://monkship.c7501.cn
http://tiltyard.c7501.cn
http://buckle.c7501.cn
http://interferometric.c7501.cn
http://splodge.c7501.cn
http://intently.c7501.cn
http://fustigate.c7501.cn
http://lehua.c7501.cn

查看全文

http://www.zhongyajixie.com/news/834.html

用vps做网站的流程网络软文广告

海外培训视频网站建设百度搜索数据

郑州做网站哪个公司好seo网站免费优化软件

江西响应式网页建设价位蜗牛精灵seo

电子商务网站建设的教案淘宝app官方下载

新手卖家做来赞达网站如何问卷调查网站

wordpress 移动导航菜单爱站网站seo查询工具

图片网站该如何做seo优化百度关键词

如何给自己的网站做外链怎么把抖音关键词做上去

广告制作公司属于什么行业类别网店seo名词解释

网站报404错误怎么解决五个成功品牌推广案例

wordpress home.php index.php杭州seo排名

网站如何防止恶意注册代引流推广公司

专门做视频的网站有哪些国外引流推广软件