乐知付加密服务平台

如果你有资源, 平台可以帮你实现内容变现, 无需搭建知识付费服务平台。

点击访问官方网站 https://lezhifu.cc

扫码关注公众号 乐知付加密服务平台-微信公众号
爬虫相关问题 | chenzuoli's blog

爬虫相关问题

记录一下爬虫网站的时候的一些问题。

  1. scrapy shell报错:twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost
1
scrapy shell  https://www.xxx.com/
解决方法一:尝试把www去掉就可以了
解决方法二:尝试模仿浏览器访问,修改请求的user-agent,编辑settings.py
    
1
2
3
4
5
DEFAULT_REQUEST_HEADERS = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36'
}
  1. 输出爬取结果时,中文乱码,显示的是unicode码

    1
    scrapy crawl kanping99 -O kanping99.jsonl

    unicode
    -s设置输出的编码:

    1
    scrapy crawl kanping99 -O kanping99.jsonl -s FEED_EXPORT_ENCODING=utf-8
  2. connection refused

    1
    2
    3
    4
    5
    6
    2024-04-28 14:07:36 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
    2024-04-28 14:07:36 [scrapy.core.engine] INFO: Spider opened
    2024-04-28 14:07:36 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.xxx.com/3-1.html> (failed 1 times): Connection was refused by other side: 61: Connection refused.
    2024-04-28 14:07:37 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.xxx.com/3-1.html> (failed 2 times): Connection was refused by other side: 61: Connection refused.
    2024-04-28 14:07:37 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET https://www.xxx.com/3-1.html> (failed 3 times): Connection was refused by other side: 61: Connection refused.

https换成http,www去掉试试。
我把https换成http后ok了。


欢迎关注微信公众号,你的资源可变现:【乐知付加密平台】
乐知付加密平台

一起学习,一起进步。

-------------本文结束感谢您的阅读-------------