阅读(2422) 赞(0)

BeautifulSoup能和Scrapy一起使用吗？

2021-06-11 15:33:08 更新

是的你可以。如上所述：ref：above <faq-scrapy-bs-cmp>，`BeautifulSoup`_可用于解析Scrapy回调中的HTML响应。您只需将响应的主体提供给``BeautifulSoup``对象，并从中提取所需的任何数据。

下面是一个使用BeautifulSoupAPI的蜘蛛示例， lxml 作为HTML解析器：

from bs4 import BeautifulSoup
import scrapy


class ExampleSpider(scrapy.Spider):
    name = "example"
    allowed_domains = ["example.com"]
    start_urls = (
        'http://www.example.com/',
    )

    def parse(self, response):
        # use lxml to get decent HTML parsing speed
        soup = BeautifulSoup(response.text, 'lxml')
        yield {
            "url": response.url,
            "title": soup.h1.string
        }

注解

``BeautifulSoup``支持几种HTML / XML解析器。请参阅“BeautifulSoup的官方文档”，了解哪些可用。

← Scrapy与BeautifulSoup或LXML相比如何

Scrapy是否从Django“窃取”X？ →

BeautifulSoup能和Scrapy一起使用吗？

推荐文章

推荐教程

最新教程