Hello everyone,

So beside inspecting all the XML HTTP REQUESTS to get the restaurants details i'm gonna show another technique to get mostly all the same information's (name, address, city, country, phone number ...) of course there is no benefit for choosing between the technique I showed in the (Airbnb Spider lecture) over the one i'm gonna explain in this article("all roads lead to Rome").

Alright, let's take the Red Rooster restaurant as an example:

Now you may think that we can use a simple scrapy.Request(url='RedRooster', callback=self.parse_id)and XPath to extract all the data we want, unfortunately no you can't because not all the restaurant details page share the same structure so it's almost impossible to follow this approach.

Alternatively what I found interesting is that Airbnb within each restaurant detail page we have a script tag (within the head node) which contains the restaurant details as a JSON object as the image below shows.

Now reading a JSON object like this is a pain, so just copy it and then open this website(JSON FORMATTER) paste in the JSON object and voila:

This object contains nearly all the data we want (ID within the URL, address, latitude, longitude, description, phone number ..) except the Restaurant website.

I know you can ask how did you know that there is a script node within the head of the HTML document which contains the JSON object ? Well honestly I didn't know at first I was just reading their HTML markup and I found it.

----------------------

Now if you want to follow this approach you must sent a request to:

yield scrapy.Request(url='https://www.airbnb.com/restaurants/{0}?adults=0&children=0&infants=0&guests=0&source=p2&currentTab=restaurant_tab&searchId=413ccc3a-c327-4154-9cda-3fa35af46cf1&sectionId=2a6256c2-4eb1-4d4a-920d-1d195d30cd86&pdpReferrer=1&searchContext%5Bsearch_id%5D=0c9207cf-5506-4dba-abff-3da6fc546ac3&searchContext%5Bfederated_search_id%5D=fe4877f9-a449-49cf-932e-5f02c47e7eb7&searchContext%5Bmobile_search_session_id%5D=&searchContext%5Bsubtab%5D=9'.format(restaurant.get('id')), callback=self.parse)

and your parse method should like this:

def parse(self, response):
    script = response.xpath("//script[@type='application/ld+json']/text()").extract_first()
    restaurant = json.loads(script)
    yield {
          'title': restaurant.get("name"),
          'type': restaurant.get("servesCuisine"),
          'description': restaurant.get("description"),
          'place': {
                "address": restaurant.get("address").get("streetAddress"),
                "city": restaurant.get("address").get("addressLocality"),
                "country": restaurant.get("address").get("addressCountry"),
                'latitude': restaurant.get("geo").get("latitude"),
                'longitude': restaurant.get("geo").get("longitude")
            },
            'phone_number': restaurant.get("telephone"),
            #website: can't be extracted 
        }

So this was another technique you can use if you have have any questions please go ahead and ask me in the Q&A.

Kind regards,
Ahmed.