This script was basically the concept for a similar WP Plugin, which automatically counts the amount of all single product ratings in each category and writes the correct amount of total reviews in a category on the category pages "aggregate rating" Schema.org Markup.
We had a case, where this was the optimal solution to display the correct amount of "aggregate rating" in "Recipe Rich Results" for a Foodblog/Recipe-Website.
As for today, Google does not seem to give to much attention, but there are indicators showing, the math is getting more important.
This script extracts the total number of reviews from all categories listed in a sitemap and saves the results to a file. It is specifically designed to work with webpages where review counts are displayed in a specific format (e.g., "(123)").
Run the Script: Replace placeholders (https://example.com/category-sitemap.xml) with the actual URL of the sitemap. Execute the script in a Python environment.
Output: The total reviews per category are saved in result.txt.
Python libraries: requests, beautifulsoup4, re.
Review Format: This script is suitable for webpages where the number of reviews is enclosed in parentheses, such as "(123)". It uses a regular expression to identify and extract these numbers.
# scrape_review_count.py
# Author: Christopher Hüneke
# Date: 04.08.2024
# Description: This script extracts the total number of reviews from all categories listed in a sitemap and saves the results to a file.
# Description: It is specifically designed to work with webpages where review counts are displayed in a specific format (e.g., "(123)").
import requests
from bs4 import BeautifulSoup
import re
# Function to get the total number of reviews from a category URL
def get_total_reviews(url):
total_reviews = 0
page_number = 1
review_pattern = re.compile(r'\((\d+)\)')
while True:
page_url = f"{url}/page/{page_number}/" if page_number > 1 else url
response = requests.get(page_url)
if response.status_code == 404:
break
soup = BeautifulSoup(response.content, 'html.parser')
page_reviews = soup.find_all(string=review_pattern)
if not page_reviews:
break
for review_text in page_reviews:
match = review_pattern.search(review_text)
if match:
total_reviews += int(match.group(1))
page_number += 1
return total_reviews
# Main function to process the sitemap and extract reviews for each category
def main():
sitemap_url = 'https://example.com/category-sitemap.xml' # Replace with the actual sitemap URL
response = requests.get(sitemap_url)
soup = BeautifulSoup(response.content, 'xml')
categories = soup.find_all('loc')
results = []
for category in categories:
category_url = category.text
category_name = category_url.split('/')[-2]
print(f"Processing category: {category_name}")
total_reviews = get_total_reviews(category_url)
results.append(f"{category_name}: {total_reviews} reviews\n")
with open('result.txt', 'w', encoding='utf-8') as file:
file.writelines(results)
print("Results saved to result.txt")
if __name__ == '__main__':
main()