As a small marketing team or solo founder, it can be challenging to continuously publish high-quality, optimised content while balancing all your other responsibilities. One way to streamline and scale up your content production is through a programmatic approach using web scraping and templating tools.If you’re curious how to start seo marketing, check out programmatic seo below in my quick start guide.
In this post, I’m going to share how I leveraged Apify to scrape product comparisons from ProductHunt and quickly generate template-based blog posts targeted at relevant keywords. This allowed me to massively increase my content output with minimal ongoing effort.
Finding Keywords Through Head Terms
The first step is to identify topics and keywords to target in your content. A great source is looking at “head terms” – the types of questions people are searching for answers to. For a B2B SaaS company, some common head terms include-
- “Best alternative to [competitor name]”
- “What is the difference between [two similar tools]”
- “[Tool name] vs [similar tool]”
- “How to [common task] in [your tool]”
You can find ideas by searching Google yourself or looking at related searches and questions in the SEMRush keyword topic map. The goal here is to target searches with clear buyer intent.
Where to find programmatic SEO datasets?
From your product….
If you have an existing product, you likely already have databases of information like price points,reviews, product features,etc. Extracting and anonymising subsets of this structured data is a great starting point for experiments.
Public Data Sources
Many government agencies and non-profits publish open datasets that can be leveraged. A few ideas
- site: example.com “whatever you’re looking for” csv
- Amazon Web Services Public Data Sets
- Kaggle (larger dataset exchange)
- searching github
Scraping Public Websites
You could build scrapers using Apify or similar tools to extract structured data from sites like:
- Product/service comparison sites
- News/article databases
- Reddit, X etc
- Public APIs/endpoints
Example Domain CSV
Searching “site:example.com csv” yielded some sample datasets already hosted on the domain that could be built upon.
Want a dataset of ProductHunt tools to use? See here
Does Keyword Volume Really Matter?
You’ll often hear that you need high search volumes for keywords to be worth targeting. For an early-stage company, I think this emphasis on volume is misguided. Stories and comparisons that answer people’s questions can still be found and shared, even for lower-volume keywords, as long as the content is high-quality. Over time, this content-building approach can help expand your brand awareness and trust with the types of buyers searching those long-tail keyword phrases.
Using Web Scraping for Content Scaling (ProductHunt to Template)
Using Coefficients’ Chrome Extension- I was able to test GPT4 at scale
Prompting (for variable descriptions)
With keyword topics identified, I turned to Apify to automate the content generation process. I built an Apify Actor that used their Puppeteer Crawler library to scrape product pages from ProductHunt, specifically looking for releases that compared two similar tools.
It extracted key details like the product names, quick descriptions, where they differed, and a summary. I then used Apify’s Cheerio scraper to pull any related images. This data was exported to a JSON file.
From there, I created a WordPress template that dynamically pulled in the comparison points from the JSON into formatted blog posts. A few minutes of tweaking the template design, and I had a scalable workflow. (YMMV, this took what felt like an age of hair pulling)
From there, it was just a matter of scheduling the Apify Actor to run weekly and generate new comparison blog posts targeting the identified head terms.
Levels of complexity in programmatic SEO
There’s a few ways to go about it, most commonly, no-code tools to test and start, with template-based systems. In-house programming and scripts only once you’ve finessed your approach.
For companies with a dev team that has capacity, building fully custom scripts and platforms allows maximum control .
Great for complex B2B use cases, but not realistic for early-stage solopreneurs.
Platforms like Zapier, Make and Apify allow founders to connect APIs and automate tasks without coding.
Low barrier to start experimenting and refine processes. Still requires manually validating output.
Leverage CMS plugins (like whalesync) or build template sites that pull data via API. Allows non-technical teams to scale up with guidance. Manual A/B testing of template designs and pages is still important.
Managed Content Partners
Outsource to partners who handle implementation and hosting. Removes upfront time cost but manual oversight is still needed for content quality control.
Regardless of the level of “programmatic”, the early stages should involve a lot of testing – try different approaches, analyse results, refine processes.
With some fine-tuning, you too can leverage web scraping and templating tools like Apify to massively scale up your content production.
This is just a beginning guide of what to start with, more content coming soon. You may enjoy Learnings from my dumpster fire of a pSEO site