GSheet Script to Extract Title, and H1 to H6 Headings from a List of URLs

Just a quickie today!

I’ve been busy planning out articles for a new site in a niche I know little about…yet! (Yes, the case study site of mine and Jasons.)

When I know very little about a topic, I sometimes I use a little Gsheet script I had written that scrapes and lists the <title> tag, H1 heading, and then all H2 to H6 headings for a list of URLs I enter into a spreadsheet.

This helps me easily see what topics and sub-topics the already ranking articles are covering, so I can cover them too 😉

I thought it would be a nice little tool to give away, something that may be useful to others.

What Does the Sheet do?

Quite simply:

  • Copy a list of URLs into column A
  • Run a script
  • The <title> tag, H1 heading, and all H2 to H6 tags for each URL will be extracted and put into a column per URL.

Example here:

Why is This Useful?

Let’s say you are going to write an article. In the example I am using, it’s on ‘best garlic presses.’

Google ‘best garlic presses’, and open the top X results.

Find pages that are of the type you are aiming to create: A best X roundup article, on a content site. Exclude forums, ecom sites, etc.

Create a list of URLs from the results, and enter them into column A of the sheet I am giving here, then run the script.

You will now have a list of the <title> tag, H1 tag, and every H2 to H6 tag, of every URL that you entered into column A.

This shows you every topic and sub-topic that all the top ranking results – The sites that google has decided are most relevant for the query – have included in their content.

Use this info to take a little from site A, a little from site B, a little from site C etc. To build the structure of your own article, making sure you cover everything that the top results do. After all, google has shown (and hence users have shown) that these topics are what they want to see and will rank on the first page for your chosen keyword!

So, it’s a little time saver, and a reasonably good one IMO, when you are researching and creating a brief and outline for an article.

Video Walkthrough

Here is a very quick video overview of what the tool does, and how I tend to use it:

Step by Step Instructions for Use

Download the Following Sheet

First of all, follow this link and then save a copy to your own google account before you can use it.

<<Copy Gsheet with Script to Extract Title and Headings>>

Click ‘copy’ and it’s yours.

Copy a List of URLs into Column A

Exactly as the title says, copy a list of URLs into column A – THERE MUST BE NO GAPS IN THE LIST! It must be a contiguous list, with no empty cells within it.

Make sure there is no header! The list must start in cell A1.

Also, they must be full URLs, and result in a 200 response. If the URL is broken, or does not lead to a web page when clicked, it will break the script. I couldn’t be arsed to get error catching and workarounds coded in 😉

Run the Script

After pasting your list of URLs into column A, click in the main menu: Scripts -> Parse Pages.

Note: It can take a couple of minutes for the ‘scripts’ menu item to appear, but it will appear. Be patient!

It will ask you for authorization:

Click continue.

Next, select the Google account you wish to use:

Then click the small ‘advanced‘ link:

After that, click ‘Go to Parse Pages(unsafe)‘ – Google like to make it as hard as possible! :p

Finally, click the ‘Allow’ button.

You will have now successfully run the script.

Sometimes it Breaks – What to Do?

Yep, unfortunately it doesn’t work sometimes.

Occasionally when a site has a weird structure, or is badly coded, then the script will break. It will throw some sort of error…and I forget how it’s worded.

But the script will break, you get a red warning message from Google, and you will need to deal with it.

I could have had error handling coded into it, but I didn’t bother. Instead what i do is:

  • You have a list of URLs, and when the script runs successfully you will have a column per URL of it’s title and headings.
  • When the script breaks, the first URL in the list WITHOUT a column, is the URL on which it broke.
  • Look at which URLs have run successfully, and delete first URL that the script did not complete for.
  • Make sure you delete the cell, not just the content, as for the script to run there must be no gaps, no empty cells in the list.
  • After this, re-run the script and it will be OK for the remainder.
  • The URL you deleted you will have to scrape manually.

For example: If the script threw an error and I could see the following:

I can see that URL1, thekitchn.com was successfully scraped, and that the first URL that DI NOT get scraped before the error was thrown, is URL2 ‘businessinsider.com.’

So I need to delete cell A2 with Businessinsider.com in it, and then run the script again:

It now works for the remainder.

How Many URLs does this Work for?

I’ve no idea. I’ve done up to 20, and it’s good enough for that. I usually only do 5 to 10.

Have a play and let me know if you find an upper limit 😉

That’s About it!

So that’s it: Script given, instructions on how to use it provided. I’ve nothing more to say!

If you have any comments or questions, do leave them in the comment section below. I will always try to answer as best as I can.

Cheers!

Leave a Comment