Screaming Frog + Excel Macros = Internal Link Audit

In this post, I’m giving out an excel spreadsheet with some macros and a custom menu, that I use for ‘partial link audit’ purposes.

It is a set of Excel macros, that parse and format data exported from Screaming Frog crawls, to help with performing an internal link audit by having the data at hand in an easy to read format for making optimizations.

Example Sheet and Video Walkthrough

You can grab the sheet here, that is filled with some data after running an example (random) site through it.

<< DOWNLOAD: Screaming Frog and Excel Link Audit Tool >>

When you open this link, download it! It’s useless in the web viewer. It is a macro enabled Excel spreadsheet, that MUST be viewed in excel.

Details of what that sheet is, how it works, why it’s useful, and how to use it can all be found in the following ramblings of a madman article.

For a ‘bottom line up front’, here is a quick video demonstration of what you get with this:

Why I Made This Tool

I made this tool because I like to use excel for data manipulation, and regularly use other tools for data gathering, but excel as the primary tool to work with.

Screaming Frog is an excellent tool. I use it often for many things. But its internal linking reports leave a lot to be desired.

When you crawl your site with Screaming Frog and look at internal links, it contains all header, footer, sidebar, related post, and nav links. All of them.

This makes it very difficult to separate out ONLY the links you have placed within content, which are the ones I’m most concerned with when doing a link audit.

So, many moons ago I took it upon myself to write some macros that take the data you can export from screaming Frog, and present it in a better, more usable fashion, concentrating only on links that are placed within article content (or the main body at least.)

Use Cases

With the data this tool and process gathers, you can:

  • Find and fix 3xx, 4xx, 5xx, and more, for internal and external links.
  • Easily see which pages have zero, too few, or even too many internal or external links.
  • See at a glance the state of all rel tags (nofollow, sponsored, UGC, ‘open in new tab’) for all internal and external links, and have the URLs at hand ready to open to change them if necessary.
  • See the anchor texts of all links on your site, so you can see if they are under – or over-optimized, and then optimize them.
  • See all amazon links, their rel tags, if open in new window or not.
  • …a few more things besides

Who Will this Work For – Pre-Requisites and Required Tools

To my knowledge, this will only work for those who have:

  • Paid for Screaming Frog – because the free version does not have ‘custom extractions.’
  • Windows users – Because Excel for MAC OS has differences in VBA code I know nothing about, and hence do not program for.
  • Latest version of Excel – Because that’s also what I work with and do not take into account backward compatibility. It would be a waste of my time.

So, this is of no use to many people, I know.

But for those of you who match the above requirements, you might see some benefit in using it. 😉

Yes, There are Other and Better Tools Out There

Before any complaints that this tool is ‘not as good as what’s already out there’, or that ‘other tools do it better’…

I don’t care.

I made this for me and the way I like to work. I’m merely giving it away in case anybody else finds it useful.

I know there are many tools you can use to audit and improve your internal linking. Ahrefs has some features for this, Website Auditor, Sitebulb, Spencer Haws ‘Link Whisper’, …and many more tools besides.

They may well be better. They certainly do a lot more. Feel free to carry on with the workflow you like to use, I’m not looking to change that.

This tool is just what I use when auditing internal linking, because I like the plain format, no bells and whistles style, raw data in a spreadsheet that I can manipulate with filtering, sorting, excel formulas, VLOOKUP’s, pivot tables and more.

Setting up Screaming Frog

We need two exports from screaming Frog for this tool:

  1. Bulk Export, All Inlinks – This report is generated for every crawl regardless. Nothing to do here, just export it after the crawl.
  2. Custom Extraction – This is a report we build by taking all HTML content found within ‘a tags’ from inside the main body copy of a website only, excluding header, footer, sidebar, nav, and so on.

I will show how to get this export in the next section. But first:

If you are looking at a small site, up to just a few hundred pages, you can simply create the custom extraction and let it rip.

However, if you have a large site of 1k pages plus, and certainly 2k or 3k plus, you will want to set up Screaming Frog to ignore A LOT of stuff, to cut down the crawl time by a huge amount.

Configuring the Spider for Large Sites

There’s A TON of stuff screaming Frog pulls that we’re not interested in for this task of only looking at links. So perform the following setting changes to speed up the process:

Open: Configuration – Spider.

Screenshot of Screaming Frog siper config menu

Go to ‘extraction’ and untick everything. Then click OK.

Unticked extraction options for Screaming Frog tool

This will save a ton of time in the crawl for larger sites.

Setting Up the Custom Extraction

Screaming Frog has a feature to ‘extract’ certain pieces of information from the page.

For example, you might want to extract all names from a table, or all product ASINs from a catalog page…or in this example, all ‘a tags’ on the page, but only from within the content.

The full extent of how custom extractions work, what you can do, and how, is well beyond this tutorial and not what it’s about.

Today, we are only going to look at one type of extraction, for which I will give you the exact setup.

Go to ‘Configuration – Custom – Extraction.’

Screaming Frog custom extraction menu

Once you have the options screen open, fill it in EXACTLY as you see here – with one exception!

The font I use on this site creates curly quotes, so if you copy and paste the xpath from this article into screaming frog, delete the two ” and retype them, they need to be straight ones! I’ll fix this at a later date somehow.

If you copy-paste into screaming frog from this article, and get a red X, delete the quotation marks, and retype them. You should then get a green tick.

  • Any name
  • Xpath
  • //div[@class=”content-class”]//a
  • Extract HTML element
Screaming Frog Xpath options

Notice the Xpath I’ve said to enter is:

//div[@class=”content-class”]//a

The “content-class” needs to be swapped for an actual class from the site you are going to audit. And the class you choose will be different from site to site.

We need to find a class that wraps the entire body content – and the body content alone – so that we only pull the ‘a tags’ from within the body content, and no links from the header, footer, sidebar, etc.

Finding Custom Extraction Class to Use

Here is a quick explainer video showing how you can find the correct class to use, using the site I randomly picked to use as an example for this article – https://thegunzone.com/

Steps are:

  1. Use Chrome browser (though Firefox has similar / same functionality)
  2. Open a typical article.
  3. ‘Right-click – inspect’ the first paragraph
  4. Looking in the ‘elements – HTML’ window, look for a class that when hovered over, highlights ONLY the article content, no header, sidebar, footer, etc.

So, the class I found to use for this example is: entry-content.

Therefore, in Screaming Frog custom extraction, I need to enter the following xpath for my example! (You need to find the right class for the site you are looking at):

//div[@class=”entry-content”]//a

Why the Entire ‘a’ Tag, Not Just the URL and Anchor?

Because I want all the data inside the ‘a tag,’ including classes, href (URL), anchor text, and more. This is all useful data when doing an internal link audit…as you shall see later on.

Start the Crawl

After you’ve set up the ‘Custom Extraction’ correctly, change Screaming Frog to the ‘custom Extraction’ tab view, by clicking the drop-down and then ‘Custom Extraction’ as seen in the following screenshot:

Choosing the custom extraction view in screaming frog

You can now start the crawl.

If you do not see the ‘Extraction’ column being filled in with A href data, you have messed up the XPath for the custom extraction and need to stop the crawl and reconfigure it.

When crawl hits 100%, Make Two Exports

Once the crawl has finished, there are two reports we need to export:

One: In the ‘Custom Extraction’ view, click export to get a ‘custom_extraction_all.csv’ file

exporting the custom data

Two: Click ‘Bulk Export’ then ‘All Inlinks,’ to export the ‘all_inlinks.csv’ file

Exporting all inlinks from screaming frog

Ready to Roll – Importing and Formatting the Data

It’s now time to get all the data into Excel and make it usable.

Make a copy of the ‘SF_Link_Audit_MASTER.xlsm’ file you downloaded at the start of this tutorial.

My recommendation is to leave the original untouched, store it somewhere, make a copy each time you want to use it.

Once you have a copy, you need to move into the ‘SF Link Audit’ menu I created in Excel.

Screenshot of excel menu

In there, you will see 5 menu items, each with a different Macro attached for a different task:

Five custom buttons made in excel

We will slowly work through 1 to 3, with 4 and 5 for later use.

Step 1: Create Necessary Sheets

To get the sheet ready, click the button: ‘1. Create Necessary Sheets

It will give a warning:

an excel message box up

And if you click ‘yes,’ it will delete ALL existing sheets, create 2 new ones, and tell you what to do next:

An excel message box pop up saying set up complete

Step 2: Copy Screaming Frog Exports Into Excel

You now have two sheets in excel, roughly called the same as the file names you exported from screaming Frog.

So do the following:

  1. Copy export ‘custom_extraction_all.csv’ Into sheet ‘Cust_Extr
  2. Copy export ‘all_inlinks.csv’ Into sheet ‘all_inlinks

You MUST copy the exports 100% in their entirety, and ensure they are pasted starting in cell A1 in the SF_Link_Audit_MASTER.xlsm file.

Step 3: Run’ 2. Parse all Data’ Macro’

With the data from Screaming Frog now copied into the spreadsheet, run the second macro: 2. Parse All Data.

This macro takes the ‘Custom Extraction’ data from Screaming Frog – which, let’s face it, is a bit of a mess – and makes it more useable to create summary sheets later.

You will get a quick warning, just click ‘yes’ to proceed:

excel message box pop up

WARNING! Depending on the amount of data, this can take a few minutes. DO NOT use your laptop while it’s running for anything else. Due to my shit programming, it can break the process (copy and paste is used, when it shouldn’t be…but I’m not going to change it 😉 )

I have programmed in some ‘status bar’ updates throughout the process, so you can see it’s working and not locked up to remove any doubts.

Check the bottom left corner of excel as it’s running to see remarks on its progress.

excel status bar message example

When finished, you get a message telling you to delete some data from the ‘All_Links’ sheet that’s been newly created:

Excel message box pop up

Step 4: Delete Noisy Data

When we take every ‘a tag’ from within body content, there will still be some elements you would like to have excluded, but due to the structure of the website and the class you had to choose for the extraction, could not be excluded.

For example, the data may include:

  • Table of contents jump links (usually starting with a #)
  • Social sharing icon links
  • Related posts that are placed into the article body content
  • Prev / Next / Read more links
  • Jump to recipe / print recipe links
  • ….etc.

What you want to delete will depend on your site and what links have been scraped, but you would like to ignore.

So, delete out all rows that you do not want in the final data.

Step 5: Run ‘3. Finalise Sheets’ Macro

Next, run macro ‘3. Finalise Sheets

Again, this macro may take some time if there’s A LOT of data. And again:

DO NOT USE YOUR COMPUTER WHILE THE MACRO IS RUNNING!

If you do, it may break the process, and you’ll have to start again. So be patient, let it run.

I have again programmed in ‘status bar’ updates you can see in the bottom left corner of Excel, so you can see it has not locked up and is still working through the process.

Red URLs in Internal Link Sheets – Are Relative URLs

Many sites use ‘relative URLs’, instead of absolute URLs:

  • https://domain.com/category/article/ = Absolute URL, including the domain.
  • /category/article/ = Relative URLs, without domain name included.

These are links pointing to the same page, and should be counted equal, but are different due to the absence of the domain in relative URLs.

To work around this in my sheets and summary tables, I add the domain at the beginning of the relative URLs, so it is counted in my processing as non-unique, but lumped in with the absolute address.

I give a ‘relative URLs’ count on the summary sheet, as well as color the domain name red in such links, so you can clearly see where they are in the sheets, and can sort the sheet by text color if you wish to find them all.

Step 6: Finished!

Once the macro in step 5 has completed, the process has finished, and we have the data we need to check on the internal linking of our site.

The Generated Sheets and Data Explained

Here is a quick rundown of each sheets content, what’s in them and why it’s useful.

Sheet: Cust_Extr (Ignore)

This is the untouched, raw data that you exported from Screaming Frog, in the ‘Custom Extraction’ export.

It’s very rarely useful, but just in case I have left it in here.

Sheet: Omitted (Ignore)

In here, you will find all the ‘a tags’ that you deleted at step 4.

Again, rarely useful, but a nice to have just in case, so I created this and left the data there.

Sheet: All_Links (Ignore)

This is the ‘parsed data’ from your custom extraction, minus what you deleted in step 4.

Once more, not very useful, but I leave it in there just in case.

Sheet: Rels_and_Status

The first USEFUL sheet!

In here is a general overview of the state of the links that you have inside of your content.

You get to see:

  • Number of internal links
  • Number of external links
  • How many of the links have nofollow, sponsored, UGC, noopener, noreferrer, and target_blank added.
  • How many links have 3xx, 4xx, 5xx codes, connection errors, etc.
  • Whether you have any ‘relative URLs’ in content.
  • Anchor text of 3 characters or less – This is almost always a mistake, and is where you may have accidentally linked a space, or linked a string of consecutive words individually instead of a sentence. Worth checking out and maybe fixing!
  • How many Amazon links you have, what rel tags they have, whether open in new or tab or not, and if any amazon links do not have an affiliate tag (potentially losing revenue!)
  •  How many Amazon short links you have, what rel tags they have, whether open in new tab or not.

You can use the above data as a good overview, to see first and most foremost:

  • If you have external links opening in the same window that you’d like to open in new (and the same with internal links)
  • Whether your internal and external links have correct rel tags.
  • How many 3xx, 4xx, 5xx status codes you have and that may need attention
  • If you have any Amazon links without nofollow tag, what rel tags they do use, if opening in a new window or not.

Sheet: Link_Summary

This sheet is a summary of every page on your website, with information beside it showing the sum total of:

  • Internal incoming links
  • Internal outgoing links
  • External outgoing links

NOTE: Some of the data is wrong in this sheet! This is unavoidable!

Pages that do not have the class you added to screaming frog custom extraction, will not have their links extracted.

This typically includes pages like home, category pages, tag pages, etc.

So yes, there may be some links missing. Know that going in 😉

But this is a tool I use to check links manually placed in content. The links from home, category and tag pages are generally speaking auto-generated by the theme.

These could be included by running a second crawl, with a second extraction, then combining the sheets. But hey, I’ve never felt the need, so have not and will not be doing this.

So anyway, this sheet is useful because it shows you at a glance – by sorting columns B, C or D low to high – or high to low:

  • Pages with no incoming internal links
  • Pages with no outgoing internal links
  • Pages with no external outgoing links (lack of references for your data)
  • Pages with too many external outgoing links (reducing them leaves more power for your internal links!)

Sheet: Int_In

This sheet shows every URL on your site, alongside the incoming internal links.

The sheet is sorted by target URL in the ‘To’ column, showing the linking page in the ‘From’ column.

Alongside each link is the anchor text or image alt used, as well as any rel tags and ‘open in new window’ or not.

There is also a ‘1st link class’ column, that can be used to help you identify WHERE in the page the link is found (button, shortcode, elementor section, etc.)

You can use this sheet to see how many links a page has, and what anchors are used, to easily see if it’s being linked to enough, and if the anchor texts used are under- or over-optimized.

Sheet: Int_Out

This sheet shows every URL on your site, alongside the outgoing internal links.

This is the same data as the ‘Int_In’ sheet basically, but sorted and grouped by the ‘From’ column, so you can see at a glance what internal outgoing links each page has.

Sheet: Ext_Out

This sheet shows every URL on your site, alongside the outgoing external links they have.

Once again, anchor or image alt text can be seen, along with rel tags, target_blank, etc.

Sheet: Status_Codes

This sheet shows you every link found in the content of your site, both internal and external, that does not result in a 200 HTTP status (these have been deleted.)

So you can see at a glance all the 3xx, 4xx, 5xx, blocked by robots, connection timeout etc. and can use this to clean up your sites HTTP statuses.

Reformatting

When you filter and sort any of the sheets, they lose their grouping and separation with the blank blue lines.

To fix this, I include a macro ‘4. Reformat sheets’, that reformats all the sheets back into their grouped and separated state.

Don’t use it too often, as remember it can take a while. But it’s useful to have.

Recalculating Summaries

The final macro: ‘5. Recalc Summaries’, is there to recalculate the summary sheets, followed by formatting all the sheets.

This one is useful if you decide to update the sheet as you go about your work.

For example, if you fix 404s, internal 301s, change an anchor text, add a link in or whatever: If you update the sheets to reflect this, by clicking ‘5. Recalc Summaries’, it will recalculate and reformat everything to take into account the changes and reflect it in the summaries.

Conclusion

So there you have it: My Screaming Frog and Excel Macro internal link auditing tool.

Hopefully, some of you find use in it? It’s my go-to tool whenever I want to do an audit of my internal linking.

Combined with the filtering, sorting, and data manipulation powers of excel, this sheet gives you good data in a reasonably well-presented manner to be able to spot flaws and makes changes to the internal linking on your site (with a little on external links to boot.)

Let me know if you have any questions!

Leave a Comment