How do I extract all links from a web page with Javascript?

hero image blog

What is web scraping?

Web scraping is a term for various methods used to gather information on the web.

In general, this operation is done using tools or codes that simulate human web browsing in order to gather specific pieces of information on different websites.

This information is collected and then exported in formats that are more useful for users: whether it's a spreadsheet (XLS, CSV, etc.) or an API.

READ MORE: What is web scraping? -->

Is web scraping legal?

This answer is not as simple as it seems, it is above all a question of ethics.

Depending on the case, you may or may not end up legally. You should pay attention to:

  • data types,
  • of their use,
  • but also the method of collection through programming languages (or web scraping tools).
READ MORE: Is web scraping legal? -->

What is Javascript?

JavaScript is a text-based programming language that is used on both the client side and the server side.

It makes it possible to make web pages dynamic (unlike HTML and CSS, which mainly give the structure and style of web pages)

To summarize briefly, JavaScript adds behavior to web pages: for our part, we are going to use it to dynamically collect data from a web page in a few seconds because of its function library.

What do you need for this project?

web browser
image source: PCMAG
  • Literally any navigator made in the last 10 years.
  • The snippet of code that I'm going to provide further down the page.

What is the principle of collecting links from a web page with Javascript

Extracting and cleaning data from websites and documents are daily tasks.

I like to learn how to systematically extract data from multiple web pages and even multiple websites using Python and/or Web Scraping tools.

But sometimes, a project only requires a small amount of data, coming from a single page on a website. Previously, when a case like this occurred, I would always launch my Developer Interface and write and execute a script to extract this information. It's the same as using a mass to break a nut.

The good old JavaScript is powerful enough to extract information from a single web page.

This JavaScript code in question can be executed in the browser development console in a matter of seconds.

In this example, I extract all the links from a web pageb, because it's a task that many like me regularly perform on web pages.

However, this code would work just as well to extract any other type of element in HTM documentsL, with a few minor changes.

When this code runs, it opens a new tab in the browser and creates a table containing the text of each hyperlink and the link itself.

Javascript code and how it works

Open your browser and go to the page you want to extract links from

Go to the target page

In our case, we will use the Blog page on this website as an example. - this is a page that contains a significant amount of links.

All that's left for us to do is open the developer console and run the code.

Open the browser console

To open the developer console, you can right-click on the page and select “Inspect” or “Inspect Item”:

Once done, you should appear in your browser (at the bottom or on the side), the HTML code of the page. Don't worry at this point, even if you don't understand what's on the screen, that's okay.

Page source code

Execute the code in the browser console

Now click on the Console tab here circled in red to make the browser console appear as its name suggests.

Console du Navigateur
Browser console

Once on the Console, it may not be empty and there may be lots of things written. Once again, don't worry, you can empty it by deleting the history to start with a nice “clean” section.

To erase the contents of the console, right-click and clear the history (or “Clear History” in English)

Effacer le contenu de la console
Erase console content

From now on you have almost reached the end of the method.

You're going to copy the code snippet below to your console.

Javascript code

Here it is in text format for easy copy/paste.

var x = document.querySelectorAll (“a”);
var myarray = []
for (var i=0; i<x.length; i++) {
var nametext = x [i] .textContent;
var cleantext = nametext.replace (/\ s+/g, '') .trim ();
var cleanlink = x [i] .href;
myarray.push ([cleantext, cleanlink]);
};
function make_table () {
 var table = <table><thead><th>'Name</th> <th>Links'</th></thead><tbody>;
 for (var i=0; i<myarray.length; i++) {
 table += '<tr><td>'+ myarray [i] [0] +'</td> <td>'+myarray [i] [1] +'</td></tr> ';
 };

 var w = window.open (“”);
w.document.write (table);
}
make_table ()

As in the example below, you only need to place the piece of code in the console.

Collez le code dans la console
Paste the code into the console

Retrieve the result of the Javascript code

This last step is the easiest. Then all you have to do is press the enter key.

Appuyez sur Entrée
Press enter

This will open a new tab in your browser with a table containing all the link texts and hyperlinks for the web page you have chosen.

Résultat du collecteur de liens en javascript
Result of the link collector in javascript

This table can then be copied and pasted into a spreadsheet or document to be used as you see fit.

What does this Javascript code do step-by-step to collect data from the web?

Here's a breakdown of the code and what each aspect does.

Step 1: Declaring variables

Here, we find all the “a” elements on the page (the a elements are links) and assign them to the variable x. Then we create an array variable, but we leave it empty for now.

Détails du code Javascript (1/3)
Javascript code details (1/3)

Step 2: Loop over all the links

We then loop over all of the “a” elements in x, and for each element, we try to find the text content of the element and the link.

For textual content, we replace white space with simple space and cut the text, as there may be large amounts of white space that would make our table unreadable.

Détails du code Javascript (2/3)
Javascript code details (2/3)

Step 3: Creating the Table

  1. We then create the array using the “make_table ()” function.
  2. This function creates a variable, table, with the start of an HTML table and the table headers.
  3. We then use a “for” loop to Add table rows containing link text and hyperlinks.
  4. Then we then open a new window using “window.open ()” and write the HTML table into that window using “document.write ()”.

Détails du code Javascript (3/3)
Javascript code details (3/3)

Disadvantages of this Javascript code for collecting data on the web

There is a downside to the current code - it will take ALL the links on a page. This means all the links in the menus, all the internal links that take you to other pages on the website.

You could be more specific and look for all the “a” items in certain areas of the web page. In the case of the current page, simply edit the first line of code for the querySelector to examine a targeted area of the page.

To do this, right-click on the area of the page whose links you want to remove and click on “Inspector.”

We can simply edit the first line to find all the “a” items in the item with the “class”class name“:

var x = document.querySelectorAll (“.class name a”);

What to use on a daily basis to do Web Scraping

To go quickly, this type of useful code, Javascript is very useful.

If you have development skills

However, I mainly use Python in the vast majority of my work as a Web Scraping, but it's useful to have a quick and easy way to extract information from a web page without needing to open applications other than the browser.

READ MORE: How do you collect data on the web with Python? -->

If you don't have development skills

1) Use web scraping tools

There are many tools that are very practical for extracting data from the web.

In addition, data can easily be exported in the form of spreadsheets (XLS, CSV, etc.) or an API.

Although Web Scraping can be done by hand (by copying and pasting), in most cases, these tools are less expensive, free of human errors, and allow for the collection of significant amounts of data.

2) Learn development

When I talk here about learning development, this does not necessarily mean becoming an expert but knowing how to understand at least one code, redesign it or write it in part if necessary.

To do this, training courses for all budgets are available on Udemy.

Je Take at least 1 course myself per month to keep me informed and improve my skills in many areas.

Good training and happy web scraping to you.!

profil auteur de stephen MESNILDREY
Stephen MESNILDREY
CEO & Founder

🔍 My passion? Decipher, analyze and share powerful strategies, cutting-edge software and new tips that boost your business and revolutionize your sector.

Want to stay on the cutting edge? You are at good place ! 💡

📩 Subscribe to my newsletter and receive every week :

  • Practical advice to reinvent your business, optimize your productivity and stimulate your creativity
  • Privileged access to new strategies
  • 100% content EXCLUSIVE to share with you
  • 0% things to sell to you

The adventure has only just begun, and it promises to be epic! 🚀

For daily insights and real-time analytics, follow me on Twitter 📲

⚠️ IMPORTANT: Some links may be affiliated and may generate a commission at no additional cost to you if you opt for a paid plan. These brands - tested and approved 👍 - contribute to maintaining this free content and keeping this website alive 🌐
Table of contents
>
Share this content