Stepb by Step Guide: How to master IMPORTXML in Google Sheets

SEO can be challenging for beginners. Discover the best tools to simplify SEO and improve your website’s ranking with ease using this comprehensive guide.

Your next move? FREE and INSTANT access !

Automate without limits with Phantombuster

Recover bulk web data with Scraping Bot

Key Takeaways

Pourquoi utiliser IMPORTXML ?

La fonction IMPORTXML de Google Sheets permet d’automatiser l’extraction de données structurées depuis des pages web, sans aucune connaissance en codage. Elle est idéale pour les professionnels du marketing, les analystes de données et toute personne souhaitant scraper des informations sans outils complexes.

Fonctionnement et Syntaxe d'IMPORTXML

IMPORTXML(URL, XPath) : Récupère des données depuis une URL spécifiée, grâce à une requête XPath.
Utilisation courante pour extraire des données publiques telles que :
- //h1 : Récupère tous les titres H1.
- //@href : Liste tous les liens de la page.
- //meta[@name=’description’]/@content : Récupère la méta-description.

Conseils pratiques

Utilisez l’inspecteur de votre navigateur pour copier le XPath exact d’un élément.
Utilisez les DevTools de Google Chrome pour importer des XPaths dans Google Sheets.
Vérifiez toujours les conditions d’utilisation du site pour éviter des restrictions légales.

Applications Pratiques d'IMPORTXML

Analyse concurrentielle : Extraction de données SEO (mots-clés, méta-descriptions).
E-commerce : Collecte des prix et descriptions produits depuis des sites marchands.
Suivi de données financières : Taux de change, cours de la bourse.
Gestion de projet : Importation automatique d’informations pour des rapports en temps réel.

La fonction google sheets importxml peut être utilisée pour importer des données structurées à partir de différentes sources comme XML, HTML et TSV.

Forces et faiblesses de la fonction IMPORTXML

Catégorie	Points Clés
Avantages	Automatisation des tâches répétitives. Réduction des erreurs manuelles. Intégration de données en temps réel. Compatibilité avec divers formats structurés (HTML, XML, CSV, etc.). Gratuit et intégré nativement dans Google Sheets.
Limitations et Problèmes Courants	Incapacité à accéder aux sites nécessitant une authentification. Blocage potentiel par des sites utilisant des mesures anti-scraping. Erreurs dues à des requêtes XPath incorrectes ou à des modifications de la structure du site source. Ralentissement des performances lors de l'importation de grandes quantités de données. Absence de support pour le contenu dynamique chargé via JavaScript.

Catégorie

Points Clés

Avantages

Automatisation des tâches répétitives.
Réduction des erreurs manuelles.
Intégration de données en temps réel.
Compatibilité avec divers formats structurés (HTML, XML, CSV, etc.).
Gratuit et intégré nativement dans Google Sheets.

Limitations et Problèmes Courants

Incapacité à accéder aux sites nécessitant une authentification.
Blocage potentiel par des sites utilisant des mesures anti-scraping.
Erreurs dues à des requêtes XPath incorrectes ou à des modifications de la structure du site source.
Ralentissement des performances lors de l'importation de grandes quantités de données.
Absence de support pour le contenu dynamique chargé via JavaScript.

Meilleures Pratiques pour Optimiser IMPORTXML

Simplifiez vos requêtes XPath : Ne sélectionnez que les données nécessaires.
Automatisez avec des scripts Google Apps pour actualiser les données périodiquement.
Testez les XPaths avant de les insérer dans Google Sheets.

IMPORTXML est une solution puissante et accessible pour automatiser le scraping web.

Elle réduit significativement le temps d’extraction et les erreurs, tout en facilitant le traitement des données. Que ce soit pour des projets SEO, des analyses e-commerce ou des rapports, cette fonction permet de transformer Google Sheets en un outil de collecte de données performant et polyvalent

How to use IMPORTXML in Google Sheets to collect data

You want collect data on the internet... quickly?

Maybe you want to copy a table from the site, or maybe you want to quickly retrieve SEO elements from a competitor.

IMPORT.XML is here to help you automate your operations of collecting data in Google Sheets.

What is the IMPORTXML function?

The IMPORTXML function in Google Sheets imports data from various types of structured databases (data types), including XML, HTML, CSV, TSV, and RSS and ATOM XML feeds (atom xml feeds).

It extracts information from any XML field, that is, any field surrounded by HTML tags (and therefore web pages), like <balise>and</balise>.

You can use it to import information that is publicly available from the web.

The syntax of the IMPORTXML function is as follows:

Documentation Google Sheets : Fonction IMPORTXML — Google Sheets documentation: IMPORTXML function

You are therefore not allowed to use it with websites that you need to connect to access it, for more information, the article below should interest you.

READ MORE: Is web scraping legal?

Basics you need to know to use IMPORTXML

importXML function: Imports data from various types of structured data (xml html csv tsv ross).

The base of a URL

The argument url is the URL of the web page from which you want to import data (data types),
Indicate the protocol (https://or http://).
The URL value should be either in quotation marks or a reference to a cell that contains the appropriate text.

The basis of an XPath Request

The argument xpath_query is the XPath query to be executed on the data in the given URL.
Each result of the XPath query is placed in its own row in the spreadsheet.
XPath is a query language used to retrieve information elements from sites.
It has its own syntax that you can learn in detail from online resources.

There are a few basic rules for creating your own argument. xpath_query :

The double slash:

The double slash”//“means the selection of all the elements of the tag that is written after it.
For example: //h1 means select all h1 Title items, //p means all p items, and so on.

The class

‍[@class = "] means that you want to select only items that meet this given criterion.
For example, //h1 [@class ='title'] only imports h1 Title elements whose class attribute is “title”.

Elements

You can define these elements on multiple levels as they appear in the site source.
For example, //h1/span will look at the h1 elements and then the span elements that they contain.

XPath Basics

To use XPath, you need to understand the basics of HTML.

Website data is displayed with HTML and stored in XML format.
XPath is the language used to query these databases.
You can customize these XPath arguments based on what you are looking for on the site.

XPaths examples for marketers

Check out these useful XPath extraction shortcuts to gather data from the web more quickly:

- All links on a page: “//@href”
- Extract all internal links from a page: “//a [contains (@href, 'example.com')]/@href”
- Extract all external links from a page: “//a [not (contains (@href, 'example.com'))]/@href”
- Page title: “//title”
- H1 (s): “//h1"
- Meta description: “//meta [@name ='description']/@content”
- Canonical: “//link [@rel ='canonical']/@href”.
- Robots: “//meta [@name ='robots']/@content”
- Hreflang attributes: “//link [@rel ='alternate']/@hreflang”

The function IMPORT.XML allows you to import site data directly into your spreadsheets

All you need to do is consult the source code of the website from which you want to import data, find the HTML element that can be used, and retrieve it using XPath queries.

What are the steps to use IMPORTXML in Google Sheets?

1. Start by opening a new Google sheet

First, we open a new, blank Google Sheets document:

For this example, you can use the Google Sheets sample sheet:

Google Sheets example sheet: Data collection with ImportXML

2. Add the content you need to extract

Add the URL of the page (or pages) whose information we want to extract from the web.

In our case, we are going to extract the page titles of all the articles, as well as their URL and meta descriptions.

So our starting URL will be as follows: https://www.sales-hacking.com/blog

Web Scraping avec Google Sheet — Data collection with Google Sheet

3. Finding the XPath

Use the Navigator Inspector

In order to find what the XPath looks like and how to find it simply, all you have to do is use a simple tool that is present on all modern browsers: The Inspector.

This XPath will allow us to specifically retrieve each information desired on the page in such a way as:

Relative : I want all the blog titles on the web page
Specific : I only want the first 10 or 30 article titles on the web page or only 1 in particular

In our example, let's start with the titles of the last 30 articles.

Go to your preferred browser.
Once you have hovered over the title of one of the articles, right-click
And select Inspect (Inspect in English)

Utilisez l'inspecteur des moteurs de recherche pour trouver le XPath — Use the search engine inspector to find the XPath

Once clicked the Browser will show you the HTML code of the web page as below:

Doesn't that speak to you? Don't worry, we'll see in the step below that the hardest thing for you to do will once again be to make a”Right click“

Copy/Paste the Xpath

The inspector (generally available in the Developer Tools section of your browser) allows you to directly select the item page for which you want more information:

Select the part you are interested in on the page with the Inspector, i.e. in this example: the title of one of the articles
Make sure the article title is still selected and highlighted, then right-click again and choose Copy > Copy XPath.

Inspecteur du navigateur — Retrieving the XPath by the Navigator Inspector

That's it, you've just selected the XPath that you're going to use right away in Google Sheets.

4. Extracting data into Google Sheets

To try it yourself in the Google Sheets example, you can use the example sheet below (Create a copy to use it):

Google Sheets example sheet: Data collection with ImportXML

Note : Data collection for the blog posts in this article has been voluntarily limited to 10 items to avoid any restrictions by Google services.

Retrieving blog title articles

Back in your Google Sheets document, introduce the IMPORTXML function as follows:

Data can also be retrieved using specific XPaths in Google Sheets.

For example, in step 3, we selected the XPath for a specific article which is this:

=IMPORTXML (B1,” /HTML/body/div [3] /div [1] /div [2] /div [3] /div/div ["&B3&"] /div/div/a/h3")

Récupération du Titre par des XPath Spécifiques — Retrieving the Title by Specific XPaths

‍

First of all, in our formula, we replaced the page URL with the reference to the cell where the URL is stored.

secondly, when you copy the XPath, it will always be enclosed in quotation marks.

Retrieving blog post URLs

This is what it looks like on the Google Sheets document

Exemple fonction IMPORT XML pour collecter des URL — Extracting URLs using IMPORTXML in Google Sheets

As you can see, the list returns all the articles and URLs that are on the page we just extracted.

You can also apply this principle to the extraction of any other piece of information necessary to set up your projects, such as retrieving the meta descriptions of each page with Google Sheets (example can be found in the file below):

Google Sheets example sheet: Data collection with ImportXML

feuille d'exemple complète — Data formatting

How can I automate the data collection process with IMPORTXML in Google Sheets?
You can automate the data collection process using Google Apps scripts or third-party programming tools like Python.
Scripts can be used to set a data collection schedule, add extracted data to a spreadsheet, and send update notifications.

Other Examples of IMPORTXML formulas

Let's say you want to import the paragraph titles from this Wikipedia article with IMPORTXML from Google Sheets.

<h2>After examining the source code, you can see that the titles are enclosed in tags.

So that's what you need to put in your IMPORTXML formula. In XPath syntax, you should write it as “//h2".

Try the following function where cell A2 contains the article URL:

Web Scraping - ImportXML 3 — Example of the IMPORTXML function - =IMPORTXML (A2,” //h2")

‍

First, you'll see a “Loading...” sign “, while data import is in progress. Importing from the site takes some time (but it's still faster than doing it manually).

After a few seconds, the function returns what it found on the site with the tags<h2>.

This solution is not perfect because there are additional [edit] texts next to the titles that the site contains, but it does return paragraph titles.

This time, visible text has been imported from the site.

But with the IMPORTXML function, you can import all the data that the HTML source contains. For example, it is very useful to import meta information, link sources, or any other hidden HTML information that is not visible when browsing the site.

Let's get the links to YouTube videos from the Google privacy policy site.

In this case, the first argument to the IMPORTXML function is the URL of this site.

After looking at the site's source, you may realize that YouTube videos are enclosed in tags <iframe>and</iframe>.

So you need to add //iframe to the function's xPath argument.

Then, you need to go deeper into this tag <iframe>to find the required data that you want to display.

The source of the video link can be found in the src attributes.

Add a slash “/” after the first part of the XPath query followed by the src attribute.

The “@” character indicates that it is an attribute. The whole xpath_query request will be: //iframe/ @src.

And the whole function for importing YouTube links is:

Web Scraping - ImportXML 2 — Example of using the IMPORTXML function - IMPORTXML (A2,” //iframe/ @src “)

Sometimes it's not so obvious what differentiates HTML from the content you want to import, but XPath allows you to create more complex queries that extract only the data you want.

XPath has some functions that you can include in your requests, such as “contains” or “starts with.”

Let's see how to import all the references (citations) at the bottom of the Google Sheets article from Wikipedia.

<li>After looking at the site source, you can see that the references are enclosed in tags.

But that's not enough because there are a lot of tags <li>on the site.

You need to find something more that differentiates these list items from other parts of the site.

Here is the attribute Id references that all contain the term “cite_note.”

The function Contain of XPath can be used to import only items <li>whose id contains the term 'cite_note'.

That's what the following IMPORTXML function in Google Sheets does:

Web Scraping - ImportXML 1 — Example of using the Import XML - IMPORTXML function (A2,” //li [contains (@id, 'cite_note')]”)

And it returns all the references that were found on the site with that ID.

What type of data can I collect with IMPORTXML in Google Sheets?
You can collect a variety of data from the website using IMPORTXML, such as prices, product descriptions, contact information, business dates and hours, exchange rates, formatting info, and more.

When should IMPORTXML be used to collect data?

Extraction of databases from websites or XML feeds (atom xml feeds)
When you need page data but don't want to use tools
When you need page data but don't know how to code

Is the use of IMPORTXML to collect data legal and ethical?

In most cases, using IMPORTXML to collect data is legal as long as the data collected is not used for illegal purposes.

However, it is important to respect the terms of use of the sites whose data is collected. In some cases, websites may prohibit the extraction of data for commercial purposes or prohibit the use of automated extraction methods such as IMPORTXML.

Before using IMPORTXML to collect data, it is important to check the terms of use of the site.

READ MORE: List of tools for collecting data on the web

FAQs

What is IMPORTXML in Google Sheets?

IMPORTXML is a function built into Google Sheets that allows you to retrieve XML data from the website and display it in your spreadsheet.

How does IMPORTXML work in Google Sheets?

IMPORTXML works by using XPath expressions to extract data from a web page.

XPath expressions are instructions that tell IMPORTXML where to find the data on the page.

Is IMPORTXML easy for beginners to use?

Although using IMPORTXML may seem complex at first glance, with a bit of practice, the function can be used easily even by beginners.

There are also numerous online resources, such as video tutorials and community support forums, that can help you learn how to use IMPORTXML effectively.

Conclusion

As you can see, IMPORTXML from Google Sheets can be a very powerful feature in your arsenal.

This gives you a fully automated and error-free method for extracting data from (potentially) any web page, whether it's product content and descriptions or e-commerce data such as product price or shipping costs.

At a time when information and data may be the advantage needed to achieve better-than-average results, the ability to collect web page data and structured content in a simple and fast manner can be invaluable.

Additionally, as discussed above, IMPORTXML from Google Sheets can help reduce execution times and the risk of errors.

Furthermore, this Google Sheets feature is not only a great tool that can be used exclusively for PPC tasks.

But it can be really useful in a lot of different projects that require scrapping the web, including SEO and content tasks.

You’ll Also Love…

Discover other carefully selected articles to deepen your knowledge and maximize your impact.

Find Free Leads: Google Dorks Techniques Explained

Starting a business can transform your life. Discover how entrepreneurship changes the lives of individuals worldwide with this in-depth exploration.

Python Web Scraping: Comprehensive Tutorial

Starting a business doesn’t require a fortune. Discover low-cost business ideas that need minimal investment but offer great potential for growth.

Web Scraping Ethics & Legality: Best Practices Guide

Starting an online business is easier than you think. Learn how to overcome the fear of leaving a job and starting a successful online venture.