17 minutes reading time (3420 words)

Playing with the Joomla Web Services (API) - part 4

2023---JCM-JoomlaAPI-Episode4api4

[ this article is part of a series ]

8. Using the Joomla API to Import and Update Articles directly from a Google Sheet

Being able to POST or PATCH a single Article programmatically with a few lines of code thanks to the API as shown above is great… but the added value in real life is quite limited at this point.

Indeed, having to prepare 1 script to create only 1 article would seem quite cumbersome. But once you can do it for 1, one starts to think “what about having a loop to create multiple articles” :)

When I first spoke about this with Alexandre, I would already have been happy to do it with a CSV file:

  • export to CSV-format from an Spreasheet
  • and then select that file with a kind of import button somewhere

But Alexandre went further:

  • why bother exporting/importing a CSV file: if you use Google Sheet for example, you can share it with a simple link… to its CSV version
  • why importing Articles only into 1 website: let’s have a column in our Google Sheet to select for each row in which website we want to create the corresponding Article
  • why limit the import to the native Fields, let’s make it easy to also import (any number of) Custom Fields
  • why not allow to also have json in a cell since the following Fields are stored in json-format in the database
    • Intro Image & Full Image links, together with their configurations (Description, classes, …)
    • Urla, Urlb, Urlc
    • Custom Fields of type Subform
  • why having to respect some order of the columns in the Google Sheet: let’s allow the user to have any order

8.1. Prepare Google Sheets

8.1.1. Create Google Sheets

Create a Google Sheets

  • go to https://drive.google.com/
  • click on the New button
  • select Google Sheets
  • give a Title to the newly create Google Sheets

Now we need to have some sample data with the right structure for our Google Sheets. We don’t want to reinvent the wheel so let’s have a look at the following script:

https://github.com/alexandreelise/j4x-api-examples/blob/master/using-raw-php/smart-add-edit-to-multiple-sites-from-csv-url.php 

In particular I see the following line $csvUrl = 'https://docs.google.com/spreadsheets/d/e/2PACX-1vTlM7un4cv3t5oKQ6mymmBDrAnbpFcYLFh6KnHMC6iCE8qUJSNKJ4Vw54r4OjGNNU4DXxOuLWdtKvZ2/pub?output=csv';

Excellent, let’s copy/paste this URL in a browser

https://docs.google.com/spreadsheets/d/e/2PACX-1vTlM7un4cv3t5oKQ6mymmBDrAnbpFcYLFh6KnHMC6iCE8qUJSNKJ4Vw54r4OjGNNU4DXxOuLWdtKvZ2/pub?output=csv

This makes my browser download a CSV file. I could start from this file… but an easier way is to simply display the Google Sheet directly on the browser by dropping the ?output=csv at the end of the URL.

With other words, I open

https://docs.google.com/spreadsheets/d/e/2PACX-1vTlM7un4cv3t5oKQ6mymmBDrAnbpFcYLFh6KnHMC6iCE8qUJSNKJ4Vw54r4OjGNNU4DXxOuLWdtKvZ2/pub

Then

  • I copy/paste the content of this Sheet in my own empty Google Sheets
  • I delete the following columns if they are still there (they were actually created at some point only to make more tests with the script)
    • alias-seed
    • seed
    • article-subform-field (which is useful only if you want to import Custom Fields of type Subform which are stored in JSON format in the database)
  • for the demo here I only keep the first 3 rows of content

8.1.2. Share Google Sheets

In Google Sheets

  • select the Menu File > Share > Publish on web
  • on the popup window
    • select the Sheet containing your data
    • select CSV (and not webpage nor PDF nor whatever)
  • click on the Publish button
  • copy the URL now displayed in the popup (and paste it somewhere, we will need it later)
  • close the window

In our case, the url is

https://docs.google.com/spreadsheets/d/e/2PACX-1vT5Th1TKSKv3vZuEvTnpK_MPqhItjuyV8ierdbNQa9FV2nYWHV78Cd9IRGbP9BHIXPCmnCrs6Jm5791/pub?gid=0&single=true&output=csv

So if you want to have a preview of that directly on your browser, take the url without what follows /pub, namely

https://docs.google.com/spreadsheets/d/e/2PACX-1vT5Th1TKSKv3vZuEvTnpK_MPqhItjuyV8ierdbNQa9FV2nYWHV78Cd9IRGbP9BHIXPCmnCrs6Jm5791/pub

Screenshots (my computer is obviously is French but it is pretty straigtforward):

8.2. Basic example

8.2.1. Prepare Google Sheets script

Create your Google Sheet script on your website

Customize your Google Sheet script on your website. In your file adapt the following variables

  • $csvUrl with the url we just got from Google Sheets
  • $baseUrl. In my case
    • I replace
    • 'app-001' => 'https://app-001.example.org', by
    • 'app-001' => 'https://api.joomlacustomfields.org', (beware, without the trailing slash!)
    • I delete the lines with 'app-002' and 'app-003' because I want to import only one 1 website, not on 3 (or more) like the script allows
    • $token. In my case I replace
    • 'app-001' => 'yourapp001token', by
    • 'app-001' => 'c2hhMjU2OjI3NDpmOGMyNGIyMmI4ZTI0YzhlN2VhMGI1YmI2MTk5ODdiODViZWIwYjBhMTkwYzYxMmZkMDEzYWUxNzg5MmE2YzNm',
    • I delete the lines with 'app-002' and 'app-003' because I want to import only one 1 website, not on 3 (or more) like the script allows

8.2.2. Execute your script

Simply execute your script as already explained above.

Normally this script should not even display anything and just run in the background.

Still the script displays the “raw result”: this is clearly not necessary but allows to visually check that something indeed happened (and once I could see that some rows were not processed and could identify the cause in the Google Sheets for example).

Beware: I have already noted that Google Sheets sometimes take 1 minute or so before actually updating the CSV version (even if other users of the Google Sheet see your changes “live”). So if after executing your script you see no difference, simply wait 1 minute and relaunch it

8.2.3. Go back to or refresh the Articles Manager

When opening the Article Manager, we can indeed check that

  • new articles have been created. Tadam!

And when opening any of them, we can also see that everything is there

  • introtext of the article
  • link / description / class / caption for Intro Image and Full Image
  • urla, urlb, urlc

Note:

  • in this case, for the Intro Image and the Full Image we simply put an link to an external image in the Google Sheets
  • in real life, one would probably want
    • either to use Joomla’s Media Manager to select the images
    • or use something more powerful than Google Sheets, like AirTable which allows to put images in a “cell” (see the AirTable script which is even capable of physically copying the Airtable images locally to the website)

8.2.4. If you want to Update your articles later on

You have probably noted that the first column in the Google Sheet was id, being of course the ID for each article.

Actually we wanted one single script which works both to Create and to Update Articles.

But how to make the script “smart” enough for that?

  • if the id is 0 then the script will Create an article
  • if the idis any other number, it will Update the article having the corresponding ID

So at this point, if there is a chance that I execute again this script in the future, I don’t want to re-create the already existing articles.

Therefore, in my Google Sheet

  • I put manually the id of each already created article
  • and if I wish to create new articles next time I run the script, I can also do it by giving them the id 0
  • and so on each time I run the script

Illustration with my adapted Google Sheet where

  • I change the ID, the Title and the Alias for the first 3 articles
  • I add a new row to create a new article

And the result in the Articles Manager after running the script again:

Obviously in your Google Sheets you can change the value of any field. Example: language is set on * (meaning all languages since it is the way it is encoded in the database) but you could specify a language like fr-FR

8.2.5. Two rounds to process

The script runs two loops :

  • first it it creates all the article it is asked to create
    • but if the required alias is already taken it cannot create the article
    • then it remembers that
    • and once the first loop is finished it makes a second loop in order to create those articles, by adding a random number to the required alias (in the code see the section // Handle errors and retries)

8.2.6. Other smart features

Note that the script has also been made as “smart” as possible for different aspects. Examples:

  • if there is an error for a given row, the script will simply skip that row but go on with the next ones. Examples
    • there is a Custom Field of type Link but in the corresponding cell the User has types Text or types an email address This email address is being protected from spambots. You need JavaScript enabled to view it.but forgets to mention the protocol mailto:)
    • some required data are missing
  • as you might know, in a given category an alias can only be given once
    • so if a certain alias is already given, the script will add a random number to the proposed alias so that it can still be processed/created. Illustration:

8.3. Advanced example

The following steps are not necessary of course: it only applies if you intend to use the script for example

  • to import other native Fields like publish_uppublish_downfeatured_upfeatured_down (to find the right name for these fields, simply look at the database columns…)
  • to import Custom Fields

8.3.1. Create Custom Fields in Joomla

Let’s suppose we create a website for a Rock band. It would be much easier for the musicians to add their new concerts simply in a Google Sheets than to lean how to use a CMS.

Those concerts would then be easily imported and/or updated in the website.

So let’s create a new Category Concerts and assign to it the 3 following Custom Fields :

  • Date and Time: CF of Type Calendar with name date-and-time
  • Venue: CF of Type Text with name venue
  • Link to Event: CF of Type URL with name link-to-event

8.3.2. Adapt the Google Sheet

Since we create a new Category (“Concert” with ID 8), let us change the value of the catidcolumn from 2 to 8 in the Google Sheet.

Let us also create the following columns in the Google Sheet:

  • publish_uppublish_downfeatured_upfeatured_down
  • date-and-timevenuelink-to-event

Note: when you type date, they should have the following format: 2022-11-23 20:00:00 (and visibly it imports dates in the real TimeZone of your website, as if you were typing them from Joomla’s backend)

Here is an example with some values:

8.3.3. Adapt the script

In your script

  • add the custom fields by changing
    • $customFieldKeys = []; into
    • $customFieldKeys = ['date-and-time','venue','link-to-event'];

  • and add the following block inside the $defaultKeys array:
        'publish_up',
        'publish_down',
        'featured_up',
        'featured_down',

8.3.4. Go back to or refresh the Articles Manager

Tadam!

Our Categories have indeed been Updated:

Our Custom Fields have indeed been Updated:

Our other native Fields have indeed been Updated:

8.4. Special fields like Images or Custom Fields or Type Subform or URLs

Some special fields entails “multiple values”. So how to handle this?

Suppose you want to import an image or a Custom Field of Type Subform.

As you might be aware, in the database these fields are saved in json format.

So what should we type in our Google Sheet cell ? The easiest way is to save a real Article on the website and see in the database what the corresponding json is.

Note that we don’t need in the Google Sheet to “escape the / character”. With other words we simply type / and not \/

Example for Image (which includes the Intro Image and the Full Image, together with their ALT, Caption etc):

{"image_intro":"images/test.jpg","image_intro_alt":"","float_intro":"","image_intro_caption":"","image_fulltext":"images/test.jpg","image_fulltext_alt":"","float_fulltext":"", "image_fulltext_caption":""}

Example for a Custom Field of Type Subform having in this case 2 Custom Fields (field2 and field7) and 3 values (row0, row1, row2) – note: it also accepts HTML tags but I can’t paste them here in the presentation:

{"row0":{"field2":1410,"field1":"This an article summary 1 for this dessert. Short & Sweet!", “field7”: “What’s up Super Joomlers! Alex here…Proud to be a joomler. Nowadays focusing on Joomla! 4.x Web Services Apis.”}, “row1”:{"field2":1410,"field1":"This an article summary 2 for this dessert. Short & Sweet!", “field7”: “What’s up Super Joomlers! Alex here… Proud to be a joomler. Nowadays focusing on Joomla! 4.x Web Services Apis.”}, “row2”:{"field2":1410,"field1":"This an article summary 3 for this dessert. Short & Sweet! “, ”field7": “What’s up Super Joomlers! Alex here… Proud to be a joomler. Nowadays focusing on Joomla! 4.x Web Services Apis.”}}

Example for the urls Field (those URL A, URL B and URL C which are native in Joomla):

{"urla":"https://alexandree.io","urlatext":"Website","targeta":"","urlb":"https://github.com/alexandreelise", "urlbtext":"Github “,”targetb":"","urlc":"https://twitter.com/mralexandrelise","urlctext":"Twitter", "targetc":""}

Note: my Editor is tweaking my code snippets here. Of course all the " are regular double quotes (and not italic). And the Github icon is also added by the Editor

8.5. Fields with potentially multiple values like Custom Fields of Type List or Checkbox

Suppose you have a Custom Field of Type List or Checkbox. For those CF

  • there is typically an Option to allow or forbid Multiple Values
  • when you create your proposed options, for each of them you specify
    • a Text (ie what will be displayed in the front-end and in the back-end)
    • a Value (ie what is written in the database)

In that case

  • if don’t have Multiple Values
    • in the Google Sheet you should simply type the unique Value (example: chocolate)
  • if you have Mutiple Values
    • in the Google Sheet you should then type {"0":"fries", "1":"chocolate", "2":"beer"} if the 3 values that I want are fries, chocolate and beer
      Note: at the moment the script might throw the following message Cannot use object of type stdClass as array but apparently this does not prevent the code from executing and the content from being imported

8.6. Fields with potentially multiple values like Custom Field of Type Articles Field by Regular Labs

Articles Fields by Regular Labs is a powerful Type of Custom Field allowing to select other Articles: https://regularlabs.com/articlesfield

A good example : for a Library website you have typically have a Category “Books” and a Category “Authors”. You could then link each Book Article to one or several Author Article(s).

If you have multiple values, I have tested succesfully the following 2 methods (let’s suppose we want to link to Articles having ID 12 and 13):

  1. in the Google Sheet, type 12,13 then Synchronize. After that Edit the Article in the Backend (the 2 values do appear there but are not yet correctly saved in the database) and Save it
  2. in the Google Sheet, type directly the values in JSON-format, ie {"0":"12", "1":"13"}, then Synchronise

8.7. Performance

When I first started to play with this script I was importing 100 to 200 articles and it was going fine.

Then I decided to raise the bar for testing purposes and tried with some 2600 article.

I think that the script managed to create some 500 articles (having each 10 Custom Fields). Indeed after a few minutes the script was throwing the following Request Timeout :

This request takes too long to process, it is timed out by the server. If it should not be timed out, please contact administrator of this web site to increase 'Connection Timeout'.

So on my hosting I increased the following:

  • max_execution_time
  • max_input_time
  • default_socket_timeout
  • memory limit

This seems to have done the trick for me since now

  • the script still displays a Request Timeout after 5 minutes
  • but actually the process continues and I see that my articles are being further updated

Actually, it takes approximately 13 minutes to get all my 2600 articles updated (meaning an average of 200 articles per minute).

If I needed to import even more article, I guess that the following actions would help:

  • disabling Smart Search (to be more precise, the Smart Search Content and Content Smart Search. Maybe 1 of those 2 would be enough but I did not check their exact respective action)
  • disabling Versioning

Indeed, every Article creation or update triggers the Smart Search and the Versioning which probably take extra resources before passing onto the next Article.

8.8. Difference between using the API or using a direct SQL query in the database

One could think of importing / updating their Articles directly with a SQL query (based somehow on the same data in the Google Sheet).

Doing so would indeed directly modify the database… but it would not trigger all the Actions foreseen by Joomla (typically onContentAfterSave) such as

  • Smart Search
  • Versioning

So that if those Actions make sense for you then the API is better because it is really as if every Article would be manually edited in the backend.

But for pure performance, writing directly to the database can only be more performant.

9. Troubleshooting

If no Article is created or updated when you launch the script, here is what you can do

  1. check that everything is OK in the script (your API key, the link to the Google Sheet, the structure of the Google Sheet)
  2. check the log:
    • go to System > Global Configuration > tab Logging > enable Log Almost Everything + be sure that Log Priorities is set to all
    • relaunch the script
    • see /administrator/logs/everything.php

Note: if you want to make your life easier, install this little free plugin made by Yannick Gaultier, which allows to view/download/delete all logs files produced by Joomla directly from the backend: https://weeblr.com/joomla-seo/4logs-simple-free-logs-viewer

Here are two practical examples of issues that I faced myself.

9.1. Example 1 – some custom field is required

One day I encountered the following error message in /administrator/logs/everything.php

Fields: datetime priority clientip category message 2023-01-18T16:11:36+00:00 CRITICAL error Uncaught Throwable of type Tobscure\JsonApi\Exception\InvalidParameterException thrown with message "Field required: Event Category". Stack trace: #0 [ROOT]/libraries/src/MVC/Controller/ApiController.php(362): Joomla\CMS\MVC\Controller\ApiController->save()

Actually the Custom Field “Event Category” was required but that Custom Field was empty in my Google Sheet.

So just like you could not Save an Article via the Interface in that case, you can’t do it neither via the API…

9.2. Example 2 – conflict caused by 3rd party extension

I have already had the case that an extension (CFI plugin in my case) would prevent the API from working, which I also discovered thanks to the Log file.

My workaround was to restrict the plugin to Super Users so that it would not interfere.

9.3. Example 3 – special characters in the Google Sheet

The following one was quite tricky: somes Articles would be imported normally with the script… but some other ones would throw such an error :

Fatal error: Uncaught ValueError: array_combine(): Argument #1 ($keys) and argument #2 ($values) must have the same number of elements in /home/my-sync-file.php:151 Stack trace: #0 /home/my-sync-file.php(151): array_combine(Array, Array) #1 /home/my-sync-file.php(194): {closure}('https://docs.go...', Array, Object(Closure)) #2 {main} thrown in /home/my-sync-file.php on line 151

After some time, I realized that the users were using a kind of “curly single quote” instead of the “regular single quote”… and that “curly single quote” would be interpreted as the end of the field, causing a discrepancy :

l’oasis instead of l'oasis

The only solution to this is to make a Search a Replace for the whole Google Sheet

9.4. Example 4 – Break Line in the Google Sheet

When you type text in some cell of a Google Sheet, if you want to create a new line within that given cell, you can simply press CTRL+ENTER.

Apparently (I should test further to be 100% sure), having this at the end of a Field like Title which is pure text would cause the import/synchronization of the corresponding Article to fail.

For Custom Fields of Type Editor, this would be no problem – I have tested it – although you would loose the “break line” in Joomla, having then one unique long paragraph in the Article (want a solution for that? see the tip in the next section).

10. Final tip

One final tip if you want to have multiple paragraphs in the cells of your Google Sheet :

If you want to replace automatically all the “Break Lines” that users have typed in the cells of your Google Sheet by proper HTML tags like <p> and </p>, you can simply use a Regular Expression (REGEX) like the following one, the “Bread Line” being identified by \n :

=if(A1="";"";"<p>"&REGEXREPLACE(A1;"\n";"</p><p>")&"</p>")

 

 
1
Style your Joomla website: Articles - Newsflash
Meet a Joomla User Group: Maastricht, the Netherla...
 

Comments

Already Registered? Login Here
No comments made yet. Be the first to submit a comment

By accepting you will be accessing a service provided by a third-party external to https://magazine.joomla.org/