Who to blame when a Joomla release has a bug
I was privileged - and I mean privileged - to watch the release of Joomla 4.2.7.
It went wrong, there was a bug, and according to some on social media the sky was about to fall in, but what unfolded was impressive and makes me feel that Joomla is in good hands.
Would it not be better to have perfect releases every time? Well yes, but that's unlikely to happen.
Although rather lampooned at the time, Donald Rumsfeld’s famous quote is relevant when writing code. He said that there are:
1. known unknowns (expected or foreseeable conditions), which can be reasonably anticipated but not quantified based on past experience as exemplified by case histories and
2. Unknown unknowns (unexpected or unforeseeable conditions), which pose a potentially greater risk simply because they cannot be anticipated based on past experience or investigation.
It's often not the piece of code you change that is the issue but the knock-on effects that it can have when that code interacts with other areas of code.
So what actually happened? What was the bug?
It was a comma, a trailing comma which is totally fine in PHP 8 and is really useful in function calls and arrays but it was not allowed and would throw an error if used in older, no longer secure versions of PHP. Unfortunately, the comma slipped in and most coders are now used to seeing trailing commas, but we still support PHP 7.2.5.
And there lay the issue: trailing commas were fine in 7.3 onwards but in 7.2.5 it throws an error.
So we have a subset of users who are using out-of-date PHP (PHP stopped supporting it on the 30th of November 2020 so over two years ago: https://en.wikipedia.org/wiki/PHP#Release_history)
How bad was the bug?
That was the question we needed answering and answering as quickly and accurately as possible.
In days gone by we would have released a Joomla version and I would have then gone to my local pub to celebrate and have a few halves of beer. But I have learnt to cancel any celebrations for a while after several times I just managed to wet my top lip with a dark stout and would see my phone light up (my local is a phone on silent pub so any calls have to be taken in the garden come rain or shine).
Making decisions and trying to get an “oops it went wrong” message out is not really possible on the phone so if this happens, it's an evening in by the laptop.
Getting the answer to how bad the bug is, needs boots on the ground and after every release, there is now a post-release group monitoring the airwaves looking out for early reports of issues and looking at the forums, Twitter, Facebook and community channels.
We have members of Marketing, Production, Release Manages and people involved in teams that might be affected by the release.
This was put in place by Production lead Benjamin Trenkle and the CMS release team under the direction of Sigrid Gramlinger, the team constantly refines the process to make the outcome better for the community.
“Okay, Houston...we've had a problem here”
When news that there was an issue first came in, it was confusing and unclear as to the size and severity of what had happened. So here’s what happens in a situation like this.
We need facts, and on the Twittersphere, there is more heat than light. So the first thing we need to do is recreate the error.
Several people are tasked with rebuilding sites using the few facts we have at that point, to simulate the issues people are reporting.
It soon transpires the issues occur when people update Joomla 4 sites which are on PHP 7.2.
That in itself makes it difficult to test on some of our servers as we don't all have such out-of-date versions.
While that's all going on we are reaching out to those who have systems or large numbers of client sites, to try and work out how many installations are likely on that version of PHP and so how many people are affected. Gleaming any insights that might help us assess the numbers.
Next, we need to work out the severity. Does it leave the site broken, partially updated, or vulnerable? We now have the ability to test and Sigrid is busy finding the answer, the issue seems to NOT affect the update but then leaves the action log showing an error when you click into it in the administrator area.
So it's not a vulnerability, no data is exposed, the front of the site is intact and complete and the purpose of the update, a security update, works. It introduces an issue that affects only admins looking at action logs.
What's the cure, can people self-medicate?
The next assessment was to decide on possible fixes. Should a new update be built just to fix this issue which wouldn’t be an issue if using an in-date PHP version?
There were two ways to resolve it if someone found themselves in the small group of people affected, one involving editing the comma out of a single file, the other a PHP upgrade. So the need to push a new version of Joomla 4 to all was not really warranted.
While all the work of assessing and testing was going on, out on the Twittersphere and in some online posts a few people were urging that the security fix was NOT used and saying quite incorrectly and without any evidence that a release of Joomla 4.2.8 was imminent. It was not and that was wrong information. It was more than that, as it discouraged the updating of Joomla 4 sites to fix two security issues that were present, uninformed and bad advice.
But what's the decision with different people saying different things and urging from emotional and not necessarily well-thought-through reasoning?
Which Pill, Blue or Red? Enter: the Matrix
So to help make this decision we have a matrix: a system that takes in values and gives a result.
Why is this of use? It's the distillation of many decisions before. Hindsight is a wonderful thing, and hindsight comes when you have more information and when there are fewer unknowns. It's that additional information which helps to make the right decision.
Building the matrix, Benjamin and Sigrid have looked back at past decisions. Some seemed the right thing at the time but history proved otherwise, and others were spot on but were hard to come to at the time.
With the input of a few questions such as a rough percentage of sites affected, the severity and whether it left the sites broken, data lost etc, it's possible to get a reasoned outcome. President Robert Deutz was the one charged with that job and as soon as the figures were added it confirmed what we had suspected: that this was most definitely not the sort of situation to produce a new release.
The matrix showed that the outcome should be the following:
- A document noting the ways to fix the issue
- Change the original post to highlight the facts and point to the document
- Push the answer out as wide as possible to the community and all Joomla site owners
Louise Hawkins, marketing lead, had joined the group and was a big help in getting the wording together and setting up the message to the wider community all done under the guiding hand of Franciska Perisa, the release manager who was checking the message for accuracy and writing the documentation for the users to follow.
It felt smooth as people all stepped up to do the jobs they are best at and just knuckled down to get the work done. I think it was around 11 pm UK time that I finally got to switch the laptop off feeling that we had done the best job we could.
So who is to blame?
While this was going on there was internal and external criticism. That's fine, a mistake happened and it's right that the person who caused the issue or the team that failed to spot it is put on the spot. Those who want to throw mud get their chance to step forward and take the moral high ground, after all, it wasn't their oversight.
Well if the last paragraph doesn't actually feel like the best way to do things to you, then you and I are on the same page. If you are one of those who was slinging the mud, read on. Something came to mind while I was watching the contrast between those who were saying the sky was falling in and a new Joomla version was about to be released, and those who were actually researching, checking, simulating, fixing and writing the documentation.
Saving Ford, social maturity, working as a team
I once watched a documentary about Ford and how the car company had been dying for some years. I'm remembering it from a distance of at least a decade but the gist is what follows.
As things got tougher for Ford, blame grew. Problems needed fixing and the best way to fix them was to hold people responsible for their actions, hold the problems up for inspection and the person responsible to account.
Alan Mulally was brought in who had a background in the air industry. His first Thursday meeting with all the heads of departments was smooth and easy, they had no problems. The system they used was a panel from each city manager and it showed:
- green for no problems
- yellow for a problem but a fix is in progress
- red being a problem and need help to fix
All was green, so after a chat with each, the calls ended.
Next week the same, then after a few of these Thursday morning meetings he addressed them all and asked. “If there are no problems why are we going bust, not meeting targets, not selling cars?”
The next Thursday one of the cities was showing red in a sea of green. The poor manager who posted it was waiting for the criticism, almost ridicule. But the response was to be praised, praised for finding an issue and all were asked to help fix the supply issue, working as a team.
The next Thursday there was a sea of red!
That was the point Ford started to turn it around, making it acceptable to have issues and the problems focus turning from blame to a much more mature approach of focusing on the problems instead of the people who find, report or even cause the problems.
Ford then saw the most dramatic turnaround, one of the largest in business history going from imminent bankruptcy and huge losses to a profit in just a few years.
What's this to do with bugs?
Simple, a volunteer organisation needs to work as a team even more than an organisation like Ford. One of Alan Mulallys’ 4 principles is that the teams should have fun, and work should be fun. Having people sniping from the sidelines and pointing fingers at people who are trying to do the work, rather than roll up their sleeves and get involved is going to demoralise the people in the team and stop potential new recruits.
It's taking all the fun out of it.
It wastes time with bickering and it distracts from what's needed.
If the people inside and outside teams really want Joomla to succeed, to be its best, to be the best it can be, then when an issue happens the mature and helpful thing to do is to report the issue with as much information as possible but withhold the grandstanding and instant judicial proclamations. Focus on the problem and not the people, because focusing on the people, and pointing fingers makes you become part of the problem.
The negative comments and unfriendly attitudes stop retention, demoralise, and mean there are fewer eyes on the code, fewer tests, and fewer bugs caught before it goes live.
Watching the focused and professional way the post-release team dealt with the issue, and with the help of hindsight, time passed and the fact that the sky didn't fall in it shows which approach is right, which will help to stop bugs and fix them quickly and fully.
I urge you to watch this interview with Alan Mulally
And listen to the quote from his father.
“It's nice to be important but it's more important to be nice”
It's these principles that helped him transform companies, they are not weak but strong and will have an effect.
It would be amazing if we could embrace such ideals in our fight against bugs and in our journey to make Joomla even better than it is.
Thank you for taking the time to analyse this situation in detail.
Thank you to all those who took the time to volunteer
Yes to more fun :-) and less disputes :-(
This does not prevent critical but constructive exchanges!
Really feel for the people that get this problem and a big thanks to the Joomla Team to give Us the BEST CMS out there!
Few lessons here, Whatever CMS you use
1. NEVER UPDATE ANYTHING ON A LIVE WEBSITE without first doing a backup (Free Akeeba Backup)
— If something is going wrong, you can restore your website fast.
2. ALWAYS TEST ANY (even the least little plugin update) on a local or same server copy of the live website with a similar/identical set-up (PHP etc...) BEFORE UPDATING LIVE.
— This give an extra level of security
3. You Need to stay up to date, I have a Joomla 1.5 website out there. It is now a fragile brick that will soon disintegrate to nothing. When software is obsolete you are sadly on your own.
Thank you for reading it and responding. Yes, exactly, we are definitely not saying that people should refrain from being critical or having opposing views, it's the manner in which they are delivered. Is the intention to hurt or upset, was there a way that the point could have been made which does not belittle the person on the receiving end? I would go so far as to say we should in a community try to be friendly with our responses, we don't have to like someone to be courteous and friendly.
Good advice. Always a backup.
And you mention testing, Joomla 4.3 Beta 3 just came out this afternoon, we really could do with testers