Don’t wash a cat, or your keyboard, either: I will never EVER forget this sysadmin lesson.

Mohammad R. Tayyebi
4 min readDec 7, 2023

I really appreciate those clean system administrators who don’t bring their executables to home and I try to be one. So I do obsessive stuff, such as routine checkups of directories and configurations, cleaning my table everyday, and washing my keyboard.

Keyboard peaces in a liquid cleaner

So I wash (really I do) my keyboard each couple of months. And I did last week. It has simple steps: plug buttons out, put them in a jar containing cleaners, let them dry, and put them in their places. Would be nice to put all those peaces in rice to fasten this process. But believe me, there are still risks; not for your beloved keyboard, but for those servers which you are administrator of!

What happened to me?

I was configuring a safe directory for those CI/CD stuff to be editable by default user ubuntu. I had to enter a command to change ownership of files and directories inside a project folder. Everybody knows how to do that yeah? Simply by entering sudo chown usr:grp ../dir, machine will immediately start the progress if the commander user is a sudoer. Linux is a suicide warrior. Mistakenly I entered sudo chown usr:grp / instead of sudo chown usr:grp *! Why? I was looking at my keyboard not the monitor at the time… And that was something wrong with that. Number pad peaces where physically in a wrong order. This was a awkward behavior, which I don’t do everyday. I use Shift + 8 to add an asterisk.

I pressed Ctrl + C multiple times, after a second, which it was too late for a forty and some cores machine writing NVMe disks.

I was in a mess. My user account was useless. sudo was not working due to insufficient permissions.

But there was still some hope. Business files are stored on another RAID partitions on another SSD drives which are mounted on system startup. So In worst case scenario, I had to reinstall the OS from images stored on the network disk which comes with dedicated servers in your purchase from OVH Data Centers.

KVM on OVH Bare Metal Control Panel

First thing I did was to call my boss. I asked for privileges to access the cloud control panel. He had a responsible approach. I also asked for DND (do not disturb) from colleagues, and the rescue mission began.

Up-time monitoring

I was forced for an unplanned down-time in the middle of the day. No fail-over solution was available since this machine was a single point of failure for the object storage service.

Thankfully, OVH will give you a tiny zero-config disk which is really handy when the default rescue mode is also unavailable with a never ending error: journal corrupted or uncleanly shut down, renaming and replacing. To have that up and running, simply you can change boot order in the user dashboard and boom!

I mounted the primary disk, changed the ownership rules and rebooted the machine. It was online in less than 5 minutes after boot. Sadly, about 30 mins of down-time, was forced. I am still monitoring the machine for any potential misconfiguration.

But I will never forget my lessons:

  • Double-check everything before passing to the terminal.
  • Test everything before production, even your keyboard.
  • Never accept production servers tasks when you are tired or not mentally prepared.
  • If you are a boss, you will always find a time for nagging. It’s not a good idea to do that in incidents. What about my boss? He said “Thanks”. I appreciate.
  • There are scripts out there which can prevent human errors. They do this simply with asking for a confirmation before running commands containing wildcards and modification of system files. Use them.

UPDATE 1:

These commands come handy in this situation:

# return ownership
chown root:root /usr/bin/sudo
# fix permissions
chmod 4755 /usr/bin/sudo

--

--