Shortening French Words Automatically

Raphaël Léger
5 min readSep 27, 2021
This french woman has a long baguette, a long skirt, long hair and… she even dares to pronounce a long word?!

Wait what? Why shorten French words?

A decade ago, before messaging apps ever existed, in order to communicate using a mobile phone, you had to write the shortest texts possible. Yep back in the day, phone keyboards were a real pain to use and you would be charged depending on the length of your messages.

For the sake of this article, let’s pretend it’s early 2000s and we aim to have the shortest french words possible.

SMS language to the rescue!

You have heard about SMS language, right? Short Message Service language. Seems like this language could be convenient to help shortening words.

In the same way in English “you are” can be written out as “u r”, in French sentences such as “quoi de neuf ?” (what’s up) can be written out as “koi 2 9 ?”, given that “de” and “2” share the same pronunciation and that “neuf” and “9” are polysemous.

The issue is that on Google Translate there is no option to translate French to SMS language 😔 Hence the idea of getting creative and offer this possibility myself.

Lucien — a french man that enjoys writing down “nptk” instead of “n’importe quoi” in order to look cool.

The online translator: French to SMS

Long story short, I did it! I created a small website that enables anyone to translate French to SMS language! 🥳

The website is available right there and you can test it out and play with it as much as you want.

The ultimate tool I created in 2008 😎

Now let us dive in under the hood, shall we?

How I built the Algorithm

At first it seemed to me that there were rules governing the SMS language and that the process of making words smaller but still readable could be made automatic by finding easy steps. As I did not find a place where such steps were explicitly stated, I kind of wrote my own.

Step 1: Replace specific words entirely

There are some words for which there exists a globally accepted translation.
I wrote down a list of those words and what they needed to be replaced with.

French people usually understand “lu bg” (hello handsome) as those replacements are commonly used.

Step 2: Replace specific sounds

There are some French words that contain several characters to produce a sound, but the phonetic could come from less characters.
I built a second list containing the series of characters that should be replaced by a shorter series of characters to get the same sound.

“dauphin” can be shortened to “dofin” and still sound the same.

Step 3: Replace specific start and specific end of words

In French there is a great amount of characters that are not pronounced, especially at the end of words and at the start of them.
I grew a third and a fourth list containing series of characters that could be replaced by shorter ones specifically at the start of words or at the end of words.

Notice how “ger” does get replaced in “berger”, but not in “fromagerie” as things to replace are looked up at the start of words or at the end of them only.

Minding the exceptions!

So yasss, I had defined three simple steps. Though, something to consider about the French language is that it’s riddled with spelling and pronunciation exceptions… a bunch of them.

This is why I tweaked the algorithm so that it checks for exceptions upon performing any replacement:

Replacing all “si” by “6” as they produce a similar sound, but not “si” in “sin” as the phonetic of “6n” and “sin” differ.

Final algorithm

And there with only three steps and minding exceptions, I had my algorithm set up! I made a simple program that runs replacements one by one.

To see it in action and better understand it, let’s play it out over a simple sentence:

And voilà! Through the algorithm, 10 replacements impacted the provided sentence and its length got reduced from 45 characters to 29 characters. But most importantly, the output is cool and readable by a french human being!

Statistics

As of the time of writing this article, the algorithm performs pretty well with the following settings:

  • it knows 322 entire words it has to replace
  • it knows 58 sounds it has to replace
  • it knows 32 series of characters it has to replace at the start of words
  • it knows 131 series of characters it has to replace at the end of words
  • it minds 29 exceptions

The lists of replacements the algorithm “knows” were slowly built over time by running the algorithm over more than 100,000 user inputs again and again until the results were globally satisfying. It is still more experimental than perfect and can always be improved.

Is there an actual useful use-case for this tool?

The only use-case I found was to spice up a chat-based game I worked on for a few years: Loups-garous-en-ligne.com

Loups-garous-en-ligne.com

Players were able to join special chat rooms where any text they sent were first converted to the SMS language before being displayed to the other players, bringing an unusual bit of fun to the game.

I would be happy to know if you have any ideas of other use-cases for such a tool? Feel free to let some comments about those ideas! 🙂

Source code

The algorithm of 2008 was originally written in PHP. A few days ago I decided to revamp it using modern and plain JavaScript / Node.js

Are you a developer? The package and its documentation are available here:
https://github.com/raphael-leger/french-to-sms

If this article brought any interest to you, you can clap & subscribe! 🙂
This helps me know that taking the time to write was worth it!

--

--