Simon Frankau's blog

On the free idempotent monoids generated by a finite set of letters

Off the back of a Mastodon conversation that started with enumerating the elements of a two element idempotent rig (https://github.com/simon-frankau/two-generator-idempotent-rigs)), I ended up trying to work out how to enumerate the elements of the idempotent monoid generated by three letters. Basically, this is the set of all words divided into equivalence classes over repeated substrings - so "abcXYZXYZdef" and "abcXYZdef" are the same element.

Surprisingly despite the fact that there are infinitely many square-free words built from three letters - that is, words witout repeating substrings in them, there are only a finite number of elements to the monoid. It turns out that for all sufficiently long square-free words you can introduce a repeat, take out other repeats, and end up with a shorter word!

I coded up a brute-force tool to find all the elements of the monoid generated from 3 elements, at `https://github.com/simon-frankau/monoid-gen, but having read Chapter 2 of Jean Berstel and Christophe Reuteneur, Square-free words and idempotent semigroups, in Combinatorics on Words, ed. M. Lothaire, Addison-Wesley, Reading, Massachusetts, 1983, courtesy of this post, I realise it's very much simpler than that. Sufficiently so that I'll give an intuitive proof sketch here.

Notation-wise, I'll use upper-case letters for strings, lower-case letters for letters. Otherwise, it's just ASCII. My blog software is really dumb.

First, given a string AB, where B contains a subset of the letters in A, we can find a string that, massively abusing notationg, I'll call B^-1 such that AB B^-1 = A. We can prove this by going a letter at a time. Say AB = AB'x (that is, the last letter of "B" is "x"). "A" contains an "x", let's say A = LxR. Then AB RB' = LxRB'x RB' = LxRB' = AB' - we have found a string to append to "remove" a letter. Do this repeatedly, and we can strip off any word, as long as the letters involved appear again sometwhere earlier in the word.

By symmetry, we can find strings to prefix to "remove" prefixes.

Given this, we can now prove that a string LMR, where L and R are the shortest left and right substrings that use all the letters in LMR, can be reduced to LR, and hence we can break all the strings down into equivalence classes based on L and R (I'm deliberately ignoring the cases where L and R overlap for simplicity - the original proof handles this fine).

Roughly, we can find "R^-1" and "(LM)^-1" (as described above) such that LMR = LR R^-1 MR = LR LR R^-1 MR = LR LMR = L (LM)^-1 LMR LMR = L (LM^-1) LMR = LR.

This means, that rather than the exhaustive enumeration I did, I could simply generate all the left and right monoids on two-letter words, using two of the three letters, and add the third letter to get the left and right ends of the word, eliminate any overlap, and have an element. For example, choose "ab" on the left, "cbc" on the right. The left and right parts are "abc" and "acbc" respectively, there are no overlaps, so "abcacbc" is an element.

Posted 2022-12-30.

Full index