# Categorical translation for mathematical language

## Part 1

###### 7th November, 2019

- Implementation before theory
- The dictionary and its structure
- The main theoretical input
- The categorical setting
- The underlying type theory
- Languages as Grothendieck constructions
- Sentences as trees
- Compositional translation
- Nonsensical sentences
- What’s next?
- Assorted references

Being unable to ever properly finish any project that I start, but loving starting new projects, has made getting around to typing up this blog post quite an effort. Not only that, but it’s also unsatisfying to me how much I’ve failed to understand the categorical framework behind my translation project, so it’s mildly intimidating (to say the least) to present this stuff to the whole internet (although, in actuality, it’s really just to the one (mabye two) reader(s) of this blog), but I’m doing so in the hopes that somebody who actually knows about this sort of applied category theory can help me get somewhat closer to a solid understanding.

The idea for this project came to me whilst working on the translation of EGA that
Ryan Keleti and I have been chipping away at for the past however-many
months. Wouldn’t it be nice, I thought to myself, if there were some
universal language for doing mathematics? It’s true that most
mathematicians can (and have to) read (and write) papers in English, but
this doesn’t mean that it’s their preferred language. I mean, I can read
and write papers in French^{1}, but trying to
*understand* maths in French does create a whole new hurdle. Even
for adept foreign-language speakers, there is some extra mental
processing, and any hurdle, no matter how small, can be the straw that
breaks the camel’s back when it comes to understanding mathematics — a
rather difficult subject at times, to say the least.

Something particularly nice about the EGA (and, I suppose,
Grothendieck’s style in general) is the use of formulaic sentences.
Often, good mathematical writing is good because it’s enjoyable to read,
maybe somewhat whimsical at times, and not simply an arid recapitulation
of a dictionary entry. But there is a (very important) place for more
repetitive, dry language, especially when it comes to works that are, in
some sense, actually trying to be like a dictionary (or maybe more of an
encyclopedia). The Stacks
project, the EGA/SGA, Kerodon,
Hartshorne^{2}, and other such endeavours aim
(amongst other things) to serve as a sort of definitive reference for a
subject: not so much something to be read from cover to cover, but more
something to refer to whenever you want to know something for sure.

So then, I continued to say to myself, wouldn’t it be nice if we only
had to translate EGA *once*, and have it available for everybody
to read in their own language? Of course, if you stick it into some
online translator then you might get something mildly cohesive, but
*(a)* most translators struggle with mathematical terminology,
and *(b)* the delicate aspects of the more complex parts of, for
example, algebraic geometry, are not really something that you want to
read in a translation that’s ‘probably mostly pretty accurate some of
the time’. Going from one language to another, it becomes very easy to
mix up the logical implications of, for example, the order of a
sentence; there is a big difference between ‘A if B’ and
‘if A, B’, but in a given language, the difference
might be easy to miss for a non-native speaker.

There is a nice thing about mathematical writing though: the
necessity for preciseness is so important that there is usually very
little ambiguity in any given sentence. This, I said to myself, in the
last soliloquy of this introduction, might mean that writing translation
software intended to translate *only* mathematics is actually
somewhat *easier* than translating arbitrary phrases. Not only
that, but when you restrict your mathematical writing even further to
somewhat-boring sentences of the form ‘If A then B.
Let x be a T. Then C
implies B.’ then you work on each
little logical sub-part and glue them all together to get the whole
sentence. Compositionality!

# Implementation before theory

↟In case I hadn’t made it clear already, I am neither an applied category theorist nor a linguist (nor a polyglot, nor a programmer, actually), so my approach to this was to write some proof-of-concept software that at least half worked before trying to figure out any theoretical details. I wanted to come up with something that could be interacted with in a browser, so I made the absolutely terrible decision of coming up with a prototype in JavaScript. Don’t get me wrong, JavaScript has its uses, but so does a compost bin, and I wouldn’t want to use one of them to write a functorial framework.

If you want to play around with the demo, then you can do so here, although
there are very few sentences that it can cope with translating, and it
has more bugs than the ICE2020.
If you want to see *something* that works, try typing in
something like ‘Let $X$ be a scheme of finite type’ into the box
labelled ‘EN’ (or the French equivalent into the box labelled ‘FR’,
etc.) and seeing what happens. Of course, if you are the kind of person
who treats having loud construction works outside your window as some
sort of free concert, or enjoys going to landfill sites for the
aesthetic pleasure, then you might be masochistic or tasteless enough to
want to look at the code making it tick (in the loosest sense of the
word): nearly everything is in the page source, but you can have a look
at the GitHub repo here.

One reason that this demo is so terrible is that it is actually doing three things, and two of them are things that have probably already been really well implemented, but I didn’t realise this until it was far too late. What it actually does is the following:

- turn a sentence into some ‘abstract syntax tree’ (is that the right use of that phrase?);
- actually translate the sentence (in this tree form); then
- turn the tree back into a sentence in the target language.

Steps 1 and 3 are much easier here than elsewhere because we’ve restricted ourselves to mathematical language, but still, they were far beyond my Javascript capabilities; Step 2, which is the actual bit that I was trying to figure out, is actually, computationally, pretty trivial.

# The dictionary and its structure

↟Even if the category theory behind all this turns out to be
uninteresting, or just plain wrong, my hope is that the multilingual
dictionary of mathematical terminology will be of at least some use, and
I’m planning on making a way of contributing to it publicly available,
so we can all have one nice resource, instead of having to go on
Wikipedia and hope that some person has been nice enough to write an
article on the object whose name you want to translate in the language
that you want it translated into. At the moment the dictionary is very
(*very*) small, but that’s mostly because I am at the time in my
life where my thesis is going to be very unhappy if it goes any longer
without being finished. You can see what little I currently have here.

Before delving into the ‘theory’ (if I can dare to call it that), I thought it might be nice to have a look at what an entry from the dictionary looks like. It’s in JSON format, which is quite a nice way of storing (and conveying) data consisting of key-value pairs and lists and such like. I’m thinking of eventually moving over to Dhall, when I get around to writing the translation code in Idris or Haskell or something, but that’s not going to be for quite a while yet. So, anyway, here’s the entry for “group”:

```
"0677022e": {
"root": {
"DE": { "atom": "Gruppe", "gend": "feminin" },
"EN": { "atom": "group" },
"ES": { "atom": "grupo" , "gend": "masculino" },
"FR": { "atom": "groupe", "gend": "masculin" },
"IT": { "atom": "gruppo", "gend": "maschile" },
"JA": { "atom": "群" },
"PT": { "atom": "grupo" , "gend": "masculino" },
"RU": { "atom": "группа", "gend": "же́нский" },
"ZH": { "atom": "群" }
},
"adjs": {
"0ba202dd": {
"EN": { "atom": "reduced", "pstn": "before" },
"FR": { "atom": "réduit" , "pstn": "after" }
},
"0b0e02cd": {
"EN": { "atom": "abelian", "pstn": "before" },
"FR": { "atom": "abélien", "pstn": "after" }
},
"08b70280": {
"EN": { "atom": "finite", "pstn": "before" }
}
}
}
```

As you can see, we have a bunch of data concerning the *root*,
which is just my way of saying ‘the noun itself’. (For the moment I’m
just going to talk about nouns. Sentences are a whole nother kettle of
fish.) For each language, we just say what the translation is (what I’ve
been calling the *atom*, thinking of sentences as compositional
things made up of smaller things), and what the grammatical gender of
the noun is.^{3} Then we have a list of
*adjectives* that can modify the noun. This is something which I
think is quite neat, if not seemingly inefficient: to avoid any
ambiguity, we have a separate entry for each time an adjective can
modify a different noun. Why? I think it’s not that hard to believe
that, although in English the word ‘normal’ can describe a group or a
scheme, in some other language there might be a distinction, i.e. a
different word used in each case. Because of this, it seems a lot nicer
to make sure that adjectives really do ‘belong’ to the noun, in some
sense. Not only that, but it also ensures that adjectives will always^{4} agree (in gender, for example) with
the noun that they modify, which means we have one less thing to worry
about.

The thing that I haven’t explained is the key for this entry: what is
‘0677022e’? Well, as we’ll soon see, this whole system is built on the
idea of some sort of underlying ‘base’ language, and most of this is
vaguely like the idea that you can write notation instead of words, such
as writing ‘∃’ instead of ‘there exists’, but there isn’t a notation for
*every* single mathematical object. So, here, to come up with
some language-independent way of writing the word ‘group’, we simply use
a hash (Adler32) to generate a nonsensical but *unique*^{5} name for the entry.

Let’s quickly look at what an entry for something that *isn’t*
a noun looks like. Here is the entry for the ‘constructor’ (or phrase,
really) ‘Let … be …’, as in, ‘Let X be
a scheme.’

```
"η": {
"fullType": "s",
"argsType": "v,n",
"EN": {
"atom": "let # be #",
"type": "s((v←)⊠(n←))",
"vari": "1,2"
},
"FR": {
"atom": "soit",
"type": "s((n←)(v←))",
"vari": "1,2"
}
}
```

This now really is something that has been cobbled together to make
some software ‘work’, but there is an underlying structure that I’ll
describe in a bit. The entry here is called ‘η’ (for some reason that I
can’t remember, but it’s possibly notation stolen from modal logic?)^{6}, it takes two arguments, of type
‘variable’ and ‘noun’, and, when fully curried (do people say that? I
mean, when it has been applied to the full number of variables), is of
type ‘sentence’. The types of the entries for English and French might
seem more familiar to those who’ve seen any pregroup grammar stuff
before, but I wouldn’t assume that they mean what you think they mean,
because I *don’t* really know my pregroup grammar stuff very
well, so have probably butchered it somewhat. Sorry. Finally, there’s
the ‘vari’ entry, which will help us to deal with the fact that
different languages like to put words in different orders.

# The main theoretical input

↟To summarise my one ‘contribution’ to this translation problem, I can say the following: ‘words have meanings, and since, in mathematics, sentences are compositional, we can do translation by thinking of words as being fibres over some fundamental base meaning, and translate them by just ensuring that the image of the fibre is a fibre over the same basepoint.’ Of course, I now need to justify and explain this…

# The categorical setting

↟The paper that I’ve been reading to try to help me understand what I want to say is the brilliant Translating and Evolving: Towards a Model of Language Change in DisCoCat, by Tai-Danae Bradley (Graduate Center, CUNY), Martha Lewis (ILLC, University of Amsterdam), Jade Master (Dept. Mathematics, UC Riverside), and Brad Theilman (Gentner Lab, UC San Diego). For the sake of not making this already-too-long post any longer, I will assume some familiarity with this paper.

Before going any further though, I would like to point out that the
flow of mathematical ideas has been entirely unidirectional: none of the
above authors are to blame for any of the ramblings that now follow. And
I really do mean ‘ramblings’, because, again, lest I never actually
write this all up, I’m making things easier for myself by assuming that
everybody reading this is imagining giant scare quotes around every
single sentence that now follows. *If you are not yet doing so, then
please now do so.*

# The underlying type theory

↟Let T be some set of types, say T=\{n,v,s,...\}, where n stands for ‘noun’, ‘v’ for variable (think X, f_{12}, \widehat{\mathcal{C}}, etc.), and s for ‘sentence’.

Already I’m avoiding the subtle issue of *subtypes*. For
example, ‘variable’ is (in this framework) a subtype of ‘noun’, in that,
if something is of type v then it is
inherently of type n too.

Let \mathcal{T} be the free pregroup on T, ‘freely promoted to a 2-rig’, by which I mean ‘throw in some notion of tensor product’, which we denote by \boxtimes. (Whether or not this makes sense is something that I’m really not sure about. There are a lot of structures now floating around that have to all interact with each other. There are some nice comments about this on this Twitter thread.) The use of this product will be to deal with constructors (like the aforementioned ‘Let … be …’) that take arguments in a more complicated manner than just concatenation.

Let \mathcal{B}, the
*base-language category*, be the category whose objects are pairs
(a,t), where t\in\mathcal{T} and a\in\mathsf{FormStr}, which is just the
*set* of ‘formal strings of characters’, and whose morphisms are
given by those in \mathcal{T}. The idea
is that an object of \mathcal{B} is
some entry in the ‘base-language dictionary’, e.g. ‘0677022e’ or ‘η’,
along with a type. The type of ‘η’, for example, would be written as
s((n←)(v←)) (which we sort of decode
into the ‘fullType: s’ and ‘argsType:
v, n’
values in our JSON version above).

What, then, should the category \mathsf{EN} of English ‘words’ be? Keeping in
mind how we want to translate things (by just making sure that we map an
English ‘word’ to a French (for example) ‘word’ *that lies over the
same base-language ‘word’*), we can come up with the idea that
elements of \mathsf{EN} should be
triples (a,b,t), where a\in\mathsf{FormStr}, b\in\mathcal{B}, and t\in\mathcal{T}. But we need some sort of
coherence property: the type of some English ‘word’ should somehow
respect the type of the base ‘word’ that it lies over! So we stipulate
that t\in\mathcal{T}|_{\operatorname{type}(b)},
where \mathcal{T}|_{\operatorname{type}(b)} is the
free pregroup generated by the elements of T that appear in the expression of the type
\operatorname{type}(b) of b, and where we limit the number of
occurrences of each type to be exactly the same as in \operatorname{type}(b). For example, if we
want to consider ‘Let … be …’ as an element of \mathsf{EN}, then its type should have
exactly one occurrence of s, one of
v, and one of n, because this is what the type of ‘η’ has.
In our setting, its type should be s((v←)⊠(n←)), which does indeed have the
right number of occurrences of each type.

So then, it almost looks like \mathsf{EN} comes from some sort of Grothendieck construction!

# Languages as Grothendieck constructions

↟As we have just said, an object of \mathsf{EN} is a triple (a,b,t), where a\in\mathsf{FormStr}, b\in\mathcal{B}, and t\in\mathcal{T}|_{\operatorname{type}(b)}, so we might be able to realise \mathsf{EN} as the Grothendieck construction of some functor \mathsf{FormStr}\times\mathcal{B}^{\mathrm{op}} \to \widehat{\mathcal{T}}, where \widehat{\mathcal{T}} is the category of all pregroups \mathcal{T}' freely generated on some T'\subset T. We know how this functor should act on objects: (a,b) \mapsto \mathcal{T}|_{\operatorname{type}(b)}, but we need to think about how it acts on morphisms too.

This is something that I’m really not too sure about, but something
that might work is something like the following: a morphism (a,b)\leqslant(a',b') consists of a
reduction^{7}, and so the corresponding functor
\mathcal{T}|_{\operatorname{type}(b')} \to
\mathcal{T}|_{\operatorname{type}(b)} is given by performing the
corresponding expansion, i.e. if your reduction is e.g. n^r n\leqslant 1, then we replace ‘some’
1 in b
by n^r n.

I really don’t know about this bit, and maybe there isn’t actually a good functorial construction that makes this a Grothendieck construction, but I really hope there is, because that gives me some hope that there might actually be some truth hidden behind all this wild gesticulation.

# Sentences as trees

↟Assuming we know how to deal with ‘words’ (which I continue to write in scare quotes because, by this I mean things of arbitrary type, not just of type n), the next step is how to deal with sentences. This is something that, as I said above, I’m pretty sure has been well understood by people for a while now: we think of a sentence as a tree. Here’s an example (which I would normally have made look nice in TikZ, but I’ve already spent far too much time on this whole side project):

But the whole point of our \boxtimes structure on \mathcal{T} is that we can write these trees as ‘linear concatenations’, i.e. as strings with arguments just being applied by concatenation on the right.

Something that is very important, and very subtle, is the
*order* of arguments. Normally this is something that comes up
when you talk about adjectives, since most adjectives in English come
*before* the noun that they modify, but most adjectives in, say,
Spanish, come *after* the noun. This specific example of order
doesn’t really bother us, because of the way that we treat adjectives as
sort of ‘second-class objects’, in that they come bound to a noun, and
can thus contain the information of order in their dictionary entry, but
we still need to worry about order more generally. I actually can’t
think of an example right now, but I’m sure that somebody out there can:
in English, French, and Spanish (the only languages that I can claim any
degree of knowledge in), the construction ‘Let … be …’, for example,
takes its two arguments in the same order in all three. True, in French
the type is s((n←)(v←)), whereas in English it’s s((v←)⊠(n←)), because
in French we say ‘Soit R un anneau’,
and in English we say ‘Let R be a
ring’, but the ordered list of arguments is still (R,044a01b1), where ‘044a01b1’ is the hash of
the word ‘ring’.

There are two ways (*that I can think of*) of dealing with
order:

- keep track of how order changes for every specific entry between every possible pair of languages; or
- come up with some ‘canonical’ order for the base-language entry, and then either attach the data of a permutation to each language entry or use labelled occurrences of elements of \mathcal{T}.

The first sounds like an administrative nightmare; the second sounds ‘evil’, but seems to work, and is thus the one for which I opted.

# Compositional translation

↟What *is* translation then? Well, on sentences, it’s given
compositionally (I’m literally trying to wave my hands as I type that),
and on sentences it’s given by the profunctor^{8}
\mathrm{Tr}_{\mathsf{L}_1,\mathsf{L}_2} \colon
\mathsf{L}_1 \nrightarrow \mathsf{L}_2 defined by

That is, translating words is just given by finding a word in the target language with the same base. “But what if there are synonyms?” I hear you cry, to which I defensively respond “well just pick any of them then, because if they really are synonyms then it doesn’t matter which one we use”.

# Nonsensical sentences

↟Of course, at the moment we could conceivably translate entirely
(mathematically) nonsensical sentences, like ‘If X is the group and let z be the complex triangle then P does not hold false.’ An interesting
question, I think, is whether or not we could introduce some sort of
logic on our base language \mathcal{B}
to *prevent such sentences from even existing*. That is, think of
our base language as some sort of proof assistant, like Lean or Coq or
whatever, instead of just arbitrary symbols. This is perhaps a more
interesting question, since we’re describing a sort of translation from
proof assistant to natural language, and this is something that’s
already been studied (see the references at the end of this post).

# What’s next?

↟Wow. This ended up being much longer than I intended it on being, but
I got a bit carried away. I think that, given some more time, I could
actually build a genuinely working prototype of this translation
software, and this makes me believe that there *has* to be some
categorical underpinnings somewhere. I’ve tried to unearth them myself,
and this post is a semi-lucid stream-of-consciousness record of my
efforts, but that’s about all I can manage right now.

I have no idea what my future holds in terms of postdocs (or lack
thereof), but I hope to also get a chance to make the dictionary itself
public and easy to contribute to.^{9}

My dream implementation of this software would be that one could write something in the language of their choice, run some command on the source file, and be told how much of it can be translated into any given target language, and detail which sentences it can’t recognise. A bonus feature would be some sort of ‘did you mean …’ suggestion, or an autocomplete, to save people from having to go consult the dictionary to see how exactly they ‘should’ have phrased what they wanted to say to fit within the confines of the limited language.

If anybody has read any of this and wants to talk to me about it, then please do! (Although if it’s just to point out all the flaws and mistakes, please at least also send a cute or funny picture of a bird or something, so that I don’t have to only read negative things. I am very sensitive.) You can talk to me on Twitter, by email, on Matrix, or any other way that you want. I love it when people actually want to talk to me.

# Assorted references

↟Here, in no particular order (really, no order at all), are some references that I’ve put on my reading list for this project.

- https://kwarc.info/people/mkohlhase/papers/icms16-smglom.pdf
- https://kwarc.info/people/mkohlhase/papers/synasc13.pdf
- http://www.logique.jussieu.fr/~alp/structure_vernacular.pdf
- https://uniformal.github.io/doc/
- http://mizar.org/JFM/
- https://docs.weblate.org/en/latest/index.html
- https://jiggerwit.wordpress.com/2019/06/20/an-argument-for-controlled-natural-languages-in-mathematics/
- https://en.wikipedia.org/wiki/Pregroup_grammar
- https://arxiv.org/pdf/1811.11041.pdf