Mastodon; Machine Learning; Mandarin
Mastodon; Machine Learning; Mandarin
TL;DR I made a Mastodon bot that uses Tensorflow, Stable Diffusion and Text to Speech to create flashcard like posts to study for the HSK Mandarin Test. Here is an example:
=> https://botsin.space/@hsk_words/109408750566290159 苹果
I’ve been studying Mandarin Chinese for quite a while now. Far longer than my ability would dictate.
In the past I’ve studied in waves. I’d study for 3 or 4 months, and then I wouldn’t study for a long while - sometimes years. Then, I’d pick it up again and toddle around with it for a while. Learning Mandarin has become more of a fun hobby than any kind of life goal at this point.
To finalise this “project”, I decided to take the HSK Test (汉语水平考试 - the Mandarin Language proficiency test) to see my official level. I have thought about taking it in the past, but I just never did. While theoretically I should be somewhere in the intermediate range, I decided to try to do the tests in order. So I dusted off an old Anki deck of HSK 1 vocabulary words and started to refresh myself.
But… I’ve also been getting more into machine learning, and I’ve also been looking for a reason to play with ActivityPub. So, as I often do, why not play with all of them at the same time?
(ActivityPub is the distributed underlying protocol that Mastodon uses - Mastodon is, depending on who you ask, the “new Twitter”)
If you haven’t been paying attention, machine learning has come an incredibly long way in the past few years.
Just a few years ago, I was trying to wrap my uneducated mind around gradient decent, back propagation, neurons, weights, and loss functions - things I could grasp from a high level, but getting my hands dirty… that was beyond my skill to heal.
Now, using incredibly sophisticated models is as easy as calling a function. It’s amazing.
ML libraries tend to do the same things, but because I have access to data scientist geniuses at work who use Tensorflow, that’s the library I’ve been playing around with.
Make Some Images - Stable Diffusion
I find that flashcards with images help me remember words. So after I created a basic random word selector function, I decided to feed the English translation into Stable Diffusion to generate an image for the flashcard.
Stable Diffusion is known for being used with the Python library pytorch, but I found a Tensorflow version that Divam Gupta ported:
=> https://github.com/divamgupta/stable-diffusion-tensorflow Tensorflow Stable Diffusion
It’s the old 1.4 version, but that’s good enough for my purposes.
I don’t want to go too in depth here, but if you’re into this kind of thing have a look at this to dig in and see how it’s all working:
=> https://github.com/robrohan/stable-diffusion-tensorflow/blob/6db7e56b7f423405b4668c601409e17f0f978e55/stable_diffusion_tf/stable_diffusion.py#L222 Tensorflow Stable Diffusion Models
Now, if you know much about Stable Diffusion, you know that the images it generates can be on point, or they can be completely bizarre. For language learning, I think both will work well, but feeding a single word into the process wasn’t giving even slightly interesting results.
I stumbled upon a small data set of Mandarin sentences, and decided to lookup a random sentence that contained the word. Instead of just a single word to generate the image, it now uses the found sentence as the prompt. Using a full sentence often creates a better scene for the word.
This process can sometimes create some absolutely crazy images - which is fantastic.
Make Some Sounds - zhtts
One part of my Mandarin study I have always struggled with is being able to understand the spoken word. I can read well enough, and if I concentrate I can speak using proper tones, but I struggle to catch what people say.
So, to help with that I used another library zhtts (which is an implementation of TensorflowTTS), to take the Stable Diffusion example sentence and make an audio file.
The library uses the TACOTRON model, which is an impressive model.
Make a Video - ffmpeg
To put the audio and video together, I just call ffmpeg and create an mp4.
I wanted to dig into the ActivityPub protocol for this project, but as you might already expect, Python has a library that interacts with Mastodon without any real effort:
I was happy with how easy it was to get this process to post to Mastodon. Mastodon reminds me of how Twitter used to be when it first started.
This post is a bit of a ramble, but I thought the project was a simple, novel use of some of the new-ish tech hanging around in 2022.
While it’s not as easy to build correctly, Machine Learning has become incredibly easy to use. If you’re building start ups or adding new features to your projects, have a look at the possibilities the preexisting models might be able to give you.
Here is the source if you want to have a look:
And don’t look up Roko’s Basilisk.