I’m currently reading The Listening Society by Hanzi Freinacht – a book a can warmly recommend to anyone interested in complexity, psychology and how to find a better political way ahead in these digital, globalized and indeed complex times. But this experiment zooms in on the part about psychological state and tries to see if a statistical model of human language in tweets can guess similar words of the correct sort of state, high or low.
I’ve experimented with training a classifier to look for a similar categorization before, in around 2009. I called it mood classification instead of state, but since I had apparently read similar books as Hanzi I had the same classes, high and low, which was confusing to a lot of people who expected something simpler like ”happy” or ”upset”. I actually renamed the classes in the public facing classification GUI on uClassify, but it’s still the same. I haven’t been able to evaluate the accuracy of that original classifier, mostly due to lack of time, but also up until recently, programming skills applied to machine learning and statistics.
When reading the chapter about psychological high and low states I wanted to experiment a little with the pre-trained word2vec word embedding I created some time ago to see if Hanzis example words somehow get support in the data. It’s a very simplistic experiment, but the point was that I wanted to do something quick’n’dirty while I’m home with a cold to see if it’s worth pursuing any further. Currently I think it’s not.
The experiment suggest that the word embedding model is pretty good at (78% of 50 manually judged) finding similar low state words, but poor (30%) at finding similar high state words.
Here is the experiment notebook with links to the multi-lingual (Swedish, English, Norwegian, Finnish mostly) pre-trained word2vec model