Home Artists Posts Import Register

Content

In case you missed it, there's a new Computerphile about word embeddings!

https://www.youtube.com/watch?v=gQddtTdmG_8

Don't miss the "extra bits" for this one, there's some fun stuff in there.
https://www.youtube.com/watch?v=usthqKtw2LA

I'm pretty sure I credited Grace Avery at the time, but that didn't make it into the final video, so here's a link to the post that inspired a lot of this:
https://graceavery.com/word2vec-fish-music-bass/

And here's the notebook I put together to do it:
https://colab.research.google.com/drive/1sh7lhZLMRbx0Rb6Uyh-e_byRlnBrR5g6

I didn't want to post that to the general public, because I didn't feel like taking on a big unpaid tech support job helping every youtube viewer to get it to work. But for patrons it's kind of a paid tech support job, which is much better. Plus I think my patrons are a lot smarter on average so it should be easier :p
You'll need a copy of GoogleNews-vectors-negative300.bin in your google drive, in a folder called 'models'. I think I got my copy from here: https://github.com/mmihaltz/word2vec-GoogleNews-vectors
If enough people want it, I'll take the time to put together a more detailed walk-through, let me know if you get stuck.

Files

Vectoring Words (Word Embeddings) - Computerphile

How do you represent a word in AI? Rob Miles reveals how words can be formed from multi-dimensional vectors - with some unexpected results. 08:06 - Yes, it's a rubber egg :) Unicorn AI: EXTRA BITS: https://youtu.be/usthqKtw2LA AI YouTube Comments: https://youtu.be/XyMdpcAPnZc More from Rob Miles: http://bit.ly/Rob_Miles_YouTube Thanks to Nottingham Hackspace for providing the filming location: http://bit.ly/notthack https://www.facebook.com/computerphile https://twitter.com/computer_phile This video was filmed and edited by Sean Riley. Computer Science at the University of Nottingham: https://bit.ly/nottscomputer Computerphile is a sister project to Brady Haran's Numberphile. More at http://www.bradyharan.com

Comments

Poker Chen

Could you then say that, in some sense, the common English language can be captured by a set of N "qualities", where N is the number of neurons in the hidden network? As in, there exists "a space of meaning" in which the language inhabits, such that you need at least ~few hundred neurons to write an accurate, continuous thesaurus. If so, then I suppose the benefit of each additional neuron plateaus after about ~1000 or so. I would also suppose that it would be theoretically possible to build unsupervised translation network such that you can input "dog"+"German"-"English" = "Hund". EDIT: and the size of this network will not be significantly larger than a single-language network. It'll be the union of the two sets of meanings + 1.

robertskmiles

If my understanding of this is correct, the usefulness of this kind of system actually relies on having a relatively small number of dimensions. Clearly, too few dimensions wouldn't allow this to work, but nor would too many. So it's not just that it would plateau, it would go on to get worse after that. Like, if you gave the hidden layer the same number of neurons as the dictionary size, the network could in principle just leave things as they are and you wouldn't really have meaningful directions in the latent space, because there's so much room in the space that it can just put the word embeddings wherever it feels like and still do well. The system lays everything out in this neatly organised way because that's the only way to fit all the information through that bottleneck. There's a pretty deep analogy between meaning/understanding and compression. I don't know where they got the number 300, but it wouldn't surprise me if they tried a range of values and found that 300 works best for that dataset and architecture, which would suggest that the 'true value' for English might be somewhere around there as well. This is not to say that the *dimensions* are meaningful, just *directions*. The dimensions define the space within which the network lays out the embeddings, but there's no reason for meaningful directions in the embeddings to align with the axes of the space. So it's not "English has about 300 axes along which words can vary" but more like "To lay out words such that the most important ways that they vary each have a direction, you need about 300 dimensions". You could probably do principal component analysis on it though, if you wanted. And yeah, people have tried various things like your translation suggestion, with decent results. Check out this, for example: https://github.com/facebookresearch/MUSE

loopuleasa

cool video this one, I really enjoy that you also have a working example for people to play with