Wednesday, March 6, 2013

Ngram Viewer, a wonderful Google tool!

Today I wanna talk you about Google Ngram Viewer

A very, very valuable tool from Google!

First of all .. what is a n-gram?

According to wikipedia:

In the fields of computational linguistics and probability, an n-gram is a contiguous sequence of n items from a given sequence of text or speech. An n-gram could be any combination of letters. However, the items in question can be phonemes, syllables, letters, words or base pairs according to the application. The n-grams typically are collected from a text or speech corpus.

So basically, a word is an  n-gram of size 1 (or unigram)  while "I am" is an n-gram of size 2 (or  bigram), "the view is awesome" is an ngram of size 4 and so on...

Now that you know what a ngram is let's see what Google Ngram can do for us.
When you enter phrases into the Google Books Ngram Viewer, it displays a graph showing how those phrases (the ngrams)  have occurred in a corpus of books in the selected language (British, French, Italian, German, Spanish etc ) over the selected years (usually 1800-2008).  And you can make comparisons among different ngrams!

Just take a moment and think how cool it is!

You can see the popularity of some characters like "Albert Einstein" vs "Sherlock Holmes" between 1800 and 2008. So just enter in the CASE SENSITIVE research field:

Albert Einstein, Sherlock Holmes

Or you can check the occurrence of awesomeness in the published works (any reference to Barney Stinson is purely fictional!)

I know,  I know  these are just amenities... however we can make a better (= useful) use of the Ngram Viewer.
An example?

Which expression do you think is more popular:

Case 1: "appealing to me"
Case 2: "appealing for me"

This is the kind of doubts I have sometimes... the solution I had before discovering Ngram  Viewer was just to run a search on Google, but now....

So we learn that the correct (i.e. the most used ) ngram is "appealing to me". Nice, eh?

Another very common use I found for Ngram Viewer is to check German expressions, especially the ones involving reflexive Verben when you never know in advance if the Dativ or Accusative form of the personal pronoun is required!

Just as an example, let's compare the two 3-word ngrams

1) Ich bedanke mich

2) Ich bedanke mir  

 As you can see "Ich bedanke mir" ist kein Deutsch!!!!! ;) Indeed Ngram Viewer could not find any occurrence of "Ich bedanke mir"!

Another example (even if this can be checked on dictionary too ...)

1) Ich denke an dich
2) Ich denke an dir

No results were found for "Ich denke an dir" so better not to use this expression ... in a sms or letter ;)

Do not be surprised, but Google Ngram Viewer is way more powerful than what I told you!
For example, you can use the mathematical operators to make comparisons even more clearer.

Let's compare Bigfoot (also known as Sasquatch) with the Loch Ness monster (also known as Nessie) by entering the expression:

(Sasquatch+Bigfoot), (Loch Ness monster+Nessie)

OK, it seems that Bigfoot is way more popular ... but can we be more accurate? Of course we can!

Let's graph (Sasquatch+Bigfoot)/(Loch Ness monster+Nessie)

Other possibilities offered by Ngram Viewer include tags, and with them you can make very complicated (and interesting ) graphs!
If you want to learn more, just visit the Google Help Page of Ngram Viewer at this link

1 comment:

Anonymous said...

that's just amazing! this will help me a lot, for sure