How do personal assistants typically generate sentences

algorithmsdata structuresnatural-language-processing

This is sort of a follow up to this question about NLG research directions in the linguistics field.

How do personal assistant tools such as Siri, Google Now, or Cortana perform Natural Language Generation (NLG)? Specifically, the sentence text generation part. I am not interested in the text-to-speech part, just the text generation part.

I'm not looking for exactly how each one does it, as that information is probably not available.

I am wondering what setup is required to implement sentence generation of that quality?

  • What kind of data would you need in a database (at a high level)?
    • Does it require having a dictionary of every possible word and it's meaning, along with many books/corpora annotated and statistically analyzed added to it?
    • Does it require actually recording people talk in a natural way (such as from TV shows or podcasts), transcribing them to text, and then adding that somehow to their "system"? (to get really "human"-like sentences)
    • Or are there just simple syntax-based sentence patterns they are using, with no gigantic semantic "meaning" database? Where someone just wrote a bunch of regular expressions type thing..
  • What are the algorithms that are used for such naturally written human-like sentences?

One reason for asking is, it seems like the NLG field is very far from being able to do what Siri and Google Now and others are accomplishing. So what kind of stuff are they doing? (Just for the sentence text generation part).

Best Answer

Siri typically doesn't "generate" sentences. She parses what you say and 'recognizes' certain keywords, sure, and for common responses, she will use a template, such as I found [N] restaurants fairly close to you or I couldn't find [X] in your music, [Username].

But most of her responses are canned, depending on her interpretation of your speech, in addition to a random number generator to choose a creative answer to a flippant question. Simply asking Siri "How much wood can a wood chuck chuck?" or "What is the meaning of life?" will generate any of a variety of answers. There are numerous cultural references and jokes built-in (and repeated verbatim) that prove with relative certainty that Siri is not just spontaneously generating most of her text, but pulling it from a database of some sort. It's likely that incoming questions are saved to a central server, where new responses to those questions can be created by Apple employees, allowing Siri to "learn".

Her text-to-speech part is good enough, however, that it sometimes makes it seem as though the answers are being generated...

Related Topic