Triples – Deep Natural Language Processing

Problem: In Text Mining extracting keywords (n-grams) alone cannot produce meaningful data nor discover “unknown” themes and trends.

Objective: The aim here is to extract dependency relation from sentence i.e., extract sets of the form {subject, predicate[modifiers], object} out of syntactically parsed sentences, using Stanford parser and opennlp.

Steps: 

1) Get the syntactic relationship between each pair of words

2) Apply sentence segmentation to determine the sentence boudaries

3) The Stanford Parser is then applied to generate output in the form of dependency relations, which represent the syntactic relationships within each sentence

How this is different from n-gram?

Dependency relation allows the similarity comparison to be based on the syntactic relations between words, instead of having to match words in their exact order in n-gram based comparisons.

Example:

Sentence: “The flat tire was not changed by driver”

Stanford dependency relations: 

root(ROOT-0, changed-6)
det(tire-3, the-1)
amod(tire-3, flat-2)
nsubjpass(changed-6, tire-3)
auxpass(changed-6, was-4)
neg(changed-6, not-5)
prep(changed-6, by-7)
pobj(by-7, driver-8)

Refer Stanford typed dependencies manual for full list & more info: http://nlp.stanford.edu/software/dependencies_manual.pdf

Triples output in the form (Subject : Predicate [modifier] : Object) :  

driver : changed [not] : tire

Extraction Logic: You can use the below base logic to build the functionality in your favorite/comfortable language (R/Python/Java/etc). Please note that this is only the base logic and needs enhancement.

triples

Challenges:

  • Sentence level is too structured
  • Usage of abbreviations and grammatical errors in sentence will mislead the analysis

 

Hope this article is useful!

Leave a Reply