Natural Language Processing with iOS

5 minutes, 29 seconds

Natural Language Processing is an art to identify the context of what naturally what human speaking or talking into something that a computer can understand.

We will discuss about using NSLinguisticTagger, which will identify what kind of word is being used. Also, we will talk about NSDataDetector, which also important to get the data information, from the text not only from human speaking.

These techniques are not only used for Siri-like speech recognition, like this example, but also on normal text usually found on the web browser like Safari, which we can do certain action on some particular text like to call when tap on the phone number.

NSLinguisticTagger

NSLinguisticTagger is an API for natural language processing which allowed us to tokenize text, detect the language, and determine parts of speech. Best of all this are done on locally on the machine. With Swift we also can test out the code with playground.

We will have the base code that will tokenize each word from a sentence that we want to test with.

This kind of API make the technology more humanizing, trying to understand the context of what human being want to say.


typealias TaggedToken = (String, String?)

func tag(text: String, scheme: String) -> [TaggedToken] {
  let options: NSLinguisticTagger.Options = [.omitWhitespace, .omitPunctuation, .omitOther]
  let tagger = NSLinguisticTagger(tagSchemes: NSLinguisticTagger.availableTagSchemes(forLanguage: "en"),
    options: Int(options.rawValue))
  tagger.string = text
  
  var tokens: [TaggedToken] = []
  
  // Using NSLinguisticTagger
  tagger.enumerateTags(in: NSMakeRange(0, text.characters.count), scheme:scheme, options: options) { tag, tokenRange, _, _ in
    let token = (text as NSString).substring(with: tokenRange)
    tokens.append((token, tag))
  }
  return tokens
}

// Implementation 

func partOfSpeech(text: String) -> [TaggedToken] {
  return tag(text: text, scheme:NSLinguisticTagSchemeLexicalClass)
}

func lemmatize(text: String) -> [TaggedToken] {
  return tag(text: text, scheme: NSLinguisticTagSchemeLemma)
}

func language(text: String) -> [TaggedToken] {
  return tag(text: text, scheme: NSLinguisticTagSchemeLanguage)
}

// Try them out
partOfSpeech(text: "Tutor at work yesterday at 2pm until yesterday at 6pm.")
lemmatize(text: "I went to the store")
language(text: "My name is Hijazi")

Lets discuss on what it’s actually figure out.

NSLinguisticTagSchemeLexicalClass

This tag scheme classifies tokens according to class: part of speech for words, type of punctuation or whitespace, etc. The value will be one of the constants specified in NSLinguisticTagSchemeLexicalClass.

So every word can be tagged with numbers of NSLinguisticTagSchemeLexicalClass string such: Noun, Verb, Adjective, Adverb, Pronoun, Determiner, Particle, Prepostition, Number, Conjunction, Interjection, Classifier, Idiom, OtherWord, SentenceTerminator, OpenQuote, CloseQuote, OpenParenthesis, CloseParenthesis, WordJoiner, Dash, OtherPunctuation, ParagraphBreak, and OtherWhitespace.

NSLinguisticTagSchemeLemma

This tag scheme supplies a stem forms of the words, if known. For example “I went to the store” we can find out “went” is the past tense for “go”.

NSLinguisticTagSchemeLanguage

Tag that determine the language of the text. The tag values will be standard language abbreviations such as “en”, “fr”, “de”, etc..

Results

Here is the screenshots for all tests.

screen-shot-2016-11-02-at-3-05-05-am

For language detection, it also detect language like Japanese that doesn’t use space in sentense too.

screen-shot-2016-11-02-at-3-09-32-am


NSDataDetector

NSDataDetector is a subclass of NSRegularExpression, but instead of matching on an ICU pattern, it detects semi-structured information: dates, addresses, links, phone numbers and transit information.

It does all of this with frightening accuracy. NSDataDetector will match flight numbers, address snippets, oddly formatted digits, and even relative deictic expressions like “next Saturday at 5”.


var testString : NSString = "You may call my number at +6016-337-3081, or visit irekasoft.com, irekasoft.com/blog by next monday at San Jose, California on 1 pm"

let types : NSTextCheckingResult.CheckingType = [.address , .date, .phoneNumber, .link ]
let dataDetector = try? NSDataDetector(types: types.rawValue)

dataDetector?.enumerateMatches(in: testString as String, options: [], range: NSMakeRange(0,testString.length), using: { (match, flags, _) in

  let matchString = testString.substring(with: (match?.range)!)
  
  if match?.resultType == .date {
    
    print("date: \(matchString)")
  
  }else if match?.resultType == .phoneNumber {
  
    print("phoneNumber: \(matchString)")
    
    
  }else if match?.resultType == .address {
    
    print("address: \(matchString)")
    
    
  }else if match?.resultType == .link {
    
    print("link: \(matchString)")
    
    
  }else{
    print("else \(matchString)")
  }

})

Result


phoneNumber: +6016-337-3081
link: irekasoft.com
link: irekasoft.com/blog
date: next monday
address: San Jose, California
date: 1 pm

Resources:
Ayaka Nonaka’s NLP Presentation
NSDataDetector