Word Embedding Captures Semantic Relationships

November 10, 2016

The article on O’Reilly titled Capturing Semantic Meanings Using Deep Learning explores word embedding in natural language processing. NLP systems typically encode word strings, but word embedding offers a more complex approach that emphasizes relationships and similarities between words by treating them as vectors. The article posits,

For example, let’s take the words woman, man, queen, and king. We can get their vector representations and use basic algebraic operations to find semantic similarities. Measuring similarity between vectors is possible using measures such as cosine similarity. So, when we subtract the vector of the word man from the vector of the word woman, then its cosine distance would be close to the distance between the word queen minus the word king (see Figure 1).

The article investigates the various neural network models that prevent the expense of working with large data. Word2Vec, CBOW, and continuous skip-gram are touted as models and the article goes into great technical detail about the entire process. The final result is that the vectors understand the semantic relationship between the words in the example. Why does this approach to NLP matter? A few applications include predicting future business applications, sentiment analysis, and semantic image searches.

Chelsea Kerwin,  November 10, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

ACA Application Process Still Vulnerable to Fraudulent Documents

November 20, 2015

The post on Slashdot titled Affordable Care Act Exchanges Fail to Detect Counterfeit Documentation relates the ongoing issue of document verification within the Affordable Care Act (ACA) process. The Government Accountability Office) GAO submitted fake applications to test the controls at the state and federal level for application and enrollment in the ACA. The article states,

“Ten fictitious applicants were created to test whether verification steps including validating an applicant’s Social Security number, verifying citizenship, and verifying household income were completed properly. In order to test these controls, GAO’s test applications provided fraudulent documentation: “For each of the 10 undercover applications where we obtained qualified health-plan coverage, the respective marketplace directed that our applicants submit supplementary documentation we provided counterfeit follow-up documentation, such as fictitious Social Security cards with impossible Social Security numbers, for all 10…”

The GAO report itself mentions that eight of the ten fakes were failed at first, but later accepted. It shows that among the various ways that the fake applications were fraudulent included not only “impossible” Social Security Numbers, but also duplicate enrollments, and lack of employer-sponsored coverage. Ultimately, the report concludes that the ACA is still “vulnerable.” Granted, this is why the GOA conducted the audit of the system, to catch issues. The article provides no details on what new controls and fixes are being implemented.
Chelsea Kerwin, November 20, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

  • Archives

  • Recent Posts

  • Meta