Entity extraction and text analysis is the process of acquiring textual information, processing that text into natural language for Topic Analysis and Sentiment Analysis, and then applying these results to any use-case depending on your specific requirements.
If you were asked to come up with a new marketing concept to appeal to a particular demographic, for example, you could use a service such as AlchemyAPI to transform vast numbers of tweets into manageable, actionable data to gain insights into what people are talking about. This would include what topics people are tweeting, as well as their sentiments toward those topics, being either positive neutral or negative.
In Justin Seitz’s recent tutorial on Automating OSINT, he implements the AlchemyAPI with Python to analyse the contents of a document recently released by the Office of the Director of National Intelligence. The document is the contents of Osama Bin Ladin’s bookshelf, including books he was reading and letters he had written.
After installing the prerequisites and signing up for an API key, the author supplies a (self-admittedly ugly) Python script to download all of the documents. Using the AlchemyAPI, followers are then guided through the entity extraction process to discover the most popular person, place or thing written about in the letters. This involves opening each PDF, extracting the text and submitting it to Alchemy for processing. Results from each document will then be combined to provide an overall account.
The author assumes no prior knowledge of coding, however, he does reference previous posts for instructions on how to install specific tools. Links are all included, as are useful line-by-line descriptions of the code provided to make this tutorial simple to follow and easy to perform entity extraction on your own.