Digital Reasoning Releases Synthesis Version 4
December 9, 2016
Digital Reasoning has released the latest iteration of its Synthesys platform, we learn from Datanami’s piece, “Cognitive Platform Sharpens Focus on Untructured Data.” Readers may recall that Digital Reasoning provides tools to the controversial US Army intelligence system known as DCGS. The write-up specifies:
Version 4 of the Digital Reasoning platform released on Tuesday (June 21) is based on proprietary analytics tools that apply deep learning neural network techniques across text, audio and images. Synthesys 4 also incorporates behavioral analytics based on anomaly detection techniques.
The upgrade also reflects the company’s push into user and ‘entity’ behavior analytics, a technique used to leverage machine learning in security applications such as tracking suspicious activity on enterprise networks and detecting ransomware attacks. ‘We are especially excited to expand into the area of entity behavior analytics, combining the analysis of structured and unstructured data into a person-centric, prioritized profile that can be used to predict employees at risk for insider threats,’ Bill DiPietro, Digital Reasoning’s vice president of product management noted in a statement.
The platform has added Spanish and Chinese to its supported languages, which come with syntactic parsing. There is also now support for Elasticsearch, included in the pursuit of leveraging unstructured data in real time. The company emphasizes the software’s ability to learn from context, as well as enhanced tools for working with reports.
Digital Reasoning was founded in 2000, and makes its primary home in Nashville, Tennessee, with offices in Washington, DC, and London. The booming company is also hiring, especially in the Nashville area.
Cynthia Murrell, December 9, 2016
Search Competition Is Fiercer Than You Expect
December 5, 2016
In the United States, Google dominates the Internet search market. Bing has gained some traction, but the results are still muddy. In Russia, Yandex chases Google around in circles, but what about the enterprise search market? The enterprise search market has more competition than one would think. We recently received an email from Searchblox, a cognitive platform that developed to help organizations embed information in applications using artificial intelligence and deep learning models. SearchBlox is also a player in the enterprise software market as well as text analytics and sentiment analysis tool.
Their email explained, “3 Reasons To Choose SearchBlox Cognitive Platform” and here they are:
1. EPISTEMOLOGY-BASED. Go beyond just question and answers. SearchBlox uses artificial intelligence (AI) and deep learning models to learn and distill knowledge that is unique to your data. These models encapsulate knowledge far more accurately than any rules based model can create.
2. SMART OPERATION Building a model is half the challenge. Deploying a model to process big data can be even for challenging. SearchBlox is built on open source technologies like Elasticsearch and Apache Storm and is designed to use its custom models for processing high volumes of data.
3. SIMPLIFIED INTEGRATION SearchBlox is bundled with over 75 data connectors supporting over 40 file formats. This dramatically reduces the time required to get your data into SearchBlox. The REST API and the security capabilities allow external applications to easily embed the cognitive processing.
To us, this sounds like what enterprise search has been offering even before big data and artificial intelligence became buzzwords. Not to mention, SearchBlox’s competitors have said the same thing. What makes Searchblox different? The company claims to be more inexpensive and they have won several accolades. SearchBlox is made on open source technology, which allows it to lower the price. Elasticsearch is the most popular open source search software, but what is funny is that Searchblox is like a repackaged version of said Elasticsearch. Mind you are paying for a program that is already developed, but Searchblox is trying to compete with other outfits like Yippy.
Whitney Grace, December 5, 2016
SearchBlox 8.5 Now Available
September 28, 2016
A brief write-up at DataQuest, “AI-Based Cognitive Business Reasoning with SearchBlox v8.5,” informs us about the latest release of the enterprise-search, sentiment-analysis, and text-analytics software. The press release describes this edition:
“Version 8.5 features the addition of new connectors including streaming, API and storage data sources bringing the total number of available sources to 75. This new release allows customers to use advanced entity extraction (person, organization, product, title, location, date, time, urls, identifiers, phone, email, money, distance) from 18 different languages within unstructured data streams on a real time basis. Use cases include advanced federated search, fraud or anomaly detection, content recommendations, smart business workflows, customer experience management and ecommerce optimization solutions. SearchBlox can use your existing data to build AI based cognitive learning models for your most complex use cases.
The write-up describes the three key features of SearchBlox 8.5: The new connectors mentioned above include Magento, YouTube, ServiceNow, MS Exchange, Twilio, Office 365, Quandl, Cassandra, Google BigQuery, Couchbase, HBase, Solr, and Elasticsearch. Their entity extraction tool functions in 18 languages. And users can now leverage the AI to build learning models for specific use cases. The new release also fixes some bugs and implements performance improvements.
Cynthia Murrell, September 28, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Elastic Links Search and Social Through Graph Capabilities
September 13, 2016
The article titled Confused About Relationships? Elasticsearch Gets Graphic on The Register communicates the latest offering from Elasticsearch, the open-source search server based on Apache’s Lucene. Graph capabilities are an exciting new twist on search that enables users to map out relationships through the search engine and the Kibana data visualization plug-in. The article explains,
By fusing graph with search, Elastic hopes to combine the power of social with that earlier great online revolution, the revolution that gave us Google: search. Graph in Elasticsearch establishes relevance by establishing the significance of each relationship versus the global average to return important results. That’s different to what Elastic called “traditional” relationship mapping, which is based on a count of the frequency of a given relationship.
Elasticsearch sees potential for their Graph capabilities in behavioral analysis, particularly in areas such as drug discovery, fraud detection, and customized medicine and recommendations. When it comes to identifying business opportunities, Graph databases have already proven their value. Discovering connections and trimming degrees of separation are all of vital importance in social media. Social networks like Twitter have been using them since the beginning of NoSQL. Indeed, Facebook is a customer of Elastic, the business version of Elasticsearch that was founded in 2012. Other users of Elasticsearch include Netflix, StumbleUpon, and Mozilla.
Chelsea Kerwin, September 13, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
There is a Louisville, Kentucky Hidden Web/Dark Web meet up on September 27, 2016.
Information is at this link: https://www.meetup.com/Louisville-Hidden-Dark-Web-Meetup/events/233599645/
Update from Lucene
May 10, 2016
It has been awhile since we heard about our old friend Apache Lucene, but the open source search engine has something new, says Open Source Connections in the article, “BM25 The Next Generation Of Lucene Relevance.” Lucene is added BM25 to its search software and it just might improve search results.
“BM25 improves upon TF*IDF. BM25 stands for “Best Match 25”. Released in 1994, it’s the 25th iteration of tweaking the relevance computation. BM25 has its roots in probabilistic information retrieval. Probabilistic information retrieval is a fascinating field unto itself. Basically, it casts relevance as a probability problem. A relevance score, according to probabilistic information retrieval, ought to reflect the probability a user will consider the result relevant.”
Apache Lucene formerly relied on TF*IDF, a way to rank how users value a text match relevance. It relied on two factors: term frequency-how often a term appeared in a document and inverse document frequency aka idf-how many documents the term appears and determines how “special” it is. BM25 improves on the old TF*IDF, because it gives negative scores for terms that have high document frequency. IDF in BM25 solves this problem by adding a 1 value, therefore making it impossible to deliver a negative value.
BM25 will have a big impact on Solr and Elasticsearch, not only improving search results and accuracy with term frequency saturation.
Whitney Grace, May 10, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
An Open Source Search Engine to Experiment With
May 1, 2016
Apache Lucene receives the most headlines when it comes to discussion about open source search software. My RSS feed pulled up another open source search engine that shows promise in being a decent piece of software. Open Semantic Search is free software that cane be uses for text mining, analytics, a search engine, data explorer, and other research tools. It is based on Elasticsearch/Apache Solrs’ open source enterprise search. It was designed with open standards and with a robust semantic search.
As with any open source search, it can be programmed with numerous features based on the user’s preference. These include, tagging, annotation, varying file format support, multiple data sources support, data visualization, newsfeeds, automatic text recognition, faceted search, interactive filters, and more. It has the benefit that it can be programmed for mobile platforms, metadata management, and file system monitoring.
Open Semantic Search is described as
“Research tools for easier searching, analytics, data enrichment & text mining of heterogeneous and large document sets with free software on your own computer or server.”
While its base code is derived from Apache Lucene, it takes the original product and builds something better. Proprietary software is an expense dubbed a necessary evil if you work in a large company. If, however, you are a programmer and have the time to develop your own search engine and analytics software, do it. It could be even turn out better than the proprietary stuff.
Whitney Grace, May 1, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Elasticsearch Works for Us 24/7
February 5, 2016
Elasticsearch is one of the most popular open source search applications and it has been deployed for personal as well as corporate use. Elasticsearch is built on another popular open source application called Apache Lucene and it was designed for horizontal scalability, reliability, and easy usage. Elasticsearch has become such an invaluable piece of software that people do not realize just how useful it is. Eweek takes the opportunity to discuss the search application’s uses in “9 Ways Elasticsearch Helps Us, From Dawn To Dusk.”
“With more than 45 million downloads since 2012, the Elastic Stack, which includes Elasticsearch and other popular open-source tools like Logstash (data collection), Kibana (data visualization) and Beats (data shippers) makes it easy for developers to make massive amounts of structured, unstructured and time-series data available in real-time for search, logging, analytics and other use cases.”
How is Elasticsearch being used? The Guardian is daily used by its readers to interact with content, Microsoft Dynamics ERP and CRM use it to index and analyze social feeds, it powers Yelp, and her is a big one Wikimedia uses it to power the well-loved and used Wikipedia. We can already see how much Elasticsearch makes an impact on our daily lives without us being aware. Other companies that use Elasticsearch for our and their benefit are Hotels Tonight, Dell, Groupon, Quizlet, and Netflix.
Elasticsearch will continue to grow as an inexpensive alternative to proprietary software and the number of Web services/companies that use it will only continues to grow.
Whitney Grace, February 5, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
On the Prevalence of Open Source
November 11, 2015
Who would have thought, two decades ago, that open source code was going to dominate the software field? Vallified’s Philip O’Toole meditates on “The Strange Economics of Open-Source Software.” Though the industry gives so much away for free, it’s doing quite well for itself.
O’Toole notes that closed-source software is still in wide use, largely in banks’ embedded devices and underpinning services. Also, many organizations are still attached to their Microsoft and Oracle products. But the tide has been turning; he writes:
“The increasing dominance of open-source software seems particularly true with respect to infrastructure software. While security software has often been open-source through necessity — no-one would trust it otherwise — infrastructure is becoming the dominant category of open-source. Look at databases — MySQL, MongoDB, RethinkDB, CouchDB, InfluxDB (of which I am part of the development team), or cockroachdb. Is there anyone today that would even consider developing a new closed-source database? Or take search technology — elasticsearch, Solr, and bleve — all open-source. And Linux is so obvious, it is almost pointless to mention it. If you want to create a closed-source infrastructure solution, you better have an enormously compelling story, or be delivering it as part of a bigger package such as a software appliance.”
It has gotten to the point where developers may hesitate to work on a closed-source project because it will do nothing for their reputation. Where do the profits come from, you may ask? Why in the sale of services, of course. It’s all part of today’s cloud-based reality.
Cynthia Murrell, November 11, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
RAVN Pipeline Coupled with ElasticSearch to Improve Indexing Capabilities
October 28, 2015
The article on PR Newswire titled RAVN Systems Releases its Enterprise Search Indexing Platform, RAVN Pipeline, to Ingest Enterprise Content Into ElasticSearch unpacks the decision to improve the ElasticSearch platform by supplying the indexing platform of the RAVN Pipeline. RAVN Systems is a UK company with expertise in processing unstructured data founded by consultants and developers. Their stated goal is to discover new lands in the world of information technology. The article states,
“RAVN Pipeline delivers a platform approach to all your Extraction, Transformation and Load (ETL) needs. A wide variety of source repositories including, but not limited to, File systems, e-mail systems, DMS platforms, CRM systems and hosted platforms can be connected while maintaining document level security when indexing the content into Elasticsearch. Also, compressed archives and other complex data types are supported out of the box, with the ability to retain nested hierarchical structures.”
The added indexing ability is very important, especially for users trying to index from from or into cloud-based repositories. Even a single instance of any type of data can be indexed with the Pipeline, which also enriches data during indexing with auto-tagging and classifications. The article also promises that non-specialists (by which I assume they mean people) will be able to use the new systems due to their being GUI driven and intuitive.
Chelsea Kerwin, October 28, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Apple May Open up on Open Source
October 27, 2015
Is Apple ready to openly embrace open source? MacRumors reports, “Apple Building Unified Cloud Platform for iCloud, iTunes, Siri and More.” Writer Joe Rossignol cites a new report from the Information that indicates the famously secret company may be opening up to keep up with the cloudy times. He writes:
“The new platform is based on Siri, which itself is powered by open source infrastructure software called Mesos on the backend, according to the report. Apple is reportedly placing more emphasis on open source software in an attempt to attract open source engineers that can help improve its web services, but it remains to be seen how far the company shifts away from its deep culture of secrecy.
“The paywalled report explains how Apple is slowly embracing the open source community and becoming more transparent about its open source projects. It also lists some of the open source technologies that Apple uses, including Hadoop, HBase, Elasticsearch, Reak, Kafka, Azkaban and Voldemort.”
Rossignol goes on to note that, according to Bloomberg, Apple is working on a high-speed content delivery network and upgrading data centers to better compete with its rivals in the cloud, like Amazon, Google, and Microsoft. Will adjusting its stance on open-source allow it to keep up?
Cynthia Murrell, October 27, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

