Need Semantic Search: Lucidworks Asserts It Is the Answer by Golly
July 3, 2015
If you read this blog, you know that I comment on semantic technology every month or so. In June I pointed to an article which had been tweeted as “new stuff.” Wrong. Navigate to “Semantic Search Hoohah: Hakia”; you will learn that Hakia is a quiet outfit. Quiet as in no longer on the Web. Maybe gone?
There are other write ups in my free and for fee columns about semantic search. The theme has been consistent. My view is that semantic technology is one component in a modern cybernized system. (To learn about my use of the term cyber, navigate to www.xenky.com/cyberosint.)
I find the promotion of search engine optimization as “semantic” amusing. I find the search service firms’ promotion of their semantic expertise amusing. I find the notion of open source outfits deep in hock to venture capitalists asserting their semantic wizardry amusing.
I don’t know if you are quite as amused as I am. Here’s an easy way to determine your semantic humor score. Navigate to this slideshare link and cruise through the 34 deck presentation made by one of Lucidworks’ search mavens. Lucidworks is a company I have followed since it fired up its jets with Marc Krellenstein on board. Dr. Krellenstein ejected in short order, and the company has consumed many venture dollars with management shifts, repositionings, and the Big Data thing.
We now have Lucidworks in the semantic search sector.
Here’s what I learned from the deck:
- The company has a new logo. I think this is the third or fourth.
- Search is about technology and language. Without Google’s predictive and personalized routines, words are indeed necessary.
- Buzzwords and jargon do not make semantic methods simple. Consider this statement from the deck, “Tokenization plus vector mathematics (TF/IDF) or one of its cousins—“bag of words” – Algorithmic tweaks – enhanced bag of words.” Got that, gentle reader. If not, check out “sausagization.”
- Lucidworks offers a “field cache.” Okay, I am not unfamiliar with caching in order to goose performance, which can be an issue with some open source search systems. But Searchdaimon, an open source search system developed in Norway, runs circles around Lucidworks. My team did the benchmark test of major open source systems. Searchdaimon was the speed champ and had other sector leading characteristics as well.)
- Lucidworks does the ontology thing as well. The tie up of “category nodes” and “evidence nodes” may be one reason the performance goblin noses into the story.
The problem I encountered is that the write up for the slide deck emphasized Fusion as a key component. I have been poking around the “fusion” notion as we put our new study of the Dark Web together. Fusion is a tricky problem and the US government has made fusion a priority. Keep in mind that content is more than text. There are images, videos, geocodes, cryptic tweets in Farsi, and quite a few challenging issues with making content available to a researcher or analyst.
It seems that Lucidworks has cracked a problem which continues to trouble some reasonably sophisticated folks in the content analysis business. Here’s the “evidence” that Lucidworks can do what others cannot:
This diagram shows that after a connector is available, then “pipelines proliferate.” Well, okay.
I thought the goal was to process content objects with low latency, easily, and with semantic value adds. “Lots of stages” and “index pipelines: one way query pipelines: round trip” does not compute for this addled goose.
If the Lucidworks approach makes sense to you go for it. My team and I will stick to here and now tools and open source technology which works without the semantic jargon which is pretty much incidental to the matter. We need to process more than text. CyberOSINT vendors deliver and most use open source search as a utility function. Yep, utility. Not the main event. The failure of semantic search vendors suggests that the buzzword is not the solution to marketing woes. Pop. (That’s a pre fourth of July celebratory ladyfinger.)
Stephen E Arnold, July 3, 2015
Forget Oracle. Think about Vendors of Proprietary Enterprise Search Systems.
June 14, 2015
Database revenue doom looms for Oracle. Who did not know that, Mr. BigTable and Ms. Spark? Navigate to “Oracle Sales Erode as Startups Embrace Souped-Up Free Software.” The write up makes this point:
The impact [use of proprietary software] shows up in Oracle’s sales of new software licenses, which have declined for seven straight quarters compared with the period a year earlier. New licenses made up 25 percent of total revenue in fiscal 2014, down from 28 percent a year earlier — a sign the company is becoming increasingly dependent on revenue from supporting and maintaining products at existing customers and having a harder time finding new business. Oracle reports fiscal fourth-quarter earnings next week. To blunt this, the Redwood City, California-based company is expanding efforts in cloud computing, which will let it sell packaged high-margin services to customers. That may help balance the slowdown in the basic business. It also operates an open-source database called MySQL.
The unarticulated issue is the word “startup.” Research we conducted and which was verified by various third party sources revealed in 2012 that open source software was getting more attention from Fortune 1000 companies. The reason was that these outfits had the resources to deal with the excitement open source software provides in a Blue Apron type package.
If this Bloomberg write up is correct, the startup crowd is stepping away from Microsoft software and other well known brands toward open source. One can raise prices in the Fortune 1000 arena for a short time. Then, as Thomson Reuters- and Reed Elsevier-type companies have learned, the big boys just go a different direction. Thus, the start up and mid sized market become more and more important to proprietary software vendors.
When the small folks head for the hills, where’s the growth? Price increases? Me too plays? Marketing two steps?
I don’t think so.
Ergo. Trouble ahead for Oracle, but the challenges facing the down market and up market proprietary enterprise search vendors are going to become more severe if Bloomie is on the beam.
Stephen E Arnold, June 14, 2015
Amazon and Elasticsearch
May 29, 2015
If you are curious about the utility of Elastic’s technology, you will find “Indexing Common Crawl Metadata on Amazon EMR Using Cascading and Elasticsearch” a useful article to review. The main idea is that Amazon made Elasticsearch do some circus tricks. The write up explains the approach, provides code snippets, and includes a couple of nifty graphics which help those zany Zonies figure out the implications of the data crunched. the main idea is that Elasticsearch did something use with content in everyone’s favorite magic wand Hadoop. Why didn’t Amazon use LucidWorks (Really?)? Hmm. Good question.
Stephen E Arnold, May 29, 2015
Peruse Until You Are Really Happy
May 22, 2015
Have you ever needed to quickly locate a file that you just know you made, but were unable to find it on your computer, cloud storage, tablet, smartphone, or company pool drive? What is even worse is if your search query does not pick up on any of your keywords! What are you supposed to do then? VentureBeat might have the answer to your problems as explained in the article, “Peruse Is A New Natural Language Search Tool For Your Dropbox And Box Files.” Peruse is a search tool that allows users to use their natural flow of talking to find their files and information.
Natural language querying is already a big market for business intelligence software, but it is not as common in file sharing services. Peruse is a startup with the ability to search Dropbox and Box accounts using a regular question. If you ask, “Where is the marketing data from last week?” The software will be able to pull the file for you without even opening the file. Right now, Peruse can only find information in spreadsheets, but the company is working on expanding the supported file types.
“The way we index these files is we actually look at them visually — it understands them in a way a person would understand them,” said [co-founder and CEO Luke Gotszling], who is showing off Peruse…”
Peruse’s goal is to change the way people use document search. Document search has remained pretty consistent since 1995, twenty years later Gotszling is believes it is time for big change. Gotzling is right, document search remains the same, while Web search changes everyday.
Whitney Grace, May 22, 2015
Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com
Make Mine Mobile Search
May 21, 2015
It was only a matter of time, but Google searches on mobile phones and tablets have finally pulled ahead of desktop searches says The Register in “Peak PC: ‘Most’ Google Web Searches ‘Come From Mobiles’ In US.” Google AdWords product management representative Jerry Dischler said that more Google searches took place on mobile devices in ten countries, including the US and Japan. Google owns 92.22 percent of the mobile search market and 65.73 percent of desktop searches. What do you think Google wants to do next? They want to sell more mobile apps!
The article says that Google has not shared any of the data about the ten countries except for the US and Japan and the search differential between platforms. Google, however, is trying to get more people to by more ads and the search engine giant is making the technology and tools available:
“Google has also introduced new tools for marketers to track their advertising performance to see where advertising clicks are coming from, and to try out new ways to draw people in. The end result, Google hopes, is to bring up the value of its mobile advertising business that’s now in the majority, allegedly.”
Mobile ads are apparently cheaper than desktop ads, so Google will get lower revenues. What will probably happen is that as more users transition to making purchases via phones and tablets, ad revenue will increase vi mobile platforms.
Whitney Grace, May 21, 2015
Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com
Eric Schmidt On Search Ambition and Attitude at the GOOG
May 20, 2015
The article on Business Insider titled Google’s Former CEO Reveals The Complicated Search Question He Wants Google To Be Able To Answer reports on Eric Schmidt’s speech in Berlin where he mentioned the hurdles Google is yet to overcome. Obviously, Google is an incredibly ambitious company, and should never be satisfied. He spelled out one particular question he would like the search engine to be able to answer,
“Try a query like ‘show me flights under €300 for places where it’s hot in December and I can snorkel,'” Schmidt says. “That’s kind of complicated: Google needs to know about flights under €300; hot destinations in winter; and what places are near the water, with cool fish to see. That’s basically three separate searches that have to be cross-referenced to get to the right answer. Sadly, we can’t solve that for you today. But we’re working on it.”
Schmidt also argued on behalf of Google in regards to the EU investigation into Google possibly favoring its own results rather than a fair spread of companies. Schmidt claimed that Google is most interested in simplifying search for users, rather than obliging users to click around. Since Google search is admittedly ad-oriented, Schmidt’s position seems to be at least semi-accurate.
Chelsea Kerwin, May 20 , 2014
Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com
Open Source Conquers Proprietary Software, Really?
May 19, 2015
Open source is an attractive option for organizations wanting to design their own software as well as saving money of proprietary licenses. ZDNet reports that “It’s An Open Source World-78 Percent of Companies Run Open Source Software”, but the adopters do not manage their open source systems very well. Every year Black Duck Software, an open source software logistics and legal solutions provider, and North Bridge, a seed to growth venture capital firm, run the Future of Open Source Survey. Organizations love open source, but
“Lou Shipley, Black Duck’s CEO, said in a statement, ‘In the results this year, it has become more evident that companies need their management and governance of open source to catch up to their usage. This is critical to reducing potential security, legal, and operational risks while allowing companies to reap the full benefits OSS provides.’”
The widespread adoption is due to people thinking that open source software is easier to scale, has fewer security problems, and much faster to deploy. Organizations, however, do not have a plan to manage open source, an automated code approval process, or have an inventory of open source components. Even worse is that they are unaware of the security vulnerabilities.
It is great that open source is being recognized as a more viable enterprise solution, but nobody knows how to use it.
Whitney Grace, April 19, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Behind The Google X Doors
May 18, 2015
Google X is Google’s top-secret laboratory, where the company develops new, innovative technology projects. The main purpose behind Google X is to make technology more adaptable, useful, as well as improve people’s lives. The Google Glass was one of their projects, so is Project Loon, where giant, high altitude balloons are released into the sky to bring Internet services to rural areas. Also do not forget the driverless car. EWeek has listed “10 Bold Google X Projects Aiming For Tech Breakthroughs,” exploring the new wonders that could eventually be available to your or me.
Are you interested in cleaner, renewable energy? So are the folks at Makani Power, a Google X project that builds wind turbines and then makes them airborne using kites. The wind turbines make energy for human consumption. While energy is important for modern human life, health is a big issue too.
Google X has four projects dedicated to learning more about the human body and disease. One is a contact lens measure glucose levels in tears, so diabetics will not have to prick themselves with needles to measure their sugar levels. The Baseline Study project analyzes medical information and uses genomics to define what the human body actually is. This project’s goal is to predict major diseases before their onset. Life Labs, acquired in 2014, invented a spoon device that counteracts Parkinson’s disease. The most astounding is something out of a science-fiction novel:
“Google X is in the nanoparticles business. The company in October unveiled a platform that uses nanoparticles to detect disease. In January, it followed that up with the announcement of the creation of synthetic skin as a proof-of-concept to show what nanoparticle technology might achieve in human biology and health.”
Nanoparticles? Self-driving cars? Wind turbines on kites? What will Google X work on next?
Whitney Grace, May 18, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Oracle is Rocking COLLABORATE
April 15, 2015
News is already sprouting about the COLLABORATE 15: Technology and Applications Forum for the Oracle Community, Oracle’s biggest conference of the year. BusinessWire tells us that Oracle CEO Mark Hurd and Chief Information Officer and Senior VP Mark Sunday will be keynote speakers, says “Oracle Applications Users Group Announces Oracle’s Key Role at COLLABORATE 15.”
Hurd and Sunday will be delivering key insights into Oracle and the industry at their scheduled talks:
“On Tuesday, Sunday discusses the need to keep a leadership edge in digital transformation, with a special focus on IT leadership in the cloud. Sunday will build upon his keynote from two years ago, giving attendees better insight into adopting a sound cloud strategy in order to ensure greater success. On Wednesday, Hurd shares his insights on how Oracle continues to drive innovation and protect customer investments with applications and technology. Oracle remains the leading organization in the cloud, and Hurd’s discussion focuses on how to modernize businesses in order to thrive in this space.”
Oracle is really amping up the offerings at this year’s conference. They will host the Oracle User Experience Usability Lab, Oracle Proactive Support Sessions, Oracle Product Roadmap Session, and more to give attendees the chance to have direct talks with Oracle experts to learn about strategies, functionality, products, and new resources to improve their experience and usage. Attendees will also be able to take accreditation tests for key product areas.
COLLABORATE, like many conferences, offers attendees the chance to network with Oracle experts, get professional feedback, and meet others in their field. Oracle is very involved in this conference and is dedicated to putting its staff and products at the service of its users.
Whitney Grace, April 15, 2015
Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com
Apache Sparking Big Data
April 3, 2015
Apache Spark is an open source cluster computing framework that rivals MapReduce. Venture Beat says that people did not pay that much attention to Apache Spark when it was first invented at University of California’s AMPLAB in 2011. The article, “How An Early Bet On Apache Spark Paid Off Big” reports the big data open source supporters are adopting Apache Spark, because of its superior capabilities.
People with big data plans want systems that process real-time information at a fast pace and they want a whole lot of it done at once. MapReduce can do this, but it was not designed for it. It is all right for batch processing, but it is slow and much to complex to be a viable solution.
“When we saw Spark in action at the AMPLab, it was architecturally everything we hoped it would be: distributed, in-memory data processing speed at scale. We recognized we’d have to fill in holes and make it commercially viable for mainstream analytics use cases that demand fast time-to-insight on hordes of data. By partnering with AMPLab, we dug in, prototyped the solution, and added the second pillar needed for next-generation data analytics, a simple to use front-end application.”
ClearStory Data was built using Apache Spark to access data quickly, deliver key insights, and making the UI very user friendly. People who use Apache Spark want information immediately to be utilized for profit from a variety of multiple sources. Apache Spark might ignite the fire for the next wave of data analytics for big data.
Whitney Grace, April 3, 2015
Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com