A Fun Japanese Elasticsearch Promotion Video
September 10, 2015
Elasticsearch is one of the top open source search engines and is employed by many companies including Netflix, Wikipedia, GitHub, and Facebook. Elasticsearch wants to get a foothold into the Japanese technology market. We can assume, because Japan is one of the world’s top producers of advanced technology and has a huge consumer base. Once a technology is adopted in Japan, you can bet that it will have an even bigger adoption rate.
The company has launched a Japanese promotional campaign and a uploaded video entitled “Elasticsearch Product Video” to its YouTube channel. The video comes with Japanese subtitles with appearances by CEO Steven Schuurman, VP of Engineering Kevin Kluge, Elasticsearch creator Shay Bannon, and VP of Sales Justin Hoffman. The video showcases how Elasticsearch is open source software, how it has been integrated into many companies’ frameworks, its worldwide reach, product improvement, as well as the good it can do.
Justin Hoffman said that, “I think the concept of an open source company bringing a commercial product to market is very important to our company. Because the customers want to know on one hand that you have the open source community and its evolution and development at the top of your priority list. On the other hand, they appreciate that you’re innovating and bringing products to market that solve real problems.”
It is a neat video that runs down what Elasticsearch is capable of, the only complaint is that bland music in the background. They could benefit from licensing the Jive Aces “Bring Me Sunshine” it relates the proper mood.
Whitney Grace, September 10, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Google and Alta Vista: Who Remembers?
September 9, 2015
A lifetime ago, I did some work for an outfit called Persimmon IT. We fooled around with ways to take advantage of memory, which was a tricky devil in my salad days. The gizmos we used were manufactured by Digital Equipment. The processors were called “hot”, “complex”, and AXP. You may know this foot warmer as the Alpha. Persimmon operated out of an office in North Carolina. We bumped into wizards from Cambridge University (yep, that outfit again), engineers housed on the second floor of a usually warm office in Palo Alto, and individuals whom I never met but I had to slog through their email.
So what?
A person forwarded me a link to a what seems to be an aged write up called “Why Did Alta Vista Search Engine Lose Ground so Quickly to Google?” The write up was penned by an UCLA professor. I don’t have too much to say about the post. I was lucky to finish grade school. I missed the entire fourth and fifth grades because my Calvert Course instructor in Brazil died of yellow jaundice after my second lesson.
I scanned the write up, which you may need to register in order to read the article and the comments thereto. I love walled gardens. They are so special.
I did notice that one reason Alta Vista went south was not mentioned. Due to the brilliant management of the company by Hewlett Packard/Compaq, Alta Vista created some unhappy campers. Few at HP knew about Persimmon, and none of these MBAs had the motivation to learn anything about the use of Alta Vista as a demonstration of the toasty Alpha chips, the clever use of lots of memory, and the speed with which certain content operations could be completed.
Unhappy with the state of affairs, the Palo Alto Alta Vista workers began to sniff for new opportunities. One scented candle burning in the information access night was a fledgling outfit Google, formerly Backrub. Keep in mind that intermingling of wizards was and remains a standard operating procedure in Plastic Fantastic (my name for Sillycon Valley).
The baby Google benefited from HP’s outstanding management methods. The result was the decampment from the HP Way. If my memory serves me, the Google snagged Jeff Dean, Simon Tong, Monica Henzinger, and others. Keep in mind that I am no “real” academic, but my research revealed to me and those who read my three monographs about Google that Google’s “speed” and “scaling” benefited significantly from the work of the Alta Vista folks.
I think this is important because few people in the search business pay much attention to the turbo boost HP unwittingly provided the Google.
In the comments to the “Why Did Alta Vista…” post, there were some other comments which I found stimulating.
- One commenter named Rajesh offered, “I do not remember the last time I searched for something and it did not end up in page 1.” My observation is, “Good for you.” Try this query and let me know how Google delivers on point information: scram action. I did not see any hits to nuclear safety procedures. Did you, Rajesh? I assume your queries are different from mine. By the way, “scram local events” will produce a relevant hit half way down the Google result page.
- Phillip observed that the “time stamp is irrelevant in this modern ear, since sub second search is the norm.” I understand that “time” is not one of Google’s core competencies. Also, many results are returned from caches. The larger point is that Google remains time blind. Google invested in a company that does time well, but sophisticated temporal operations are out of reach for the Google.
- A number of commenting professionals emphasized that Google delivered clutter free, simple, clear results. Last time I looked at a Google results page for this query katy perry the presentation was far from a tidy blue list of relevant results.
- Henry pointed out that the Alta Vista results were presented without logic. I recall that relevant results did appear when a query was appropriately formed.
- One comment pointed out that it was necessary to cut and paste results for the same query processed by multiple search engines. The individual reported that it took a half hour to do this manual work. I would point out that metasearch solutions became available in the early 1990s. Information is available here and here.
Enough of the walk down memory lane. Revisionism is alive and well. Little wonder that folks at Alphabet and other searchy type outfits continue to reinvent the wheel.
Isn’t a search app for a restaurant a “stored search”? Who cares? Very few.
Stephen E Arnold, September 9, 2015
Bing Snapshots for In-App Searches
September 9, 2015
Developers have a new tool for incorporating search data directly into apps, we learn in “Bing Snapshots First to Bring Advanced In-App Search to Users” at Search Engine Watch. Apparently Google announced a similar feature, Google Now on Tap, earlier this year, but Microsoft’s Bing has beaten them to the consumer market. Of course, part of Snapshot’s goal is to keep users from wandering out of “Microsoft territory,” but many users are sure to appreciate the convenience nevertheless. Reporter Mike O’Brien writes:
“With Bing Snapshots, developers will be able to incorporate all of the search engine’s information into their apps, allowing users to perform searches in context without navigating outside. For example, a friend could mention a restaurant on Facebook Messenger. When you long-press the Home button, Bing will analyze the contents of the screen and bring up a snapshot of a restaurant, with actionable information, such as the restaurant’s official website and Yelp reviews, as well Uber.”
Bing officials are excited about the development (and, perhaps, scoring a perceived win over Google), declaring this the start of a promising relationship with developers. The article continues:
“Beyond making sure Snapshots got a headstart over Google Now on Tap, Bing is also able to stand out by becoming the first search engine to make its knowledge graph available to developers. That will happen this fall, though some APIs are already available on the company’s online developer center. Bing is currently giving potential users sneak peeks on its Android app.”
Hmm, that’s a tad ironic. I look forward to seeing how Google positions the launch of Google Now on Tap when the time comes.
Cynthia Murrell, September 9, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
dtSearch Chases Those Pesky PDFs
September 7, 2015
While predictive analytics and other litigation software are more important than ever for legal professionals to sift through the mounds of documents and discover patterns, several companies have come to the rescue, especially dtSearch. Inside Counsel explains how a “New dtSearch Release Offers More Support To Lawyers.”
The latest dtSearch release is not only able to search through terabytes of information in online and offline environments, but its documents filters have broadened to search encrypted PDFs, including those with a password. While PDFs are a universally accepted document format, they are a pain to deal with if they ever have to be edited or are password protected.
Also included in the dtSearch are other beneficial features:
“Additionally, dtSearch products can parse, index, search, display with highlighted hits, and extract content from full-text and metadata in several data types, including: Web-ready content; other databases; MS Office formats; other “Office” formats, PDF, compression formats; emails and attachments; Recursively embedded objects; Terabyte Indexer; and Concurrent, Multithreaded Searching.”
The new PDF search feature with the ability to delve into encrypted PDF files is a huge leap ahead of its rivals, being able to explore PDFs without Adobe Acrobat or another PDF editor will make pursuing through litigation much simpler.
Whitney Grace, September 7, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Shades of CrossZ: Compress Data to Speed Search
September 3, 2015
I have mentioned in my lectures a start up called CrossZ. Before whipping out your smartphone and running a predictive query on the Alphabet GOOG thing, sit tight.
CrossZ hit my radar in 1997. The concept behind the company was to compress extracted chunks of data. The method, as I recall, made use of fractal compression, which was the rage at that time. The queries were converted to fractal tokens. The system then quickly pulled out the needed data and displayed them in human readable form. The approach was called as I recall “QueryObject.” By 2002, the outfit dropped off my radar. The downside of the CrossZ approach was that the compression was asymmetric; that is, slow preparing the fractal chunk but really fast when running a query and extracting the needed data.
Flash forward to Terbium Labs, which has a patent on a method of converting data to tokens or what the firm calls “digital fingerprints.” The system matches patterns and displays high probability matches. Terbium is a high potential outfit. The firm’s methods may be a short cut for some of the Big Data matching tasks some folks in the biology lab have.
For me, the concept of reducing the size of a content chunk and then querying it to achieve faster response time is a good idea.
What do you think I thought when I read “Searching Big Data Faster”? Three notions flitter through my aged mind:
First, the idea is neither new nor revolutionary. Perhaps the MIT implementation is novel? Maybe not?
Second, the main point that “evolution is stingy with good designs” strikes me as a wild and crazy generalization. What about the genome of the octopus, gentle reader?
Third, MIT is darned eager to polish the MIT apple. This is okay as long as the whiz kids take a look at companies which used this method a couple of decades ago.
That is probably not important to anyone but me and to those who came up with the original idea, maybe before CrossZ popped out of Eastern Europe and closed a deal with a large financial services firm years ago.
Stephen E Arnold, September 3, 2015
Dark Web Drug Trade Unfazed by Law Enforcement Crackdowns
September 3, 2015
When Silk Road was taken down in 2013, the Dark Web took a big hit, but it was only a few months before black marketers found alternate means to sell their wares, including illegal drugs. The Dark Web provides an anonymous and often secure means to purchase everything from heroin to prescription narcotics with, apparently, few worries about the threat of prosecution. Wired explains that “Crackdowns Haven’t Stopped The Dark Web’s $100M Yearly Drug Sale,” proving that if there is a demand, the Internet will provide a means for illegal sales.
In an effort to determine if the Dark Web have grown to declined, Carnegie Mellon researchers Nicolas Cristin and Kyle Soska studied thirty-five Dark Web markets from 2013 to January 2015. They discovered that the Dark Web markets are no longer explosively growing, but the market has remained stable fluctuating from $100 million to $180 million a year.
The researchers concluded that the Dark Web market is able to survive any “economic” shifts, including law enforcement crackdowns:
“More surprising, perhaps, is that the Dark Web economy roughly maintains that sales volume even after major disasters like thefts, scams, takedowns, and arrests. According to the Carnegie Mellon data, the market quickly recovered after the Silk Road 2 market lost millions of dollars of users’ bitcoins in an apparent hack or theft. Even law enforcement operations that remove entire marketplaces, as in last year’s purge of half a dozen sites in the Europol/FBI investigation known as Operation Onymous, haven’t dropped the market under $100 million in sales per year.”
Cristin and Soska’s study is the most comprehensive to measure the size and trajectory of the Dark Web’s drug market. Their study ended prematurely, because two Web sites grew so big that the researchers’ software wasn’t able to track the content. Their study showed that most Dark Web vendors are using more encryption tools, they make profits less $1000, and they are mostly selling MDMA and marijuana.
Soska and Cristin also argue that the Dark Web drug trade decreases violence in the retail drug trade, i.e. it keeps the transactions digital than having there be more violence on the streets. They urge law enforcement officials to rethink shutting down the Dark Web markets, because it does not seem to have any effect.
Whitney Grace, September 3, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Does This Autonomous Nerf Gun Herald the Age of Killer Robots?
September 3, 2015
Well here’s something interesting that has arisen from HP’s “disastrous” $11 billion acquisition of Autonomy: check out this three-minute YouTube video: “See What You Can Create with HP IDOL OnDemand.” The fascinating footage reveals the product of developer Martin Zerbib’s “little project,” made possible with IDOL OnDemand and a Nerf gun. Watch as the system targets a specific individual, a greedy pizza grabber, a napping worker, and a thief. It seems like harmless fun, until you realize how gruesome this footage would be if this were a real gun.
It is my opinion that it is the wielders of weapons who should be held directly responsible for their misuse, not the inventors. Still, commenter “Dazed Confused” has a point when he rhetorically asks “What could possibly go wrong?” and links to an article in Bulletin of the Atomic Scientists, “Stopping Killer Robots and Other Future Threats.” That piece describes an agreement being hammered out that proposes to ban the development of fully autonomous weapons. Writer Seth Baum explains there is precedent for such an agreement: The Saint Petersburg Declaration of 1868 banned exploding bullets, and 105 countries have now ratified the 1995 Protocol on Blinding Laser Weapons. (Such laser weapons could inflict permanent blindness on soldiers, it is reasoned.) After conceding that auto-weaponry would have certain advantages, the article points out:
“But the potential downsides are significant. Militaries might kill more if no individual has to bear the emotional burden of strike decisions. Governments might wage more wars if the cost to their soldiers were lower. Oppressive tyrants could turn fully autonomous weapons on their own people when human soldiers refused to obey. And the machines could malfunction—as all machines sometimes do—killing friend and foe alike.
“Robots, moreover, could struggle to recognize unacceptable targets such as civilians and wounded combatants. The sort of advanced pattern recognition required to distinguish one person from another is relatively easy for humans, but difficult to program in a machine. Computers have outperformed humans in things like multiplication for a very long time, but despite great effort, their capacity for face and voice recognition remains crude. Technology would have to overcome this problem in order for robots to avoid killing the wrong people.”
Baum goes on to note that organizers base their call for a ban on existing international humanitarian law, which prohibits weapons that would strike civilians. Such reasoning has already been employed to achieve bans against landmines and cluster munitions, and is being leveraged in an attempt to ban nuclear weapons.
Will killer robots be banned before they’re a reality? It seems the agreement would have to move much faster than bureaucracy usually does; given the public example of Zerbib’s “little project,” I suspect it is already way too late for that.
Cynthia Murrell, September 3, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Watson Speaks Naturally
September 3, 2015
While there are many companies that offer accurate natural language comprehension software, completely understanding the complexities of human language still eludes computers. IBM reports that it is close to overcoming the natural language barriers with IBM Watson Content Analytics as described in “Discover And Use Real-World Terminology With IBM Watson Content Analytics.”
The tutorial points out that any analytics program that only relies on structured data loses about four fifths of information, which is a big disadvantage in the big data era, especially when insights are supposed to be hidden in the unstructured. The Watson Content Analytics is a search and analytics platform and it uses rich-text analysis to find extract actionable insights from new sources, such as email, social media, Web content, and databases.
The Watson Content Analytics can be used in two ways:
- “Immediately use WCA analytics views to derive quick insights from sizeable collections of contents. These views often operate on facets. Facets are significant aspects of the documents that are derived from either metadata that is already structured (for example, date, author, tags) or from concepts that are extracted from textual content.
- Extracting entities or concepts, for use by WCA analytics view or other downstream solutions. Typical examples include mining physician or lab analysis reports to populate patient records, extracting named entities and relationships to feed investigation software, or defining a typology of sentiments that are expressed on social networks to improve statistical analysis of consumer behavior.”
The tutorial runs through a domain specific terminology application for the Watson Content Analytics. The application gets very intensive, but it teaches how Watson Content Analytics is possibly beyond the regular big data application.
Whitney Grace, September 3, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Suggestions for Developers to Improve Functionality for Search
September 2, 2015
The article on SiteCrafting titled Maxxcat Pro Tips lays out some guidelines for improved functionality when it comes deep search. Limiting your Crawls is the first suggestion. Since all links are not created equally, it is wise to avoid runaway crawls on links where there will always be a “Next” button. The article suggests hand-selecting the links you want to use. The second tip is Specify Your Snippets. The article explains,
“When MaxxCAT returns search results, each result comes with four pieces of information: url, title, meta, and snippet (a preview of some of the text found at the link). By default, MaxxCAT formulates a snippet by parsing the document, extracting content, and assembling a snippet out of that content. This works well for binary documents… but for webpages you wanted to trim out the content that is repeated on every page (e.g. navigation…) so search results are as accurate as possible.”
The third suggestion is to Implement Meta-Tag Filtering. Each suggestion is followed up with step-by-step instructions. These handy tips come from a partnering between Sitecrafting is a web design company founded in 1995 by Brian Forth. Maxxcat is a company acknowledged for its achievements in high performance search since 2007.
Chelsea Kerwin, September 2, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Maverick Search and Match Platform from Exorbyte
August 31, 2015
The article titled Input Management: Exorbyte Automates the Determination of Identities on Business On (a primarily German language website) promotes the Full Page Entity Detect from Exorbyte. Exorbyte is a world leader in search and match for large volumes of data. They boast clients in government, insurance, input management and ICT firms, really any business with identity resolution needs. The article stresses the importance of pulling information from masses of data in the modern office. They explain,
“With Full Page Entity Detect provides exorbyte a solution to the inbox of several million incoming documents.This identity data of the digitized correspondence (can be used for correspondence definition ) extract with little effort from full-text documents such as letters and emails and efficiently compare them with reference databases. The input management tool combines a high fault tolerance with accuracy, speed and flexibility.Gartner, the software company from Konstanz was recently included in the Magic Quadrant for Enterprise Search.”
The company promises that their Matchmaker technology is unrivaled in searching text without restrictions, even without language, allowing for more accurate search. Full Page Entity Detect is said to be particularly useful when it comes to missing information or overlooked errors, since the search is so thorough.
Chelsea Kerwin, August 31 , 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph