Machine Learning Resolves Enterprise Search
August 26, 2015
One of the main topics of discussion on Beyond Search is enterprise search. We always try to find the juicy details behind enterprise search’s development, groundbreaking endeavors, and problems that search experts need to be aware of. One thing we can all agree on is that enterprise search is full of problems. The question is will all of enterprise search’s problems ever be solved?
Ron Miller proposed a possible solution on TechTarget’s Search Content Management blog, “Will Machine Learning Revamp Enterprise Search Software?” Machine learning offers a bevy of solutions for many industries and what is very intriguing about the process is that we have yet to scratch the surface of its possible applications. Miller points out that machine learning should deliver more accurate and broader search results than the traditional search index.
Miller imagines this scenario:
“I think we’re going to see tools where the machine can automatically generate results, based on what the user is working on. The information could perhaps populate onto a split screen, suggesting additional information that could potentially be helpful for the user, and then apply machine learning to the user’s response.”
He suggests machine learning driven enterprise search will anticipate a user’s information need and even help shape their daily work routine. These are very feasible conjectures and machine learning has already shaped such industries as the medical field and engineering. The main item to ask is when will machine learning become inexpensive enough to implement in enterprise search?
Whitney Grace, August 26, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Insights Into SharePoint 2013 Search
August 25, 2015
It has been awhile since we have discussed SharePoint 2013 and enterprise search. Upon reading “SharePoint 2013: Some Observations On Enterprise Search” from Steven Van de Craen’s Blog, we noticed some new insights into how users can locate information on the collaborative content platform.
The first item he brings our attention to is the “content source,” an out-of-the-box managed property option that create result sources that aggregate content from different content sources, i.e. different store houses on the SharePoint. Content source can become a crawled property. What happens is that meta elements from Web pages made on SharePoint can be added to crawled properties and can be made searchable content:
“After crawling this Web site with SharePoint 2013 Search it will create (if new) or use (if existing) a Crawled Property and store the content from the meta element. The Crawled Property can then be mapped to Managed Properties to return, filter or sort query results.”
Another useful option was mad possible by a user’s request: making it possible to add query string parameters to crawled properties. This allows more information to be displayed in the search index. Unfortunately this option is not available out-of-the-box and it has to be programmed using content enrichment.
Enterprise search on SharePoint 2013 still needs to be tweaked and fine-tuned, especially as users’ search demands become more complex. It makes us wonder when Microsoft will release the next SharePoint installment and if the next upgrade will resolve some of these issues or will it unleash a brand new slew of problems? We cannot wait for that can of worms.
Whitney Grace, August 25, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
A Call for More Friendly Enterprise Search Results
August 10, 2015
An idea from ClearBox Consulting would bring enterprise search results in line with today’s online searches. The company’s blog asserts, “Enterprise Search? We Need Some Answers on a Card.” Writer Sam Marshall likes the way Google now succinctly presents key information about a user’s query in a “card” at the top of the results page, ahead of the old-school list of relevant links. For example, he writes:
“Imagine you want to know the time of the next train between two cities. When you type this into Google, the first hit isn’t a link to a site but a card like the one below. It not only gives the times but also useful additional information: a map, trip duration, and tabs for walking, driving, and cycling. Enterprise search isn’t like this. The same query on an intranet gives the equivalent of a link to a PDF containing the timetable for the whole region. It’s like saying ‘here’s the book, look it up yourself’. This is not only a poor user experience for the employee, but a direct cost to the employer in wasted time. I’d like to see enterprise search move away from results pages of links to providing pages of answers too, and cards are a powerful way of doing this.”
Marshall emphasizes some advantage of the card approach: the most important information is right there, separated from related but irrelevant data; cards work better on mobile devices; and cards are user-friendly. Besides, he notes, since this format is now popular with sites from Facebook to Twitter, users are becoming familiar with them.
The card concept could be enhanced, Marshall continues, by personalizing results to the individual—tapping into employee profiles or even GPS data. For more information, see the article; it utilizes a hypothetical query about paternity leave to well-illustrate its point. Though enterprise search is not exactly known for living on the cutting edge of technology, developers would be foolish not to incorporate this (or a similar) efficient format.
Cynthia Murrell, August 10, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Open Source Boundaries
July 3, 2015
Now here is an interesting metaphor to explain how open source is sustainable. On OpenSource.com, Bryan Behrenshausen posted the article, “Making Collaboration Sustainable” that references the famous scene from Tom Sawyer, where the title character is forced to whitewash a fence by his Aunt Polly. He does not want to do it, but is able to persuade his friends that whitewashing is fun and has them pay him for the privilege.
Jim Whitehurst refers to it as the “Tom Sawyer” model, where organizations treat communities as gullible chumps who will work without proper compensation. It is a type of crowdsourcing, where the organizations benefit from the communities’ resources to further their own goals. Whitehurst continues that this is not a sustainable approach to crowdsourcing. It could even backfire at some point.
He continues to saw open source requires a different mindset, one that has a commitment from its contributors and everyone is equal and must be treated/respected for their efforts.
“Treating internal and external communities as equals, really listening to and understanding their shared goals, and locating ways to genuinely enhance those goals—that’s the key to successfully open sourcing a project. Crowdsourcing takes what it can; it turns people and their ideas into a resource. Open sourcing reciprocates where it can; it channels people and their ideas into a productive community.”
The entire goal of open source is to work with a community that coalesces around shared beliefs and passions. Behrenshausen finishes with that an organization might find themselves totally changed by engaging with an open source community and it could be for the better. Is that a good thing or a bad thing? It is, however, concerning for enterprise search solutions.
Whitney Grace, July 3, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Prepare To Update Your Cassandra
June 2, 2015
It is time for an update to Apache’s headlining, open source, enterprise search software! The San Diego Times let us know that “DataStax Enterprise 4.7 Released” and it has a slew of updates set to make open source search enthusiasts drool. DataStax is a company that built itself around the open source Apache Cassandra software. The company specializes in enterprise applications for search and analytics.
The newest release of DataStax Enterprise 4.7 includes several updates to improve a user’s enterprise experience:
“…includes a production-certified version of Cassandra 2.1, and it adds enhanced enterprise search, analytics, security, in-memory, and database monitoring capabilities. These include a new certified version of Apache Solr and Live Indexing, a new DSE feature that makes data immediately available for search by leveraging Cassandra’s native ability to run across multiple data centers.”
The update also includes DataStax’s OpCenter 5.2 for enhanced security and encryption. It can be used to store encryption keys on servers and to manage admin security.
The enhanced search capabilities are the real bragging points: fault-tolerant search operations-used to customize failed search responses, intelligent search query routing-queries are routed to the fastest machines in a cluster for the quickest response times, and extended search analytics-using Solr search syntax and Apache Spark research and analytics tasks can run simultaneously.
DataStax Enterprise 4.7 improves enterprise search applications. It will probably pull in users trying to improve their big data plans. Has DataStax considered how its enterprise platform could be used for the cloud or on mobile computing?
Whitney Grace, June 2, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Welcome YottaSearch
May 26, 2015
There is another game player in the world of enterprise search: Yotta Data Technologies announced their newest product: “Yotta Data Technologies Announces Enterprise Search And Big Data Analytics Platform.” Yotta Data Technologies is known for its affordable and easy to use information management solutions. Yotta has increased its solutions by creating YottaSearch, a data analytics and search platform designed to be a data hub for organizations.
“YottaSearch brings together the most powerful and agile open source technologies available to enable today’s demanding users to easily collect data, search it, analyze it and create rich visualizations in real time. From social media and email for Information Governance and eDiscovery to web and network server logs for Information Technology Operations Analytics (ITOA), YottaSearch™ provides the Big Data Analytics for users to derive information intelligence that may be critical to a project, case, business unit or market.”
YottaSearch uses the popular SaaS model and offers users not only data analytics and search, but also knowledge management, information governance, eDiscovery, and IT operations analytics. Yotta decided to create YottaSearch to earn revenue from the burgeoning big data market, especially the enterprise search end.
The market is worth $1.7 billion, so Yotta has a lot of competition, but if they offer something different and better than their rivals they stand a chance to rise to the top.
Whitney Grace, May 26, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Long-term Plans for SharePoint
May 21, 2015
Through all the iterations of SharePoint, it seems that Microsoft has wised up and is finally giving customers more of what they want. The release of SharePoint Server 2016 shows a shift back toward on-premises installations, and yet there will still be functions supported through the cloud. This new hybrid emphasis provides a third pathway through which users are experiencing SharePoint. The CMS Wire article, “3 SharePoint Paths for the Next 10 Years,” covers all the details.
The article begins:
“Microsoft Office 365 has proven to be a major disruption of how companies use SharePoint to meet business requirements. Rumors, fear, uncertainty and doubt proliferate around Microsoft’s plans for SharePoint’s future releases, as well as the support of critical features and functionality companies rely on . . . So, taking into account Office 365, the question is: How will companies be using SharePoint over the next 10 years?”
Stephen E. Arnold of ArnoldIT.com is a leader in SharePoint, with a lifelong career in search. His SharePoint feed is a great resource for users and managers alike, or anyone who needs to keep on top of the latest developments. It may be that the hybrid solution is a way to keep on-premises users happy while they still benefit from the latest cloud functions like Delve and OneDrive.
Emily Rae Aldridge, May 21, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Searching Bureaucracy
May 19, 2015
The rise of automatic document conversion could render vast amounts of data collected by government agencies useful. In their article, “Solving the Search Problem for Large-Scale Repositories,” GCN explains why this technology is a game-changer, and offers tips for a smooth conversion. Writer Mike Gross tells us:
“Traditional conversion methods require significant manual effort and are economically unfeasible, especially when agencies are often precluded from using offshore labor. Additionally, government conversion efforts can be restricted by document security and the number of people that require access. However, there have been recent advances in the technology that allow for fully automated, secure and scalable document conversion processes that make economically feasible what was considered impractical just a few years ago. In one particular case the cost of the automated process was less than one-tenth of the traditional process. Making content searchable, allowing for content to be reformatted and reorganized as needed, gives agencies tremendous opportunities to automate and improve processes, while at the same time improving workflow and providing previously unavailable metrics.”
The write-up describes several factors that could foil an attempt to implement such a system, and I suggest interested parties check out the whole article. Some examples include security and scalability, of course, as well as specialized format and delivery requirements, and non-textual elements. Gross also lists criteria to look for in a vendor; for instance, assess how well their products play with related software, like scanning and optical character recognition tools, and whether they will be able to keep up with the volumes of data at hand. If government agencies approach these automation advances with care and wisdom, instead of reflexively choosing the lowest bidder, our bureaucracies’ data systems may actually become efficient. (Hey, one can dream.)
Cynthia Murrell, May 19, 2015
Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com
Hybrid Is Essential to SharePoint 2016
May 19, 2015
It looks like SharePoint is planning to bring the cloud to its SharePoint Server 2016 users at critical points, rather than forcing them to go “all cloud.” This technique allows Microsoft to continue with the cloud-based services that they have invested in, while improving the on-premises experience that users are demanding. ZDNet covers the whole story in their article, “Microsoft’s SharePoint 2016: What’s Hybrid Got to do With It?”
The article sums up the much talked about hybrid approach:
“Though it will run on top of Windows Server 2016 R2 and/or Windows Server 2016, SharePoint 2016 will include support for what Microsoft calls ‘cloud-accelerated experiences,’ meaning new hybrid scenarios . . . Instead of trying to push all SharePoint users and all SharePoint workloads to the cloud, Microsoft is acknowledging there are some reasons (compliance among them) that not all data can or should be in SharePoint Online. That said, Microsoft wants to enable its SharePoint users to get at their data wherever it’s stored.”
Stephen E. Arnold is a lifelong leader in search and a long-time expert in SharePoint. He keeps managers and users updated on the latest SharePoint news through his Web service ArnoldIT.com. All eyes should stay peeled for continuing developments, as users get closer to seeing a public release of SharePoint Server 2016.
Emily Rae Aldridge, May 19, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Archive.is Preserves Online Information
May 18, 2015
Today’s information seekers use the Internet the way some of used reference books growing up. Unlike the paper tomes on our dusty bookshelves, however, websites can change their content without so much as a by-your-leave. Suggestions for preserving online information can be found in “Create Publicly Available Web Page Archives with Archive.is” at gHacks.net.
Writer Martin Brinkmann begins by listing several local options familiar to many of us. There’s Ctrl-s, of course, and assorted screenshot-saving methods. Website archivers like Httrack perform their own crawls and save the results to the user’s local machine. Remotely, Archive.org automatically creates snapshots of prominent sites, but users cannot control the results. Enter Archive.is. Brinkmann writes:
“Archive.is is a free service that helps you out. To use it, paste a web address into the form on the services main page and hit submit url afterwards. The service takes two snapshots of that page at that point in time and makes it available publicly. The first takes a static snapshot of the site. You find images, text and other static contents included while dynamic contents and scripts are not. The second snapshot takes a screenshot of the page instead. An option to download the data is provided. Note that this downloads the textual copy of the site only and not the screenshot. A Firefox add-on has been created for the service which may be useful to some of its users. It creates automatic snapshots of every web page that you bookmark in the web browser after installation of the add-on.”
Wow, don’t set and forget that Firefox option! In fact, the article cautions, be mindful of the public availability of every Archive.is snapshot; Brinkmann reasonably suggests the tool could benefit from a password feature. Still, this could be an option to preserve important (but, for the prudent, impersonal) information found online.
Cynthia Murrell, May 18, 2015
Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com

