Internet Archive: The Bono Books

October 16, 2017

I read “Books from 1923 to 1941 Now Liberated!” The collection is based on books which libraries can scan. The write up explains the provision of the US copyright law which makes these books eligible for inclusion in the Internet Archive. Hopefully libraries will find the resources to contribute books. I did some spot checks. One gap is history books. There are others. This is an excellent effort. The interface to the Bono books retains the Internet Archive’s unique approach to interfaces; for example, clicking on a book displays the scanned pages. Clicking on a page turns the page. The outside edge of the scanned image allows one to “jump” to a particular page. Getting back to a book’s table of contents takes a bit of effort, however. Those looking for anthologies can find a collection of 20th century poetry by hunting. The search system is just good enough. Worth checking out. Libraries, scan those history books. Who doesn’t love Theodor Mommsen’s early work?

Stephen E Arnold, October 16, 2017

Enterprise Search: Still Floundering after All These Years

October 11, 2017

Enterprise search conferences once had pride of place. Enterprise search or “search” was the Big Data, artificial intelligence, and cyber intelligence solution from 1998 to 2007.

But by 2007, the fanciful claims of enterprise search vendors were perceived as “big hat, no cattle” posturing. Unable to generate sustainable revenues, the high profile enterprise search systems began looking for a buyer. Those who failed disappeared. Do you know where Convera, Delphes, Entopia, and Siderean are today? What’s the impact of Exalead on Dassault? Autonomy on Hewlett Packard Enterprise? Vivisimo on IBM?

Easy questions to ignore. Time marches on. Proprietary search cost a bundle to keep working. The “fix” to the development, enhancement, and bug fix problems was open source.

A solution emerged. Lucene. That brings us to the title of this blog post: “Enterprise Search: Still Floundering after All These Years.”

The money from license fees is insufficient to make enterprise search work in a good enough way. Open source search, which seems to be largely free of license fees, allows vendors to offer search and highly profitable services to the organizations who want or need an “enterprise search system.”

This means that a vendor who makes more money offering search services can be perceived as a problem to an venture funded company built on promises and tens of millions in venture capital.

The truth of this observation was revealed in an article written by or for Search Technologies, a unit of a Fancy Dan consulting firm. If I understand the Search Technologies’ write up, Lucidworks (né Lucid Words) told Search Technologies that it was not welcome at a conference designed to promote Solr.

Here’s what Search Technologies said in “Why Wasn’t Search Technologies at Lucene/Solr Revolution 2017?”

Lucene/Solr Revolution’s organizer, Lucidworks, informed us that we were no longer welcome to exhibit or speak at the event. Lucidworks considered us a company that:

  • Competes with their professional services group (maybe)
  • Is not likely to resell Lucidworks’ platform exclusively (we are vendor-agnostic, after all), and,
  • Has technology assets that compete with their Fusion platform (partially true)

I don’t care too much about venture funded outfits running conferences to make their “one true way” evident to the attendees. I don’t worry about a blue chip consulting firm’s ability to generate sales leads.

No.

I find that some of enterprise search’s most problematic weaknesses have not been solved after 50 years of flailing. Examples include:

  • The cost of moving beyond “good enough” information access
  • Revealing that enterprise search systems are expensive to tune and shape to the needs of an organization
  • Developing solutions which keep indexes current and searches responsive
  • Seamless handling different types of content, including video, engineering drawings, and data tucked inside legacy systems
  • Keeping the majority of the users happy so bootleg search systems are not installed to meet departmental or operating unit needs.

The “search” problem is an illustration of innovation running out of gas. I have zero stake in Lucidworks, Search Technologies, or enterprise search. I am content to be an observer who points out that search vendors, their marketing, the consultants, and the conference organizers are their own worst enemy.

That’s why enterprise search imploded about a decade ago. Search today is pretty much “good enough.” Antidot, Lucene, Solr, dtSearch, X1, Fabasoft, Funnelback, et al. Each does “good enough” search in my opinion.

To make any system better takes consulting and engineering services. These deliver high margins. Users? Well, users want enterprise search to answer questions and work like Google. After 50 years of effort, no company has been able to meet the users’ needs.

That says more than two consulting firms trading digital jabs. What’s at stake is consulting revenue and proprietary fixes. Users? Yes, what about the users?

Stephen E Arnold, October 10, 2017

Microsoft and Open Source Software: Cost Cutting Tactic or a RedHat Type Play

September 1, 2017

Short honk: We were delighted to read “Windows 10: New Feature Sees Microsoft Blur the Line between Windows and Linux.” The write up explains that Windows allows a person to move outputs to a Linux distribution.

Few have covered Microsoft’s dalliance with Solr and the increased interest in using open source software to reduce development costs at Microsoft.

I suppose that’s understandable. The new president is not giving talks about following in the footsteps of IBM which has based dear old Watson on Lucene, home brew code, and technology from acquisitions.

Open source is an easy way to reduce development costs, keep pace with the innovations from the “community,” and free up time for marketing and sales.

Microsoft is becoming a close cousin to IBM, complete with major league strike outs like the Windows phone adventure.

A more significant misperception appears in the write up. I noted this passage:

The Free Software Foundation Europe, has previously said Microsoft’s gradual acceptance of Linux is a compliment, and a net gain for the Free Software movement.

Microsoft’s enthusiasm for some open source technology may be a precursor of Microsoft’s getting in the open source software business, emulating or duplicating the business models of RedHat and Elastic (the Elasticsearch folks).

Worth watching.

Stephen E Arnold, September 1, 2017

Support for Open Source AI from Financial Firms

August 31, 2017

Financial tech reporter Ian Allison at the International Business Times finds it interesting that financial services firms are joining tech companies like Google and Microsoft in supporting open source AI solutions. In his piece, “Finance and Artificial Intelligence Are Going ‘Fintech’ and Open Source,” Allison points to one corporate software engineer as instrumental to the trend:

QR Capital Management was probably patient zero when it came to opening up their code around data storage – and this move, shepherded by software engineer Wes McKinney, kickstarted the popular Pandas libraries project. Now he has returned to open source work at Two Sigma. We have also seen open source data storage offerings coming out of Man AHL in the form of Arctic. Taking part in a panel on open source infrastructure, McKinney said investment in an open source project yields dividends later: data storage underlies other verticals, and when other people use the software and build libraries on top of it, that makes in-house systems more compatible.

See this link for more about the panda’s library. In the same panel Allison cites above, participants were asked how best to sustain the open source community. McKinney gave this advice:

I feel a compulsion not to let open source projects die. But without sponsorship it can become hard to sustain. So when commercials ask me how they can help, I say sponsor an individual – to triage issues, do patches; that goes a long way.

So, what industry will be next to throw its weight behind open source projects?

Cynthia Murrell, August 31, 2017

 

A Brilliant List of Open Source Localization Tools

August 24, 2017

Open source projects over technology developers the ability to access technology usually locked behind pay walls.  One trouble with open source technology is language translation and the ability for developers to localize their projects.  Language continues to remain a barrier in our technology driven world, but there are tools to overcome it.  OpenSource.com curated a list of, “18 Open Source Translation Tools To Localize Your Project.”

The curator understands the pains of proprietary software:

The proprietary versions of these tools can be quite expensive. A single license for SDL Trados Studio (the leading CAT tool) can cost thousands of euros, and even then it is only useful for one individual and the customizations are limited (and psst, they cost more, too). Open source projects looking to localize into many languages and streamline their localization processes will want to look at open source tools to save money and get the flexibility they need with customization.

The list includes tools for machine translation, which is a hot commodity.  Software that can generate a digestible and accurate translation from one language to another is a must have for many localization projects.  The list recommends checking out Apertium and Moses.  Computer-assisted translation tools are a must have for all translations and language students, because they can save hours of looking up information in dead tree lexicons.  They also work in real time, saving more countless hours, so you should check out OmegaT, Subtitles Translator, and Anaphraseus.  If you are working with multiple translators on your project you will need to utilize a translation management system to organize everyone-think SharePoint.  Jabylon, Zanata, GlobalSight, and Pootle are some good TMS software to check out.  Also included are localization automation tools that can ease your work burden, such as Okapi Framework and Mojito.

Whitney Grace, August 24, 2017

DARPA Open Catalog

January 18, 2017

If you are interested in DARPA’s open catalog of open source software, you can find the pointers at this link. The public facing Web site does not provide the names of the companies or research organizations working on the software. The cyber-related listings available in 2015 and early 2-16 no longer appear. Links do point to the program manager for specific projects; for example, the office responsible to ADAMS which detects anomalies in Big Data sets. For generalists interested in DARPA Dark Web projects, the information is difficult to locate using open source tools. The change in the scope of the public facing Open Catalog appears to have taken place July 2016. Some information about specific software can be located if one knows the name of a research entity involved in the Memex project; for example, a query for Stanford University’s DeepDive which was updated in early 2016. One use of DeepDive is to identify spouses in the news.

Stephen E Arnold, January 18, 2017

Google Looks to Curb Hate Speech with Jigsaw

January 6, 2017

No matter how advanced technology becomes, certain questions continue to vex us. For example, where is the line between silencing expression and prohibiting abuse? Wired examines Google’s efforts to walk that line in its article, “Google’s Digital Justice League: How Its Jigsaw Projects are Hunting Down Online Trolls.” Reporter Merjin Hos begins by sketching the growing problem of online harassment and the real-world turmoil it creates, arguing that rampant trolling serves as a sort of censorship — silencing many voices through fear. Jigsaw, a project from Google, aims to automatically filter out online hate speech and harassment. As Jared Cohen, Jigsaw founder and president, put it, “I want to use the best technology we have at our disposal to begin to take on trolling and other nefarious tactics that give hostile voices disproportionate weight, to do everything we can to level the playing field.”

The extensive article also delves into Cohen’s history, the genesis of Jigsaw, how the team is teaching its AI to identify harassment, and problems they have encountered thus far. It is an informative read for anyone interested in the topic.

Hos describes how the Jigsaw team has gone about instructing their algorithm:

The group partnered with The New York Times (NYT), which gave Jigsaw’s engineers 17 million comments from NYT stories, along with data about which of those comments were flagged as inappropriate by moderators.

Jigsaw also worked with the Wikimedia Foundation to parse 130,000 snippets of discussion around Wikipedia pages. It showed those text strings to panels of ten people recruited randomly from the CrowdFlower crowdsourcing service and asked whether they found each snippet to represent a ‘personal attack’ or ‘harassment’. Jigsaw then fed the massive corpus of online conversation and human evaluations into Google’s open source machine learning software, TensorFlow. …

By some measures Jigsaw has now trained Conversation AI to spot toxic language with impressive accuracy. Feed a string of text into its Wikipedia harassment-detection engine and it can, with what Google describes as more than 92 per cent certainty and a ten per cent false-positive rate, come up with a judgment that matches a human test panel as to whether that line represents an attack.

There is still much to be done, but soon Wikipedia and the New York Times will be implementing Jigsaw, at least on a limited basis. At first, the AI’s judgments will be checked by humans. This is important, partially because the software still returns some false positives—an inadvertent but highly problematic overstep. Though a perfect solution may be impossible, it is encouraging to know Jigsaw’s leader understands how tough it will be to balance protection with freedom of expression. “We don’t claim to have all the answers,” Cohen emphasizes.

Cynthia Murrell, January 6, 2017

Lucidworks Sees Watson as a Savior

December 21, 2016

Lucidworks (really?). A vision has appeared to the senior managers of Lucidworks, an open source search outfit which has ingested $53 million and sucked in another $6 million in debt financing in June 2016. Yep, that Lucidworks. The “really” which the name invokes is an association I form when someone tells me that commercializing open source search is going to knock off the pesky Elastic of Elasticsearch fame while returning a juicy payoff to the folks who coughed up the funds to keep the company founded in 2007 chugging along. Yep, Lucid works. Sort of, maybe.

I read “Lucidworks Integrates IBM Watson into Fusion Enterprise Discovery Platform.” The write up explains that Lucidworks is “tapping into” the IBM Watson developer cloud. The write up explains that Lucidworks has:

an application framework that helps developers to create enterprise discovery applications so companies can understand their data and take action on insights.

Ah, so many buzzwords. Search has become applications. “Action on insights” puts some metaphorical meat on the bones of Solr, the marrow of Lucidworks. Really?

With Watson in the company’s back pocket, Lucidworks will deliver. I learned:

Customers can rely on Fusion to develop and deploy powerful discovery apps quickly thanks to its advanced cognitive computing features and machine learning from Watson. Fusion applies Watson’s machine learning capabilities to an organization’s unique and proprietary mix of structured and unstructured data so each app gets smarter over time by learning to deliver better answers to users with each query. Fusion also integrates several Watson services such as Retrieve and Rank, Speech to Text, Natural Language Classifier, and AlchemyLanguage to bolster the platform’s performance by making it easier to interact naturally with the platform and improving the relevance of query results for enterprise users.

But wait. Doesn’t Watson perform these functions already. And if Watson comes up a bit short in one area, isn’t IBM-infused Yippy ready to take up the slack?

That question is not addressed in the write up. It seems that the difference between Watson, its current collection of partners, and affiliated entities like Yippy are vast. The write up tells me:

customers looking for hosted, pre-tuned machine learning and natural language processing capabilities can point and click their way to building sophisticated applications without the need for additional resources. By bringing Watson’s cognitive computing technology to the world of enterprise data apps, these discovery apps made with Fusion are helping professionals understand the mountain of data they work with in context to take action.

This sounds like quite a bit of integration work. Lucidworks. Really?

Stephen E Arnold, December 21, 2016

IBM Open Sourciness Goes Only So Far

December 19, 2016

I love IBM, Big Blue, creator of Watson. Watson, as you may know, is a confection consisting of goodies from IBM’s internal code wizards, acquired technologies like the instantly Big Data friendly Vivisimo, and Lucene. Yep, like Attivio and many other “search” vendors, open source Lucene is the way to reduce the costs for basic information retrieval.

I assume you know about OpenLava, which is an open source system for managing certain types of IBM systems. The Open Lava Web page here states:

With an active community of users and developers, OpenLava development is accelerating, delivering high-quality implementations of important new features including:

  • Fair-share scheduling – allocate resources between users and groups according to configurable policies
  • Job pre-emption – Ensure that critical users, jobs and groups have the resources they need – when they need them
  • Docker support – Providing application isolation, fast service deployment and cloud mobility
  • Cloud & VM friendly auto-scaling – Easily add or remove cluster nodes on the fly without cluster re-configuration

These features are in addition to the many advanced capabilities already in OpenLava including job arrays, run-windows, n-way host failover, job limits, dependencies for multi-step workflows, parallel job support and much more.

I read “OpenLava under IBM Attack.” I believe everything I read on the Internet. The write up explains that that Big Blue wants the OpenLava open source code removed. The write up states:

IBM claims that the versions of OpenLava starting from 3.0 infringe their copyright
and that some source code have been stolen from them, copied, or otherwise taken
from their code base.

Several thoughts:

  1. The folks involved with OpenLava did knowingly and intentionally rip off IBM’s software, and the marketer of Watson and its open source tinged Watson is taking a logical and appropriate action against the open source alternative to IBM’s own management software
  2. IBM is unhappy with OpenLava’s adoption by IBM customers. IBM customers should buy only software from IBM-authorized sources. Other old school enterprise software companies have this philosophy too.
  3. There is a failure to communicate. OpenLava is not making its case understandable to the outfit poised to hire 25,000 more employees and IBM is not making itself clear to the crafty folks at OpenLava.

I don’t have a dog in the fight. But I find it interesting that IBM Watson with its Lucene tinged capabilities is finding open source distasteful in some circumstances.

Life was far simpler when open source projects were more malleable. Next stop? The legal eagles’ nests.

Stephen E Arnold, December 19, 2016

Tor Phone to Take on Google

December 13, 2016

Tor users have nil or very limited options to surf Underground Web anonymously as Android-powered phones still manage to scrape user data. The Tor Project intends to beat Google at its own game with Tor-enabled smartphone.

An article that appeared on arsTechnica and titled Tor Phone Is Antidote to Google “Hostility” Over Android, Says Developer, says:

The prototype is meant to show a possible direction for Tor on mobile. We are trying to demonstrate that it is possible to build a phone that respects user choice and freedom, vastly reduces vulnerability surface, and sets a direction for the ecosystem with respect to how to meet the needs of high-security users.

The phone is powered by custom-made CopperHead OS and can be run only on Google Nexus or Pixel hardware phones. Of course due to high technicalities involved, it is recommended only for Linux geeks.

For voice calls, according to the article:

To protect user privacy, the prototype runs OrWall, the Android firewall that routes traffic over Tor, and blocks all other traffic. Users can punch a hole through the firewall for voice traffic, for instance, to enable Signal.

Google’s Android is an Open Source platform that OEMs can customize. This creates multiple security threats enabling hackers and snoopers to create backdoors. CopperHead OS, on the other hand, plugs these security holes with verified boot and also stops Google Play Store from overriding native apps. Seems the days of mobile Tor are finally here.

Vishal Ingole, December  13, 2016

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta