More Open Source Smart Software
January 15, 2016
The gift giver this time is Baidu. Navigate to “Baidu Open-Sources Its WARP-CTC Artificial Intelligence Software.” Baidu’s method is call the connectionist temporal classification or CTC method. Is the innovation from the Middle Kingdom? Nah. Switzerland. You know, the country where Einstein whacked away with his so so computational skills.
According to the write up:
The CTC approach involves recurrent neural networks (RNNs), an increasingly common component used for a type of AI called deep learning. Recurrent nets have been shown to work well even in noisy environments.
Have at the code, gentle read. The link is https://github.com/baidu-research/warp-ctc
Stephen E Arnold, January 14, 2016
Open Source Data Management: It Is Now Easy to Understand
January 10, 2016
I read “16 for 16: What You Must Know about Hadoop and Spark Right Now.” I like the “right now.” Urgency. I am not sure I feel too much urgency at the moment. I will leave that wonderful feeling to the executives who have sucked in venture money and have to find a way to generate revenue in the next 11 months.
The article runs down the basic generalizations associated with each of these open source data management components:
- Spark
- Hive
- Kerberos
- Ranger/Sentry
- HBase/Phoenix
- Impala
- Hadoop Distributed File System (HDFS)
- Kafka
- Storm/Apex
- Ambari/Cloudera Manager
- Pig
- Yarn/Mesos
- Nifi/Kettle
- Knox
- Scala/Python
- Zeppelin/Databricks
What the list tells me is two things. First, the proliferation of open source data tools is thriving. Second, there will have to be quite a few committed developers to keep these projects afloat.
The write up is not content with this shopping list. The intrepid reader will have an opportunity to learn a bit about:
- Kylin
- Atlas/Navigator
As the write up swoops to its end point, I learned about some open source projects which are a bit of a disappointment; for example, Oozie and Tez.
The key point of the article is that Google’s MapReduce which is now pretty long in the tooth is now effectively marginalized.
The Balkanization of data management is evident. The challenge will be to use one or more of these technologies to make some substantial revenue flow.
What happens if a company jumps on the wrong bandwagon as it leaves the parade ground? I would suggest that it may be more like a Pig than an Atlas. The investors will change from Rangers looking for profits to Pythons ready to strike. A Spark can set fire to some hopes and dreams in the Hive. Poorly constructed walls of Databricks can come falling down. That will be an Oozie.
Dear old Oracle, DB2, and SQLServer will just watch.
Stephen E Arnold, January 10, 2016
Short Honk: Hadoop Ecosystem Made Clear
January 3, 2016
Love Hadoop. Love all things Hadoopy? You will want to navigate to “The Hadoop Ecosystem Table.” You have categories of Hadoopiness with examples of the Hadoop amoebae. You are able to see where Spark “fits” or Kudu. Need some document data model options? The table will deliver: ArangoDB and more. Useful stuff.
Stephen E Arnold, December 30, 2015
The Importance of Google AI
December 23, 2015
According to Business Insider, we’ve all been overlooking something crucial about Google. Writer Lucinda Shen reports, “Top Internet Analyst: There Is One Thing About Google that Everyone Is Missing.” Shen cites an observation by prominent equity analyst Carlos Kirjner. She writes:
“Kirjner, that thing [that everyone else is missing] is AI at Google. ’Nobody is paying attention to that because it is not an issue that will play out in the next few quarters, but longer term it is a big, big opportunity for them,’ he said. ‘Google’s investments in artificial intelligence, above and beyond the use of machine learning to improve character, photo, video and sound classification, could be so revolutionary and transformational to the point of raising ethical questions.’
“Even if investors and analysts haven’t been closely monitoring Google’s developments in AI, the internet giant is devoted to the project. During the company’s third-quarter earnings call, CEO Sundar Pichai told investors the company planned to integrate AI more deeply within its core business.”
Google must be confident in its AI if it is deploying it across all its products, as reported. Shen recalls that the company made waves back in November, when it released the open-source AI platform TensorFlow. Is Google’s AI research about to take the world by storm?
Cynthia Murrell, December 23, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Open Source Survey: One Big Surprise about Code Management
November 23, 2015
I read “Awfully Pleased to Meet You: Survey Finds Open Source Needs More Formal Policies.”
The fact that eight out of 10 outfits in the sample were using open source software was no surprise. The sponsor of the survey is open source centric.
The point I highlighted was:
According to the study, less than 42% of organizations maintain a IT Asset Management (ITAM) style inventory of open source components.
When I read this, I thought, “Who keeps track of the open source components?”
The answer in more than half the companies in the sample was, “Huh? What?”
I circled this point:
Shipley [Black Duck top dog] has also added the following comment, “In the results this year, it has become more evident that companies need their management and governance of open source to catch up to their usage. This is critical to reducing potential security, legal, and operational risks while allowing companies to reap the full benefits OSS provides.”
Is the reason companies spend money with open source commercial plays buying management? If that is the case, the successful commercial open source outfit is the one that has the ability to manage, not the technology and trends the marketers at certain commercial open source companies hype.
Stephen E Arnold, November 23, 2015
Lucidworks: Another $21 Million in Funding
November 19, 2015
Lucidworks (a eight year old “start up” founded in 2007) has raised an additional $21 million in funding. According to Crunchbase, the total funds injected into the open source centric company is now $53 million.
The news release “Lucidworks Announces $21 Million in Series D Funding” states:
Lucidworks, the chosen search solution for leading brands and organizations around the world, today announced $21 million in new financing. Allegis Capital led the round with participation from existing investors Shasta Ventures and Granite Ventures. Lucidworks will use the funds to accelerate its product-focused mission enabling companies to translate massive amounts of data into actionable business intelligence.
The statement included this observation attributed to Spencer Tail, Allegis Capital:
Lucidworks has proven itself, not only by providing the software and solutions that businesses need to benefit from Lucene/Solr search, but also by expanding its vision with new products like Fusion that give companies the ability to fully harness search technology suiting their particular customers. We fully support Lucidworks, not only for what it has achieved to date — disruptive search solutions that offer real, immediate benefits to businesses — but for the promising future of its product technology.
Lucidworks, formerly Lucid Imagination, competes with Elastic. Companies from IBM to OpenSearchServer offer solutions which compete in the same market sector. Elastic’s funding is in the $104 million range.
The horses are away from the starting gate. And the winner will be a steed with the best jockey? Stay tuned because the track is muddy.
Stephen E Arnold, November 19, 2015
On the Prevalence of Open Source
November 11, 2015
Who would have thought, two decades ago, that open source code was going to dominate the software field? Vallified’s Philip O’Toole meditates on “The Strange Economics of Open-Source Software.” Though the industry gives so much away for free, it’s doing quite well for itself.
O’Toole notes that closed-source software is still in wide use, largely in banks’ embedded devices and underpinning services. Also, many organizations are still attached to their Microsoft and Oracle products. But the tide has been turning; he writes:
“The increasing dominance of open-source software seems particularly true with respect to infrastructure software. While security software has often been open-source through necessity — no-one would trust it otherwise — infrastructure is becoming the dominant category of open-source. Look at databases — MySQL, MongoDB, RethinkDB, CouchDB, InfluxDB (of which I am part of the development team), or cockroachdb. Is there anyone today that would even consider developing a new closed-source database? Or take search technology — elasticsearch, Solr, and bleve — all open-source. And Linux is so obvious, it is almost pointless to mention it. If you want to create a closed-source infrastructure solution, you better have an enormously compelling story, or be delivering it as part of a bigger package such as a software appliance.”
It has gotten to the point where developers may hesitate to work on a closed-source project because it will do nothing for their reputation. Where do the profits come from, you may ask? Why in the sale of services, of course. It’s all part of today’s cloud-based reality.
Cynthia Murrell, November 11, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Google Uses Ninja Death Strike for Smart Software
November 10, 2015
I read “Google Tries an Android for Machine Learning, Releasing Open Source AI System.” The write up draws a parallel with Google’s Android strategy. The idea is to make something available in order to get developers and then eye balls.
I noted this paragraph:
The best explanatory quote comes from Greg Corrado, a senior researcher, in Google’s video on the system, embedded below: “There should really be one set of tools that researchers can use to try out their crazy ideas. And if those ideas work, they can move them directly into products without having to rewrite the code.”
The article mentions that the monopolists in hope and practice are into smart software. Smart software means 24×7 analytic type activity without humans. Better. Faster. Cheaper. More lucrative if one outfit sweeps up most of the activity. The goal is advertising and a reasonable chance at the type of market dominance that warmed the cockles of Andrew Carnegie’s heart.
There is one idea which caught my attention. The article and most of the others about this announcement did not mention the erstwhile leader of cognitive computing. IBM Watson is smart software, and it has a DNA anchored in open source, acquired technology, and the scripts of IBM researchers.
IBM Watson wants and needs its smart software to become a $1 billion business and pronto. Then IBM needs Watson to generate tens or hundreds of billions for the Big Blue stakeholders.
IBM is not an outfit with giving software away. I think that IBM will have to do a rethink and tap into Watson’s capabilities to find a tactic to get its smart software mojo back.
Did Google craft its open source play to blunt IBM? Nah. Google just wants to be Googley because being Alphabetty does not have the same cachet.
Does the Alphabet Google thing have a heart of gold and a weaponized open source strategy? Interesting question.
Stephen E Arnold, November 10, 2015
Open Source: A Bad Fit for Corporations?
November 9, 2015
I read “Corporations and OSS Do Not Mix.” The write up fooled me. I thought the approach was going to be that proprietary software vendors and open source code may find themselves at odds.
I was wrong.
The article explains that open source software and commercial organizations bump into licensing issues and some real world hurdles. The article states:
the joy and enthusiasm that I had when I started working on open source has been flattened. My attitude was naïve at best – this is fun and maybe I’m helping some other people do good and have fun too. This is also how a lot of my friends presently view their projects.
The list of challenges ranges from the selfishness of the commercial enterprise to dumb requests.
I also noted this passage:
Open source software is full of toxic people. This certainly shouldn’t be a surprise at this point. I would guess that it is safe to say that pretty much every person (including myself, I’m certainly not exempt from this) has had bad days and reacted poorly when dealing with the community, contributors, colleagues, etc. These are not excuses and these events can (and often do) shape the behaviors of the community and those observing it.
The article includes a list of positive ideas.
My hunch is that search vendors with proprietary software will become aggressive disseminators of the anti-open source possibilities of this write up.
That’s what makes search and content processing such credible business sectors.
Stephen E Arnold, November 9, 2015
Datafari Ventures into the Enterprise Search Jungle
October 29, 2015
A less-than-enthusiastic reader called out attention to Datafari, a new explorer of the enterprise search jungle. The software uses Solr and contains “the heart of a CMS.” The Datafari Web site explains:
A CMS allows for organizing collaboration within a company. But it is never monolithic, and only a federated search engine can fin the data wherever they are.
Datafari, Version 2.0 is explained in a video at this link. The system permits key word search and offers a point-and-click sidebar to facilitate exploration of the content.
A user can save a particular document to a Favorites folder. The system administrator can view log file data in a graphical format. Hit boosting is available as well.
A live demonstration is available at this link. When I visited the site, it appeared that I needed to load my own content into the system. I decided against taking this step.
If you are looking for an enterprise search system that can double as a content management system, Datafari may be for you. The company is located in France, so a trip for training could be an added bonus.
Stephen E Arnold, October 29, 2015