Open Source Management: A Work in Progress
March 20, 2014
I have attended a couple of “open source” events over the year. Most of the attendees are male, serious, bright, and similar to the fellows in my advanced high school math class and our math club.
The few women present were notable because there were so darned few of them. I attended the first Lucene Revolution with two exceptionally competent females, one a law librarian and one a PhD in operations management.
My recollection is that no one from Lucid, the sponsoring organizations, or the general attendance group paid either much, if any, attention, even when I introduced them.
As I recall, one of the then-senior executives of Lucid Imagination (now Lucid Works) blew off suggestions made by my PhD colleague. It was not what the Lucid person said. It was the facial expression that communicated, “Wow, do I have to listen to yet another idea from a PhD from Kentucky. I have better things to do with my really valuable time.”
I found the meeting amusing.
The female PhD did not share my point of view. Eighteen months later, that male Silicon Valley “superstar” was sucked by Lucid’s revolving door and spit into the ever sunny Silicon Valley job market. My team and I moved on, concluding that at least one open source search company was pretty much like my high school classmates in the math club. Others? Who knows? Who cares?
So what?
Well, I read a fascinating East Coasty article in the April Harper’s Magazine. The story is “The Office and its Ends”, a book extract from Cubed: A Secret History of the Workplace. Harper’s is into the Trotula recycling approach to content. Nikil Saval and his publisher Doubleday are, no doubt, thrilled by the East Coasty endorsement. Book sales are the name of the game.
Here’s the passage I noted. The extract is describing the workplace at GitHub, where much search source codes resides. The GitHub begins on page 14. You will have to snag a hard copy of the library on a newsstand, even though these are getting hard to find in rural Kentucky. Good hunting, gentle reader.
GitHub seems to be a case example of how to do the workplace.
The hook is in my opinion:
Chacon [the GitHub CIO and a founder] described this [the GitHub workplace approach] as having developed from the open source model: ‘You have all these projects that you can work on, and people choose the crossover of what they’re good at… Leadership can be ephemeral.’
No doubt about leadership ephemerality in open source companies. The whizzing of the revolving door can be discerned in Harrod’s Creek, Kentucky.
This passage struck me as one to underline:
Yet Scott Chacon, one of the company’s founders and its current CIO, kept referring to the value of employees’ being able to ‘serendipitously encounter’ one another throughout the workday. When I [Mr. Saval] asked Chacon how this was supposed to occur if most of the staff wasn’t actually required to come into the office, he explained that he wanted these encounters to be rare, once every month or two, and to ‘deeper interactions.’…’That’s way more valuable to me that ‘I saw this person when I was going to the bathroom,’ or ‘I had to wait in line behind them when I was waiting for food.’ It seemed to me [Mr. Saval] a valid rebuke to the lazier ideas the proliferated in office-design-speak around the world.
I think Mr. Saval sees GitHub as a model for other companies to emulate. There you go. A model for alleged harassment.
By chance, I came across a CNNMoney article “GitHub Suspends Founder over Gender Harassment Claims.” I have no idea if CNN was able to put sufficient resources into researching GitHub because most of the “news” efforts are directed at a missing airplane story. Nevertheless, I will assume the write up is semi-accurate. Here’s the snippet I noted:
“I’ve been harassed by ‘leadership’ at GitHub for two years,” she wrote. “I’m incredibly happy to moving to join a more healthy work environment, with a team who doesn’t tolerate harassment of their peers.”
I circled this passage as well:
It’s hardly the first time a female entrepreneur has pointed out sexism in tech. Last year, tech developer Adria Richards posted to Twitter after taking offense to a sexual reference made by male attendees at tech conference PyCon. One of the men who made the reference was fired, and in a bizarre twist, Richards was also fired for “publicly shaming the offenders.” In another incident at annual tech conference TechCrunch Disrupt, entrepreneurs came under fire for pitching controversial apps…
Several observations:
- I wonder how Mr. Saval perceives this situation. I am not sure the GitHub workplace is where I want my daughter to work. If Mr. Saval has a daughter, a wife, a female cousin, I wonder if he would use his connections at GitHub to get one of these females a job.
- I wonder if the Lucid Imagination former executive is aware that my PhD colleague could have interpreted his treatment of her as untoward behavior. My hunch is that the disconnect between this Silicon Valley warrior an an African American PhD was so great that bridging the gap was impossible. I wonder if the fellow from Lucid Imagination even knew there was a gap.
- What does this Janus-like approach at GitHub say about open source management methods? I have a few ideas, but I will tuck them in my pocket for now.
To wrap up, the East Coasty approach to open source is intriguing. How will other open source companies manage. Will the guy-centric math club approach change? At my 50th high school reunion, the math club folks sat by themselves. Some behaviors are consistent through time I believe.
The major challenge open source faces is management. I will clutch this assertion until someone demonstrates that whiz bang, I’m too busy, my plane is late methods really do deliver value to stakeholders and employees. With venture funding pouring into “open source plays”, how will these companies generate sufficient revenue to pay off the investors? Do Facebook, Google, IBM, and Yahoo have sufficient resources to buy every open source start up?
A decade ago even Google was smart enough to admit that it needed adult supervision. Even with an adult on the job, Google is a case study cornucopia; for example, the alleged relationship between a Google founder and a Glass marketer. Ample evidence appears to exist that high tech management has not found its sweet spot outside of the high school math club. If tech is the future of America’s industrial performance and open source software is the heir to proprietary software, when will management manage? One hopes in time to prevent the alleged unfortunate problems at GitHub from becoming more widespread.
Stephen E Arnold, March 20, 2014
Download A Free TemaTres Pack
March 11, 2014
Despite the dubious quality of the blog Home-Education. Free Download., they do make an interesting point with the post “TemaTres Pack.” Other than a link to a questionable download Web site, there is nothing in the post. What sort of knowledge can a user glen from a blog that was obviously made to house content and make a few cents on a dollar for the creator?
TemaTres is a legitimate open source vocabulary server developed to manage and exploit dictionaries, taxonomies thesauri, and other formal representations of knowledge. It can also be downloaded at SourceForge, trust this over the above link.
Open source is a key player in technology and software development. Proprietary and open source are ingrained with each other and it is difficult to discern where the line is drawn-except when money comes into play. This link to a TemaTres download begs the question: what does free downloads do to the business models of Smartlogic, Modeca, and other vocabulary management firms?
Companies are built on the entire premise of developing software to manage information, control vocabulary lists, and present it in a useful form. Open source is a boon to users, but is TemaTres going to dampen these companies’ profits? It is possible, but open source lacks the organization of a paying its developers and sometimes offering a robust solution without an IT professional.
Whitney Grace, March 11, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
For Big Insights Try A Big Download From IBM
March 10, 2014
IBM might not be the first name when it comes to open source, but they experiment in that area and they have offered a free, downloadable version of BigInsights. On IBM’s developerWorks page, IBM InfoSphere BigInsights Quick Start Edition can be downloaded without any strings. It was made available to anyone who wants to experience enterprise level features, play with Hadoop, and figure out what it can be used for.
IBM describes Infosphere BigInsights Quick Start Edition as:
“IBM InfoSphere BigInsights Quick Start Edition is a free, downloadable non-production version of BigInsights that enables new solutions that cost effectively turn large, complex volumes of data into insight by combining Apache Hadoop, (including the MapReduce framework and the Hadoop Distributed File Systems), with unique, enterprise-ready technologies and capabilities from across IBM, including Big SQL, text analytics and BigSheets.”
Can IBM use the word “big” to explain its product even more? Yes, they can, because they forgot to include big data solutions. This is, of course, a sales gimmick to entice people to buy the professional edition, but it has the open source benefits, especially in customer support and the IBM name.
Whitney Grace, March 10, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
Splunk: The Run Up May Have Hit a Glass Ceiling
March 1, 2014
I read “Splunk’s Q4 Expenses Run Hot as It Adds Salespeople.” I think of a Splunk as a search and data access system that helps make sense of log files. I know that Splunk does more, but once I get an idea in my head, it is sometimes overly persistent.
The write up presented some interesting information.
- Splunk is running up its expenses
- Some of the expenses are related to hiring sales people to make sales (obviously)
- Other costs were related to marketing a “hot” company’s wares.
Splunk is confident that the losses are anomalous.
I am not sure I agree. The simple reason is that Splunk’s success has given developers the idea that open source software can do what Splunk does better, faster, and cheaper. Usually, one has to pick two of these attributes.
But—and this is a big “but”—the thorn in Splunk’s side is Elasticsearch. The open source search system works wonders on some of the data that Splunk embraced. The Elasticsearch outfit is flush with cash from its recent round of funding. Even the azure chip “real journalist” operation at InfoWorld called Elasticsearch “hip.”
Other, probably less “hip” competitors like Lucid Works (formerly Lucid Imagination) want in on the Splunk game. Lucid wants to partner; Elasticseaerch wants to let its legions of developer fanatics take the company wherever the Elasticsearch technology makes sense.
In my opinion, Splunk has a developer perception problem. I am not sure hiring sales people and pumping money into marketing is going to blunt the short and mid term impact of the Elasticsearch juggernaut.
Stephen E Arnold, March 1, 2014
Quote to Note: Open Source Is a Little Pregnant
February 27, 2014
I came across “Why Is Atom Closed Source?” The thread had a very interesting statement from mojombo. I quote:
Atom won’t be closed source, but it won’t be open source either. It will be somewhere inbetween, making it easy for us to charge for Atom while still making the source available under a restrictive license so you can see how everything works. We haven’t finalized exactly how this will work yet. We will have full details ready for the official launch.
Several years ago I gave a talk and used this diagram to illustrate the spectrum of open source search software:
Some of my information explaining the diagram turned up in an azure chip consulting firm report. Well, that’s how the semi straight consulting firms work.
The point of the diagram is that open source software is on a path to be commercial software. The open source cheerleaders deny this trend. I, on the other hand, submit that the Atom quote makes it pretty darned clear that being a little pregnant is not much different from having a commercial baby. Open source is increasingly a marketing ploy with lipstick.
Stephen E Arnold, February 27, 2013
Log Files: Search, Short Cuts, and Low Costs
February 26, 2014
I read “Splunk Feels the Heat from Stronger, Cheaper Open Source Rivals.” InfoWorld is up to its old tricks again. Log files have been around for decades. Many organizations allow more recent entries to overwrite previous log files. I know that some people believe that this practice has gone the way of the dodo. Well, would you like to buy a bridge?
For those who keep log files and want to figure out what treasures nestle therein, an outfit has marketed an expensive “search” system. Splunk is the darling of many information technology gurus. In Washington, DC, I am surprised when laborers in the Federal vineyard do not sport a Splunk tattoo.
IDC’s view is that there is charge rolling down the road. The write up points out that Splunk is no longer limited. Like most information access systems, the company has expanded. In fact, the wizards at IDC parrot the jargon: Analytics. Here’s the passage I noted:
Splunk started strong and has only grown stronger as it’s branched out to become a wide-ranging analytics platform. But the free version of Splunk is quite limited, and the enterprise version’s pricing is based on the amount of data indexed, which adds up to prohibitive costs for some.
The important factoid is, in my opinion, cost. Most organizations want to reduce costs for some little understood information tasks. Making heads or tails out of the ever burgeoning and frequently overwritten log files may be at the top of the budget tightening list.
IDC, truly an expert in open source software, points out that “open source competition has been emerging in the background.” I suppose that’s why IDC is selling at $3,500 a whack analyses of open source such as this gem produced in part by IDC’s wizards. See Report 237410. Who wrote that? Worth a look I suppose.
The angle is that Graylog2 and Elasticsearch are chasing after Splunk. I am not sure if this is old news, good news, or silly news. What’s clear is that InfoWorld is covering open source and not emphasizing its deep research.
Cost control is a subtle point. I am delighted that the write up creeps up on one of the central attributes of open source software: No license fees. But what of the costs of installing, tuning, and maintaining the open source solution? Ah, not included in the write up. If you pony up $3,500 for an IDC open source report, I assume more substance is provided. Who wrote those IDC open source reports like 237410? Was it an IDC analyst, marketer, or reporter? Did the information come from another source?
Anyway, good PR for Elasticsearch. Bad PR for Splunk.
Stephen E Arnold, February 26, 2014
Elasticsearch Disrupts Open Source Search
February 17, 2014
I did a series of reports about open source search. Some of these were published under mysterious circumstances by that leader of the azure chip consultants, IDC. You can see the $3,500 per report offers on the IDC site. Hey, I am not getting the money, but that’s what some of today’s go go executives do. The list of titles appears below my signature.
Elasticsearch, a system that is based on Lucene, evolved after the still-in-use Compass system. What seems to have happened in the last six months is one of those singularities that Googlers seek.
In January 2014, GigaOM, a “real news” outfit reported that Elasticsearch had moved from free and open source to a commercial model. You can find that report in “6 million Downloads Later, Elasticsearch Launches a Commercial Product.” The write up equates lots of downloads with commercial success. Well, I am not sure that I accept that. I do know that Elasticsearch landed an additional $24 million in series B funding if Silicon Angle’s information is correct. Elasticsearch, armed with more money than the now aging and repositioning Lucid Works (originally Lucid Imagination) has. (An interview with one of the founders of Lucid Imagination, the precursor of Lucid Works is at http://bit.ly/1gvddt5. Mr. Krellenstein left Lucid Imagination abruptly shortly after this interview appeared.)
I noted that in February 2014, InfoWorld, owned by the publisher of the $3,500 report about Elasticsearch, called the company “ultra hip.” I don’t see many search companies—proprietary or open source—called “hip.” “Ultra Hip Elasticsearch Hits Commercial Release.” The write up asserts (although I wonder who provided the content):
Elasticsearch was originally spun off from the Compass project, an open source Java search engine framework, back in 2004, in an effort to create a highly scalable search solution. Built on top of the well-known and popular Lucene library from the Apache Software Foundation, Elasticsearch adds such features as multitenancy, sharding, faceted search, and a JSON-based REST API. This feature set puts it in competition with the Solr project as a complete search solution built on top of Lucene.
The statement does not hit what I thought are the main points of the Elasticsearch initiative. let me fill in the blanks. Perhaps an azure chip consultant can use these to whip up another $3,500 report?
Discover the Open Source Alternative to the Autonomy Crawler
February 7, 2014
Whether Autonomy’s product success is true or false, as proprietary software it comes with a large price tag. The average small business or user cannot afford to purchase HP Autonomy’s IDOL Crawler. Open source is the best alternative, but for the longest time you could not get software comparable to IDOL Crawler. Norconex says that has changed in the article, “An Open Source Crawler For Autonomy IDOL.” Norconex released an HP Autonomy IDOL Committer for its open source Web crawler Norconex HTTP Collector.
The HTTP Collector is available for Github. The developer encourages people to download it and contribute to the project. Its features are mostly the same as those from HP Autonomy HTTP Connector.
The article states:
“Most key features of HP Autonomy HTTP Connector are available in Norconex HTTP Collector, including document changes detection on incremental crawls and purging documents from IDOL for deleted web pages. New ones are introduced, such as having different hit interval at different time of the day and the ability to overwrite pretty much every part of the web crawling flow with your own implementation logic. The IDOL Committer has been tested on diverse public and internal web sites with great performance.”
We can learn from the open source community that if there is not a piece of software you want, all you have to do is wait until a developer makes it or you can take the initiative to do it yourself.
Whitney Grace, February 07, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
Open Source and a New Look for Ontopia
January 22, 2014
If you are thinking about building applications based on topic maps and do not feel like shelling out money for proprietary software, then do not look any further than Ontopia! Ontopia is an open source tools suite with features such as an ontology designer, a full-featured query language; web services points, database storage, and an instance data editor. There are many more powerful tools available with Ontopia outlined here.
Ontopia has been an on-going project in the open source community for over a decade and has an interesting history:
“The product suite is highly mature. Ontopia 1.0 was released in June 2001, and we are now nearing the release of Ontopia 5.1. Ontopia has been in production use in a number of commercial projects on three continents for many years now, and the core engine has been very stable over most of that period. Ontopia is open source and released under the Apache License 2.0. The entire product is released as open source. There are no proprietary add-ons, which are necessary to run it, or to make it suitable for an enterprise setting. Commercial support, however, is available.”
A developer community that has been attached to the project for years keeps up Ontopia and there are new participants from Europe. If you are curious about recent activity with Ontopia, they keep a page with Google Code and they also recently updated the Web site’s design.
Whitney Grace, January 22, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
Learn About the Open Source Alternative to ClearForest
January 22, 2014
Did you know that there was an open source version of ClearForest called Calais? Neither did we, until we read about it in the article posted on OpenCalais called, “Calais: Connect. Everything.” Along with a short instructional video, is a text explanation about how the software works. OpenCalais Web Service automatically creates rich semantic metadata using natural language processing, machine learning, and other methods to analyze for submitted content. A list of tags are generated and returned to the user for review and then the user can paste them onto other documents.
The metadata can be used in a variety of ways for improvement:
“The metadata gives you the ability to build maps (or graphs or networks) linking documents to people to companies to places to products to events to geographies to… whatever. You can use those maps to improve site navigation, provide contextual syndication, tag and organize your content, create structured folksonomies, filter and de-duplicate news feeds, or analyze content to see if it contains what you care about.”
The OpenCalais Web Service relies on a dedicated community to keep making progress and pushing the application forward. Calais takes the same approach as other open source projects, except this one is powered by Thomson Reuters.
Whitney Grace, January 22, 2014
Sponsored by ArnoldIT.com, developer of Augmentext


