Scholarship Evolving with the Web

July 21, 2016

Is big data good only for the hard sciences, or does it have something to offer the humanities? Writer Marcus A Banks thinks it does, as he states in, “Challenging the Print Paradigm: Web-Powered Scholarship is Set to Advance the Creation and Distribution of Research” at the Impact Blog (a project of the London School of Economics and Political Science). Banks suggests that data analysis can lead to a better understanding of, for example, how the perception of certain historical events have evolved over time. He goes on to explain what the literary community has to gain by moving forward:

“Despite my confidence in data mining I worry that our containers for scholarly works — ‘papers,’ ‘monographs’ — are anachronistic. When scholarship could only be expressed in print, on paper, these vessels made perfect sense. Today we have PDFs, which are surely a more efficient distribution mechanism than mailing print volumes to be placed onto library shelves. Nonetheless, PDFs reinforce the idea that scholarship must be portioned into discrete units, when the truth is that the best scholarship is sprawling, unbounded and mutable. The Web is flexible enough to facilitate this, in a way that print could never do. A print piece is necessarily reductive, while Web-oriented scholarship can be as capacious as required.

“To date, though, we still think in terms of print antecedents. This is not surprising, given that the Web is the merest of infants in historical terms. So we find that most advocacy surrounding open access publishing has been about increasing access to the PDFs of research articles. I am in complete support of this cause, especially when these articles report upon publicly or philanthropically funded research. Nonetheless, this feels narrow, quite modest. Text mining across a large swath of PDFs would yield useful insights, for sure. But this is not ‘data mining’ in the maximal sense of analyzing every aspect of a scholarly endeavor, even those that cannot easily be captured in print.”

Banks does note that a cautious approach to such fundamental change is warranted, citing the development of the data paper in 2011 as an example.  He also mentions Scholarly HTML, a project that hopes to evolve into a formal W3C standard, and the Content Mine, a project aiming to glean 100 million facts from published research papers. The sky is the limit, Banks indicates, when it comes to Web-powered scholarship.

 

Cynthia Murrell, July 21, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

There is a Louisville, Kentucky Hidden Web/Dark
Web meet up on July 26, 2016.
Information is at this link: http://bit.ly/29tVKpx.

 

Coveo Changes Its Positioning

July 20, 2016

Short honk: Coveo, the Canadian enterprise search outfit, has changed its positioning. I should probably say “added to” it positioning as an information retrieval vendor. “Montreal Opening for Big Data Search Firm Coveo” reports that the company has a new office in Montréal. What I noticed was the description of Coveo as a “big data search firm.” The company has been describing itself as a customer support solution and a vendor of unified search. But Big Data is a thing, so it makes sense that an information processing outfit would embrace the moniker. The write up reports that a Coveo wizard said:

We have an amazing pipeline of cloud solutions, and the integration of machine learning, artificial intelligence and data-driven personalization to our technology creates huge market opportunities. We believe Montreal is the best place for us to build on this momentum and assert our position as market leader.

The write up does not mention if any provincial or national subsidies were provided to Coveo. I am no expert on Canada, but I have heard that incentives, including salary support, have been made available to firms meeting certain criteria.

Stephen E Arnold, July 20, 2016

Recommind Follows BRS, IDI Basis, Fulcrum, and Nstein

July 19, 2016

OpenText is, by golly, one of the outfits which “owns” more search and retrieval technology than any other firm I can name. I read “OpenText Lives Up to Promise, Acquires Recommind.” The write up points out:

Just a week after it announced it was selling off $600 million worth of senior debt notes to fund future acquisitions, OpenText dropped $163 million to acquire Recommind, an e-discovery and information analytics provider.

The write up explains that Recommind “could generate between $70 and $80 million of annualized revenues.” This is a hefty sum for a system which has in my mind been dumped into the Autonomy-type search system pigeon hole. (If anyone is interested, I have a profile of Recommind technology. Write benkent2020 at yahoo dot com for details.) Frankly I was surprised at the modest size of the deal. What would Recommind have been worth if it had added Big Data, advanced analytics, and artificial intelligence to its system? On the other hand, maybe Recommind did exactly that.

Several observations:

  • Search and content processing systems incur significant technological debt. This means that the software system has be fed regular injections of real cash to work, keep customers happy, and keep pace with the competition
  • A vendor with multiple systems has to figure out exactly what system to pitch to a potential customer. This is often difficult if the prospect asks such questions as, “What is Nstein’s capability in terms of Recommind’s functions?” Or, “What search system is included with RedDot and what other options are available to install today and use tomorrow?”
  • Portfolio search and content processing vendors are rare birds in today’s corporate jungle. IBM is similar, and its financial performance suggests that having numerous search and content processing arrows in its quiver does not seem to hit the financial bull’s eye.

OpenText, in my view, is a company which may have to make very hard decisions about what technology debt to retire. The interest on that debt could, if left unmanaged, could lead to financial headaches.

Stephen E Arnold, July 19, 2016

Elasticsearch API Calls

July 17, 2016

Short honk: Are you a fan of Elasticsearch, the Lucene based open source system giving proprietary vendors of search systems a migraine? If you are, you will want to point your browser at “Elasticsearch-API Info.” The information is presented in a table which lists and annotates Elasticsearch’s APIs from bulk to update. Useful stuff.

Stephen E Arnold, July 17, 2016

Short Honk: Elassandra

July 16, 2016

Just a factoid. There is now a version of Elasticsearch which is integrated with Cassandra. You can get the code for version 2.1.1-14 via Github. Just another example of the diffusion of the Elastic search system.

Stephen E Arnold, July 16, 2016

Google and Song Lyrics

July 13, 2016

I love the results I get for pop stars, TV shows, and binge watching. To feed the curious minds of online researchers, Google has upped the ante. “Google Licenses LyricFind for Search Results” reports that Google has addressed its miserable search systems for the words in tunes. Consider this lyric:

“My wrist deserve a shout out, I’m like “what up, wrist’?
My stove deserve a shout out, I’m like “what up, stove’?”

According to the write up:

A query for the lyrics to a specific song will pull up the words to much of that song, freeing users from having to click through to another website. Google rolled out the lyrics feature in the U.S. today (June 27), though it has licenses to display the lyrics internationally as well.

I am definitely thrilled. Why worry about the indexing of PowerPoints, PDFs, and other content when I have access to the source of:

I’m that red bull, now let’s fly away.

What’s really flown away? Rag mop.

Stephen E Arnold, July 13, 2016

Get Them While They Are Hot: Microsoft Search APIs

July 11, 2016

If you want to buy some Microsoft smart APIs, now is the time. Navigate to Microsoft Azure and pick your API. On offer are some content processing APS like text search, image search, autosuggest, etc. How much are these goodies? Well, the fee varies with the number of transactions. What’s a “transaction”? Like Amazon AWS, you will find that out as you move forward, gentle reader. Here’s the display for the search API fees:

image

I know that these low contrast Web pages are just so easy to read. In a nutshell, you will owe the Microsofties by tier. The S1, S2, etc. remind me of IBM’s tiered prices. The number is dependent on how may transaction, which tier, and I assume any other special goodies one requires. Think in terms of blocks of $30.

Enjoy the taxi meter approach. In my experience, these work out really well for those selling services. I love metered, tiered prices with “transactions” left wonderfully fluid. Does the phrase “lock in” resonate? Does the concept of “price lift” have relevance? Have fun budgeting costs over a three to five year span.

Stephen E Arnold, July 11, 2016

Six Cybercriminal Archetypes from BAE Systems

July 11, 2016

Tech-security firm BAE Systems has sketched out six cybercriminal types, we learn from “BAE Systems Unmasks Today’s Cybercriminals” at the MENA Herald.  We’re told the full descriptions reveal the kinds of havoc each type can wreak, as well as targeted advice for thwarting them.  The article explains:

“Threat intelligence experts at BAE Systems have revealed ‘The Unusual Suspects’, built on research that demonstrates the motivations and methods of the most common types of cybercriminal. The research, which is derived from expert analysis of thousands of cyber attacks on businesses around the world. The intention is to help enterprises understand the enemies they face so they can better defend against cyber attack.”

Apparently, such intel is especially needed in the Middle East, where cybercrime was recently found to affect about 30 percent of organizations.  Despite the danger, the same study from PwC found that regional companies were not only unprepared for cyber attacks, many did not even understand the risks.

The article lists the six cybercriminal types BAE has profiled:

“The Mule – naive opportunists that may not even realise they work for criminal gangs to launder money;

The Professional – career criminals who ‘work’ 9-5 in the digital shadows;

The Nation State Actor – individuals who work directly or indirectly for their government to steal sensitive information and disrupt enemies’ capabilities;

The Activist – motivated to change the world via questionable means;

The Getaway – the youthful teenager who can escape a custodial sentence due to their age;

The Insider – disillusioned, blackmailed or even over-helpful employees operating from within the walls of their own company.”

Operating in more than 40 countries, BAE Systems is committed to its global perspective. Alongside its software division, the company also produces military equipment and vehicles. Founded in 1999, the company went public in 2013. Unsurprisingly, BAE’s headquarters  are in Arlington, Virginia, just outside of Washington DC.  As of this writing, they are also hiring in several locations.

 

 

Cynthia Murrell, July 11, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

VirtualWorks Purchases Natural Language Processing Firm

July 8, 2016

Another day, another merger. PR Newswire released a story, VirtualWorks and Language Tools Announce Merger, which covers Virtual Works’ purchase of Language Tools. In Language Tools, they will inherit computational linguistics and natural language processing technologies. Virtual Works is an enterprise search firm. Erik Baklid, Chief Executive Officer of VirtualWorks is quoted in the article,

“We are incredibly excited about what this combined merger means to the future of our business. The potential to analyze and make sense of the vast unstructured data that exists for enterprises, both internally and externally, cannot be understated. Our underlying technology offers a sophisticated solution to extract meaning from text in a systematic way without the shortcomings of machine learning. We are well positioned to bring to market applications that provide insight, never before possible, into the vast majority of data that is out there.”

This is another case of a company positioning themselves as a leader in enterprise search. Are they anything special? Well, the news release mentions several core technologies will be bolstered due to the merger: text analytics, data management, and discovery techniques. We will have to wait and see what their future holds in regards to the enterprise search and business intelligence sector they seek to be a leader in.

Megan Feil, July 8, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

DuckDuckGo Yodels Yahooooo!

July 7, 2016

I read “Information about DuckDuckGo’s Partnership with Yahoo.” Yahoo is into search DuckDuckGo style. According to the write up:

our latest partnership with Yahoo enables DuckDuckGo to get access to features you’ve been requesting for years:

Date filters let you filter results from the last day, week and month.

Site links help you quickly get to subsections of sites.

Farewell, Inktomi, AllTheWeb, Google, Microsoft. Yahoo, and home brew craziness. has a new findability future. Now about the size of the index? Will Yahoo’s new owner have a fresh idea? Worth watching.

Stephen E Arnold, July 7, 2016

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta