Make Metadata Useful. But What If the Tags Are Lousy?
July 28, 2011
I must be too old and too dense to understand why the noise about metadata gives me a headache. I came across a post or story on the CNBC.com Web site that was half way between a commercial and a rough draft of a automated indexing vendor’s temp file stuffed with drafts created by a clever intern. The post hauled around this weighty title: “EMA and ASG Webinar: 7 Best Practices For Making Metadata Useful”. The first thing I did was look up EMA and ASG because I was unfamiliar with the acronyms.
I learned that EMA represents a firm called Enterprise Management Associates. The company does information technology and data management research, industry analysis, and consulting. Fair enough. I have done some of the fuzzy wuzzy work for a couple of reasonably competent outfits, including the once stellar Booz, Allen & Hamilton and a handful of large, allegedly successful companies.
ASG is an acronym for ASG Software Solutions. The parent company grows via acquisitions just like Progress Software and, more recently, Google. The focus of the company seems to be “the cloud in your hand.” I am okay with a metaphorical description.
I am confused about metadata. Source: http://www.thebusyfool.com/wp-content/uploads/2011/05/Decisions_clipart.jpg
What caught my attention is the focus on metadata, which in my little world, is the domain of people with degrees in library and information science, years of experience in building ANSI standard controlled term lists, and hands on time with automated and human centric indexing, content processing, and related systems. An ANSI standard controlled term list is not management research, industry analysis, consulting, or the “cloud in your hand.” Controlled term lists which make life bearable for a person seeking information are quite difficult work, combining the vision of an architect and the nitty gritty stamina of a Roman legionnaire building a road through Gaul.
Here’s the passage that caught my attention and earned a place in my “quotes to note” folder:
As data grows horizontally across the enterprise, businesses are faced with the urgent need to better define data and create an accurate, transparent and accessible view of their metadata. Metadata management and business glossary are foundational technologies that can help companies achieve this goal. EMA developed seven best practices that guide companies to get the most of their data management. All attendees receive the complimentary White Paper Managing Metadata for Accessibility, Transparency and Accountability authored by Shawn Rogers.
I am not sure what some of these words and phrases mean. For example, “better define data”. My question, “What data?” Next I struggled with “create an accurate, transparent, and accessible view of their metadata.” Now there are commercial systems which allow “views” of controlled term lists. One such vendor is Access Innovations, an outfit which visited me in rural Kentucky to talk about new approaches to indexing certain types of problematic content which is proliferating in organizations. Think in terms of social content without much context other than a “handle”, date, and time even within a buttoned up company.
What do users know? Image source: http://www.computersunplugged.com.au/images/angry-man.gif
Another phrase that caught my limited attention was “metadata management and business glossary are foundational”. Okay, but before one manages, one must do a modest amount of work. Even automated systems benefit from smart algorithms helped with a friendly human crafted training document set or direct intervention by a professional information scientist. Some organizations use commercial controlled term lists to seed the automatic content tagging system. I am all for management, but I don’t think I want to jump from the hard work to “management” without going to the controlled vocabulary gym and doing some push ups. “Business glossary” baffled me and I was not annoyed by what seems to be a high school grammar misstep. Nope. The “business glossary” is a good thing, but it must be constructed to match the language of the users, the corpus, and the accepted terminology. Indexing a document with the term “terminal” is not too helpful unless there is a “field code” that pegs the terminal as one where I find airplanes, trains, death, or computer stuff. A “business glossary” does not appear from thin air,although a “cloud” outfit may have that notion. I know better.
I did a quick Google search for “Shawn Rogers,” author of the white paper. Note: I don’t know what a white paper is. The first hit is to a document which is on what I think is a pay-to-play information service called “b-eye”. The second hit points to a LinkedIn profile. I don’t know if this is “the” Shawn Rogers whom I seek. I learned that he is:
[a professional who] has more than 19 years of hands-on IT experience with a focus on Internet-enabled technology. In 2004 he cofounded the BeyeNETWORK and held the position of Executive Vice President and Editorial Director. Shawn guided the company’s international growth strategy and helped the BeyeNETWORK grow to 18 web sites around the world making it the largest and most read community covering the business intelligence, data warehousing, performance management and data integration space. The BeyeNETWORK was sold to TechTarget in April 2010.
I concluded this was “the” Mr. Rogers I sought and that he or his organization is darned good at search engine optimization type work.
What clicked in my mind was a triple tap of hypotheses:
- A couple of services firms have teamed up to cash in on the taxonomy and metadata craze. I thought metadata had come and gone, but obviously these firms are, to use Google’s metaphor, putting more wood behind the metadata thing. So, this is a marketing in order to sell services. As I said, I am okay with that.
- These firms have found a way to address the core problem of indexing by people who do not have the faintest idea of what’s involved in metatagging that helps users. One hopes.
- The two companies are not sure what the outcome of the webinar and the white paper distribution will be. In short, this is a fishing trip or an exploration of the paths on an island owned by a cruise company. There’s not much at risk.
Okay, enough.
Here’s my view on metadata.
First, most organizations have zero editorial policy and zero willingness to do the hard work required to dedupe, normalize, and tag content in a way that allows a user to find a particular item without sticky notes, making phone calls, or clicking and scanning stuff for the needed items. I think vendors promise the sun and moon and deliver gravel. Don’t agree? Use the comments section, please. Don’t call me.
Second, most of the vendors who offer industrial strength indexing and content processing systems know what needs to be done to make content findable. But the licensees often want a silver bullet. So the vendors remain silent on certain key points such as the Roman legionnaire working in the snow part. The cost part is often pushed to the margin as well.
Third, the information technology professionals “know” best. Not surprisingly most content access in organizations is a pretty lukewarm activity. I received an email last week chastising me for pointing out that more than half of an organization’s search system users were dissatisfied with whatever system the company made available. Hey, I just report the facts. I know how to find information in my organization.
Fourth, no one pays real attention to the user of a system. The top brass, the IT experts, and the vendors talk about the users. The users don’t know anything and whatever input those folks provide is not germane to the smarties. Little wonder that in some organizations systems are just worked around. Tells range from a Google search appliance in marketing to sticky notes on monitors.
Will I attend the webinar? Nah. I don’t do webinars. Do I want to change the world and make every organization have a super duper controlled term list and findable content? Nah. Don’t care. Do I want outfits like CNBC to do a tiny bit of content curation before posting unusual write ups with possible grammatical errors? You bet. What if those metadata and other tags are uncontrolled, improperly applied, and mismatched to the lingo? Status quo, I assert.
Enjoy the webinar. Good luck with your metadata and the “cloud in your hands” approach. Back to the goose pond. Honk.
Stephen E Arnold, July 29, 2011
Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search, which is not a white paper and it is not free. But at $20, such a deal.
Protected: SharePoint and Your Feed Need
July 28, 2011
Protected: A Tips Spreadsheet for SharePoint Columns
July 27, 2011
The Mongo Mambo: NoSQL Is Tireless
July 26, 2011
Okay, so this isn’t exactly search-related, but we think it’s worth a mention. The blurring of search and data management is starting to become a more common symptom of the big data world.
OpenMyMind.net provides helpful information with “Practical NoSQL—Solving a Real Problem with MongoDB and Redis.
Blogger (and software developer) Karl Seguin details his process of making an improvement to the Mogade game developer site. He is eager to share his use of a new tool coupled with a new modeling approach. In his conclusion, Seguin states,
“This reinforces my opinion about NoSQL in general. MongoDB has a couple specializations that are truly awesome (geospatial, logging), but it’s largely a general purpose data store with a number of advantages over RDBMS’. Many other NoSQL solutions are more specialized. Redis, while capable of more than what I’m using it for, is more specialized, and handles/looks at/views data differently. These solutions work well together and not only make it fun to work with data again, they make it easy and efficient.”
We appreciate the effort that Mr. Seguin put into his write up. We think that the blend of technologies is one of the harbingers of a significant shift in the data management world. One can only go so far with the traditional RDBMS before money crushes one’s big data aspirations. XML has been made to perform some interesting tricks, but under the demands of price sensitive information technology shops, there is some push back for this former prom queen. And the basic NoSQL world is being asked to deliver functions and services that extend well beyond the basics of fetching a result set.
Change is upon us and it may have a significant impact on vendors who are well positioned in the big data, search based application space. We like the moves of the Mongo Mambo but we love the music of Exalead’s CloudView approach.
Cynthia Murrell July 19, 2011
Sponsored by ArnoldIT.com, the resource for enterprise search information and current news about data fusion
IBM Pursues Analytics into the Cloud
July 26, 2011
IBM is now chasing analytics. Computerworld reports the details in “IBM rolls out cloud-based Web analytics tool.” The new application leverages two of IBM’s recent acquisitions. Coremetrics specializes in Web analytics software; and Unica’s products analyze customer data and predict business needs. Writer Patrick Thibodeau explains,
“Features from both those companies were merged to create a product that is intended to link analytics from a variety of platforms, including the Web and social media networks, and tie them to marketing efforts ranging from automated actions to sales opportunities.”
Named IBM Coremetrics Web Analytics and Digital Marketing Optimization Suite, the product will be distributed via the cloud. It will, naturally, be accessible through multiple platforms. It also allows for the inclusion of non-Web based data, like that from emails and display ads. Users will be charged based on the volume of interactions, not on the amount of storage they use.
The amount of activity within IBM with regard to metrics and analytics is astounding. The company offers different brands such as Cognos and SPSS. The firm has multiple initiatives. Our view is that analytics means big money for IBM. We are confident that IBM will generate revenue, but it is also creating a wake of confusion as its fleet of high speed boats speeds across the data ocean as it blasts past Web Fountain. Even search has morphed into content analytics. Quite a flotilla.
Cynthia Murrell, July 26, 2011
Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search.
Making Libraries Easier to View in SharePoint
July 26, 2011
Working with a single monitor and multiple windows open is a constant pain, especially with a teeny, tiny laptop. Switching between the windows is frustrating, but the smart computer user averts this problem by hooking up two or more monitors to their operating system. SharePoint users face a similar problem when working with multiple libraries at once, but it’s not as simple as hooking up another cable to a computer. The wonder Laura Rogers at SPTech Web wrote a great article called, “Display Multiple Libraries in SharePoint 2010” that helps explain the many ways to handle this issue. The article said:
A frequent requirement in SharePoint projects is to display documents from multiple libraries together. There are several different methods in which this can be achieved in SharePoint 2010, ranging from simple out-of-the-box Web parts to more advanced data view Web parts. Of course, as with most tasks in SharePoint, it can be done with custom development, but I usually try to steer towards out-of-the-box functionalities before going that route.
The different methods for opening documents are outlined in this article along with the pros and cons of each one. The web part she describes are the “relevant documents,” “constant web query,” “what’s new,” “data view-merge sources,” “data view-content roll-up,” and “data view-search results.” Read about each one to learn which method will work the best for you.
While you’re at it, head on over to SurfRay.com to learn how to improve your SharePoint search.
Stephen E Arnold, July 26, 2011
Sponsored by SurfRay, developers of Ontolica.
Brainware Fuses Synapses with Hewlett Packard
July 24, 2011
The Sacramento Bee has posted the PR Newswire story,”Brainware Named HP Silver Partner.” This is good news for Brainware, a leading data capture and search solution provider that focuses on financial management. They help clients keep track of the money; yeah, that’s pretty important.
We expect this means good news for back office and work flow functions. According to the press release:
“By working together with HP imaging and print devices, this technology will provide businesses with an opportunity to put vital data in users’ hands more quickly and accurately, so they can pay bills, fill orders, answer audits, address customer issues and more with efficiency and ease.”
Hardware giant Hewlett Packard established their Business Partner Program for Imaging and Printing in order to help businesses integrate their products with HP devices. The program has Silver, Gold, and Platinum levels. Will Brainware eventually go Platinum?
One point is clear: Brainware has found a way to make sales, which is something that some of the search vendors chasing customer support and the XML centric folks have not been able to do with Brainware’s alacrity.
Cynthia Murrell, July 24, 2011
Knowledge Assessment: InQuira and IBM
July 24, 2011
Banner relates an interesting case study of their work for InQuira in its piece, “The Future Is Closer Than You Think, and Qualified Leads.” The technology marketing specialists at Banner developed the online Knowledge Assessment Tool as a way for InQuira to engage potential clients. The write up explains:
“It worked by enabling visitors to determine their current level of KM [Knowledge Management] preparedness and understanding through a series of questions. They could then review this score against their specific industry (as well as against all industries). Respondents could view their results online as well as have them sent to their inbox as a custom-generated PDF file. These results were viewable as a spider diagram (to make them easier to digest and show how they related to the industry as a whole) and were accompanied by a set of InQuira best practice information.”
The hope, of course, is that many of the respondents would then turn to the knowledge management experts at InQuira for help in improving their score.
Bonus: IBM offered to co-brand the software and deliver it to its own database. As of the date of the article, the tool had been 241 times in 52 countries; that’s success the company is happy to boast about. No word about Watson in this knowledge space, however.
Cynthia Murrell July 24, 2011
Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search
Going Fast and Missing a Curve: Collision or Near Miss?
July 23, 2011
Last week we heard a number of rumors about layoffs and other organizational shifts at the Microsoft Fast Search units. We are not sure whether the news reported at Enterprise Search: The Business and Technology of Corporate Search was accurate. We don’t want to speculate.
We, like you, read:
[We] just learned that most of the FAST people we work with here in California and across the country have been laid off by Microsoft, apparently effective immediately. This is the team that was responsible for selling the FAST ESP products – FSIS and FSIA – as well as working with the Microsoft sales teams on Fast Search for SharePoint (FS4SP). Funny, I was just drafting a blog post today on ‘the future of FAST’ and I’m glad I hadn’t finished; I never would have guessed this at all.
Let’s assume that the rumor is false. The Microsoft consultants don’t make any changes. SharePoint generates significant consulting opportunity just the way it is.
Let’s assume the rumor is true. There are many firms ready, willing, and able to provide the technical support you need for your current SharePoint and Fast search installation. For most licensees, Microsoft’s shifting staff or reorganizing is almost a business-as-usual management method in Redmond.
Let’s assume there is just more uncertainty about the Fast search technology. My view is that deep experience in search is more important than speculating about what a very large company is doing to manage its products and services for its clients. I explain some of the issues associated with Microsoft’s approach to search in my new monograph The New Landscape of Enterprise Search. Check it out. (Sorry. I don’t provide the juicy details in this free blog.)
So, let’s put aside the issue of a single shift in a product. The focus at most SharePoint focused service firms will be on helping clients solve their technical problems. What is likely to happen is that some SharePoint licensees will look for search solutions which have traction in the marketplace and proven staying power. For that reason, you may want to check out the Exalead approach.
Stephen E Arnold, July 23, 2011
Sponsored by Article One Partners, your source for patent research.
Protected: SharePoint Records Governance and System Recovery
July 22, 2011