Free Program Removes DRM Controls from PDFs
August 12, 2011
We’ve found a tool that is, perhaps, a bit concerning. Softpedia presents, for free, PDF Drm Removal 1.4.2.0. The developer of the software is listed as Removedrmfromepub.com. The product description reads,
PDF Drm Removal is a professional and reliable application designed to remove DRM protections from PDF files with no quality loss. Just removes the PDF files drm header, no change on the files. Read the PDF on any supported devices!
Interesting and somewhat concerning. We understand there’s controversy over Digital Rights Management controls; some say that they stifle innovation or violate private property rights. Others say the technology unnecessarily locks documents into a format that is bound to become obsolete someday.
However, we necessarily sympathize with publishers and writers, like Stephen E. Arnold, who
rely on PDF security to safeguard documents. How else will they protect their work in the digital age?
Cynthia Murrell August 11, 2011
Quote to Note: Newspapers and Android Tablets
August 11, 2011
I read two different news items which informed me that Apple has stopped the sale of Samsung tablets in countries far from Kentucky. Ominous if accurate. Then I came upon “Newspaper Eyes Creating Subsidized Samsung Tablet.” The idea seemed a bit of a reach, but, hey, those newspaper managers are sharp business professionals. I am not sure how many of my neighbors would put down their jug of firewater to browse a tablet, but one never knows until one tries. What caught my attention was a quote to note, allegedly made by “one insider”, a great source for sure. Here the quote perches:
“If it turns out to be a failure, it will be a fantastically interesting failure.” Another source commented: “I would be shocked if it was successful.”
There it is.
Stephen E Arnold, August 11, 2011
Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search
Belgium and Google: A Messy Waffle
July 18, 2011
I saw this headline: “Belgian Newspapers Claim Retaliation By Google After Copyright Victory” and I was nervous clicking on the link. SEO news services make me nervous. The idea of any “retaliation” story makes me think of long lists of words on a watch list somewhere.
I clicked on the link, and the story seemed okay, just a bit thin on substantive details. Quoting the Associated Press is in and of itself is reason for concern.
Here’s the main idea:
Publishers in Belgium did not want their content indexed by Google. (That strikes me as less than informed, but forget the knowledge value angle.) So publishers get the fluid legal system to notify the Google. Shortly thereafter, some Belgium publishers note that their content is tough to find at the top of a Google results list. Bottomline line: Some folks believe Google is jiggling the results to make some Belgian content familiar with the tedium of clicking through lots of pages to find the desired hit.
My view is that accusations are definitely good for “real” news outfits like the publisher of the retaliation story. I also think that considerable care must be taken before yip yapping about why a particular results list does not show what one wants, expects, believes, or hopes will appear.
Google has lots of people working on the search system. I once believed that these teams were coordinated and working like a well oiled robot arm assembling nuclear fuel rods. Now I know that the method is more like “get it working”. Good enough is going to earn a search wizard an A from the Google system.
Messy waffles. Image source: http://eatbakelove-todayistheday.blogspot.com/2011/06/weekend-warrior.html
If there is a difference between publishers’ expectations and what is in a particular result list, I suggest several things:
First, get a trained and expert online searcher to run queries in a methodical manner to verify what is and what is not “findable.” Keep in mind that 99.9 percent of the people who claim to be search experts are not. If you don’t believe me, give Ulla de Stricker a buzz. You can also try Anne Mintz, former director of the Forbes Magazine information center. You can also ping Marydee Ojala, editor of online. Folks, trust me. These individuals are certifiable online search experts and can get the information needed to put some data behind the hot air. Data needed.
Yellow Pages Trying to Find a Future
July 7, 2011
Search Engine Watch reports, “Search Secures Recognition as Local Business Info Provider.” The article examines information from studies performed by research companies Burke and eMarketer. Each has compiled data on usage for a variety of local-business-information resources. The Yellow Pages performs well when its paper and online ventures are combined, but separately each was roundly beaten by search engines.
Not surprisingly, add revenues for printed Yellow Pages are expected to dwindle into nothingness:
In the long haul, predicted figures decline to the point of near eclipse. By 2015, the predicted ad investment for print directories is just $5 billion. Meanwhile, search investments continue to rise, with the predicted figures for search ad spending increasing by more than 50 percent to $21.5 billion by 2015.
This must inevitably lead to the extinction of printed phone books. Good news if you’re a tree.
Still, how will the Yellow Pages fare in the long run? Depends on how nimble they are with their business model. The print sector is going to be under increasing cost pressure when print and ink are involved.
Cynthia Murrell, July 7, 2011
The addled goose is the author of The New Landscape of Enterprise Search
Google Books: Chugging Right Along
July 1, 2011
Search Engine Watch’s article “Google-British Library Partnership to Digitize 250,000 Books” reminded us that Google Books is chugging along. With Google working overtime to showcase itself as a portal and really fascinating legal issues surfacing in France, the once high profile Google goal of gathering the world’s information has drifted into the background.
But Google is beavering away with books. Its partner is the British Library. The project will add 250,000 books to the already 12 million-plus that Google has already scanned. These books are copyright free, and come from the British Library’s unique works collection. As with previous projects, completed in cooperation with over 40 libraries, Google will make these works available to anyone for non-commercial use, including reproduction and modification. The Library will also keep an archive of the files.
The documents will be searchable, of course. Rumor has it that Google will also assemble language usage and other data from the texts to add to their datasets.
Google keeps looking for angles on digitizing books. The article reported:
Google still has a long way to go if it wants to meet its ambitious aim: digitizing all of the world’s known 129,864,880 books by the end of 2019. This goal, announced by Google in August of last year, involves creating a digital library of over four billion pages. With many major libraries having unique copies of historical texts, partnerships like the one with the British Library are vital to that goal.
My hunch is that in one or more of Google legal battles, the issue of books, reuse, copyright, and other issues will be glued to the tar ball Google has become. In the meantime, eBooks are hot and Google’s role in their diffusion is likely to be significant. I just don’t know in what way. The British Library is a big name.
Cynthia Murrell July 1, 2011
You can read more about enterprise search and retrieval in The New Landscape of Enterprise Search, published my Pandia in Oslo, Norway, in June 2011.
The Wages of SEO: Content Free Content
June 28, 2011
In the last two weeks, I have participated in a number of calls about the wrath of Panda. The idea is that sites which produce questionable content like Beyond Search suck. I agree that Beyond Search sucks. The site provides me with a running diary of what I find important in search and content processing. Some search vendors have complained that I cover Autonomy and not other engines. I find Autonomy interesting. It held an IPO, buys companies, manages reasonably well, and is close to generating an annual turnover of $1.0. I don’t pay much attention to Dieselpoint and a number of other vendors because these companies do not strike me as disruptive or interesting.
I paddle away in Harrod’s Creek, oblivious to the machinations of “content farms.” I have some people helping me because I have a number of projects underway, and once I find an article I want to capture, I enlist the help of librarians and other specialists. Other folks are doing similar things, but rely on ads for revenue which I do not do. I have some Google ads, but these allow me to look at Google reports and keep tabs o n various Googley functions. The money buys a tank of gas every month. Yippy.
I read “Google’s War on Nonsense.” You should too while I go out to clean the pasture spring. The main point is that a number of outfits pay people to write content that is of questionable value. No big surprise. I noted this passage in the write up:
The insultingly vacuous and frankly bizarre prose of the content farms — it seems ripped from Wikipedia and translated from the Romanian — cheapens all online information. A few months ago, tired of coming across creepy, commodified content where I expected ordinary language, I resolved to turn to mobile apps for e-books, social media, ecommerce and news, and use the open Web only sparingly. I had grown confused by the weird articles I often stumbled on. These prose-widgets are not hammered out by robots, surprisingly. But they are written by writers who work like robots. As recent accounts of life in these words-are-money mills make clear, some content-farm writers have deadlines as frequently as every 25 minutes. Others are expected to turn around reported pieces, containing interviews with several experts, in an hour. Some compose, edit, format and publish 10 articles in a single shift. Many with decades of experience in journalism work 70-hour weeks for salaries of $40,000 with no vacation time. The content farms have taken journalism hackwork to a whole new level.
My take on this approach to information—what I call content free content—is that we are in the midst of a casserole created by Google and its search engine optimization zealots. Each time Google closes a loophole for metatag stuffing or putting white text on a white background, another corner cutter cooks up some other way to confuse and dilute Google’s relevance recipe.
The content free content revolution has been with us for a long time. A Web searcher’s ability to recognize baloney is roughly in line with the Web searcher’s ability to invest the time and effort to fact check, ferret out the provenance of a source, and think critically. Google makes this flaw in its ad machine’s approach with its emphasis on “speed” and “predictive methods.” Speed means that Google is not doing much, if any, old fashioned index look up. The popular stuff is cached and updated when it suits the Google. No search required, thank you. Speed, just like original NASCAR drivers, is a trick. And that trick works. Maybe not for queries like mine, but I don’t count literally. Predictive means that Google uses inputs to create a query, generate good enough results, and have them ready or pushed to the user. Look its magic. Just not to me.
With short cuts in evidence at Google and in the world of search engine optimization, with Web users who are in a hurry and unwilling or unable to check facts, with ad revenue and client billing more important than meeting user needs—we have entered the era of content free content. As lousy as Beyond Search is, at least I use the information in my for fee articles, my client reports, and my monographs.
The problem, however, is that for many people what looks authoritative is authoritative. A Google page that puts a particular company or item at the top of the results list is the equivalent of a Harvard PhD for some. Unfortunately the Math Club folks are not too good with content. Algorithms are flawless, particularly when algorithms generate big ad revenue.
Can we roll back the clock on relevance, reading skills, critical thinking, and the pursuit of knowledge for its own sake? Nope, search is knowledge. SEO is the into content free content. In my opinion, Google likes this situation just fine.
Stephen E Arnold, June 28, 2011
You can read more about enterprise search and retrieval in The New Landscape of Enterprise Search, published my Pandia in Oslo, Norway, in June 2011.
Gannett a Gone Goose?
June 26, 2011
Physorg.com announces that “USA Today Publisher Gannett Cuts 700 Jobs.” Just what we need now, hundreds more unemployed.
Despite the headline, USA Today itself is not involved. Gannett is the largest news chain in the U.S, and the layoffs will hit a multitude of local papers. The article states:
Like other US newspapers, Gannett has been grappling with declining print advertising revenue, falling circulation and the migration of readers to free news online. Robert Dickey, the head of Gannett’s US Community Publishing division, said in a memo to employees that the layoffs were necessary to ‘align our costs with the current revenue trends.
Yeah, we’ve heard it before. I’d like to see what the executives make. Also why spare the national rag?
Stephen E Arnold, former vice president at the “old” Pulitzer Prize Courier Journal” hinted that the Courier Journal was once one of the top 25 newspapers in the world.
Since Gannett’s purchase of the paper, the CJ has tumbled from its lofty perch. Steve thought that Barry Senior would have gone ballistic. In his own gentile way he would have put the quality back in the paper. He would motivate staff, officers, and others by reminding everyone that quality was what made the paper great. He would have passed the word to others, including the US president, various elected officials, and his pals at the New York Times.
When large corporations gobble up local media, the quality goes out and the buzz dies.
Now the future of Gannett becomes visible. Business wizards with systems that release stories online without the value adding that extends, complements and enhances old fashioned high value writing. Where did those databases go? Good question. The gone goose’s gone geese.
Cynthia Murrell June 25, 2011
From the leader in next-generation analysis of search and content processing, Beyond Search.
Will Google Do Real News?
June 23, 2011
Will Google do “real” news? I read “Salon CEO Gingras Resigns to Become Global Head of News Products at Google.” I think this is a fascinating action on the part of Google. In Google: The Digital Gutenberg, published by Infonortics Ltd. in 2009, I looked at Google’s content technology. My focus was not on indexing. I reviewed the parse, tag, chop, and reassemble systems and methods that Google’s wizards had invented. The monograph is available at this link. The monograph may be useful for anyone who wants to understand what happens when “real” journalists get access to the goodies in the Googleplex. In addition, to Odwalla beverages, the Google open source documents suggest that snippets of text and facts can be automatically assembled into outputs that one could describe as “reports” or “new information objects.” Sure, a human is needed in some of these processes, but Google uses lots of humans. Its public relations machine and liberal mouse pad distribution policy helps keep the myth alive that Google is all math all the time. Not exactly accurate.
The write up says:
The new position as the senior executive overseeing Google News, as well as other products that may be in the pipeline, comes several years after Gingras worked as a consultant at the Mountain View campus, focusing on ways the search giant could improve its news products.
What will come from a “real” journalist getting a chance to learn about some of the auto assembly technology? I offer some ideas in my Digital Gutenberg monograph. Publishers may want to ponder this idea as well. Google is more than search, and we are going to learn more about its intentions in the near future.
Five years ago, when Google was at the top of its game, I would have had little hesitation to give Google a better than 50 percent chance of success. Now with the Amazon, Apple, and Facebook environment, I am not so sure. Google has been relying more on buying stuff that works and playing a hard game of “Me Too.”
With the most recent reworking of Google News, I find myself turning to Pulse, Yahoo News, and NewsNow.co.uk. Am I alone?
Stephen E Arnold, June 23, 2011
From the leader in next-generation analysis of search and content processing, Beyond Search.
Odd Spat: Academic Publishers vs. Universities
June 11, 2011
With the transition of formerly printed content to digital formats in academia, publishers of these academic materials are experiencing severe reductions to their revenue stream. Professors often put course materials on e-reserves, making it so students can access a single copy without having to pay for individual copies.
“Academic Publishers Attempting To Eliminate Fair Use At Universities [Updated]” from TechDirt.com delves into this issue and mentions specific entities in this ongoing battle. In one instance, “Cambridge, Oxford, & Sage publishers are filing against Georgia State University and asking the court to issue one of the all-time-detrimental-to-education injunctions in the modern era.”
This is clearly a heated debate and has been for some time. The write up said:
In 1994 publishers sought to deal with e-reserves at the Conference on Fair Use (CONFU), but the issue proved so contentious that the participants could not agree on a recommendation for the final report. Since then, the threat of litigation has loomed over a number of universities concerning their e-reserves, as publishers’ reproduction revenues dipped.
This demonstrates publishers’ inability to adapt to the current environment. Everything has been going digital for a while now and an industry that relies so heavily on paper printing should have been prepared for this. The film and music industries have adapted to this climate, even books are popular in digital format; academic publishers need to board the train.
Our understanding is that academic publishers depend on university professors and those hard working graduate students to craft the content academic publishers publish.
Universities have found benefit from student loans. Perhaps academic publishers should explore some creative options as well.
Stephen E Arnold, June 11, 2011
Sponsored by ArnoldIT.com, the resource for enterprise search information and current news about data fusion
Sheer Insanity and Search
June 7, 2011
Jan Wenner’s bon mot—“sheer insanity and insecurity and fear”—caught my attention. Addled geese are not crazy. Loons, as I recall, have that distinction. Insecurity resonates. Here in Harrod’s Creek a shotgun or an automatic weapon can reduce a goose in a nonce. Fear. Yep, fear. Got it.
The question is, “Do these characterizations apply to the iPad and other tablets?” The write up “Jann Wenner: Magazines’ Rush to iPad Is ‘Sheer Insanity and Insecurity and Fear” accomplishes a fusion which caused me to do some thinking. First, here’s the passage that flapped my wings:
Magazines that depend on photography, and design, and long reads, and quality stuff, are going to do just fine despite the internet and cable news. Because in those areas there’s a real advantage to getting a print product and having something you can hold and that of course is portable and has a luxurious feeling and is comfortable and immersive and you can spend time with it and it’s organized for you. In the age of the 24-hour news cycle and the availability of the internet you have to focus on those qualities in your magazine even more. Really you have to deliver quality more than ever. And unless you can deliver something that’s quality and really compelling there’s just too many …media choices around now. Unless you’re really good you’re in trouble.
Three observations:
First, the notion of quality is an important one. Online delivers information which lacks a tactile component. Mr. Wenner makes an important point about a product one can hold. Digital content may be great but it looks like baloney. Stripped from a Web site, content just floats. With an iPad one holds an Apple or some other manufacturer’s gizmo. The publisher and his / her content is like a sardine in a tin. Who remembers an individual sardine?
Second, another dimension of quality for Mr. Wenner is organization. Who organizes content on the Internet? I suppose I do, but I am not interested in news. I am focused on capturing ideas and links which I used to store in my paper notebooks. I use the content of this blog as raw material for my books. The New Landscape of Search is an example. I use the information captured in this blog in that book, which costs money and has a greater content payload than any collection of my blog posts from airports and restaurants. Mr. Wenner is spot on.
Third, the notion of “always on” and a 24 hour news cycle has changed how many people conceptualize information. I think Mr.Wenner is correct but I think that for certain demographics, there will be little appetite for a hard copy anything. I think the gameification of content is gathering momentum. I miss magazines like Life which I used to flip through on long summer afternoons at my grandparents’ house in nowheresville. The problem is that “quality” has a different freight of meaning for lots of folks y0unger than the goose. There is, I assert, no turning back.
Bottom line? Traditional publishing is under considerable pressure. I don’t think the executives are much different than an executive at Kentucky Fried Chicken who missed his quarterly numbers. The iPad is still fresh and for some, it is perfectly logical to assume that creating online content is pretty much the same as traditional magazine content creation. Publishing executives have to do something. Paper, ink, distribution, and design are not getting much cheaper in my experience.
But my interest is in finding information, search, if you will. Can I find content in a pay walled, iPadded, and filtered world? Not easily. So we are moving backwards as publishers try to press forward. I find this an interesting situation which seems a bit like the Dark Ages running on zippy new gizmos powered by XML.
Stephen E Arnold, June 7, 2011
Sponsored by ArnoldIT.com, the resource for enterprise search information and current news about data fusion