Maps on Steroids
May 17, 2011
Here is an interesting link from the people behind Mapsys.info: “Public Data Visualization with Google Maps and Fusion Tables”.
“Visualizing” public data basically means mapping information that is relevant to a community. A good working example mentioned in the posting is San Francisco’s Bay Area bike accident tracker. The map’s legend decodes the various colored dots as the type of accident and how it came to be recorded.
Source: http://mapsys.info/
A screenshot of the coding needing to display a map with personalized details is offered in the posting. The star of the show is the integration with a fusion table, a tool offered by Google to house data sets to be presented on a map. Added functionality is included by using “SQL-like query syntax” and leveraging “the Python libraries Google provides for query generation and API calls”. This allows you to pick smaller data sets out of the fusion table.
So behind the scenes, this looks like another example of search moving beyond the token keyword. You won’t hear any complaints out of us. I remember creating maps using old fashioned methods when I was working on my engineering degree. This method delivers accuracy and time savings.
Sarah Rogers, May 17, 2011
Freebie
Logica Taps Ontology for Large-Scale Data Management Tools
May 6, 2011
“Ontology, Logica Team on Enterprise Data Solutions,” announces Billing & Oss World. In response to client demand in Europe, the deal pairs Ontology’s semantic search product with Logica’s existing penetration in that market.
In the write up, we noted this passage:
Ontology Systems and Logica have formed a strategic relationship to provide Enterprise Data Alignment (EDA) solutions for communication service providers (CSPs) who want to search and align knowledge from the customer, service and network data distributed across their operational, business and infrastructure systems.
In 2006, Ontology saw a niche and set out to fill it. Their semantic search solutions are built to help communication service providers avoid “data misalignment.” In other words, they provide advanced tools that turn a wealth of disorganized data into actionable information.
Logica is a business consulting firm based in the U.K., serving clients around the world in everything from the automotive industry to utility providers. Among other things, they perform Enterprise Content Management, which explains their interest in cutting edge data management tools.
What are the content processing and search tools available to licensees? The write up remains mum. Big data often means big findability problems.
Cynthia Murrell May 6, 2011
Freebie
Fetch: Interesting View of Big Data
April 24, 2011
Our sister publication, Inteltrax, covers the world of data fusion, but we thought that Fetch’s stance on big data was appropriate for Beyond Search’s readers.
You may find Fetch Technologies’ Blog entry, “Bringing the Web to Big Data.” In it, Timo Kissel presents a useful point of view on the challenge of big data.
With all the talk about how to simply manage colossal amounts of data, ways to benefit from them can feel like an afterthought. Fetch puts the focus back on how we humans can make best use of Big Data:
But what’s more exciting to me is the use of this Big Data infrastructure to glean novel insights by using new approaches, algorithms, and analytics that simply weren’t feasible before. . . . This is another instance of using computers to do what they’re good at (tireless processing of large amounts of information) and using humans to do what we’re good at – pattern recognition, creativity and insight – albeit now at a scale that would be impossible for us to execute without these novel tools.
Kissel’s example involves retailers. Sure, they can continue to analyze sales from their own stores for trends. However, it would be so much better to open the whole Web, with global information about our products as marketed in different areas by different competitors. Immediately.
It seems that Fetch has some ideas on how to do that with the firm’s services, of course. But whether you go to them or not, this viewpoint represents a profitable way to approach what is now almost every organization’s new hurdle.
Cynthia Murrell April 24, 2011
Freebie
Protected: Is Genesis a New SharePoint Opportunity?
April 21, 2011
Bilingual Search at PaginasAmarillas.com
April 13, 2011
We learned via PRNewswire’s “YaSabe.com to Provide Bilingual Local Search for U.S. Hispanics at PaginasAmarillas.com” that PaginaAmarillas is now bilingual.
With over 50 million Hispanics now living in the U.S., it only makes sense to address that niche. YaSabe and Publicar, S.A. are teaming up to do just that with their bilingual products and services search deal.
Hispanics everywhere can relate to the Paginas Amarillas brand,’ said Carlos Caceres, Internet Business Director at Publicar. ‘Partnering with YaSabe, we will provide a world-class local search experience at PaginasAmarillas.com for the 50 million Hispanics that live in the United States.
Publicar S.A. blankets South and Central America with access to multimedia content, directory assistance, internet search services, and other digital products.
Ya Sabe, Inc. connects U.S. Latinos with resources from local businesses to national brands. Their bilingual search, complete with access to a live human for recommendations, taps into the almost $1 trillion in combined disposable income wielded by Hispanics in this country.
This new service promises to be a welcome tool for Spanish speakers in this country. It is also a smart business move. Check it out at PaginasAmarillas.com.
Cynthia Murrell, April 13, 2011
Freebie
Disappearing US Government Public Information
April 9, 2011
The goose is not going to honk too much about the shuttering of US government Web sites. Most of them get few hits. I know you think that millions of mouse potatoes rush to such thrillers as the US Department of Agriculture’s numerous Web sites or thousands of fact hungry MBAs explore the treasure trove of Department of Commerce content. The reality is that usage is not setting the world on fire.
An outfit called the Sunlight Foundation reported that a bunch of US government Web sites were going dark. The list appeared in “Budget Technopocalypse Deepens: Transparency Sites Will Go Dark In A Few Months.” How do you like that word “technopocalypse”? Felicitous, right?
Anyway, the alleged goners are:
- Apps.gov – Better hurry. Bring your credit card.
- Data.gov—Some interesting but often incomplete data sets
- IT Dashboard – Some spending information. Fascinating for the non economists
- Paymentaccuracy.gov – Love the charts
- USASpending.gov – Keep in mind the $1.6 trillion deficit and you are good to go.
No further comments from the goose.
Low traffic is the norm for most governmental Web sites. One happy exception for the US government is the IRS Web site at tax time. Traffic drops off after April 15th each year.
Coincident with the removal of sunshine data, the US government will notify me of changes in the terror alert level via Facebook and Twitter. Seems a fair trade I suppose.
Stephen E Arnold, April 9, 2011
Freebie
A SQL Server Keeper: Data Extraction Tool
April 9, 2011
We think Microsoft SQL Server is just about perfect. Well, most of the time. When our favorite database has the hiccups, life can become pretty darned exciting.
“Server Database Extract Tool to Extract SQL Server Database Proficiently” introduces a potentially handy resource to aid in times of trouble. Should you find yourself with a damaged SQL server and thus an inaccessible database, SysTools SQL Recovery is worth a try.
What’s the tool do? Our quick look revealed that we can deals with such corruption issues as:
- The file *.mdf is missing and needs to restore
- Server can’t find the requested database table
- Table corrupt object id wrong
- The delete statement conflicted with the reference constraint the conflict occurred in database
- The conflict occurred in database msdb table dbo.sysmaintplan_subplans column ‘job_id’
- Error 3403 and Severity 22 during recovery initialization.
For compatibility information, check the SysTools SQL Recovery Web site at http://www.sqlserverdatabaserecovery.com/.
The vendor specializes in recovering critical business data from SQL Server 2000, 2005, and 2008. We have successfully recovered SQL data from all servers, operating systems and databases, including relational database servers, web servers (Apache and Microsoft IIS), business application servers, snap servers, NAS, SAN and document management systems, and content management systems.
The pricing model is $129 and $229 for a personal and business license, respectively. There is also a demo version available for download, so it’s worth a look.
Sarah Rogers, April 9, 2011
Freebie
Recorded Future in the Spotlight: An Interview with Christopher Ahlberg
April 5, 2011
It is big news when In-Q-Tel, the investment arm of the US intelligence community, funds a company. It is really big news when Google funds a company. But when both of these tech-savvy organizations fund a company, Beyond Search has to take notice.
After some floundering around, ArnoldIT was able to secure a one-on-one interview with the founder of Recorded Future. The company is one of the next-generation cloud-centric analytics firms. What sets the company apart technically is, of course, the magnetism that pulled In-Q-Tel and Google to the Boston-based firm.
Mr. Ahlberg, one of the founders of Spotfire which was acquired by the hyper-smart TIBCO organization, has turned his attention to Web content and predictions. Using sophisticated numerical recipes, Recorded Future can make observations about trends. This is not fortune telling, but mathematics talking.
In my interview with Mr. Ahlberg, he said:
We set out to organize unstructured information at very large scale by events and time. A query might return a link to a document that says something like “Hu Jintao will tomorrow land in Paris for talks with Sarkozy” or “Apple will next week hold a product launch event in San Francisco”). We wanted to take this information and make insights available through a stunning user experiences and application programming interfaces. Our idea was that an API would allow others to tap into the richness and potential of Internet content in a new way.
When I probed for an example, he told me:
What we do is to tag information very, very carefully. For example, we add metatags that make explicit when we locate an item of data. We tag when that datum was published. We tag when we analyzed that datum. We also tag when we find it, when it was published, when we analyzed it, and what actual time point (past, present, future) to which the datum refers. The time precision is quite important. Time makes it possible for end users and modelers to deal with this important attribute. At this stage in our technology’s capabilities, we’re not trying to claim that we can beat someone like Reuters or Bloomberg at delivering a piece of news the fastest. But if you’re interested in monitoring, for example, the co-incidence of an insider trade with a product recall we can probably beat most at that.
To read the full text of the interview with Mr. Ahlberg click here. The interview is part of the Search Wizards Speak collection of first person narratives about search and content processing. Available without charge on the ArnoldIT.com Web site, the more than 50 interviews comprise the largest repository of first hand explanations of “findability” available.
If you want your search or content processing company featured in this interview series, write seaky2000 at yahoo dot com.
Stephen E Arnold, April 5, 2011
Freebie
Big Data, Big Hassles
April 4, 2011
InfoWorld warns, “Big Data runs afoul of big lawyers.” They emphasize that increasingly popular “Big Data” can be inexpensive- until the attorneys get involved.
Big Data has come to refer to large datasets and the tools used to analyze them, a combination which can yield important information if used correctly. It can also be inexpensive.
However, you might want to bring in your council before going too far. The article tells the story of Pete Warden, who:
“. . .Described how he spent just $100 to scrape 500 million Web pages, including 220 million Facebook public profiles, using his own Web crawler and a 100-machine cluster running on Amazon EC2. He was able to analyze the information to match Twitter, LinkedIn, and Facebook accounts with the email accounts of users of his email tool.
“Then, just for fun, he created interactive maps showing how various countries, U.S. states, and cities connect with each other over social media and what types of fan pages they frequent.”
Neat, huh? Facebook didn’t think so. Their legal department cost him over 30 times the money he spent on the adventure.
So, venture forth, but be careful as you explore this new arena.
Cynthia Murrell, April 4, 2011
Freebie
Resource Links: Text Extraction From HTML Documents
March 28, 2011
We found another nifty links page to add to your software utility file. The list comes from Tomaž Kova?i?’s Tech Blog. He gathered resource links about text extraction from HTML documents to aid the wayward IT worker.
He first highlights articles that cover the basics of text extraction. By reading these articles, you gain a general knowledge about text extraction and the best way to approach it for your needs. He also mentions how to eliminate content “noise” (i.e. content farms).
He’s also collected a comprehensive list of links related to software about text extraction. He says, “There is only a small amount of competition when it comes to software capable of [removing boilerplate text / extracting article text / cleaning web pages / predicting informative content blocks] or whatever terms authors are using to describe the capabilities of their product.”
Extracting text from an HTML document is relatively simple. The type of software you use makes it more complex. He ends with information about APIs and other miscellaneous links that will be helpful. Stash it away for future use.
Whitney Grace, March 28, 2011

