Deep Web Technologies: Cracking Multilingual Search

January 30, 2012

The rapid development of Web-based technologies over the last decade has created a unique opportunity to bring together the world’s scientists by making it easy for them to share research information. With the shift from US-centric, English language information to information published in other languages, researchers find that facility in one or two other languages is inadequate.

The Multilingual Challenge

Multilingual search increases the value of research output by making it available to a wider audience. Seamless federation and automated translation makes available research from China, Japan, Russia, and other countries prolific in science publication to researchers who may lack facility in certain languages. In the area of patent research, multilingual search greatly broadens the scope of patent research. For English speakers, the availability of multilingual federated search exposes English speakers to diverse perspectives from researchers in foreign countries.

For example, China’s research output is now far outpacing the rest of the world. In 2006 China’s research and development output surpassed that of Japan, the UK and Germany. At this pace, China will overtake the USA in a few years. But non US innovation is not confined to Asia and Europe. Brazil’s share of research output is growing rapidly.

WWS_Screenshot

Sample system output from WorldWideScience.org, powered by Deep Web Technologies’ multilingual federating system.

Deep Web Technologies (DWT) is one of the leaders in federated search. Federation requires taking a user’s query and using it to obtain search results from other indexes and search-and-retrieval systems. For example, Deep Web Technologies’ Explorit product handles this process, returning to the user a blended set of results. For the user, federation eliminates the need to frame a query for Google, Medline, USA.gov, and the NASA website. The user frames a query, sends it to Explorit and a single, relevance-ranked results list is displayed to the user.

DWT has moved beyond single language federation and grown to become the leader in federated search of the deep web. This has resulted in the launch of their ground-breaking, patent pending multilingual federated search capability in June of 2011.

“We now live in a much more interconnected world where information is available in a variety of languages,” noted Abe Lederman, President and CTO of Deep Web Technologies. “Major advances in machine translation have made it possible for DWT to develop a revolutionary new Explorit product that breaks down language barriers and advances scientific collaboration and business productivity.”

Read more

File Extension List

January 28, 2012

Need a handy list of all known file extensions and types? Look no further. Nosa Lee at Seek The Sun Slowly has kindly provided such a list in “The Known File Extensions/ Types References – A” through “Z.” In a translation from the original Chinese, the listing explains:

Now, I collected all the known file extensions/types for your reference, I grouped them according to the first character due to there are too many file extensions/types.

Yes, there’s a page for each letter, and even “Number” and “Symbol.” To download them all in one fell swoop, click here.

I knew there were a lot of file types, but seeing them all in one place really puts the matter into perspective.

Cynthia Murrell, Janaury 28, 2012

Sponsored by Pandia.com

SharePoint 2010, Windows Azure, and the Issue of Scalability

January 27, 2012

SharePoint has not seen the same explosive growth in small or medium sized business as it has in larger organizations.  One reason is simply the time and monetary investment that is required in running such an overwhelming infrastructure.  Andrew Connell addresses this issue and its possible remedy with Windows Azure in his piece, “SharePoint 2010 + Windows Azure – Why You Should Care & the Development Story.”

Connell states:

Historically SharePoint has been very popular with large organizations because they can shoulder the financial and resource requirements required to deploy SharePoint. However one place where SharePoint hasn’t grown as fast is in the small and medium side business (SMB) area. The reason for this is most likely the cost and resource requirements (not just hardware, but people as well) necessary to deploy and maintain it. With Office 365, many of these barriers have been removed and therefore it represents a new era and opportunity to grow the SharePoint customer base. Therefore this is an area you should be very aware of and should learn how you can best leverage and exploit this new and untapped market of possible SharePoint customers!

Connell goes on to give specifics regarding the solutions in Office 365, and the ways that small or medium sized business might find SharePoint more amenable to their size and financial viability.  Even as SharePoint becomes more accessible to smaller organizations, some third-party solutions are already customizing options to fit a variety of business needs.

Fabasoft Mindbreeze is one excellent third-party option for an organization of any size looking for an enterprise solution.  Their scalable offerings are customized based on the needs of the customer, and expansion is an easy option whenever needed.  Read more about “Three Configurations for Dynamic Scalability and Deployment.”

The blog entry begins,

In enterprise search, quality, usability and style are as important as relevancy of results and performance to engage your users right from the start.  Let’s take a look at typical scale-out scenarios that become relevant when implementing enterprise environments with Fabasoft Mindbreeze.

The article then goes on to describe the different levels of deployment and how an organization might choose where to begin their installation.  Keep in mind that moving from one level up to the next is effortless.  Additionally, Fabasoft Mindbreeze is currently expanding onto the Cloud, where an organization’s storage needs are never limited.

So for SharePoint devotees looking for a way to make SharePoint more viable for smaller organizations, consult Connell’s findings above, but also explore the solutions presented by Fabasoft Mindbreeze and find a scalable solution that works for your organization.

Emily Rae Aldridge, January 27, 2012

Sponsored by Pandia.com

Synaptica Independent Taxonomy Resource

January 27, 2012

Synaptica started out as Synapse Corporation under founders Trish Yancey and Dave Clarke. The company offered taxonomies, software solutions, and professional lexicography and indexing services for businesses and organizations based on its Synaptica product, a knowledge management and indexing software application, which enables enterprises in managing taxonomies, thesauri, classification schemes, authority control files, and indexes. In 2005, the company, renamed Synaptica, was acquired by Dow Jones and placed in its Factiva unit. Clarke has subsequently regained control of Synaptica.

The company has also has revamped its informational website, Taxonomy Warehouse – a free online resource that has answers enquiries about taxonomies. Named as one of KM World magazine’s “Trend-Setting Products of 2011,” Synaptica is an editorial tool designed for use by professional taxonomists. In 2011, the company added a complementary suite of front-end publication tools that make it easy for any taxonomy or ontology to be presented to end-users.  The Ontology Publishing Suite gives administrators better control over which parts of a master ontology are exposed to end-users, as well as how they are laid out on-screen. Other parts of the Synaptica product suite include Synaptica Enterprise, the behind-the-firewall solution for larger organizations; Synaptica Express, a cloud-computing solution for individuals or small-business users; Synaptica IMS, a complementary suite of tools designed to support the human indexing of content using taxonomies stored in Synaptica; and Synaptica SharePoint Integration, an add-on module enabling taxonomies being managed within Synaptica to be applied as meta-tags to content being stored in SharePoint document libraries, as well as allowing for those same taxonomies to be used for search.

The technology has found a home in corporate, pharmaceutical, government, and e-commerce markets. Clients include Verizon, ProQuest, the BBC, and Harvard Business Publishing. Competitors LexisNexis, Dun & Bradstreet, and InsideView. (I would not include Concept Searching or Ontoprise in this short list due to exogenous complexity factors.)

Stephen E Arnold, January 27, 2012

Sponsored by Pandia.com

OpenCalais: From the Innovators at Thomson Reuters

January 27, 2012

Thomson Reuters is now testing a new print publication called Reuters at the World Economic Forum. Before the firm, returned to print, Thomson Reuters was probing automated tagging.

Founded in 1998, ClearForest was previously an independent software start-up. It was acquired by Reuters in 2007 and is now part of the Markets division of Thomson Reuters. OpenCalais is a strategic initiative from Thomson Reuters, based on ClearForest technology, to support the interoperability of content across the digital landscape.

OpenCalais is free to use in both commercial and non-commercial settings but can only be used on public content. It can process up to 50,000 documents per day (blog posts, news stories, Web pages, etc.) free of charge.  For users needing to process more than that, there is Calais Professional. While it does not keep a copy of the content, it does keep a copy of the metadata it extracts. Offering a de-facto standard for making content interoperable in a fashion that complies with Semantic Web standards ultimately benefits Thomson Reuters, which is then able to track themes, memes and trends on the Web and to potentially do things like link to relevant content that helps provide context to its readers, customers and other constituents.

After releasing a couple of major upgrades – in particular the incorporation of a whole Linked Data ecosystem underneath OpenCalais for companies, geographies, products and a few other things – with little or no adoption and no fundamentally new capabilities being built, the OpenCalais team, headed by Tom Tague, decided to slow down development and let the market for semantic extraction mature. Thomson Reuters believes that there are massive opportunities for OpenCalais in the areas of news, its integration with social media and its utilization as a massive repository of knowledge.

OpenCalais’ early adopters include CBS Interactive / CNET, Huffington Post, Slate, Al Jazeera, “The New Republic,” the White House and more. Customers include: Kodak, Dow Chemical, Eastman Chemical, NASD, EDS, Boeing, US Dept. Air Force, Reuters, Dow Jones, Thomson Financial. Competitors include Eqentia and Evri. . (I would not include Concept Searching or Ontoprise in this short list due to exogenous complexity factors.)

Stephen E Arnold, January 27, 2012

Sponsored by Pandia.com

Simplified Database Maintenance in SharePoint

January 26, 2012

Database maintenance tasks are a part of general SharePoint infrastructure upkeep and care.  However, as Steve Hord writes, it is not always the easiest task for a SharePoint administrator.  In “Find active databases used by SharePoint Server 2010,” Hord writes the following:

One of the best ways to know what databases your SharePoint deployment uses is to keep a record and add database names each time you create a new database.  This isn’t always easy as there usually isn’t enough extra time during the day to keep records. Plus, more often than not your SharePoint database maintenance tasks tend to occur either late at night or in the pre-dawn hours when no users are accessing the system, so remembering to add a new database name to an ongoing list is really tough.

Hord then goes on to give a few tried and true methods for finding the active databases in the SharePoint installation as well as their properties.  However, some administrators might desire an easier method for database maintenance.  We have found that many third-party solutions offer an ease of maintenance that SharePoint simply cannot supply.

Read about one of our favorite offerings, Fabasoft Mindbreeze, and its Fabasoft Mindbreeze Database Connector solution.

The Database Connector enables the connection of databases to Fabasoft Mindbreeze Enterprise. Furthermore, directory services such as Active Directory can be accessed via LDAP and searched very efficiently via Fabasoft Mindbreeze Enterprise.  The development of the Database Connector is based on the data integration connector, making it easy to connect databases to Fabasoft Mindbreeze Enterprise.

So while there are tips and tricks for almost any SharePoint issue or gripe, there are more and more quality third-party solutions available that replace of enhance SharePoint’s functionality.  Check out the Fabasoft Mindbreeze offerings and see if they can supplement or replace your organization’s current SharePoint infrastructure.

Emily Rae Aldridge, January 26, 2012

Sponsored by Pandia.com

Intellisophic: Formerly Indraweb

January 26, 2012

Founded in 1999 as Indraweb and changing its name in 2055, Intellisophic, Inc., is a privately-funded technology company that is the world’s largest provider of taxonomic content. Its technology, originating from the work of founders Henry Kon, PhD., George Burch, and Michael Hoey, is based on the premise that concepts within unstructured information can be systematically derived by leveraging the trusted taxonomies of the reference book community. Within this core idea, Intellisophic developed and patented the Orthogonal Corpus Indexing algorithm for extracting and using taxonomies from reference and education books.

During a stint as principal investigator for MIT’s Context Interchange, CTO Kon researched and implemented methodologies for enterprise integration of structured and semi-structured data over independently managed and disparate schema databases. He researched, designed, and prototyped integration engines for distributed multi-database query and caching over heterogeneous, distributed, and partially connected databases. As a member of MIT’s Composite Information Systems Laboratory, Kon published on multi-database integration engines and the use of ontology for bridging database schema. With Intellisophic, he has pioneered innovation in the conceptual management of unstructured information and in the integration of structured, semi-structured and unstructured content.

Intellisophic content is machine-developed, leveraging knowledge from respected referenceworks. The taxonomies are unbounded by subject coverage and are cost-effective to create. The taxonomy library covers several million topic areas defined by hundreds of millions of terms. In addition to taxonomic content, the company offers intelligent solutions, such as enterprise search and retrieval, business intelligence, categorization and classification, compliance management, portal infrastructure, social networking, content and knowledge management, electronic discovery, data warehousing, and government intelligence.

Its strategic alliance partners include Mark Logic, DataLever, SchemaLogic, DFI International, and Mosaic, Inc. Competitors Sandpiper, Intellidimension, and HighFleet. The depth and breadth of Intellisophic’s taxonomies, along with its support of the leading text mining, search, and categorization applications, make it a good solution for many industries. (I would not include Concept Searching or Ontoprise in this short list due to exogenous complexity factors.)

Stephen E Arnold, January 26, 2012

Sponsored by Pandia.com

SharePoint How-To: Keep It Simple

January 25, 2012

There is a saying in the library world, and apparently a version has made its way into the SharePoint world.  This version is, “Make it easier to use than not to use and it will get used.”  We could all learn a thing a two from this principle, but let’s look at what Kerri Abraham has to say about SharePoint in, “Give Them Instructions!

Referring to the principle above, Abraham says:

Someone in the SharePoint community used this quote in a webinar I watched years ago and it has never left me, I find myself quoting it often because it is just dorky (and easy) enough to remember. And again, proves the point that easy sells! The real gauge of an elegant solution is in its ease of use, not in how complicated it was to build.  So how do I make SharePoint easy? I provide great instructions. I test solutions with users and think creatively about how they might end up frustrated or lost and include those tips in the how-to. Then I place the link at the top of the page for a consistent method of presentation.

It is worth taking a look at Abraham’s entry and viewing her instructive photos and screenshots.  Ultimately, the point is well taken.  Give clear instructions, make it simple, and any organization’s SharePoint installation will be cleaner and more efficient.  However, we also think that third-party enterprise solutions often offer a platform that is more intuitive and easier to use out-of-the-box.  One we particulary like is Fabasoft Mindbreeze.

Read what the Upper Austria Chamber of Commerce had to say about the Mindbreeze ease of use:

Fabasoft Mindbreeze Enterprise enables access to information from various data repositories throughout the organization (such as e-mail systems, file systems, data bases). By expanding the scope of Fabasoft Mindbreeze, the service center staff is now able to receive all relevant information at a glance with only one search query. The simple and intuitive user interface eliminates the need for time intensive training.

Abraham’s creative solution for implementing instructions might help end users with the everyday functions of SharePoint.  If additional efficiency is required, research Fabasoft Mindbreeze and see if it can meet your organization’s enterprise needs.

Emily Rae Aldridge, January 25, 2012

Sponsored by Pandia.com

Mondeca: How Smart Is Your Content?

January 25, 2012

Here in Harrod’s Creek, we and our content are not too smart. Mondeca believes it can change this hapless condition.

Founded in 1999 by Jean Delahousse and others, Mondeca asserts that it is the leading European provider of technology for the management of advanced knowledge structures: ontologies, thesauri, taxonomies, terminologies, metadata repositories, knowledge bases, and Linked Open Data.

Based in Paris, France, the company has been financed by its founders, as well as investment funds Trinova and Banque Populaire. Before starting Mondeca, Delahousse worked for Andersen Consulting, Paris Stock Exchange and Diagram, a publisher of financial software. With expertise in semantic web, ontologies, and content management, he has experience in the design and launch of large software applications, as well as in implementation of semantic technologies for large international clients.

Mondeca’s products help enterprises to integrate and interlink heterogeneous information by mapping it to explicit knowledge references and improve the way information is retrieved, analyzed, and reused by producing consistent, precise, and relevant metadata as well as supplying the relevant context. Mondeca’s technology is at the core of the Semantic Enterprise Information Architecture that allows to interconnect people and resources as well as to extract the most value from information.

Its products include Content Annotation Manager, a platform for building and managing customized workflows for semantic annotation of content that coordinates content analysis, data mapping, human validation, and knowledge enrichment components; and Intelligent Topic Manager, which supports the management of complex knowledge structures throughout their lifecycle, from authoring to delivery and can be either used independently to store and manage complex domain-specific knowledge structures, or as a service that enhances enterprise search, knowledge discovery, and text mining solutions.

Mondeca has also built its credibility in the Semantic Web space as a key contributor to widely-used international standards: OWL, RDF, SKOS, ISO 25964, and Topic Maps. Clients include Hachette Filipacchi, the World Tourism Organization, and Thomson Scientific. Competitors include Layer2 and Wordmap. 

Stephen E Arnold, January 25, 2012

Sponsored by Pandia.com

PolySpot Scales Ten Alps Publishing

January 24, 2012

The economic climate may be uncertain, but it is a great day for scaling Ten Alps. PolySpot announced that it closed a deal to implement its next generation, search enabled applications system for a major publisher. The PolySpot system will be deployed for the Link2Portal system.

Olivier Michel, one of PolySpot’s senior managers, told me:

Ten Alps publish more than 200 publications a year and have developed the unique Link2Portal site, to bring together the day’s news, analysis and exclusive opinions across UK and Global Trade, Logistics, construction and infrastructure,  energy and sustainable development sectors. This information was previously isolated by each publication or subscriber list and as the volume of data was both growing rapidly, and becoming of increasing value to a widening readership, Ten Alps decided to invest in an information search and access solution to facilitate and enhance access to all of its information assets.

According to Mr. Michel, Ten Alps selected PolySpot because of its flexibility, performance, and implementation speed. The PolySpot system was up and running in three days, including integration of the PolySpot solution with other enterprise applications. PolySpot’s robust enterprise search application programming interface was a pivotal element in this implementation.

Stuart Brown, managing director of Ten Alps, said:

With its simple, open architecture, PolySpot was the only platform capable of providing us with a unique B2B search engine, which optimizes our content.

What makes this implementation significant is that PolySpot uses a range of content, including directory information from an Amazon cloud-hosted CouchDB database, the site’s editorial content (which is managed by Drupal), and the unstructured content of the thousands of publications available as PDF files and e books.

Consequently, PolySpot delivers the type of integrated search experience that some vendors have been describing but delivering only after weeks or months of effort. With PolySpot, a search on Link2Portal lets the user find news, a sector expert’s opinion, the e book for a publication, opened at the right page supported with industry solutions and suppliers information.

Gilles André, the chief executive officer of PolySpot, said:

The aim of Link2Portal is to facilitate information access for visitors to a major UK media group’s Web site. We achieved this objective in just a few days and we are proud to have Ten Alps as a customer.

Founded in 2001, PolySpot designs and sells search and information access solutions designed to improve business efficiency in an environment where data volumes are increasing at an exponential rate.

PolySpot’s solutions offer deep connectivity,so that licensees can access the data they need, regardless of their structure, format or origin. PolySpot’s solutions are based on an innovative infrastructure offering both versatility and high performance, enabling companies to make best use of their assets and rationalizing the strategic costs that today’s businesses and organizations face. PolySpot’s solutions have millions of users worldwide, across all business sectors, with customers including Allianz, BNP Paribas, Bureau Veritas, Crédit Agricole, OSEO, Schlumberger, Veolia, Trinity Mirror and Vinci.

A tip of the search enabled applications hat to the PolySpot team. Autonomy, Endeca, Exalead, and IBM have a frisky competitor on their hands I surmise.

Stephen E Arnold, January 24, 2012

Sponsored by Pandia.com

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta