Oracle Scores with Text Query Functions

October 17, 2011

Save Your Knowledge, an IT knowledge and experience blog, provides a useful how-to with, “Oracle Text: A Simple Way to Implement Scoring Text Search Engine on Oracle DB.” We quite like the allusions to the work of Edward Hopper too.

The author’s English is interesting, but the concept is clear:

In this post I will describe text query functionality.  My customer wants search functionality on several database columns, and results must be ordered by their relevance.  Using a “like” clause let’s you find results that contains a word but doesn’t say you how much relevant it is.  For this purpose you can use Oracle Text extension.

Most licensees of Oracle’s flag ship database will have or be able to get access to the Text functions. Although getting long in the tooth, the system will index what’s in an Oracle table. Performance can be an interesting challenge. Scaling and speeding up the creaking technology of Oracle Text requires expertise and resources; that is, money and time. For more information about Oracle text, click here.

The author goes on to describe how to make this work in various contexts and also provides examples, screenshots, etc.  The technique could be a helpful function for users of the Oracle application and is worth a look.

Emily Rae Aldridge, October 17, 2011

Sponsored by Pandia.com

Lucid Imagination: Open Source Search Reaches for Big Data

September 30, 2011

We are wrapping up a report about the challenges “big data” pose to organizations. Perhaps the most interesting outcome of our research is that there are very few search and content processing systems which can cope with the digital information required by some organizations. Three examples merit listing before I comment on open source search and “big data”.

The first example is the challenge of filtering information required by orgnaizatio0ns produced within the organization and by the organizations staff, contractors, and advisors. We learned in the course of our investigation that the promises of processing updates to Web pages, price lists, contracts, sales and marketing collateral, and other routine information are largely unmet. One of the problems is that the disparate content types have different update and change cycles. The most widely used content management system based on our research results is SharePoint, and SharePoint is not able to deliver a comprehensive listing of content without significant latency. Fixes are available but these are engineering tasks which consume resources. Cloud solutions do not fare much better, once again due to latency. The bottom line is that for information produced within an organization employees are mostly unable to locate information without a manual double check. Latency is the problem. We did identify one system which delivered documented latency across disparate content types of 10 to 15 minutes. The solution is available from Exalead, but the other vendors’ systems were not able to match this problem of putting fresh, timely information produced within an organization in front of system users. Shocked? We were.

lucid decision copy

Reducing latency in search and content processing systems is a major challenge. Vendors often lack the resources required to solve a “hard problem” so “easy problems” are positioned as the key to improving information access. Is latency a popular topic? A few vendors do address the issue; for example, Digital Reasoning and Exalead.

Second, when organizations tap into content produced by third parties, the latency problem becomes more severe. There is the issue of the inefficiency and scaling of frequent index updates. But the larger problem is that once an organization “goes outside” for information, additional variables are introduced. In order to process the broad range of content available from publicly accessible Web sites or the specialized file types used by certain third party content producers, connectors become a factor. Most search vendors obtain connectors from third parties. These work pretty much as advertised for common file types such as Lotus Notes. However, when one of the targeted Web sites such as a commercial news services or a third-party research firm makes a change, the content acquisition system cannot acquire content until the connectors are “fixed”. No problem as long as the company needing the information is prepared to wait. In my experience, broken connectors mean another variable. Again, no problem unless critical information needed to close a deal is overlooked.

Read more

DirectDirect Adds Video Management

September 29, 2011

The lines between content management, content processing, and data fusion continue to blur.

DataDirect Networks, the world’s largest privately held information storage company, announced the release of Storage Fusion Architecture (SFA) 10K-X this week. SFA 10K-X is an integrated storage appliance that maximizes application performance while minimizing total cost of ownership for Big Data, cloud, and content-intensive environments. Autonomy has been a player in video for a number of years, and we anticipate that other storage firms will observe Autonomy’s success and explore the burgeoning rich media opportunity.

In Maria Deutscher’s article, DataDirect Networks Brings Fusion Tech to Big Data Storage DataDirect CEO and Cofounder Alex Bouzari said:

The DDN SFA10K-X is a high- performance, scalable solution that will meet the needs of today’s and tomorrow’s data-intensive organizations.

According to an August article, DataDirect is now powering more than 60 percent of the top 50 fastest computer storage solution in the world. While DataDirect is kept busy with expanded partnerships and a new command video management platform, if you want an explanation, be prepared to pay over $200 for a basic book.

DirectData is an example of a next generation enterprise solution which uses storage as a platform for sophisticated content processing and management services.

Jasmine Ashton, September 29, 2011

Sponsored by Pandia.com

SQL Injection: Knowledge Prevents Problems

September 14, 2011

Our modern lives are controlled by databases: health records, financial records, education records, and online search. Even when you are not personally interfacing with a database, there is usually one behind the scenes controlling your enrollment, appointment time, or access to any given record. SQL is a computer database language used to write or create such databases and is vulnerable to hacking through a technique called SQL injection.

SQL injection exploits a security vulnerability in the database layer of an application, like queries. It’s considered one of the top 10 web application security vulnerabilities. Our culture of free access to information can be used for good or for evil. One example is this SQL Injection Pocket Reference.

Freely available on the Web, this pocket guide explains the ins and outs of SQL injection. The author could argue that this guide helps creators build more secure databases by recognizing mistakes in the framework or areas of weakness. However, a stronger argument could be made that such a reference is more of a “hacking for dummies” guidebook than anything else. Anyone who’s ever suffered an email or bank account hack would like to see such information be a little harder to find.

We are not fans of hacker related information or the hacker ethos. Information can prevent missteps. We suggest you consider learning about SQL injection and then double checking that you are not vulnerable.

Emily Rae Aldridge, September 14, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

Jaspersoft Reaches Out to the Open Source Community

September 9, 2011

Dr. Dobb’s examines “Jaspersoft’s Open Source BI Knowledge Center.” Called Self-Service Express, the new subscription service grants access to product documentation, the Jaspersoft Customer Knowledge Base, and the enterprise-search-powered repository of Jaspersoft technical information. The service itself will run from $99 to $399/ month, but the content is all freeware. Writer Adrian Bridgwater explains,

Self-Service includes tips and tricks, code samples, and best practices. Developers focused on BI tool construction will be able to use Jaspersoft’s enterprise search service to explore technical information including product documentation and knowledge base materials, as well as other resources found at Jaspersoft.com and JasperForge.org.

Jaspersoft supplies clients with business intelligence (BI) software, with an emphasis on keeping up-to-date. The company focuses on such current concerns as cost sensitivity, refined user interface experiences, Cloud computing, and getting the most out of Big Data.

This service is one-of-a-kind, claims the enterprise. Jaspersoft Senior Analyst Jay Lyman says the company hopes the project will help them draw new customers from the open source community. Sounds like a good strategy to me.

Cynthia Murrell, September 9, 2011

Sponsored by Pandia.com

Hadoop Gaining Ground on RDBMS Like a Smart Car Climbing Pike’s Peak

September 6, 2011

Open-source Apache Hadoop software is co-existing on the market with the more established RDBMS for relational database management. Computer World reports in, “Hadoop Growing, Not Replacing RDBMS in Enterprises.” We learned:

Hadoop is designed to help companies manage and process petabytes of data. Much of the technology’s appeal lies in its ability to break up very large data sets into smaller data blocks that are then distributed across a cluster of commodity hardware for faster processing. Early adopters of the technology, including Facebook, Amazon, eBay and Yahoo, have been using Hadoop to store and analyze petabytes of unstructured data that conventional RDBMS setups couldn’t handle easily.

Computer World’s review is not completely negative, but rather restrictive in our view. RDBMS has organizational inertia on its side, an obstacle any newcomer has to conquer. RDBMS is entrenched in the rigid world of transaction data, customer information, and call records. However, Hadoop is adept in creative sectors such as event data, search engine results, and text and multimedia content from social media sites. Security concerns are also cited, although as adoption becomes more widespread those concerns are sure to lessen.

Our view is that in the present financial environment, open source is likely to suffer severe pressures. Giant, for profit companies will want to capitalize on open source goodness and then implement a fiercely commercial pricing model for services, training, consulting, engineering, and proprietary extensions. Big money will lure key developers, and the “community” may be subject to London, UK style dissention. Yikes!

Emily Rae Aldridge, September 6, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

Statisticians Weigh In on Big Data

September 5, 2011

The Joint Statistical Meetings, the largest assembly of data scientists in North America, provided fertile ground this summer for a survey by Revolution Analytics on the state of Big Data technologies. Revolution Analytics presents the results in “97 Percent of Data Scientists Say ‘Big Data’ Technology Solutions Need Improvement.”

As the headline suggests, the vast majority of these experts crave improvement in the field:

The survey revealed nearly 97 percent of data scientists believe big data technology solutions need improvement and the top three obstacles data scientists foresee when running analytics on Big Data are: complexity of big data solutions; difficulty of applying valid statistical models to the data; and having limited insight into the meaning of the data.

Results also show a lack of consensus on the definition of “Big Data.” Is the threshold a terabyte? Petabyte? Or does it vary by the job? No accepted standard exists.

Survey-takers were asked about their future use of existing analytics platforms, SPSS, SAS, R, S+, and MATLAB. Most respondents expected to increase use of only one of these, the open source R project (a.k.a. GNU S).

Revolution Analytics bases their data management software and services on the R project. The company also sponsors Inside-R.org, a resource for the R project community. I’d have to see the survey to know whether the emphasis they found on R was skewed, but let’s give them the benefit of the doubt for now.

Cynthia Murrell, September 5, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

MetaCarta Offers Geotagging Plug-In

August 19, 2011

Geospatial context is the linch pin for cultural and human ecosystem modeling and analysis. Concept templates can guide models, allowing professionals to consider economic, religious, political and geographic features simultaneously. “Geotagging with MetaCarta” explains the Thetus blog, is a new plug-in solution for creating such models.

MetaCarta’s GeoSearch Toolkit plug-in for Apache Solr, an open source high performance search and index, gives us the ability to combine geographic search constraints such as bounding boxes and heat maps with the many other semantic and text-based search inputs that we have built up using Solr. This toolkit from MetaCarta allows us to run geo-aware searches through one unified and high performance search engine, rather than needing to conglomerate geographic search results from one data source with semantic search results from another source.

The GeoSearch plug-in by MetaCarta makes a lot of sense for professionals seeking ease and speed when incorporating geographic data into their work. Geography is certainly a specific field, and those not well versed in its intricacies often choose to stay away all together. Perhaps software such as this offering by MetaCarta can make geography a user-friendly affair. Thetus keeps a low profile, but the company continues to move forward with commercial and government work.

Emily Rae Aldridge, August 19, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

Endeca Tackles Big Data, but Is the Concept Valid?

August 17, 2011

This week Endeca announced it would integrate Apache Hadoop in with their Endeca Latitude product, thus providing a better environment for processing big data. In, “Endeca Attacks Big data with Hadoop integration,” the enterprise search vendor continues to move away from traditional models and address specific business needs.

Hadoop is an open source data processing tool that according to the Endeca release works particularly well with unstructured data. One of the advantages of working with Hadoop is that it offers what is essentially a fail-safe approach because if one server shuts down or just slows down, Hadoop compensates across the remaining servers and keeps running . . . This all comes together to provide a better environment for processing big data, something that according to Donald Feinberg, VP distinguished analyst at Gartner, is a growing concern at many organizations.

But is big data a growing concern or a corporate myth? In “There’s no such thing as big data,” Alistair Croll contends that big data may exist in theory but not in practice. While they may accumulate virtual vaults of data, Croll contends, “It takes an employee, deciding that the loss of high-value customers is important, to run a query of all their data and find him, and then turn that into a business advantage. Without the right questions, there really is no such thing as big data — and today, it’s the upstarts that are asking all the good questions.”

Croll maintains that small start-ups are winning the marketing game because they are approaching from a more agile, more creative position. However, large companies have plenty of power to leverage in their holdings of big data, if only they knew how to ask the right questions. Why do we have Netflix instead of a reinvented Blockbuster? This is the heart of Croll’s question.

So while Endeca might have found a favorable selling point for Latitude, the business plan is still lacking for how to incorporate the big data concept into a profitable model. Maybe big business will learn from the start-ups, allowing big data to become a topic of relevance.

Emily Rae Aldridge, August 17, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

Protected: Microsoft Access and SharePoint: Happy Together

August 16, 2011

This content is password protected. To view it please enter your password below:

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta