MIT Discover Object Recognition

June 23, 2015

MIT did not discover object recognition, but researchers did teach a deep-learning system designed to recognize and classify scenes can also be used to recognize individual objects.  Kurzweil describes the exciting development in the article, “MIT Deep-Learning System Autonomously Learns To Identify Objects.”  The MIT researchers realized that deep-learning could be used for object identification, when they were training a machine to identify scenes.  They complied a library of seven million entries categorized by scenes, when they learned that object recognition and scene-recognition had the possibility of working in tandem.

“ ‘Deep learning works very well, but it’s very hard to understand why it works — what is the internal representation that the network is building,’ says Antonio Torralba, an associate professor of computer science and engineering at MIT and a senior author on the new paper.”

When the deep-learning network was processing scenes, it was fifty percent accurate compared to a human’s eighty percent accuracy.  While the network was busy identifying scenes, at the same time it was learning how to recognize objects as well.  The researchers are still trying to work out the kinks in the deep-learning process and have decided to start over.  They are retraining their networks on the same data sets, but taking a new approach to see how scene and object recognition tie in together or if they go in different directions.

Deep-leaning networks have major ramifications, including the improvement for many industries.  However, will deep-learning be applied to basic search?  Image search still does not work well when you search by an actual image.

Whitney Grace, June 23, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Chrome Restricts Extensions amid Security Threats

June 22, 2015

Despite efforts to maintain an open Internet, malware seems to be pushing online explorers into walled gardens, akin the old AOL setup. The trend is illustrated by a story at PandoDaily, “Security Trumps Ideology as Google Closes Off its Chrome Platform.” Beginning this July, Chrome users will only be able to download extensions for that browser  from the official Chrome Web Store. This change is on the heels of one made in March—apps submitted to Google’s Play Store must now pass a review. Extreme measures to combat an extreme problem with malicious software.

The company tried a middle-ground approach last year, when they imposed the our-store-only policy on all users except those using Chrome’s development build. The makers of malware, though, are adaptable creatures; they found a way to force users into the development channel, then slip in their pernicious extensions. Writer Nathanieo Mott welcomes the changes, given the realities:

“It’s hard to convince people that they should use open platforms that leave them vulnerable to attack. There are good reasons to support those platforms—like limiting the influence tech companies have on the world’s information and avoiding government backdoors—but those pale in comparison to everyday security concerns. Google seems to have realized this. The chaos of openness has been replaced by the order of closed-off systems, not because the company has abandoned its ideals, but because protecting consumers is more important than ideology.”

Better safe than sorry? Perhaps.

Cynthia Murrell, June 22, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Content Grooming: An Opportunity for Tamr

June 20, 2015

Think back. Vivisimo asserted that it deduplicated and presented federated search results. There are folks at Oracle who have pointed to Outside In and other file conversion products available from the database company as a way to deal with different types of data. There are specialist vendors, which I will not name, who are today touting their software’s ability to turn a basket of data types into well-behaved rows and columns complete with metatags.

Well, not so fast.

Unifying structured and unstructured information is a time consuming, expensive process. The reasons for the obese exception files where objects which cannot be processed go to live out their short, brutish lives.

I read “Tamr Snaps Up $25.2 Million to Unify Enterprise Data.” The stakeholders know, as do I, that unifying disparate types of data is an elephant in any indexing or content analytics conference room. Only the naive believe that software whips heterogeneous data into Napoleonic War parade formations. Today’s software processing tools cannot get undercover police officers to look ship shape for the mayor.

Ergo, an outfit with an aversion to the vowel “e” plans to capture the flag on top of the money pile available for data normalization and information polishing. The write up states:

Tamr can create a central catalogue of all these data sources (and spreadsheets and logs) spread out across the company and give greater visibility into what exactly a company has. This has value on so many levels, but especially on a security level in light of all the recent high-profile breaches. If you do lose something, at least you have a sense of what you lost (unlike with so many breaches).

Tamr is correct. Organizations don’t know what data they have. I could mention a US government agency which does not know what data reside on the server next to another server managed by the same system administrator. But I shall not. The problem is common and it is not confined to bureaucratic blenders in government entities.

Tamr, despite the odd ball spelling, has Michael Stonebraker, a true wizard on the task. The write up mentions an outfit what might be politely described as a “database challenge” as a customer. If Thomson Reuters cannot figure out data after decades of efforts and millions upon millions of investment, believe me when I point out that Tamr may be on to something.

Stephen E Arnold, June 20, 2015

LinkedIn: A Pinot for a Flavor Profile with a Narrow Market

June 13, 2015

LinkedIn is the social network for professionals. The company meets the needs of individuals who want to be hired and companies looking to find individuals to fill jobs. We use the system to list articles I have written. If you examine some of the functions of LinkedIn, you may discover that sorting is a bit of disappointment.

LinkedIn has been working hard to find technical solutions to its data management challenges. One of the company’s approaches has been to create software, make it available as open source, and then publicize the contributions.

A recent example is the article “LinkedIn Fills Another SQL-on-Hadoop Niche.” What is interesting in the write up is that the article does not make clear what LinkedIn does with this  software home brew. I learned:

Pinot was designed to provide the company with a way to ingest “billions of events per day” and serve “thousands of queries per second” with low latency and near-real-time results — and provide analytics in a distributed, fault-tolerant fashion.

On the surface, it seems that Hadoop is used as a basked. Then the basket’s contents is filtered using SQL queries. But for me the most interesting information in the write up is what the system does not do; for example:

  • The SQL-like query language used with Pinot does not have the ability to perform table joins
  • The data is (sic) strictly read-only
  • Pinot is narrow in focus.

Has LinkedIn learned that its internal team needs more time and money to make Pinot a mash up with wider appeal? Commercial companies going open source is often a signal that the assumptions of the in house team have collided with management’s willingness to pay for a sustained coding commitment.

Stephen E Arnold, June 13, 2015

Mongo the Destroyer and JSON and the Datanauts Team Up

June 12, 2015

Hadoop fans, navigate to “A Better Mousetrap: A JSON Data Warehouse Takes on Hadoop.” There are a couple of very interesting statements in this write up. Those who do the Hadoop the loop know that certain operations are sloooow. Other operations are not efficient for certain types of queries. One learns about these Hadoop the Loops over time, but the issues are often a surprise to the Hadoop/Big Data cheerleaders.

The article reports that SonarW may have a good thing with its Mongo and JSON approach. For example, I highlighted:

In other words, Hadoop always tries to maximize resource utilization. But sometimes you need to go grab something real quick and you don’t need 100 nodes to do it.

That means the SonarW approach might address some sharp focus, data analysis tasks. I also noted:

What could work to SonarW’s advantage is its simplicity and lower cost (starting at $15,000 per terabyte) compared to traditional data warehouses and MPP systems. That might motivate even non-MongoDB-oriented companies to at least kick the tires.

Okay, good. One question which crossed my mind, will SonarW’s approach provide some cost and performance capabilities that offer some options to XML folks thinking JSON thoughts?

I think SonarW warrants watching.

Stephen E Arnold, June 12, 2015

Search Companies: Innovative or Not?

June 11, 2015

Forbes’ article “The 50 Most Innovative Companies Of 2014: Strong Innovators Are Three Times More Likely To Rely on Big Data Analytics” points out how innovation is strongly tied to big data analytics and data mining these days.  The Boston Consulting Group (BCG) studies the methodology of innovation.  The numbers are astounding when companies that use big data are placed against those who still have not figured out how to use their data: 57% vs. 19%.

Innovation, however, is not entirely defined by big data.  Most of the companies that rely on big data as key to their innovation are software companies.  According to Forbes’ study, they found that 53% see big data as having a huge impact in the future, while BCG only found 41% who saw big data as vital to their innovation.

Big data cannot be and should not be ignored.  Forbes and BCG found that big data analytics are useful and can have huge turnouts:

“BCG also found that big-data leaders generate 12% higher revenues than those who do not experiment and attempt to gain value from big data analytics.  Companies adopting big data analytics are twice as likely as their peers (81% versus 41%) to credit big data for making them more innovative.”

Measuring innovation proves to be subjective, but one cannot die the positive effect big data analytics and data mining can have on a company.  You have to realize, though, that big data results are useless without a plan to implement and use the data.  Also take note that none of the major search vendors are considered “innovative,” when a huge part of big data involves searching for results.

Whitney Grace, June 11, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Online Shopping Is Too Hard

June 10, 2015

Online shopping is supposed to drive physical stores out of business, but that might not be the case if online shopping is too difficult.  The Ragtrader article, “Why They Abandon” explains that 45 percent of Australian consumers will not make an online purchase if they experience Web site difficulties.  The consumers, instead, are returning to physical stores to make the purchase.  The article mentions that 44 percent believe that traditional shopping is quicker if they know what to look for and 43 percent as prefer in-store service.

The research comes from a Rackspace survey to determine shopping habits in New Zealand and Australia.  The survey also asked participants what other problems they experienced shopping online:

“42 percent said that there were too many pop-up advertisements, 34 percent said that online service is not the same as in-store and 28 percent said it was too time consuming to narrow down options available.”

These are understandable issues.  People don’t want to be hounded to purchase other products when they have a specific item in mind and thousands of options are overwhelming to search through.  Then a digital wall is often daunting if people prefer interpersonal relationships when they shop.  The survey may pinpoint online shopping weaknesses, but it also helps online stores determine the best ways for improvement.

“ ‘This survey shows that not enough retailers are leveraging powerful and available site search and navigation solutions that give consumers a rewarding shopping experience.’ ”

People shop online for convenience, variety, lower prices, and deals.  Search is vital for consumers to narrow down their needs, but if they can’t navigate a Web site then search proves as useless as an expired coupon.

 

Whitney Grace, June 10, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Free Version of InetSoft Style Scope Agile Edition Available

June 10, 2015

The article titled InetSoft Launches Style Scope Agile Edition for Dashboarding and Visual Analytics on PRWeb tells of a free version of InetSoft’s application for visualizing analysis. Business users will gain access to an interactive dashboard with an easy-to-use drag and drop sensibility. The article offers more details about the launch:

“Advanced visualization types ideal for multi-dimensional charting and point-and-click controls like selection lists and ranger sliders give greater abilities for data exploration and performance monitoring than a simple spreadsheet offers. Any dashboard or analysis can be privately shared with others using just a browser or a mobile device, setting the application apart from other free BI tools… Setting up the software will be straightforward for anyone with power spreadsheet skills or basic knowledge of their database.”

Drawbacks to the free version are mentioned, such as being limited to two concurrent users. Of course, the free version is meant to “showcase” the company’s technology according to CMO Mark Flaherty. There is a demo available, to check out the features of the free application. InetSoft has been working since 1996 to bring users intuitive solutions to business problems. This free version is specifically targeted at smaller businesses who might be unable to afford the full application.

Chelsea Kerwin, June 10, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

IBM Elevates Tape Storage to the Cloud

June 9, 2015

Did you think we left latency and bad blocks behind with tape storage? Get ready to revisit them, because “IBM Cloud Will Reach Back to Tape for Low-Cost Storage,” according to ComputerWorld. We noticed tape storage was back on the horizon earlier this year, and now IBM has made it official at its recent Edge conference in Las Vegas. There, the company was slated to present a cloud-archiving architecture that relies on a different storage mediums, including tape, depending on an organization’s needs. Reporter Stephen Lawson writes:

“Enterprises are accumulating growing volumes of data, including new types such as surveillance video that may never be used on a regular basis but need to be stored for a long time. At the same time, new big-data analytics tools are making old and little-used data useful for gleaning new insights into business and government. IBM is going after customers in health care, social media, oil and gas, government and other sectors that want to get to all of their data no matter where it’s stored. IBM’s system, which it calls Project Big Storage, puts all tiers of storage under one namespace, creating a single pool of data that users can manage through folders and directories without worrying about where it’s stored. It incorporates both file and object storage.”

A single pool of data is good. The inclusion of tape storage in this mix is reportedly part of an attempt to undercut IBM’s cloudy competitors, including AWS and Google Cloud. Naturally, the service can be implemented onsite, as a cloud service, or as a hybrid. IBM hopes Big Storage will make cloud pricing more predictable, though complexity there seems inevitable. Tape storage is slower to deliver data, but according to the plan only “rarely needed” data will be stored there, courtesy of IBM’s own Spectrum Scale distributed storage software. Wisely, IBM is relying on the tape-handling experts at Iron Mountain to run the tape-based portion of the Big Storage Project.

Cynthia Murrell, June 9, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

NSA Blanket Data Collection Preventing Accurate Surveillance

June 4, 2015

The article on ZDNet titled NSA Is So Overwhelmed with Data, It’s No Longer Effective, Says Whistleblower examines the concept of “bulk data failure” by the NSA and other agencies. William Binney, a whistleblower who has been out of the NSA for over a decade, says that the sheer amount of data the NSA collects leads to oversights and ineffective surveillance. The article states,

“Binney said he estimated that a “maximum” of 72 companies were participating in the bulk records collection program — including Verizon, but said it was a drop in the ocean. He also called PRISM, the clandestine surveillance program that grabs data from nine named Silicon Valley giants, including Apple, Google, Facebook, and Microsoft, just a “minor part” of the data collection process. “The Upstream program is where the vast bulk of the information was being collected,” said Binney.”

It appears that big data presents challenges even when storage, servers, and money are available. Binney blames the data overload for bungles that have led to the Boston bombing and Paris shooting. He believes the NSA had the information needed to prevent the attacks, but couldn’t see the trees for the forest. Smart data collection, rather than mass data collection, is his suggestion to fix this information overload.

Chelsea Kerwin, June 4, 2014

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta