Unemployed in Search or Content Processing? Go for Data Science

July 27, 2015

I read an amazing write up. The title of this gem of high school counseling is “7 Skills/Attitudes to Become a Better Data Scientist.” What does one need to be a better data scientist? Better python or R programming methods? Sharper mathematical intuition? Ability to do the least upper bound (sup) and greatest lower bound (inf) of a set of real numbers) in your head, without paper, and none of that Mathematica software? Wrong.

What you need is to be intellectually curious, an understanding of business, ability to communicate (none of the Cool Hand Luke pithiness), knowledge of more than one programming language, knowledge of SQL, be a participant in competitions, and read articles like “7 Skills and Attitudes.”

Yep, follow these tips and you too can be a really capable data scientist. Why wait? Act now. Read the “7 Skills” article. Nah, don’t worry about such silly notions as data integrity or statistical procedures. Talk to someone, anyone and you will be 14.28 percent of the way to your goal.

Stephen E Arnold, July 27, 2015

Data Companies Poised to Leverage Open Data

July 27, 2015

Support for open data, government datasets freely available to the public, has taken off in recent years; the federal government’s launch of Data.gov in 2009 is a prominent example. Naturally, some companies have sprung up to monetize this valuable resource. The New York Times reports, “Data Mining Start-Up Enigma to Expand Commercial Business.”

The article leads with a pro bono example of Enigma’s work: a project in New Orleans that uses that city’s open data to identify households most at risk for fire, so the city can give those folks free smoke detectors. The project illustrates the potential for good lurking in sets of open data. But make no mistake, the potential for profits is big, too.  Reporter Steve Lohr explains:

“This new breed of open data companies represents the next step, pushing the applications into the commercial mainstream. Already, Enigma is working on projects with a handful of large corporations for analyzing business risks and fine-tuning supply chains — business that Enigma says generates millions of dollars in revenue.

“The four-year-old company has built up gradually, gathering and preparing thousands of government data sets to be searched, sifted and deployed in software applications. But Enigma is embarking on a sizable expansion, planning to nearly double its staff to 60 people by the end of the year. The growth will be fueled by a $28.2 million round of venture funding….

“The expansion will be mainly to pursue corporate business. Drew Conway, co-founder of DataKind, an organization that puts together volunteer teams of data scientists for humanitarian purposes, called Enigma ‘a first version of the potential commercialization of public data.’”

Other companies are getting into the game, too, leveraging open data in different ways. There’s Reonomy, which supplies research to the commercial real estate market. Seattle-based Socrata makes data-driven applications for government agencies. Information discovery company Dataminr uses open data in addition to Twitter’s stream to inform its clients’ decisions. Not surprisingly, Google is a contender with its Sidewalk Labs, which plumbs open data to improve city living through technology. Lohr insists, though, that Enigma is unique in the comprehensiveness of its data services. See the article for more on this innovative company.

 

Cynthia Murrell, July 27, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Big Data: Slow Down, Think

July 25, 2015

i read “Contradictions of Big Data.” Few articles which I see take a common sense approach to Big Data baloney. (Azure chip consultants bristle at my use of baloney. Too bad.) I liked this article.

The article appeared in my Overflight a day ago even though the write up was posted in March 2015. Big Data does not mean rapid data.

I highlighted this passage:

have been waging an uphill battle against the nonsensical and unsubstantiated idea that more data is better data, but now this view is getting some additional support, and from some surprising corners.

I do not agree. The yap about Big Data has almost overpowered the craziness of search engine optimization’s shouting about semantic search.

The write up points out:

Take it from me [Martyn Jones] , most businesses will not be basing their business strategies on the analysis of a glut of selfies, home videos of cute kittens, or the complete works of William Shakespeare or Dan Brown. Almost all business analysis will continue to be carried out on structured data obtained primarily from internal operational systems and external structured data providers.

The write up points out the silliness of velocity and several other slices of marketing baloney. (Make a sandwich, please.)

I found this paragraph insightful:

I have seen data scientists at work, and the word science doesn’t actually jump out and grab you. It’s difficult to make the connection, just as it is to accurately connect some popular science magazines with fundamental scientific research. If a professional and qualified statistician wants to label themselves a data scientist then I have no issue with that, it’s their problem, but I am not willing to lend credibility to the term ‘data scientist’ when it is merely an interesting job title, with at most a tenuous connection to the actual role, and one that is liberally applied, with the almost customary largesse of IT, to creative code hackers and business-averse dabblers in data.

Harsh words for those who combine an undergraduate degree minor in math with Twitter and come up with data scientist.

Hopefully other will pick up this practical approach to the sliced and processed meat wrapped in plastic and branded Big Data.

Stephen E Arnold, July 25, 2015

Plethora of Image Information

July 24, 2015

Humans are visual creatures and they learn and absorb information better when pictures accompany it.  In recent years, the graphic novel medium has gained popularity amongst all demographics.  The amount of information a picture can communicate is astounding, but unless it is looked for it can be hard to find.   It also cannot be searched by a search engine…or can it?  Synaptica is in the process of developing the “OASIS Deep Image Indexing Using Linked Data,”

OASIS is an acronym for Open Annotation Semantic Imaging System, an application that unlocks image content by giving users the ability to examine an image closer than before and highlighting data points.  OASIS is linked data application that enables parts of the image to be identified as linked data URIS, which can then be semantically indexed to controlled vocabulary lists.  It builds an interactive map of an image with its features and conceptual ideas.

“With OASIS you will be able to pan-and-zoom effortlessly through high definition images and see points of interest highlight dynamically in response to your interaction. Points of interest will be presented along with contextual links to associated images, concepts, documents and external Linked Data resources. Faceted discovery tools allow users to search and browse annotations and concepts and click through to view related images or specific features within an image. OASIS enhances the ability to communicate information with impactful visual + audio + textual complements.”

OASIS is advertised as a discovery and interactive tool that gives users the chance to fully engage with an image.  It can be applied to any field or industry, which might mean the difference between success and failure.  People want to fully immerse themselves in their data or images these days.  Being able to do so on a much richer scale is the future.

Whitney Grace, July 24, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

What We Know About SharePoint 2016

July 23, 2015

Everyone is vying for a first look at the upcoming SharePoint 2016 release. In reality those details are just now starting to roll in, so little has been known until recently. The first true reveal came from Bill Baer at this spring’s Microsoft Ignite event. CIO distills Baer’s findings down into their article, “SharePoint 2016: What Do We Know?

The article says:

“The session on SharePoint 2016 was presented by Bill Baer, the head of SharePoint at Microsoft. This was the public’s first opportunity to learn what exactly would be in this version of the product, what sorts of changes and improvements have been made, and other things to expect as we look toward the product’s release and general availability in the first quarter of next year. Here’s what we know after streaming Baer’s full presentation.”

The article goes on to discuss cloud integration, migration, upgrades, and what all of this may point to for the future of SharePoint. In order to stay up to date on the latest news, stay tuned to ArnoldIT.com, in particular the dedicated SharePoint feed. Stephen E. Arnold has made a career out of all things search, and his work on SharePoint gives interested parties a lot of information at a glance.

Emily Rae Aldridge, July 23, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

Disable Annoying Windows Web Search

July 23, 2015

In another attempt to Apple, Microsoft allows users to search not only their computer’s hard drive, but also the Web at the same time.  This is a direct copy of Apple OS’s Spotlight Search, but unlike Apple, Windows’s increased search parameters are annoying. Windows users can disable this supposed “helpful” feature and GHacks has the directions to do it: “How To Disable Web Search In Windows 10’s Start Menu.”

Apple’s Spotlight Search does pretty much the same thing, but it categorizes results into organized categories and does not search the entire Web, only Wikipedia, iTunes, and preselected search engines.  Microsoft has the tendency to go overboard and that usually equals slow response time.  The article mentions the Windows 10 search results are also:

“I will never use the search for a couple of reasons. First, I don’t need it there as I want local files and settings to be returned exclusively when I run a search on Windows 10. Second, the suggestions are too generic most of the time and third, since a browser is open all the time on my system, I can run a search using it as well without having to add another step to the process.”

The good news is that the Web search feature can be disabled, but it is not available to all users.  Does that surprise you?  Microsoft has the tendency to release OS’s without fully fixing all the bugs.  Windows 10 appears to be better than prior releases, but little bugs like this make it annoying.

Whitney Grace, July 23, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

A Technical Shift in Banking Security

July 23, 2015

Banks may soon transition from asking for your mother’s maiden name to tracking your physical behavior in the name of keeping you (and their assets) safe. IT ProPortal examines “Fraud Prevention: Knowledge-Based Ananlytics in Steep Decline.” Writer Lara Lackie cites a recent report from the Aite Group that indicates a shift from knowledge-based analytics to behavioral analytics for virtual security checkpoints. Apparently, “behavioral analytics” is basically biometrics without the legal implications. Lackie writes:

“Examples of behavioural analytics/biometrics can include the way someone types, holds their device or otherwise interacts with it. When combined, continuous behavioural analysis, and compiled behavioural biometric data, deliver far more intelligence than traditionally available without interrupting the user’s experience….

Julie Conroy, research director, Aite Group, said in the report “When the biometric is paired with strong device authentication, it is even more difficult to defeat. Many biometric solutions also include liveliness checks, to ensure it’s a human being on the other end.’

“NuData Security’s NuDetect online fraud engine, which uses continuous behavioural analysis and compiled behavioral biometric data, is able to predict fraud as early as 15 days before a fraud attempt is made. The early detection offered by NuDetect provides organisations the time to monitor, understand and prevent fraudulent transactions from taking place.”

The Aite report shows over half the banks surveyed plan to move away from traditional security questions over the next year, and six of the 19 institutions plan to enable mobile-banking biometrics by the end of this year. Proponents of the approach laud behavioral analytics as the height of fraud detection. Are Swype patterns and indicators of “liveliness” covered by privacy rights? That seems like a philosophical question to me.

Cynthia Murrell, July 23, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

IBM SAP Versus SAS: A Faux Dust Up

July 22, 2015

Ah, the freebie statistics are like gnats. One or two make no difference when one is eating a chicken leg. Toss in 20,000 or more and the leg eating becomes a chore.

I read an oblique write up called “SAS UK Chief: Envious Rivals, Skills Gap and Analytics in the Cloud.” The topics are interesting because they are mixed together, a fruit salad to go with that picnic chicken.

The write up begins a statement attributed to an IBM SAP executive along the lines: “SAS could be entirely replaced.” That seems a bit of fortune telling which might not be entirely in line with some SAS users’ plans. IBM, as you may know, is fresh from 13 straight quarters of revenue decline. I interpreted the feisty comment as a signal to IBM management that the much loved SAP division is replete with machismo and doing its bit to increase revenues. There’s nothing like a statistics squabble to pump up the sales spice.

As I understand the write up, that allegedly “put ‘em up, chump” statement caused an SAS executive to flounder. SAS’s problem is that it is still a little chunk of graduate school. SAS faces competition from upstarts like Talend. SAP, on the other hand, is chasing consulting and giant IBM cloud-type things. But the two outfits are old school operations. For proof just ask a graduate student in statistics.

The reality is that both SAP and SAS may be victims of the same market shifts. In order to get either company’s products to deliver a perfect grilled chicken, one has to know about statistics and have resources (money, gentle reader).

Big companies are okay with these requirements. But the buzz in the analytics world is for open source, point and click, ready to run solutions. The outputs of these next generation systems may not meet the standards of the SAPs and the SASs of the world, but the customers don’t care.

These two firms are facing many gnats. Neither is going to have a pleasant meal. The good old days of sunshine, blue skies, and a bug free experience are gone.

Stephen E Arnold, July 22, 2015

Neural Networks and Thought Commands

July 22, 2015

If you’ve been waiting for the day you can operate a computer by thinking at it, check out “When Machine Learning Meets the Mind: BBC and Google Get Brainy” at the Inquirer. Reporter Chris Merriman brings our attention to two projects, one about hardware and one about AI, that stand at the intersection of human thought and machine. Neither venture is anywhere near fruition, but a peek at their progress gives us clues about the future.

The internet-streaming platform iPlayer is a service the BBC provides to U.K. residents who wish to catch up on their favorite programmes. In pursuit of improved accessibility, the organization’s researchers are working on a device that allows users to operate the service with their thoughts. The article tells us:

“The electroencephalography wearable that powers the technology requires lucidity of thought, but is surprisingly light. It has a sensor on the forehead, and another in the ear. You can set the headset to respond to intense concentration or meditation as the ‘fire’ button when the cursor is over the option you want.”

Apparently this operation is easier for some subjects than for others, but all users were able to work the device to some degree. Creepy or cool? Perhaps it’s both, but there’s no escaping this technology now.

As for Google’s undertaking, we’ve examined this approach before: the development of artificial neural networks. This is some exciting work for those interested in AI. Merriman writes:

“Meanwhile, a team of Google researchers has been looking more closely at artificial neural networks. In other words, false brains. The team has been training systems to classify images and better recognise speech by bombarding them with input and then adjusting the parameters to get the result they want.

But once equipped with the information, the networks can be flipped the other way and create an impressive interpretation of objects based on learned parameters, such as ‘a screw has twisty bits’ or ‘a fly has six legs’.”

This brain-in-progress still draws some chuckle-worthy and/or disturbing conclusions from images, but it is learning. No one knows what the end result of Google’s neural network research will be, but it’s sure to be significant. In a related note, the article points out that IBM is donating its machine learning platform to Apache Spark. Who knows where the open-source community will take it from here?

Cynthia Murrell, July 22, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

Big Data Basics: Garbage In, Garbage Out Still a Problem

July 20, 2015

The person writing “Data Integrity: A Sequence of Words Lost in the World of Big Data” appears to be older than 18. I don’t hear too many young wizards nattering about data integrity. The operative concept is that with enough data, the data work out the bumps in the Big Data tapestry. The cloth may have leaves and twigs in it. But when you make the woven object big enough and hang it on a wall in a poorly illuminated chateau, who can tell. Few visitors demand a ladder and a lanthorn to inspect the handiwork.

According to the write up:

The purpose of this post is to highlight the necessity to keep data clean and orderly so that the results of the analysis are reliable and trustworthy – if data integrity is intact, information derived from this data will be trustworthy resulting in actionable information.

Why tackle this topic in a blog for Big Data professionals?

Answer: No one pays much attention. The author saddles up and does the Don Quixote gallop at the Big Data hyperbole windmill.

The article includes a partial list of questions to ask and, keep this in mind, gentle reader, to answer. One example: “Are values outside of acceptable domain values?”

I found this article refreshing. Take a gander.

Stephen E Arnold, July 20, 2015

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta