Google Clamps down on Surprise Costs in BigQuery
December 23, 2015
The article titled Google Promises to Rein in Runaway Query Costs on Fortune discusses the obstacles facing Google’s BigQuery data tool. Google hopes to make BigQuery a major resource for big companies considering cloud technology, but unpredictable costs are getting in the way of the “low-cost big data analytics option” marketing that Google has deployed. Hence, the introduction of “custom quota” and Query Explain,
“Google is now offering potential inquisitors a way to set a “custom quota” to ensure that the number crunching on a specified project does not exceed a pre-set daily limit. In addition, a Query Explain feature promises to lay out, how BigQuery will go about processing the question on the table in advance. That way, in theory, you can see if your questions will be “write, read, or compute heavy” and better anticipate where performance bottlenecks could lurk…”
One might fairly ask why there was any delay in these services, since customers are not known for their fondness of mobile phone type billing surprises. Amazon is also standing next to Google waving at RedShift, a BigQuery competitor in the air. But the simpler pricing and efficiency of BigQuery might be more appealing to many companies, especially with the more controlled processes now available.
Chelsea Kerwin, December 23, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Microsoft Drops Bing from Pulse, Adds Azure Media Services
December 22, 2015
The article on VentureBeat titled Microsoft Rebrands Bing Pulse to Microsoft Pulse, extends Snapshot API ushers in the question: is Bing a dead-end brand? The article states that the rebranding is meant to emphasize that the resource integrates with MS technologies like Power BI, OneNote, and Azure Media Services. It has only been about year since the original self-service tool was released for broadcast TV and media companies. The article states,
“The launch comes a year after Bing Pulse hit version 2.0 with the introduction of a cloud-based self-service option. Microsoft is today showing a few improvements to the tool, including a greatly enhanced Snapshot application programming interface (API) that allows developers to pull data from Microsoft Pulse into Microsoft’s own Power BI tool or other business intelligence software. Previously it was only possible to use the API with broadcast-specific technologies.”
The news isn’t good for Bing, with Pulse gaining popularity as a crowdsourcing resource among such organizations as CNN, CNBC, the Aspen Institute, and the Clinton Global Initiative. It is meant to be versatile and targeted for broadcast, events, market research, and classroom use. Dropping Bing from the name may indicate that Pulse is moving forward, and leaving Bing in the dust.
Chelsea Kerwin, December 22, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Score One for Yandex
December 21, 2015
Russian search powerhouse Yandex has successfully sued Google, we learn from re/code’s article, “Meet the Russian Company that Got Its Antitrust Watchdog to Bite Google.” Reporter Mark Bergen interviewed Yandex’s Roman Krupenin, who has led this legal campaign. In his intro, Bergen relates:
“In October, Russia’s antitrust authority ruled that Google’s practice of bundling its services on Android handsets violated national law. The case’s lead complainant was Yandex, an 18-year old Web search and advertising company. It’s not a global name, but is big in Russia. Last quarter, Yandex raked in $233.1 million in revenue. (For context, Google averaged about $179 million in sales a day over the same period.) Most Russians use Yandex for Internet searches — an estimated 57 percent in the last quarter, though that share has slipped in recent years. The culprit? According to Yandex, it’s the favored position of Google’s apps, including its search one and its browser, on Android smartphones, which outnumber iPhones in Russia considerably. To fight it off, Yandex has pushed to cut handset agreements of its own: It finalized one with Lenovo last year, and paired with Microsoft last month to make Yandex’s homepage and search results the Russian default for Windows 10.”
Furthermore, we’re reminded, Yandex is also taking part in the EU’s latest antitrust investigation. Naturally, Google is appealing the decision. See the article for text of the interview, where Krupenin discusses the focus on Android over Search, the unique factors that made for victory over the notoriously slippery company, and the call for an end to Google’s service-bundling practices.
Cynthia Murrell, December 21, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
The Modern Law Firm and Data
December 16, 2015
We thought it was a problem if law enforcement officials did not know how the Internet and Dark Web worked as well as the capabilities of eDiscovery tools, but a law firm that does not know how to work with data-mining tools much less the importance of technology is losing credibility, profit, and evidence for cases. According to Information Week in “Data, Lawyers, And IT: How They’re Connected” the modern law firm needs to be aware of how eDiscovery tools, predictive coding, and data science work and see how they can benefit their cases.
It can be daunting trying to understand how new technology works, especially in a law firm. The article explains how the above tools and more work in four key segments: what role data plays before trial, how it is changing the courtroom, how new tools pave the way for unprecedented approaches to law practice, how data is improving how law firms operate.
Data in pretrial amounts to one word: evidence. People live their lives via their computers and create a digital trail without them realizing it. With a few eDiscovery tools lawyers can assemble all necessary information within hours. Data tools in the courtroom make practicing law seem like a scenario out of a fantasy or science fiction novel. Lawyers are able to immediately pull up information to use as evidence for cross-examination or to validate facts. New eDiscovery tools are also good to use, because it allows lawyers to prepare their arguments based on the judge and jury pool. More data is available on individual cases rather than just big name ones.
“The legal industry has historically been a technology laggard, but it is evolving rapidly to meet the requirements of a data-intensive world.
‘Years ago, document review was done by hand. Metadata didn’t exist. You didn’t know when a document was created, who authored it, or who changed it. eDiscovery and computers have made dealing with massive amounts of data easier,’ said Robb Helt, director of trial technology at Suann Ingle Associates.”
Legal eDiscovery is one of the main branches of big data that has skyrocketed in the past decade. While the examples discussed here are employed by respected law firms, keep in mind that eDiscovery technology is still new. Ambulance chasers and other law firms probably do not have a full IT squad on staff, so when learning about lawyers ask about their eDiscovery capabilities.
Whitney Grace, December 16, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Google Timeline Knows Where You Have Been
December 16, 2015
We understand that to get the most out of the Internet, we sacrifice a bit of privacy; but do we all understand how far-reaching that sacrifice can be? The Intercept reveals “How Law Enforcement Can Use Google Timeline to Track Your Every Move.” For those who were not aware, Google helpfully stores all the places you (or your devices) have traveled, down to longitude and latitude, in Timeline. Now, with an expansion launched in July 2015, that information goes back years, instead of just six months. Android users must actively turn this feature off to avoid being tracked.
The article cites a report titled “Google Timelines: Location Investigations Involving Android Devices.” Written by a law-enforcement trainer, the report is a tool for investigators. To be fair, the document does give a brief nod to privacy concerns; at the same time, it calls it “unfortunate” that Google allows users to easily delete entries in their Timelines. Reporter Jana Winter writes:
“The 15-page document includes what information its author, an expert in mobile phone investigations, found being stored in his own Timeline: historic location data — extremely specific data — dating back to 2009, the first year he owned a phone with an Android operating system. Those six years of data, he writes, show the kind of information that law enforcement investigators can now obtain from Google….
“The ability of law enforcement to obtain data stored with privacy companies is similar — whether it’s in Dropbox or iCloud. What’s different about Google Timeline, however, is that it potentially allows law enforcement to access a treasure trove of data about someone’s individual movement over the course of years.”
For its part, Google admits they “respond to valid legal requests,” but insists the bar is high; a simple subpoena has never been enough, they insist. That is some comfort, I suppose.
Cynthia Murrell, December 16, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Big Data Gets Emotional
December 15, 2015
Christmas is the biggest shopping time of the year and retailers spending months studying consumer data. They want to understand consumer buying habits, popular trends in clothing, toys, and other products, physical versus online retail, and especially what competition will be doing sale wise to entice more customers to buy more. Smart Data Collective recently wrote about the science of shopping in “Using Big Data To Track And Measure Emotion.”
Customer experience professionals study three things related to customer spending habits: ease, effectiveness, and emotion. Emotion is the biggest player and is the biggest factor to spur customer loyalty. If data specialists could figure out the perfect way to measure emotion, shopping and science would change as we know it.
“While it is impossible to ask customers how do they feel at every stage of their journey, there is a largely untapped source of data that can provide a hefty chunk of that information. Every day, enterprise servers store thousands of minutes of phone calls, during which customers are voicing their opinions, wishes and complaints about the brand, product or service, and sharing their feelings in their purest form.”
The article describes some methods emotional data is fathered: phone recordings, surveys, and with vocal layer speech layers being the biggest. Analytic platforms that measure vocal speech layers that measure relationships between words and phrases to understand the sentiment. The emotions are ranged on a five-point scale, ranging from positive to negative to discover patterns that trigger reactions.
Customer experience input is a data analyst’s dream as well as nightmare based on all of the data constantly coming.
Whitney Grace, December 15, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Easy as 1,2,3 Common Mistakes Made with Data Lakes
December 15, 2015
The article titled Avoiding Three Common Pitfalls of Data Lakes on DataInformed explores several pitfalls that could negate the advantages of data lakes. The article begins with the perks, such as easier data access and of course, the cost-effectiveness of keeping data in a single hub. The first is sustainability (or the lack thereof), since the article emphasizes that data lakes actually require much more planning and management of data than conventional databases. The second pitfall raised is resource allocation,
“Another common pitfall of implementing data lakes arises when organizations need data scientists, who are notoriously scarce, to generate value from these hubs. Because data lakes store data in their native format, it is common for data scientists to spend as much as 80 percent of their time on basic data preparation. Consequently, many of the enterprise’s most valued resources are dedicated to mundane, time-consuming processes that considerably lengthen time to action on potentially time-sensitive big data.“
The third pitfall is technology contradictions or trying to use traditional approaches on a data lake that holds both big and unstructured data. Be not alarmed, however, the article goes into great detail about how to avoid these issues through data lake development with smart data technologies such as semantic tech.
Chelsea Kerwin, December 15, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Bill Legislation Is More Complicated than Sitting on Capitol Hill
December 14, 2015
When I was in civics class back in the day and learning about how a bill became an official law in the United States, my teacher played Schoolhouse Rock’s famous “I’m Just a Bill” song. While that annoying retro earworm still makes the education rounds, the lyrics need to be updated to record some of the new digital “paperwork” that goes into tracking a bill. Engaging Cities focuses on legislation data in “When Lobbyists Write Legislation, This Data Mining Tool Traces The Paper Trail.”
While the process to make a bill might seem simple according to Schoolhouse Rock, it is actually complicated and is even crazier as technology pushes more bills through the legislation process. In 2014, there were 70,000 state bills introduced across the country and no one has the time to read all of them. Technology can do a much better and faster job.
“ A prototype tool, presented in September at Bloomberg’s Data for Good Exchange 2015 conference, mines the Sunlight Foundation’s database of more than 500,000 bills and 200,000 resolutions for the 50 states from 2007 to 2015. It also compares them to 1,500 pieces of “model legislation” written by a few lobbying groups that made their work available, such as the conservative group ALEC (American Legislative Exchange Council) and the liberal group the State Innovation Exchange(formerly called ALICE).”
A data-mining tool for government legislation would increase government transparency. The software tracks earmarks in the bills to track how the Congressmen are benefiting their states with these projects. The software analyzed earmarks as far back as 1995 and it showed that there are more than anyone knew. The goal of the project is to scour the data that the US government makes available and help people interpret it, while also encouraging them to be active within the laws of the land.
The article uses the metaphor “need in a haystack” to describe all of the government data. Government transparency is good, but when they overload people with information it makes them overwhelmed.
Whitney Grace, December 14, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Censys Search Engine Used to Blow the Lid off Security Screw-Ups at Dell, Cisco
December 14, 2015
The article on Technology Review intriguingly titled A Search Engine for the Internet’s Dirty Secrets discusses the search engine Censys, which targets security flaws in devices hooked up to the Internet. The company has already caused some major waves while being used by SEC Consult to uncover lazy device encryption methods among high profile manufacturers such as Cisco and General Electric. The article also provides this revealing anecdote about Censys being used by Duo Security to investigate Dell,
“Dell had to apologize and rush out remediation tools after Duo showed that the company was putting rogue security certificates on its computers that could be used to remotely eavesdrop on a person’s encrypted Web traffic, for example to intercept passwords. Duo used Censys to find that a Kentucky water plant’s control system was affected, and the Department of Homeland Security stepped in.”
Censys uses software called ZMap to harvest data for search, which was developed by Zakir Durumeric, who is also directing the open-source project at the University of Michigan. The article also goes into detail on Censys’s main rival, Shodan. The companies use different software but Shodan is a commercial search engine while Censys is free to use. Additionally, the almighty Google has thrown its weight behind Censys by providing an infrastructure.
Chelsea Kerwin, December 14, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Search Data from Bing for 2015 Yields Few Surprises
December 11, 2015
The article on Search Engine Watch titled Bing Reveals the Top US and UK Searches of 2015 in the extremely intellectual categories of Celebs, News, Sport(s), Music, and Film. Starting with the last category, guess what franchise involving wookies and Carrie Fisher took the top place? For Celebrity searches, Taylor Swift took first in the UK, and Caitlyn Jenner in the US, followed closely by Miley Cyrus (and let’s all take a moment to savor the seething rage this data must have caused in Kim Kardashian’s heart.) What does this trivia matter? Ravleen Beeston, UK Sales Director of Bing, is quoted in the article with her two cents,
“Understanding the interests and motivations driving search behaviour online provides invaluable insight for marketers into the audiences they care about. This intelligence allows us to empower marketers to create meaningful connections that deliver more value for both consumers and brands alike. By reflecting back on the key searches over the past 12 months, we can begin to anticipate what will inspire and how to create the right experience in the right context during the year to come.”
Some of the more heartening statistics were related to searches for women’s sports news, which increased from last year. Serena Williams was searched more often than the top five male tennis players combined. And saving the best for last, in spite of the dehumanizing and often racially biased rhetoric we’ve all heard involving Syrian refugees, there was a high volume of searches in the US asking how to provide support and aid for refugees, especially children.
Chelsea Kerwin, December 11, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

