Attivio Targets Profitability by the End of 2016 Through $31M Financing Round
July 18, 2016
The article on VentureBeat titled Attivio Raises $31 Million to Help Companies Make Sense of Big Data discusses the promises of profitability that Attivio has made since its inception in 2007. According to Crunchbase, the search vendor has raised over $100 million from four investors. In March 2016, the company closed a financing round at $31M with the expectation of becoming profitable within 2016. The article explains,
“Our increased investment underscores our belief that Attivio has game-changing capabilities for enterprises that have yet to unlock the full value of Big Data,” said Oak Investment Partners’ managing partner, Edward F. Glassmeyer. Attivio also highlighted such recent business victories as landing lab equipment maker Thermo Fisher Scientific as a client and partnering with medical informatics shop PerkinElmer. Oak Investment Partners, General Electric Pension Trust, and Tenth Avenue Holdings participated in the investment, which pushed Attivio’s funding to at least $102 million.”
In the VentureBeat Profile about the deal, Stephen Baker, CEO of Attivio makes it clear that 2015 was a turning point for the company, or in his words, “a watershed year.” Attivio prides itself on both speeding up the data preparation process and empowering their customers to “achieve true Data Dexterity.” And hopefully they will also be profitable, soon.
Chelsea Kerwin, July 18, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
There is a Louisville, Kentucky Hidden Web/Dark
Web meet up on July 26, 2016.
Information is at this link: http://bit.ly/29tVKpx.
==
Mouse Movements Are the New Fingerprints
May 6, 2016
A martial artist once told me that an individual’s fighting style, if defined enough, was like a set of fingerprints. The same can be said for painting style, book preferences, and even Netflix selections, but what about something as anonymous as a computer mouse’s movement? Here is a new scary thought from PC & Tech Authority: “Researcher Can Indentify Tor Users By Their Mouse Movements.”
Juan Carlos Norte is a researcher in Barcelona, Spain and he claims to have developed a series of fingerprinting methods using JavaScript that measures times, mouse wheel movements, speed movement, CPU benchmarks, and getClientRects. Combining all of this data allowed Norte to identify Tor users based on how they used a computer mouse.
It seems far-fetched, especially when one considers how random this data is, but
“’Every user moves the mouse in a unique way,’ Norte told Vice’s Motherboard in an online chat. ‘If you can observe those movements in enough pages the user visits outside of Tor, you can create a unique fingerprint for that user,’ he said. Norte recommended users disable JavaScript to avoid being fingerprinted. Security researcher Lukasz Olejnik told Motherboard he doubted Norte’s findings and said a threat actor would need much more information, such as acceleration, angle of curvature, curvature distance, and other data, to uniquely fingerprint a user.”
This is the age of big data, but looking Norte’s claim from a logical standpoint one needs to consider that not all computer mice are made the same, some use lasers, others prefer trackballs, and what about a laptop’s track pad? As diverse as computer users are, there are similarities within the population and random mouse movement is not individualistic enough to ID a person. Fear not Tor users, move and click away in peace.
Whitney Grace, May 6, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Do Businesses Have a Collective Intelligence?
May 4, 2016
After working in corporate America for several years, I was amazed by the sheer audacity of its stupidity. I came to the conclusion that many people in corporate America lack intelligence and are slowly skirting insanity’s edge, so when I read Xconomy’s article, “Brainspace Aims To Harness ‘Collective Intelligence’ Of Businesses” made me giggle. I digress. Intelligence really does run rampant in businesses, especially in IT departments the keep modern companies up and running. The digital workspace has created a collective intelligence within a company’s enterprise system and the information is either accessed directly from the file hierarchy or through (the usually quicker) search box.
Keywords within the correct context pertaining to a company are extremely important to semantic search, which is why Brainspace invented a search software that creates a search ontology for individual companies. Brainspace says that all companies create collective intelligence within their systems and their software takes the digitized “brain” and produces a navigable map that organizes the key items into clusters.
“As the collection of digital data on how we work and live continues to grow, software companies like Brainspace are working on making the data more useful through analytics, artificial intelligence, and machine-learning techniques. For example, in 2014 Google acquired London-based Deep Mind Technologies, while Facebook runs a program called FAIR—Facebook AI Research. IBM Watson’s cognitive computing program has a significant presence in Austin, TX, where a small artificial intelligence cluster is growing.”
Building a search ontology by incorporating artificial intelligence into semantic search is a fantastic idea. Big data relies on deciphering information housed in the “collective intelligence,” but it can lack human reasoning to understanding context. An intelligent semantic search engine could do wonders that Google has not even built a startup for yet.
Whitney Grace, May 4, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Tips on How to Make the Most of Big Data (While Spending the Least)
April 13, 2016
The article titled The 10 Commandments of Business Intelligence in Big Data on Datanami offers wisdom written on USB sticks instead of stone tablets. In the Business Intelligence arena, apparently moral guidance can take a backseat to Big Data cost-savings. Suggestions include: Don’t move Big Data unless you must, try to leverage your existing security system, and engage in extensive data visualization sharing (think Github). The article explains the importance of avoiding certain price-gauging traps,
“When done right, [Big Data] can be extremely cost effective… That said…some BI applications charge users by the gigabyte… It’s totally common to have geometric, exponential, logarithmic growth in data and in adoption with big data. Our customers have seen deployments grow from tens of billions of entries to hundreds of billions in a matter of months. That’s another beauty of big data systems: Incremental scalability. Make sure you don’t get lowballed into a BI tool that penalizes your upside.”
The Fifth Commandment remind us all that analyzing the data in its natural, messy form is far better than flattening it into tables due to the risk of losing key relationships. The Ninth and Tenth Commandments step back and look at the big picture of data analytics in 2016. What was only a buzzword to most people just five years ago is now a key aspect of strategy for any number of organizations. This article reminds us that thanks to data visualization, Big Data isn’t just for data scientists anymore. Employees across departments can make use of data to make decisions, but only if they are empowered to do so.
Chelsea Kerwin, April 13, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Change Is Hard, Especially in the User Interface
March 22, 2016
One of the most annoying things in life is when you go to the grocery store and notice they have rearranged the entire place since your last visit. I always ask myself the question, “Why grocery store people did you do this to me?” Part of the reason is to improve the shopping experience and product exposure, while the other half is to screw with customers (I cannot confirm the latter). According to the Fuzzy Notepad with its Pokémon Evee mascot the post titled “We Have Always Been At War With UI” explains that programmers and users have always been at war with each other when it comes to the user interface.
Face it, Web sites (and other areas of life) need to change to maintain their relevancy. The biggest problem related to UI changes is the roll out of said changes. The post points out that users get confused and spend hours trying to understand the change. Sometimes the change is announced, other times it is only applied to a certain number of users.
The post lists several changes to UI and how they were handled, describing how they were handled and also the programming. One constant thread runs through the post is that users simply hate change, but the inevitable question of, “Why?” pops up.
“Ah, but why? I think too many developers trot this line out as an excuse to ignore all criticism of a change, which is very unhealthy. Complaints will always taper off over time, but that doesn’t mean people are happy, just that they’ve gone hoarse. Or, worse, they’ve quietly left, and your graphs won’t tell you why. People aren’t like computers and may not react instantly to change; they may stew for a while and drift away, or they may join a mass exodus when a suitable replacement comes along.”
Big data can measure anything and everything, but the data can be interpreted for or against the changes. Even worse is that the analysts may not know what exactly they need to measure. What can be done to avoid total confusion about changes is to have a plan, let users know in advance, and even create tutorial about how to use the changes. Worse comes to worse, it can be changed back and then we move on.
Whitney Grace, March 22, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Infonomics and the Big Data Market Publishers Need to Consider
March 22, 2016
The article on Beyond the Book titled Data Not Content Is Now Publishers’ Product floats a new buzzword in its discussion of the future of information: infonomics, or the study of creation and consumption of information. The article compares information to petroleum as the resource that will cause quite a stir in this century. Grace Hong, Vice-President of Strategic Markets & Development for Wolters Kluwer’s Tax & Accounting, weighs in,
“When it comes to big data – and especially when we think about organizations like traditional publishing organizations – data in and of itself is not valuable. It’s really about the insights and the problems that you’re able to solve,” Hong tells CCC’s Chris Kenneally. “From a product standpoint and from a customer standpoint, it’s about asking the right questions and then really deeply understanding how this information can provide value to the customer, not only just mining the data that currently exists.”
Hong points out that the data itself is useless unless it has been produced correctly. That means asking the right questions and using the best technology available to find meaning in the massive collections of information possible to collect. Hong suggests that it is time for publishers to seize on the market created by Big Data.
Chelsea Kerwin, March 22, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
The History of ZyLab
February 10, 2016
Big data was a popular buzzword a few years ago, making it seem that it was a brand new innovation. The eDiscovery process, however, has been around for several decades, but recent technology advancements have allowed it to take off and be implemented in more industrial fields. While many big data startups have sprung up, ZyLab-a leading innovator in the eDiscovery and information governance-started in its big data venture in 1983. ZyLab created a timeline detailing its history called, “ZyLab’s Timeline Of Technical Ingenuity.”
Even though ZyLab was founded in 1983 and introduced the ZyIndex, its big data products did not really take off until the 1990s when personal computers became an indispensable industry tool. In 1995, ZyLab made history by being used in the OJ Simpson and Uni-bomber investigations. Three years later it introduced text search in images, which is now a standard search feature for all search engines.
Things really began to take off for ZyLab in the 2000s as technology advanced to the point where it became easier for companies to create and store data as well as beginning the start of masses of unstructured data. Advanced text analytics were added in 2005 and ZyLab made history again by becoming the standard for United Nations War Crime Tribunals.
During 2008 and later years, ZyLab’s milestones were more technological, such as creating the Zylmage SharePoint connector and Google Web search engine integration, the introduction of the ZyLab Information Management Platform, first to offer integrated machine translation in eDiscovery, adding audio search, and incorporating true native visual search and categorization.
ZyLab continues to make historical as well as market innovations for eDiscovery and big data.
Whitney Grace, February 10, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
IBM Sells Technology Platform with a Throwback to Big Datas Mysteries
February 2, 2016
The infographic on the IBM Big Data & Analytics Hub titled Extracting Business Value From the 4 V’s of Big Data involves quantifying Volume (scale of data), Velocity (speed of data), Veracity (certainty of data), and Variety (diversity of data). In a time when big data may have been largely demystified, IBM makes an argument for its current relevance and import, not to mention its mystique, with reminders of the tremendous amounts of data being created and consumed on a daily basis. Ultimately the graphic is an ad for the IBM Analytics Technology Platform. The infographic also references a “fifth “V”,
“Big data = the ability to achieve greater Value through insights from superior analytics. Case Study: A US-based aircraft engine manufacturer now uses analytics to predict engine events that lead to costly airline disruptions, with 97% accuracy. If this prediction capability had ben available in the previous year, it would have saved $63 million.”
IBM struggles for revenue. But, obviously from this infographic, IBM knows how to create Value with a capital “V”, if not revenue. The IBM Analytics Technology Platform promises speedier insights and actionable information from trustworthy sources. The infographic reminds us that poor quality in data leads to sad executives, and that data is growing exponentially, with 90% of all data forged in only the last two years.
Chelsea Kerwin, February 2, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Measuring Classifiers by a Rule of Thumb
February 1, 2016
Computer programmers who specialize in machine learning, artificial intelligence, data mining, data visualization, and statistics are smart individuals, but they sometimes even get stumped. Using the same form of communication as reddit and old-fashioned forums, Cross Validated is a question an answer site run by Stack Exchange. People can post questions related to data and relation topics and then wait for a response. One user posted a question about “Machine Learning Classifiers”:
“I have been trying to find a good summary for the usage of popular classifiers, kind of like rules of thumb for when to use which classifier. For example, if there are lots of features, if there are millions of samples, if there are streaming samples coming in, etc., which classifier would be better suited in which scenarios?”
The response the user received was that the question was too broad. Classifiers perform best depending on the data and the process that generates it. It is kind of like asking the best way to organize books or your taxes, it depends on the content within the said items.
Another user replied that there was an easy way to explain the general process of understanding the best way to use classifiers. The user directed users to the Sci-Kit.org chart about “choosing the estimator”. Other users say that the chart is incomplete, because it does not include deep learning, decision trees, and logistic regression.
We say create some other diagrams and share those. Classifiers are complex, but they are a necessity to the artificial intelligence and big data craze.
Whitney Grace, February 1, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Data Discrimination Is Real
January 22, 2016
One of the best things about data and numbers is that they do not lie…usually. According to Slate’s article, “FTC Report Details How Big Data Can Discriminate Against The Poor,” big data does a huge disservice to people of lower socioeconomic status by reinforcing existing negative patterns. The Federal Trade Commission (FTC), academics, and activists have expressed for some time that big data analytics.
“At its worst, big data can reinforce—and perhaps even amplify—existing disparities, partly because predictive technologies tend to recycle existing patterns instead of creating new openings. They can be especially dangerous when they inform decisions about people’s access to healthcare, credit, housing, and more. For instance, some data suggests that those who live close to their workplaces are likely to maintain their employment for longer. If companies decided to take that into account when hiring, it could be accidentally discriminatory because of the radicalized makeup of some neighborhoods.”
The FTC stresses that big data analytics has positive benefits as well. It can yield information that can create more job opportunities, transform health care delivery, give credit through “non-traditional methods, and more.
The way big data can avoid reinforcing these problems and even improve upon them is to include biases from the beginning. Large data sets can make these problems invisible or even harder to recognize. Companies can use prejudiced data to justify the actions they take and even weaken the effectiveness of consumer choice.
Data is supposed to be an objective tool, but the sources behind the data can be questionable. It becomes important for third parties and the companies themselves to investigate the data sources, run multiple tests, and confirm that the data is truly objective. Otherwise we will be dealing with social problems and more reinforced by bad data.
Whitney Grace, January 22, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

