Stroz Friedberg Snaps Up Elysium Digital

August 20, 2015

Cybersecurity, investigation, and risk-management firm Stroz Friedberg has a made a new acquisition, we learn from their announcement, “Stroz Friedberg Acquires Technology Litigation Consulting Firm Elysium Digital” (PDF). Though details of the deal are not revealed, the write-up tells us why Elysium Digital is such a welcome addition to the company:

“Founded in 1997, Elysium Digital has worked with law firms, in-house counsel, and government agencies nationally. The firm has provided a broad range of services, including expert testimony, IP litigation consulting, eDiscovery, digital forensics investigations, and security and privacy investigations. Elysium played a role in the key technology/legal issues of its time and established itself as a premier firm providing advice and quality technical analysis in high-stakes legal matters. The firm specialized in deciphering complex technology and effectively communicating findings to clients, witnesses, judges, and juries.

“‘The people of Elysium Digital possess highly sought after technical skills that have allowed them to tackle some of the most complex IP matters in recent history. Bringing this expertise into Stroz Friedberg will allow us to more fully address the needs of our clients around the world, not just in IP litigation and digital forensics, but across our cyber practices as well,’ said Michael Patsalos-Fox, CEO of Stroz Friedberg.”

The workers of Elysium Digital will be moving into Stroz Friedberg’s Boston office, and its co-founders will continue to play an important role, we’re told. Stroz Friedberg expects the acquisition to bolster their capabilities in the areas of digital forensics, intellectual-property litigation consulting, eDiscovery, and data security.

Founded in 2000, Stroz Friedberg says their guiding principle is to “seek truth” for their clients. Headquartered in New York City, the company maintains offices throughout the U.S. as well as in London, Hong Kong, and Zurich.

Cynthia Murrell, August 20, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Chinese Opinion Monitoring Software by Knowlesys

August 18, 2015

Ever wonder what tools the Chinese government uses to keep track of those pesky opinions voiced by its citizens? If so, take a look at “IPOMS : Chinese Internet Public Opinion Monitoring System” at Revolution News. The brief write-up tells us about a software company, Knowlesys, reportedly supplying such software to China (among other clients). Reporter and Revolution News founder Jennifer Baker tells us:

“Knowlesys’ system can collect web pages with some certain key words from Internet news, topics on forum and BBS, and then cluster these web pages according to different ‘event’ groups. Furthermore, this system provides the function of automatically tracking the progress of one event. With this system, supervisors can know what is exactly happening and what has happened from different views, which can improve their work efficiency a lot. Most of time, the supervisor is the government, the evil government. sometimes a company uses the system to collect information for its products. IPOMS is composed of web crawler, html parser and topic detection and tracking tool.”

The piece includes a diagram that lays out the software’s process, from extraction to analysis to presentation (though the specifics are pretty standard to anyone familiar with data analysis in general). Data monitoring and mining firm Knowlesys was founded in 2003. The company has offices in Hong Kong and a development center in Schenzhen, China.

Cynthia Murrell, August 18, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

List of Data Visualization Players

August 17, 2015

I read “CI Radar Delivers New Competitive Intelligence Coverage of the Data Visualization Market.” In the story which explains a tracking and monitoring tool from a competitive intelligence firm was a little chunk of information. The story contains a list of the players which the competitive intelligence firm considers important in the Hollywoodization of analytic system outputs. Who loves a great chart? Certainly generals, mid tier consultants, and MBA students.

Here’s the list of data visualization players:

  • Adobe (ah, the magic of the creative cloud)
  • APCON
  • Advizor Solutions
  • Afs Technologies
  • BeyondCore
  • Birst
  • Centrifuge Systems
  • Chartio
  • ClearStory Data
  • DataHero
  • Datameer
  • Datawatch
  • Dell (visualization and not laptops?)
  • Domo
  • Dundas
  • GoodData
  • Halo
  • iDashboards (maybe free for academics?)
  • Inetsoft Technology
  • Infor (I think of this outfit as a CRM vendor)
  • Informatica (now owned by Permira)
  • Information Builders
  • International Business Machines (IBM) (which unit of IBM?)
  • Jinfonet Software
  • Logi Analytics
  • Looker
  • Manthan
  • Microsoft (my goodness)
  • Microstrategy
  • OpenText (is this the Actuate or the Talend acquisition?)
  • Panorama Software
  • Pentaho (don’t forget this is Hitachi)
  • Phocas Software
  • ProfitBase
  • Prognoz
  • Pyramid Analytics
  • Qlik
  • RapidMiner
  • Roambi
  • Salesforce (a surprise to me)
  • SAP (interesting?)
  • SAS (also interesting?)
  • Sisense
  • Splunk (a bit of a surprise)
  • Synerscope
  • Tableau Software
  • Teradata (Is this Rainstor, ThinkBig or another chunk of acquired technology?)
  • ThoughtSpot
  • TIBCO (is this Spotfire?)
  • Viur.

I would point out that some of the key players in the law enforcement and intelligence community are not included. Why would a consulting firm want to highlight the companies which are pioneering next generation, dynamic, interactive, and real time visualization tools. Although incomplete from my vantage point, how long will it be before Forrester, Gartner, and other mid tier firms roll out a magic wave rhomboid explaining what these companies are doing to be “players”?

Stephen E Arnold, August 17, 2015

Open Source Tools for IBM i2

August 17, 2015

IBM has made available two open source repositories for the IBM i2 intelligence platform: the Data-Acquisition-Accelerators and Intelligence-Analysis-Platform can both be found on the IBM-i2 page at GitHub. The IBM i2 suite of products includes many parts that work together to give law enforcement, intelligence organizations, and the military powerful data analysis capabilities. For an glimpse of what these products can do, we recommend checking out the videos at the IBM i2 Analyst’s Notebook page. (You may have to refresh the page before the videos will play.)

The Analyst’s Notebook is but one piece, of course. For the suite’s full description, I turned to the product page, IBM i2 Intelligence Analysis Platform V3.0.11. The Highlights summary describes:

“The IBM i2 Intelligence Analysis product portfolio comprises a suite of products specifically designed to bring clarity through the analysis of the mass of information available to complex investigations and scenarios to help enable analysts, investigators, and the wider operational team to identify, investigate, and uncover connections, patterns, and relationships hidden within high-volume, multi-source data to create and disseminate intelligence products in real time. The offerings target law enforcement, defense, government agencies, and private sector businesses to help them maximize the value of the mass of information that they collect to discover and disseminate actionable intelligence to help them in their pursuit of predicting, disrupting, and preventing criminal, terrorist, and fraudulent activities.”

The description goes on to summarize each piece, from the Intelligence Analysis Platform to the Information Exchange Visualizer. I recommend readers check out this page, and, especially, the videos mentioned above for better understanding of this software’s capabilities. It is an eye-opening experience.

Cynthia Murrell, August 18, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Data Lake Alert: Tepid Water, High Concentration of Agricultural Runoff

August 13, 2015

Call me skeptical. Okay, call me a person who is fed up with silly jargon. You know what a database is, right? You know what a data warehouse is, well, sort of, maybe? Do you know what a data lake is? I don’t.

A lake, according to the search engine du jour Giburu:

An area prototypically filled with water, also of variable size.

A data lake, therefore, is an area filled with zeros and ones, also of variable size. How does a data lake differ from a database or a data warehouse?

According to the write up “Sink or Swim – Why your Organization Needs a Data Lake”:

A Data Lake is a storage repository that holds a vast amount of raw data in its native format for processing later by the business.

The magic in this unnecessary jargon is, in my opinion, a quest, perhaps Quixotic?) for sales leads. The write up points out that a data lake is available. A data lake is accessible. A data lake is—wait for it—Hadoop.

What happens if the water is neither clear nor pristine? One cannot unleash the hounds of the EPA to resolve the problem of data which may not very good until validated, normalized, and subjected to the ho hum tests which some folks want to have me believe may be irrelevant steps in the land of a marketer’s data lakes.

My admonition, “Don’t drink the water until you know it won’t make life uncomfortable—or worse. Think fatal.”

Stephen E Arnold, August 13, 2015

Exclusive Interview: Danny Rogers, Terbium Labs

August 11, 2015

Editor’s note: The full text of the exclusive interview with Dr. Daniel J. Rogers, co-founder of Terbium Labs, is available on the Xenky Cyberwizards Speak Web service at www.xenky.com/terbium-labs. The interview was conducted on August 4, 2015.

Significant innovations in information access, despite the hyperbole of marketing and sales professionals, are relatively infrequent. In an exclusive interview, Danny Rogers, one of the founders of Terbium Labs, has developed a way to flip on the lights to make it easy to locate information hidden in the Dark Web.

Web search has been a one-trick pony since the days of Excite, HotBot, and Lycos. For most people, a mobile device takes cues from the user’s location and click streams and displays answers. Access to digital information requires more than parlor tricks and pay-to-play advertising. A handful of companies are moving beyond commoditized search, and they are opening important new markets such as secret and high value data theft. Terbium Labs can “illuminate the Dark Web.”

In an exclusive interview, Dr. Danny Rogers, one of the founders of Terbium Labs with Michael Moore, explained the company’s ability to change how data breaches are located. He said:

Typically, breaches are discovered by third parties such as journalists or law enforcement. In fact, according to Verizon’s 2014 Data Breach Investigations Report, that was the case in 85% of data breaches. Furthermore, discovery, because it is by accident, often takes months, or may not happen at all when limited personnel resources are already heavily taxed. Estimates put the average breach discovery time between 200 and 230 days, an exceedingly long time for an organization’s data to be out of their control. We hope to change that. By using Matchlight, we bring the breach discovery time down to between 30 seconds and 15 minutes from the time stolen data is posted to the web, alerting our clients immediately and automatically. By dramatically reducing the breach discovery time and bringing that discovery into the organization, we’re able to reduce damages and open up more effective remediation options.

Terbium’s approach, it turns out, can be applied to traditional research into content domains to which most systems are effectively blind. At this time, a very small number of companies are able to index content that is not available to traditional content processing systems. Terbium acquires content from Web sites which require specialized software to access. Terbium’s system then processes the content, converting it into the equivalent of an old-fashioned fingerprint. Real-time pattern matching makes it possible for the company’s system to locate a client’s content, either in textual form, software binaries, or other digital representations.

One of the most significant information access innovations uses systems and methods developed by physicists to deal with the flood of data resulting from research into the behaviors of difficult-to-differentiate sub atomic particles.

One part of the process is for Terbium to acquire (crawl) content and convert it into encrypted 14 byte strings of zeros and ones. A client such as a bank then uses the Terbium content encryption and conversion process to produce representations of the confidential data, computer code, or other data. Terbium’s system, in effect, looks for matching digital fingerprints. The task of locating confidential or proprietary data via traditional means is expensive and often a hit and miss affair.

Terbium Labs changes the rules of the game and in the process has created a way to provide its licensees with anti-fraud and anti-theft measures which are unique. In addition, Terbium’s digital fingerprints make it possible to find, analyze, and make sense of digital information not previously available. The system has applications for the Clear Web, which millions of people access every minute, to the hidden content residing on the so called Dark Web.

image

Terbium Labs, a start up located in Baltimore, Maryland, has developed technology that makes use of advanced mathematics—what I call numerical recipes—to perform analyses for the purpose of finding connections. The firm’s approach is one that deals with strings of zeros and ones, not the actual words and numbers in a stream of information. By matching these numerical tokens with content such as a data file of classified documents or a record of bank account numbers, Terbium does what strikes many, including myself, as a remarkable achievement.

Terbium’s technology can identify highly probable instances of improper use of classified or confidential information. Terbium can pinpoint where the compromised data reside on either the Clear Web, another network, or on the Dark Web. Terbium then alerts the organization about the compromised data and work with the victim of Internet fraud to resolve the matter in a satisfactory manner.

Terbium’s breakthrough has attracted considerable attention in the cyber security sector, and applications of the firm’s approach are beginning to surface for disciplines from competitive intelligence to health care.

Rogers explained:

We spent a significant amount of time working on both the private data fingerprinting protocol and the infrastructure required to privately index the dark web. We pull in billions of hashes daily, and the systems and technology required to do that in a stable and efficient way are extremely difficult to build. Right now we have over a quarter trillion data fingerprints in our index, and that number is growing by the billions every day.

The idea for the company emerged from a conversation with a colleague who wanted to find out immediately if a high profile client list was ever leaded to the Internet. But, said Rogers, “This individual could not reveal to Terbium the list itself.”

How can an organization locate secret information if that information cannot be provided to a system able to search for the confidential information?

The solution Terbium’s founders developed relies on novel use of encryption techniques, tokenization, Clear and Dark Web content acquisition and processing, and real time pattern matching methods. The interlocking innovations have been patented (US8,997,256), and Terbium is one of the few, perhaps the only company in the world, able to crack open Dark Web content within regulatory and national security constraints.

Rogers said:

I think I have to say that the adversaries are winning right now. Despite billions being spent on information security, breaches are happening every single day. Currently, the best the industry can do is be reactive. The adversaries have the perpetual advantage of surprise and are constantly coming up with new ways to gain access to sensitive data. Additionally, the legal system has a long way to go to catch up with technology. It really is a free-for-all out there, which limits the ability of governments to respond. So right now, the attackers seem to be winning, though we see Terbium and Matchlight as part of the response that turns that tide.

Terbium’s product is Matchlight. According to Rogers:

Matchlight is the world’s first truly private, truly automated data intelligence system. It uses our data fingerprinting technology to build and maintain a private index of the dark web and other sites where stolen information is most often leaked or traded. While the space on the internet that traffics in that sort of activity isn’t intractably large, it’s certainly larger than any human analyst can keep up with. We use large-scale automation and big data technologies to provide early indicators of breach in order to make those analysts’ jobs more efficient. We also employ a unique data fingerprinting technology that allows us to monitor our clients’ information without ever having to see or store their originating data, meaning we don’t increase their attack surface and they don’t have to trust us with their information.

For more information about Terbium, navigate to the company’s Web site. The full text of the interview appears on Stephen E Arnold’s Xenky cyberOSINT Web site at http://bit.ly/1TaiSVN.

Stephen E Arnold, August 11, 2015

Does Math Make Distinctions?

August 8, 2015

I read “What Does a Data Scientist Do That a Traditional Data Analytics Team Can’t?” Good marketing question. Math, until the whole hearted embrace of fuzziness, was reasonably objective. Survivors of introductory statistics learned about the subjectivity involved with Bayesian antics and the wonder of fiddling with thresholds. You remember. Above this value, do this; below this value, do that. Eventually one can string together numerical recipes which make threshold decisions based on inputs. In the hands of responsible, capable, and informed professionals, the systems work reasonably. Sure, smart software can drift and then run off the rails. There are procedures to keep layered systems on track. They work reasonably well for horseshoes. You know. Close enough for horseshoes. Monte Carlo’s bright lights beckon.

The write up takes a different approach. The idea is that someone who does descriptive procedures is an apple. The folks who do predictive procedures are oranges. One lets the data do the talking. Think of a spreadsheet jockey analyzing historical pre tax profits at a public company. Now contrast that with a person who looks at data and makes judgments about what the data “mean.”

image

Close enough for horse shoes.

Which is more fun? Go with the fortune tellers, of course.

The write up also raises the apparent black-white issue of structured versus unstructured data. The writer says:

Unstructured or “dirty” data is in many ways the opposite of its more organized counterpart, and is what data scientists rely on for their analysis. Data of this type is made up of qualitative rather than quantitative information — descriptive words instead of measurable numbers —  and comes from more obscure sources such as emails, sentiment expressed in blogs or engagement across social media. Processing this information also involves the use of probability and statistical algorithms to translate what is learned into advanced applications for machine learning or even artificial intelligence, and these skills are often well beyond those of the average data analyst.

There you go. One does not want to be average. I am tempted to ask mode, median, or mean?

Net net: If the mathematical foundation is wrong, if the selected procedure is inappropriate, if the data are not validated—errors are likely and they will propagate.

One does not need to be too skilled in mathematics to understand that mistakes are not covered or ameliorated with buzz words.

Stephen E Arnold, August 8, 2016

IBM Spends to Make Watson Healthier, Hopefully Quickly

August 7, 2015

I noted the article “IBM Adds Medical Images to Watson, Buying Merge Healthcare for $1 Billion.” The company is in the content management business. Medical images are pretty much of a hassle whether in the good old fashioned film form or in digital versions. The few opportunities I have had to looked at murky gray or odd duck enhanced color images, I marveled at how a professional would make sense of the data displayed. Did this explanation trigger thoughts of IBM FileNet?

The image processing technology available from specialist firms permitting satellite or surveillance image analysis are a piece of cake compared to the medical imaging examples I reviewed. From my point of view the nifty stuff available to an analyst looking at the movement of men and equipment were easier to figure out.

Merge delivers a range of image and content management services to health care outfits. The systems can work with on premises systems and park data in the cloud in a way that keeps the compliance folks happy.

According to the write up:

When IBM set up its Watson health business in April, it began with a couple of smaller medical data acquisitions and industry partnerships with Apple, Johnson & Johnson and Medtronic. Last week, IBM announced a partnership with CVS Health, the large pharmacy chain, to develop data-driven services to help people with chronic ailments like diabetes and heart disease better manage their health.

Now Watson is plopping down a $1 billion to get a more substantive, image centric, and—dare I say it—more traditional business.

The idea I learned:

“We’re bringing Watson and analytics to the largest data set in health care — images,” John Kelly, IBM’s senior vice president of research who oversees the Watson business, said in an interview.

The idea, as I understand the management speak, is that Watson will be able to perform image analysis, thus allowing IBM to convert Watson into a significant revenue generator. IBM does need all the help it can get. The company has just achieved a milestone of sorts; IBM’s revenue has declined for 13 consecutive quarters.

My view is that the integration of the Merge systems with the evolving Watson “solution” will be expensive, slow, and frustrating to those given the job of making image analysis better, faster, and cheaper.

My hunch is that the time and cost required to integrate Watson and Merge will be an issue in six or nine months. Once the “integration” is complete, the costs of adding new features and functions to keep pace with regulations and advances in diagnosis and treatment will create a 21st century version of FileNet. (FileNet, as you, gentle reader, know as the 2006 acquisition. At the time, nine years ago, IBM said that the FileNet technology would

“advance its Information on Demand initiative, IBM’s strategy for pursuing the growing market opportunity around helping clients capture insights from their information so it can be used as a strategic asset. FileNet is a leading provider of business process and content management solutions that help companies simplify critical and everyday decision making processes and give organizations a competitive advantage.”

FileNet was an imaging technology for financial institutions and a search system which allowed a person with access to the system to locate a check or other scanned document.)

And FileNet today? Well, like many IBM acquisitions it is still chugging along, just part of the services oriented architecture at Big Blue. Why, one might ask, was the FileNet technology not applicable to health care? I will leave you to ponder the answer.

I want to be optimistic about the upside of this Merge acquisition for the companies involved and for the health care professionals who will work with the Watsonized system. I assume that IBM will put on a happy face about Watson’s image analysis capabilities. I, however, want to see the system in action and have some hard data, not M&A fluff, about the functionality and accuracy of the merged systems.

At this moment, I think Watson and other senior IBM managers are looking for a way to make a lemon grove from Watson. Nothing makes bankers and deal makers happy than a big, out of the blue acquisition.

Now the job is to find a way to sell enough lemons to pay for the maintenance and improvement of the lemon grove. I assume Watson has an answer to on going costs for maintenance and enhancements, bug finding and stomping, and the PR such activities trigger. Yep, costs and revenue. Boring but important to IBM’s stakeholders.

Stephen E Arnold, August 7, 2015

IT Architecture Needs to Be More Seamless

August 7, 2015

IT architecture might appear to be the same across the board, but depending on the industry the standards change.  Rupert Brown wrote “From BCBS To TOGAF: The Need For A Semantically Rigorous Business Architecture” for Bob’s Guide and he discusses how TOGAF is the defacto standard for global enterprise architecture.  He explains that while TOGAF does have its strengths, it supports many weaknesses are its reliance on diagrams and using PowerPoint to make them.

Brown spends a large portion of the article stressing that information content and model are more important and a diagramed should only be rendered later.  He goes on that as industries have advanced the tools have become more complex and it is very important for there to be a more universal approach IT architecture.

What is Brown’s supposed solution? Semantics!

“The mechanism used to join the dots is Semantics: all the documents that are the key artifacts that capture how a business operates and evolves are nowadays stored by default in Microsoft or Open Office equivalents as XML and can have semantic linkages embedded within them. The result is that no business document can be considered an island any more – everything must have a reason to exist.”

The reason that TOGAF has not been standardized using semantics is the lack of something to connect various architecture models together.  A standardized XBRL language for financial and regulatory reporting would help get the process started, but the biggest problem will be people who make a decent living using PowerPoint (so he claims).

Brown calls for a global reporting standard for all industries, but that is a pie in the sky hope unless the government imposes regulations or all industries have a meeting of the minds.  Why?  The different industries do not always mesh, think engineering firms vs. a publishing house, and each has their own list of needs and concerns.  Why not focus on getting industry standards for one industry rather than across the board?

Whitney Grace, August 7, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Quality and Text Processing: An Old Couple Still at the Alter

August 6, 2015

I read “Why Quality Management Needs Text Analytics.” I learned:

To analyze customer quality complaints to find the most common complaints and steer the production or service process accordingly can be a very tedious job. It takes time and resources.

This idea is similar to the one expressed by Ronen Feldman in a presentation he gave in the early 2000s. My notes of the event record that he reviewed the application of ClearForest technology to reports from automobile service professionals which presented customer comments and data about repairs. ClearForest’s system was able to pinpoint that a particular mechanical issue was emerging. The client responded to the signals from the ClearForest system and took remediating action. The point was that sometime in the early 2000s, ClearForest had built and deployed a text analytics system with a quality-centric capability.

I mention this point because many companies are recycling ideas and concepts which are in some cases long beards. ClearForest was acquired by the estimable Thomson Reuters. Some of the technology is available as open source at Calais.

In search and content processing, the case examples, the lingo, and even the technology has entered what I call its “recycling” phase.

I learned about several new search systems this week. I looked at each. One was a portal, another a metasearch system, and a third a privacy centric system with a somewhat modest index. Each was presented as new, revolutionary, and innovative. The reality is that today’s information highways are manufactured from recycled plastic bottles.

Stephen E Arnold, August 6, 2015

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta