How Collaboration and Experimentation Are Key to Advancing Machine Learning Technology

September 12, 2016

The article on CIO titled Machine Learning “Still a Cottage Industry” conveys the sentiments of a man at the heart of the industry in Australia, Professor Bob Williamson. Williamson is the Commonwealth Scientific and Industrial Research Organisation’s (CSIRO’s) Data 61 group chief scientist. His work in machine learning and data analytics led him to the conclusion that for machine learning to truly move forward, scientists must find a way to collaborate. He is quoted in the article,

There’s these walled gardens: ‘I’ve gone and coded my models in a particular way, you’ve got your models coded in a different way, we can’t share’. This is a real challenge for the community. No one’s cracked this yet.” A number of start-ups have entered the “machine-learning-as-a-service” market, such as BigML, Wise.io and Precog, and the big names including IBM, Microsoft and Amazon haven’t been far behind. Though these MLaaSs herald some impressive results, Williamson warned businesses to be cautious.

Williamson speaks to the possibility of stagnation in machine learning due to the emphasis on data mining as opposed to experimenting. He hopes businesses will do more with their data than simply look for patterns. It is a refreshing take on the industry from an outsider/insider, a scientist more interested in the science of it all than the massive stacks of cash at stake.

Chelsea Kerwin, September 12, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
There is a Louisville, Kentucky Hidden Web/Dark Web meet up on September 27, 2016.
Information is at this link: https://www.meetup.com/Louisville-Hidden-Dark-Web-Meetup/events/233599645/

Computers Pose Barriers to Scientific Reproducibility

December 9, 2015

These days, it is hard to imagine performing scientific research without the help of computers. Phys.org details the problem that poses in its thorough article, “How Computers Broke Science—And What We Can Do to Fix It.” Many of us learned in school that reliable scientific conclusions rest on a foundation of reproducibility. That is, if an experiment’s results can be reproduced by other scientists following the same steps, the results can be trusted. However, now many of those steps are hidden within researchers’ hard drives, making the test of reproducibility difficult or impossible to apply. Writer, Ben Marwick points out:

“Stanford statisticians Jonathan Buckheit and David Donoho [PDF] described this issue as early as 1995, when the personal computer was still a fairly new idea.

‘An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures.’

“They make a radical claim. It means all those private files on our personal computers, and the private analysis tasks we do as we work toward preparing for publication should be made public along with the journal article.

This would be a huge change in the way scientists work. We’d need to prepare from the start for everything we do on the computer to eventually be made available for others to see. For many researchers, that’s an overwhelming thought. Victoria Stodden has found the biggest objection to sharing files is the time it takes to prepare them by writing documentation and cleaning them up. The second biggest concern is the risk of not receiving credit for the files if someone else uses them.”

So, do we give up on the test of reproducibility, or do we find a way to address those concerns? Well, this is the scientific community we’re talking about. There are already many researchers in several fields devising solutions. Poetically, those solutions tend to be software-based. For example, some are turning to executable scripts instead of the harder-to-record series of mouse clicks. There are also suggestions for standardized file formats and organizational structures. See the article for more details on these efforts.

A final caveat: Marwick notes that computers are not the only problem with reproducibility today. He also cites “poor experimental design, inappropriate statistical methods, a highly competitive research environment and the high value placed on novelty and publication in high-profile journals” as contributing factors. Now we know at least one issue is being addressed.

Cynthia Murrell, December 9, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Genentech Joins the Google Enterprise Crew

October 22, 2015

Enterprise search offers customizable solutions for organizations to locate and organize their data.  Most of the time organizations purchase a search solution is to become more efficient, comply with procedures for quality compliance, and or to further their business development.  The latter usually revolves around sales operation planning, program research, customer service, contracts, and tech sales collateral.

Life sciences companies are but one of the few that can benefit from enterprise search solutions.  Genentech recently deployed the Google Search Application to improve the three areas listed above.  Perficient explains the benefits of enterprise search for a life science company in the video, “Why Life Sciences Leader Genentech Adopted Google Enterprise Search.”

“‘…we explore why life sciences leader Genentech executed Google Search Appliance. “No company is or should ever be static. You have to evolve,’ said CEO Ian Clark.”

Perficient helps companies like Genentech by customizing a search solution by evaluating the company and identifying the areas where it can be improved the most.  They host workshops to evaluate where people in different areas must stop to search for information before returning to the task.  From the workshops, Perficient can create a business prototype to take their existing business process and improve upon it.  Perficient follows this procedure when it deploys enterprise search in new companies.

The video only explains a short version of the process Perficient deployed at Genentech to improve their business operations with search.  A full webinar was posted on their Web site: “Google Search For Life Sciences Companies.”

 

Whitney Grace, October 22, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

Flaws in the Peer Review System

June 2, 2015

The article titled Does Peer Review Do More Harm Than Good? on Maclean’s explores the issues facing today’s peer review system. Peer review is the process of an expert looking over a scientific paper before it is published in order to double check the findings. It is typically unpaid and as a result, can take a long time. In an effort to solve the wait time problem, some journals started offering “fast tracking” or a hefty fee that would guarantee a quick turnaround for peer review. The article quotes Professor Alex Holcombe on the subject,

“It ran contrary to many of the scientific values that I hold dear,” says Holcombe, “which is: What appears in scientific journals is determined not by money, but rather the merit of the actual science.” He says fast-tracking is a formula for taking shortcuts—such tight timelines may force reviewers and editors to make decisions without proper scrutiny—and worries it will jeopardize reviewers’ neutrality.”

The article goes on to compare peer review to democracy- the best of all evils. But now predatory journals are posing as legitimate academic journals in an attempt to get money out of desperate-to-publish scientists. Not only is this exploitative, it also leads to bad science getting published. For scientists, the discrepancies may be obvious, but the article points out that journalists and politicians might not know the difference, leading to the spread of “crackpot views” without a base in science.

Chelsea Kerwin, June 2, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
==

2

NSF Makes Plan for Public Access to Scientific Research

April 16, 2015

The press release on the National Science Foundation titled National Science Foundation Announces Plan for Comprehensive Public Access to Research Results speaks to the NSF’s interest in increasing communications on federally funded research. The NSF is an independent federal agency with a 7 billion dollar annual budget that is dispersed around the country in the form of grants to fund research and education in science and engineering. The article states,

“Scientific progress depends on the responsible communication of research findings,” said NSF Director France A. Córdova…Today’s announcement follows a request from the White House Office of Science and Technology Policy last year, directing science-funding agencies to develop plans to increase access to the results of federally funded research. NSF submitted its proposal to improve the management of digital data and received approval to implement the plan.”

The plan is called Today’s Data, Tomorrow’s Discoveries and promotes the importance of science without creating an undue burden on scientists. All manuscripts that appear in peer-reviewed scholarly journals and the like will be made available for free download within a year of the initial publication. In a time when scientists are less trusted and science itself is deeply misunderstood, public access may be more important than ever.

 

Chelsea Kerwin, April 16, 2014

Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com

  • Archives

  • Recent Posts

  • Meta