The Machine Learning Textbook

July 19, 2016

Deep learning is another bit of technical jargon floating around and it is tied to artificial intelligence.  We know that artificial intelligence is the process of replicating human thought patterns and actions through computer software.  Deep learning is…well, what specifically?  To get a primer on what deep learning is as well as it’s many applications check out “Deep Learning: An MIT Press Book” by Ian Goodfellow, Yoshua Bengio, and Aaron Courville.

Here is how the Deeping Learning book is described:

“The Deep Learning textbook is a resource intended to help students and practitioners enter the field of machine learning in general and deep learning in particular. The online version of the book is now complete and will remain available online for free. The print version will be available for sale soon.”

This is a fantastic resource to take advantage of.  MIT is one of the leading technical schools in the nation, if not the world, and the information that is sponsored by them is more than guaranteed to round out your deep learning foundation.  Also it is free, which cannot be beaten.  Here is how the book explains the goal of machine learning:

“This book is about a solution to these more intuitive problems.  This solution is to allow computers to learn from experience and understand the world in terms of a hierarchy of concepts, with each concept de?ned in terms of its relation to simpler concepts. By gathering knowledge from experience, this approach avoids the need for human operators to formally specify all of the knowledge that the computer needs.”

If you have time take a detour and read the book, or if you want to save time there is always Wikipedia.

 

Whitney Grace, July 19, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

There is a Louisville, Kentucky Hidden Web/Dark
Web meet up on July 26, 2016.
Information is at this link: http://bit.ly/29tVKpx.

 

The Google Knowledge Vault Claimed to Be the Future

May 31, 2016

Back in 2014, I heard rumors that the Google Knowledge Vault was supposed to be the next wave of search.  How many times do you hear a company or a product making the claim it is the next big thing?  After I rolled my eyes, I decided to research what became of the Knowledge Vault and I found an old article from Search Engine Land: “Google ‘Knowledge Vault’ To Power Future Of Search.” Google Knowledge Graph was used to supply more information to search results, what we now recognize as the summarized information at the top of Google search results.  The Knowledge Vault was supposedly the successor and would rely less on third party information providers.

“Sensationally characterized as ‘the largest store of knowledge in human history,’ Knowledge Vault is being assembled from content across the Internet without human editorial involvement. ‘Knowledge Vault autonomously gathers and merges information from across the web into a single base of facts about the world, and the people and objects in it,’ says New Scientist. Google has reportedly assembled 1.6 billion “facts” and scored them according to confidence in their accuracy. Roughly 16 percent of the information in the database qualifies as ‘confident facts.’”

Knowledge Vault was also supposed to give Google a one up in the mobile search market and even be the basis for artificial intelligence applications.  It was a lot of hoopla, but I did a bit more research and learned from Wikipedia that Knowledge Vault was nothing more than a research paper.

Since 2014, Google, Apple, Facebook, and other tech companies have concentrated their efforts and resources on developing artificial intelligence and integrating it within their products.  While Knowledge Vault was a red herring, the predictions about artificial intelligence were correct.

 

Whitney Grace, May 31, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Wikipedia Relies on Crowdsourcing Once More

May 9, 2016

As a non-profit organization, the Wikimedia Foundation relies on charitable donations to fund many of its projects, including Wikipedia.  It is why every few months, when you are browsing the Wiki pages you will see a donation bar pop to send them money.  Wikimedia uses the funds to keep the online encyclopedia running, but also to start new projects.   Engadget reports that Wikipedia is interested in taking natural language processing and applying it to the Wikipedia search engine, “Wikipedia Is Developing A Crowdsourced Speech Engine.”

Working with Sweden’s KTH Royal Institute of Technology, Wikimedia researchers are building a speech engine to enable people with reading or visual impairments to access the plethora of information housed in the encyclopedia.  In order to fund the speech engine, the researchers turned to crowdsourcing.  It is estimated that twenty-five percent, 125 million monthly users, will benefit from the speech engine.

” ‘Initially, our focus will be on the Swedish language, where we will make use of our own language resources,’ KTH speech technology professor Joakim Gustafson, said in a statement. ‘Then we will do a basic English voice, which we expect to be quite good, given the large amount of open source linguistic resources. And finally, we will do a rudimentary Arabic voice that will be more a proof of concept.’”

Wikimedia wants to have a speech engine in Arabic, English, and Swedish by the end of 2016, then they will focus on the other 280 languages they support with their projects.  Usually, you have to pay to have an accurate and decent natural language processing machine, but if Wikimedia develops a decent speech engine it might not be much longer before speech commands are more commonplace.

 

Whitney Grace, May 9, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Wikipedia Grants Users Better Search

March 24, 2016

Wikipedia is the defacto encyclopedia to confirm fact from fiction, although academic circles shun its use (however, scholars do use it but never cite it).  Wikipedia does not usually make the news, unless it is tied to its fundraising campaign or Wikileaks releases sensitive information meant to remain confidential.  The Register tells us that Wikipedia makes the news for another reason, “Reluctant Wikipedia Lifts Lid On $2.5m Internet Search Engine Project.”  Wikipedia is better associated with the cataloging and dissemination of knowledge, but in order to use that knowledge it needs to be searched.

Perhaps that is why the Wikimedia Foundation is “doing a Google” and will be investing a Knight Foundation Grant into a search-related project.  The Wikimedia Foundation finally released information about the Knight Foundation Grant, dedicated to provide funds for companies invested in innovative solutions related to information, community, media, and engagement.

“The grant provides seed money for stage one of the Knowledge Engine, described as “a system for discovering reliable and trustworthy information on the Internet”. It’s all about search and federation. The discovery stage includes an exploration of prototypes of future versions of Wikipedia.org which are “open channels” rather than an encyclopedia, analysing the query-to-content path, and embedding the Wikipedia Knowledge Engine ‘via carriers and Original Equipment Manufacturers’.”

The discovery stage will last twelve months, ending in August 2016.  The biggest risk for the search project would be if Google or Yahoo decided to invest in something similar.

What is interesting is that former Wiki worker Jimmy Wales denied the Wikimedia Foundation was working on a search engine via the Knowledge Engine.  Wales has since left and Andreas Kolbe reported in a Wikipedia Signpost article that they are building a search engine and led to believe it would be to find information spread cross the Wikipedia portals, rather it is something much more powerful.

Here is what the actual grant is funding:

“To advance new models for finding information by supporting stage one development of the Knowledge Engine by Wikipedia, a system for discovering reliable and trustworthy public information on the Internet.”

It sounds like a search engine that provides true and verifiable search results, which is what academic scholars have been after for years!  Wow!  Wikipedia might actually be worth a citation now.

 

Whitney Grace, March 24, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Artificial Intelligence Competition Reveals Need for More Learning

March 3, 2016

The capabilities of robots are growing but, on the whole, have not surpassed a middle school education quite yet. The article Why AI can still hardly pass an eighth grade science test from Motherboard shares insights into the current state of artificial intelligence as revealed in a recent artificial intelligence competition. Chaim Linhart, a researcher from an Israel startup, TaKaDu, received the first place prize of $50,000. However, the winner only scored a 59.3 percent on this series of tasks tougher than the conventionally used Turing Test. The article describes how the winners utilized machine learning models,

“Tafjord explained that all three top teams relied on search-style machine learning models: they essentially found ways to search massive test corpora for the answers. Popular text sources included dumps of Wikipedia, open-source textbooks, and online flashcards intended for studying purposes. These models have anywhere between 50 to 1,000 different “features” to help solve the problem—a simple feature could look at something like how often a question and answer appear together in the text corpus, or how close words from the question and answer appear.”

The second and third place winners scored just around one percent behind Linhart’s robot. This may suggest a competitive market when the time comes. Or, perhaps, as the article suggests, nothing very groundbreaking has been developed quite yet. Will search-based machine learning models continue to be expanded and built upon or will another paradigm be necessary for AI to get grade A?

Megan Feil, March 3, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Data Managers as Data Librarians

December 31, 2015

The tools of a librarian may be the key to better data governance, according to an article at InFocus titled, “What Librarians Can Teach Us About Managing Big Data.” Writer Joseph Dossantos begins by outlining the plight data managers often find themselves in: executives can talk a big game about big data, but want to foist all the responsibility onto their overworked and outdated IT departments. The article asserts, though, that today’s emphasis on data analysis will force a shift in perspective and approach—data organization will come to resemble the Dewey Decimal System. Dossantos writes:

“Traditional Data Warehouses do not work unless there a common vocabulary and understanding of a problem, but consider how things work in academia.  Every day, tenured professors  and students pore over raw material looking for new insights into the past and new ways to explain culture, politics, and philosophy.  Their sources of choice:  archived photographs, primary documents found in a city hall, monastery or excavation site, scrolls from a long-abandoned cave, or voice recordings from the Oval office – in short, anything in any kind of format.  And who can help them find what they are looking for?  A skilled librarian who knows how to effectively search for not only books, but primary source material across the world, who can understand, create, and navigate a catalog to accelerate a researcher’s efforts.”

The article goes on to discuss the influence of the “Wikipedia mindset;” data accuracy and whether it matters; and devising structures to address different researchers’ needs. See the article for details on each of these (especially on meeting different needs.) The write-up concludes with a call for data-governance professionals to think of themselves as “data librarians.” Is this approach the key to more effective data search and analysis?

Cynthia Murrell, December 31, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

A Fun Japanese Elasticsearch Promotion Video

September 10, 2015

Elasticsearch is one of the top open source search engines and is employed by many companies including Netflix, Wikipedia, GitHub, and Facebook.  Elasticsearch wants to get a foothold into the Japanese technology market.  We can assume, because Japan is one of the world’s top producers of advanced technology and has a huge consumer base.  Once a technology is adopted in Japan, you can bet that it will have an even bigger adoption rate.

The company has launched a Japanese promotional campaign and a uploaded video entitled “Elasticsearch Product Video” to its YouTube channel.  The video comes with Japanese subtitles with appearances by CEO Steven Schuurman, VP of Engineering Kevin Kluge, Elasticsearch creator Shay Bannon, and VP of Sales Justin Hoffman.  The video showcases how Elasticsearch is open source software, how it has been integrated into many companies’ frameworks, its worldwide reach, product improvement, as well as the good it can do.

Justin Hoffman said that, “I think the concept of an open source company bringing a commercial product to market is very important to our company.  Because the customers want to know on one hand that you have the open source community and its evolution and development at the top of your priority list.  On the other hand, they appreciate that you’re innovating and bringing products to market that solve real problems.”

It is a neat video that runs down what Elasticsearch is capable of, the only complaint is that bland music in the background.  They could benefit from licensing the Jive Aces “Bring Me Sunshine” it relates the proper mood.

Whitney Grace, September 10, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Quality Peer Reviews Are More Subjective Than Real Science

July 16, 2015

Peer reviewed journals are supposed to have an extra degree of authority, because a team of experts read and critiqued an academic work.  Science 2.0 points out in the article, “Peer Review Is Subjective And The Quality Is Highly Variable” that peer-reviewed journals might not be worth their weight in opinions.

Peer reviews are supposed to be objective criticisms of work, but personal beliefs and political views are working their way into the process and have been for some time.  It should not come as a surprise, when academia has been plagued by this problem for decades.  It also has also been discussed, but peer review problems are brushed under the rug.  In true academic fashion, someone is conducting a test to determine how reliable peer review comments are:

“A new paper on peer review discusses the weaknesses we all see – it is easy to hijack peer review when it is a volunteer effort that can drive out anyone who does not meet the political or cultural litmus test. Wikipedia is dominated by angry white men and climate science is dominated by different angry white men, but in both cases they were caught conspiring to block out anyone who dissented from their beliefs.  Then there is the fluctuating nature of guidelines. Some peer review is lax if you are a member, like at the National Academy of Sciences, while the most prominent open access journal is really editorial review, where they check off four boxes and it may never go to peer review or require any data, especially if it matches the aesthetic self-identification of the editor or they don’t want to be yelled at on Twitter.”

The peer review problem is getting worse in the digital landscape.  There are suggested solutions, such as banning all fees associated with academic journals and databases, homogenizing review criteria across fields, but the problems would be far from corrected.  Reviewers are paid to review works, which likely involves kickbacks of some kind.  Also trying to get different academic journals, much less different fields to standardize an issue will take a huge amount of effort and work, if they can come to any sort of agreement.

Fixing the review system will not be done quickly and anytime money is involved, the process is slowed even further.  In short, academic journals are far from being objective, which is why it pays to do your own research and take everything with a grain of salt.

 

Whitney Grace, July 16, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

  • Archives

  • Recent Posts

  • Meta