Virtual China: Beefing Up

April 3, 2020

I want to keep this brief. “Tencent to Build AI Supercomputing Center, Industrial Base in Shanghai.” So what’s new? The write up states:

The internet titan and the city’s Songjiang district government signed an deal today to deepen collaboration in areas such as AI…

DarkCyber noted this checklist:

The center will undertake various large-scale AI algorithm calculations, machine learning, image processing, and scientific and engineering computing tasks based on Tencent’s AI capabilities, and provide cloud computing services to the whole of society with data processing and storage capabilities…

Edge computing? Smart manufacturing? Intercept and data analytics?

Check, check, check.

Stephen E Arnold, April 3, 2020

Forget Weak Priors, Certain Predictive Methods Just Fail

April 2, 2020

Nope. No equations. No stats speak. Tested predictive models were incorrect.

Navigate to “Researchers Find AI Is Bad at Predicting GPA, Grit, Eviction, Job Training, Layoffs, and Material Hardship.” Here’s the finding, which is a delightfully clear:

A paper coauthored by over 112 researchers across 160 data and social science teams found that AI and statistical models, when used to predict six life outcomes for children, parents, and households, weren’t very accurate even when trained on 13,000 data points from over 4,000 families.

So what? The write up states in the form of a quote from the author of the paywalled paper:

“Here’s a setting where we have hundreds of participants and a rich data set, and even the best AI results are still not accurate,” said study co-lead author Matt Salganik, a professor of sociology at Princeton and interim director of the Center for Information Technology Policy at the Woodrow Wilson School of Public and International Affairs. “These results show us that machine learning isn’t magic; there are clearly other factors at play when it comes to predicting the life course.”

We noted this comment from a researcher at Princeton University:

In the end, even the best of the over 3,000 models submitted — which often used complex AI methods and had access to thousands of predictor variables — weren’t spot on. In fact, they were only marginally better than linear regression and logistic regression, which don’t rely on any form of machine learning.

Several observations:

  1. Nice work AAAS. Keep advancing science with a paywall germane to criminal justice and policeware.
  2. Over inflation of the “value” of outputs from models is common in marketing. DarkCyber thinks that the weaknesses of these methods needs more than a few interviews with people like the Cathy O’Neil, author of Weapons of Math Destruction.
  3. Are those afflicted with innumeracy willing to delegate certain important actions to procedures which are worse than relying on luck, flipping a coin, or Monte Carlo methods?

Net net: No one made accurate predictions. Yep, no one. Thought stimulating research with implication for predictive analytics adherents. This open source paper provides some of the information referenced in the AAAS paper: Measuring the Predictability of Life Outcomes with a scientific mass collaboration

Stephen E Arnold, April 2, 2020

A Cheerleading Routine for AI

April 2, 2020

We have come across a good example of cheerleading with minimal facts. Rah rah for AI, cries the SmartData Collective in their write-up, “Experts Debunk the Biggest Myths About AI in Business.” Writer Sean Mallon begins by noting how fast the AI market is growing, which is indeed to be expected given recent developments (and hype). He declares the growth is due to businesses that comprehend how powerful a tool AI is. He writes:

“Companies are now increasing the adoption of this technology in a range of different industries, which covers diverse sectors such as healthcare, finance, marketing and more. Through the incorporation of AI, industries have seen major shifts in how they run. While the true potential of AI is now being recognized by businesses from all different sectors, many myths have floated around causing skepticism and unnecessary fear over this transformative technology. If AI is to reach its true potential in businesses across all industries, it’s important to explore, and further debunk, these common misconceptions.”

The piece magnanimously helps any reluctant companies see the light by deflating these “myths:” that AI steals jobs, that AI is hard to integrate, and, most dastardly, that AI may be unnecessary. On that last point, Mallon asserts:

“This is perhaps one of the biggest myths currently circulating around industries today, limiting businesses from unlocking their true potential. AI technology is increasingly becoming a part of daily life, especially in the business sector, boosting its productivity and furthering its growth and success. Companies everywhere are using AI to gain a competitive advantage, helping their business to work smarter and faster than those around them.”

For some, I’m sure that is the case; for others, not so much. Business is just too complex for such absolutes. As always, the best bet is to ignore the hype, know your organization’s needs and the capabilities of available software, and mix and match accordingly.

Cynthia Murrell, April 2, 2020

Big Data Gets a New Term: DarkCyber Had to Look This One Up

April 2, 2020

In our feed this morning (April 1, 2020) we skipped over the flood of news about Zoom (a Middle Kingdom inspired marvel), the virus stories output by companies contributing their smart software to find a solution), and the trend of Amazon bashing (firing a worker who wanted to sanitize a facility and Amazon’s organizational skills are wobbling).

What stopped our scanning eyes was “Why Your Business May Be on a Data-Driven Coddiwomple.” DarkCyber admits that one of our team write a story for an old school publisher which used the word “cuculus” in its title “Google in the Enterprise 2009: The Cuculus Strategy.” A “cuculus,” as you probably know, gentle reader, is a remarkable bird, sort of a thief.

But Coddiwomple? That word means travel in a purposeful manner to a vague definition. Most of the YouTube train ride and the Kara and Nate trips qualify. Other examples include the aimless wandering of enterprise search vendors who travel to the lands of customer service, analytics, business process engineering, and only occasionally returning to their home base of the 50 year old desert of proprietary enterprise search.

What’s the point of “Why Your Business May Be on a Data-Driven Coddiwomple”? DarkCyber believes the main point is valid:

In practical terms the lack of clarity on the starting point can involve a lack of vision into what the specific objectives of the team are, or what human resources and skills are already in house. Meanwhile, the diverse and siloed stakeholders in a “destination” for the data-driven endeavor may all have slightly different ideas on what the result should be, leading to a divergent and fuzzy path to follow.

In DarkCyber’s lingo, these data and analytics journeys are just hand waving and money spending.

Are businesses and other entities data driven?

Ho ho ho. Most organizations are not sure what the heck is going on. The data are easy to interpret, and no fancy, little understood analytics system is needed to figure out that an iceberg has nicked the good ship Silicon Lollipop.

There are interesting uses of data and clever applications of systems and methods that are quite old.

Like the cuculus, opportunism is important. The coddiwomple is a secondary effect. The cuculus gets into a company’s nest and raises money consumers. When the money suckers are bigger, each flies to another nest and the cycle repeats.

Data driven is a metaphor for doing something even though results are often difficult to explain: Higher costs, increased complexity, and an inability to adapt to the business environment.

I support the cuculus inspired consultants. The management of the nest can enjoy the coddiwomple as they seek a satisfying place to begin again.

Stephen E Arnold, April 2, 2020

Semantic Search: From Whence to What

April 2, 2020

A post from semantic SEO firm InLinks traces “The Evolution of Semantic Search.” The buzzword-filled summary does relate an interesting saga, which prompts us to wonder why enterprise search results are generally still pretty poor.

The write-up traces the evolution from the card-catalogue-like directories of early Yahoo to today’s semantic search. Along the way it details these concepts and milestones: directory-based search vs. text-based search; the crawl and discover phase; JavaScript challenges; turning text into math; the continuous bag of words (COBW) and nGrams; vectors; semantic markup; and trusted seed sets. See the post for elaboration on any of these headings.

The piece concludes:

“We started the journey of search by discussing how human-led web directories like Yahoo Directory and the Open Directory Project was surpassed by full-text search. The move to Semantic search, though, is a blending of the two ideas. At its heart, Google’s Knowledge-based extrapolates ideas from web pages and augments its database. However, the initial data set is trained by using ‘trusted seed sets’. the most visible of these is the Wikipedia foundation. Wikipedia is curated by humans and if something is listed in Wikipedia, it is almost always listed as an entity in Google’s Knowledge Graph. … So in many regards. the Knowledge Graph is the old web Directory going full circle. The original directories used a tree-like structure to give the directory and ontology, whilst the Knowledge Graph is more fluid in its ontology. In addition, the smallest unit of a directory structure was really a web page (or more often a website) whilst the smallest unit of a knowledge graph is an entity which can appear in many pages, but both ideas do in fact stem from humans making the initial decisions.”

Here is where we are reminded of the post’s source—For the SEO platform, the takeaway is that what Google considers an “entity” has become key to effective SEO marketing. For our part, we look forward to the continuation of the saga, hopefully resulting in truly effective enterprise search solutions. Some day.

Cynthia Murrell, April 2, 2020

Nervous about AI? Google Uses It and You Do Too

April 2, 2020

Despite the deployment of smart speakers, virtual assistants, language translation automation, and many other technologies we use every day, AI still feels like a future innovation. We are probably stuck on the idea that AI means walking, talking robots, but AI, in fact, is already part of our daily lives. Techni Pages wrote, “5 Uses Of Advanced AI Already Being Used By Google” to demonstrate how AI is currently being used.

Have you ever sent a text message using the voice-to-text feature on your mobile phone? Surprise, that is a form AI! Human language is very complex and in order for machines to understand it, Google uses Deep Neural Networks to model language sounds. Current endeavors have designed voice-to-text to be faster, siphon out more noise, and more accurate.

Google Maps is another huge AI project. Powered by real time predictions, Google Maps delivers the fastest route to destinations. It takes into consideration accidents, traffic, and constructions so users can avoid those hindrances. The Google Assistant is another AI tool that acts as your own personal assistant to perform Internet searches, schedule appointments, set reminders, and make simple notes. Gmail also uses AI to categorize emails and filter spam from your inbox.

Google offers the Cloud AutoML too:

“The Cloud AutoML is an advanced AI that helps developers to create other AI smart solutions. The machine learning models are of high quality and enable developers to create AI that suits their business needs. Cloud AutoML has state-of-the-art performance and also enables the machine learning to happen with minimal effort since it uses neural architecture search technology and transfer learning.”

Google is an industry leader in developing innovative AI tools. The AI tools we use might not be robots, but they are very helpful.

Whitney Grace, April 2, 2020

April Surprise: PhpSearch Images

April 1, 2020

For an interesting search experience, navigate to this link which is powered by the SRCH2 search system. The content available from the search box inside the fish is interesting. Running queries on the image search system can be particularly interesting.

I suppose I could provide some queries for you to test, but I will leave that to you, gentle reader.

The SRCH2 technology has been around for a number of years. I tracked down the company when I was working on the New Landscape of Search, but I decided not to include the company because it was focusing on mobile.

For information about the company, navigate to this link.

Stephen E Arnold, April 1, 2020

Not a Joke: More of a Commentary on Allegedly Smart PhDs

April 1, 2020

Trigger warning: This is not about search, cybercrime, intelware, or any of the other hobby horses I flog each day as I have since 2008.

Before I highlight the real news item from the “we beg for dollars” outfit the Guardian, try to answer these questions:

  • Did the PhD get his degree online?
  • Did the PhD understand the equation F = q2B1v2 sin theta?
  • Did the PhD think that people would shove ceramic magnets up their nose?

Okay, now navigate to “Astrophysicist Gets Magnets Stuck Up Nose While Inventing Coronavirus Device.” The allegedly accurate write up states:

Australian Dr Daniel Reardon ended up in hospital after inserting magnets in his nostrils while building a necklace that warns you when you touch your face.

The newspaper provides a number of details. Here’s one:

Before attending the hospital, Reardon attempted to use pliers to pull them out, but they became magnetized by the magnets inside his nose.

You too can get a PhD online, impress your friends, and invent new things. Darwin award nominee?

Stephen E Arnold, April 1, 2020

No Fooling: Copyright Enforcer Does Indexing Too

April 1, 2020

The Associated Press is one of the oldest, most respected, and widely read news services in the world. As more than half the world reads Associated Press, it makes one wonder how the news services organizes and distributes its content. Synaptica has more details in the article, “Synaptica Insights: Veronika Zielinska, The Associated Press.”

Veronika Zielinska has a background in computational linguistics and natural language. She was interested in how automated tagging, taxonomies, and statistical engines apply rules to content. She joined Associated Press’s Information Management team in 2005, then moving up to the Metadata Technology team. Her current responsibilities are to develop the Metadata Services platform, fine tuning search quality and relevancy for content distribution platforms, scheme design, data transformations, analytics and business intelligence programs, and developing content enrichment methods.

Zielinska offers information on how the Associated Press builds a taxonomy:

“We looked at all the content that AP produced and scoped our taxonomy to cover all possible topics, events, places, organizations, people, and companies that our news production covered. News can be about anything – it’s broad, but we also took into account there are certain areas where AP produces more content than others. We have verticals that have huge news coverage – this can be government, politics, sports, entertainment and emerging areas like health, environment, nature, and education. Looking at the content and knowing what the news is about helps us to develop the taxonomy framework. We took this content base and divided the entire news domain into smaller domains. Each person on the team was responsible for their three or four taxonomy domains. They became subject and theme matter experts.”

The value of Associated Press’s taxonomies comes from the entire content package that includes everything from photos, articles, and videos centered around descriptive metadata that makes it agreeable and findable.

While the Associated Press is a non-profit news service, they do offer a platform called AP Metadata Services that is used by other news services. The Associated Press frequently updates its taxonomy with new terms when they enter the media. The AP taxonomy team works with the AP Editorial team to identify new terms and topics. The biggest challenges Zielinska faces are maintenance and writing in a manner that the natural language processing algorithms can understand it.

As for the future, Zielinska fears news services losing their budgets, local news not getting as much coverage, and the spread of misinformation. The biggest problem is that automated technologies can take the misinformation and disseminate it. She advises, “Managers can help by creating standardized vocabularies for fact checking across media types, for example, so that deep fakes and other misleading media can be identified consistently across various outlets.”

Whitney Grace, April 1, 2020

Swagiggle? Nope, Not an April Fooler

April 1, 2020

Big ecommerce sites like eBay and Amazon depend on a robust, accurate, and functional search engine. Without a powerful search application, searching for items on eBay and Amazon is like looking through every page of a printed catalog. The only difference is that there are millions of items compared to the thousands in one catalog. Amazon and eBay are not always accurate, especially when users edit and add content without being monitored. That means there is room for improvement and a startup to worm their way into the big leagues. Swagiggle is a:

“Swagiggle is a precision shopping search and product discovery website created by WAND, Inc. to demonstrate the capabilities of its taxonomy based product data organization and enrichment abilities featured in the WAND eCommerce Taxonomy Portal and PIM. WAND, Inc. is the world’s leading provider of pre-defined taxonomies, including the WAND Product and Service Taxonomy.

Have you ever had the experience of going to a category on an online retail site and seeing mis-categorized items? Or, a bunch of items dumped into a catch-all “Accessories” category. At Swagiggle, our goal is to provide accurate and specific categories so that our users can quickly find exactly the products they are looking for. From there, we assign product specifications so that users can filter through the items in a category and find exactly what they want.”

Wand’s Swagiggle sounds like an awesome product. Using products from its clients, Swagiggle offers an online catalog for users to search for products they wish to buy. These products range from clothing to cleaning products. The items are organized by large categories, then users man drill down to specific items or search with key words. It is a pretty standard search engine, but it has one major problem. The drilling down aspect does fill dated and half the time pictures and content would not load. The loading time is extraordinary long too. Plus, due to the variety of their clients, items offered on Swagiggle are very random. Swagiggle needs tofu the broken pictures and figure out how to make itself faster.

Whitney Grace, April 1, 2020

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta