Facebook Wants to Help Wikipedia with Factoid Accuracy

September 12, 2022

Yes, Facebook is an arbiter of truth.

Researchers have a love-hate relationship with Wikipedia. They love that it is a constantly updated, digital encyclopedia with quick search and reference tools, but hate its inaccuracies. SinguarlityHub discusses how Facebook wants to change Wikipedia’s unreliability: “Meta Is Building An AI To Fact-Check Wikipedia-All 6.5 Million Articles.”

Wikipedia’s editors wrote: “The online encyclopedia does not consider itself to be reliable as a source and discourages readers from using it in academic or research settings.” Most basic information on Wikipedia is true, but it is good to double-check information, but most people do not do that. Facebook armed with its new Meta facade is working on an AI to verify all of Wikipedia’s information.

The AI would fact-check the information in the articles, but it works differently than expected:

“Meta’s model will “understand” content not by comparing text strings and making sure they contain the same words, but by comparing mathematical representations of blocks of text, which it arrives at using natural language understanding (NLU) techniques. What we have done is to build an index of all these web pages by chunking them into passages and providing an accurate representation for each passage,’ Fabio Petroni, Meta’s Fundamental AI Research tech lead manager, told Digital Trends. ‘That is not representing word-by-word the passage, but the meaning of the passage. That means that two chunks of text with similar meanings will be represented in a very close position in the resulting n-dimensional space where all these passages are stored.’”

Thankfully the AI’s learning dataset of four million Wikipedia citations is cleaner and better than what other AI have learned from in the past. The dataset is also constantly being updated. The developers are also teaching the AI how to distinguish a reliable source from a bad one, i.e. a scientific paper vs. a conspiracy theory article.

The Meta team said no one has used AI to verify Wikipedia’s information before. It is great that Facebook is doing the world a favor by fact-checking Wikipedia, but what will Facebook correct in the Facebook Wikipedia information?

Whitney Grace, September 12, 2022

Microsoft: Explaining Its Cloud Policies and Revealing Its Thought Processes

September 12, 2022

After I graduated from a so so university, some other academic entity paid me money to work on a PhD. As part of the deal, I had to teach one class in freshman composition. The students were working like pious beavers to become nuns, priests, and I suppose capable professionals in a religious bookstore or some similar line of work.

I read some wild and crazy essays: Truth: The Path to Salvation, Faith: The Rock in the Thunderstorm of Life, etc etc. I was transported back to my small apartment behind a big estate type house and correcting the type of errors Grammarly eliminates. No computers in 1967 that would fit in my roomy 700 square feet.

The essay which caught my attention is — in modern lingo — a blog post. Its title lacks the metaphorical impact of those freshman essays but the content is quite remarkable.

First, the title: “New Licensing Benefits Make Bringing Workloads and Licenses to Partners’ Clouds Easier.” The main idea is that Microsoft wants to demonstrate that it is not really a quasi-monopoly. Nope, it learned its lesson when Mr. Gates’ testimony successfully thwarted the US government decades ago. Who knew he was a gifted rhetorician or a word-meister capable of The Road Ahead?

The blog title is interesting because it talks about benefits. The idea that Microsoft wants to make life easier. You know. Just like the Windows 11 changes for the corporations who deploy the operating systems to one or two employees. No big deal. Just add annoyances and kill printing. But the payoffs addressed in the blog “essay” requires some linguistic calisthenics.

Here’s a sampling:

CSP or cloud solution provider
Easier
Ecosystem
Empower
Ensure
Excited
Exciting
Flexible virtualization
Hosted
Joint success
Outsourcers
QMTH or qualified multitenant hosting
QOS or qualifying operating system
SPLA or service provider licensing agreements
Scenarios
Virtual core
Workloads

What does the word choice suggest? To me, I am suspicious. How can a giant corporation with a stellar track record of delivering software which often does not work care so much about a provider. What is a provider. A good shepherd, a rock in a storm, a beacon to salvation?

Third, I noted a fascinating but very tiny asterisk in the section title “More Flexibility and Options for Software Outsourcing.” The asterisk points to the foot of the blog essay. Listed at that point are the companies not allowed to get paid to let customers put Microsoft software on these alien, and apparently inappropriate computer systems. You want multi cloud? You want freedom to run the software for which you pay where to want to run it? Ho ho ho. Not unless a regulator shows some moxie.

Who are the dark and threatening cloudies? Here’s the list with the tiny asterisk:

  • Alibaba
  • Amazon Web Services
  • Google
  • Microsoft.

See, Microsoft puts Microsoft on its own list. How can a giant company be more fair? Impossible to out do this path to salvation.

Fourth, information which strikes me as important appears toward the end of the blog post; to wit:

At its inception, SPLA was intended to allow partners to offer hosted services from their own datacenters, not for managed service providers buying through SPLA to host on others’ datacenters. We are making changes to the SPLA program, starting in October 2022, to better align with the program’s intent, and with other commercial licensing programs.

Observations:

  1. Microsoft is scrambling to be on the side of its partners and customers but, to me, mostly the customers
  2. The European Union is likely to be confused by the language of the blog post but will muddle through and continue the crackdown on the US technology companies and their business practices
  3. The Microsoft partners need to generate revenue with Microsoft generating leads, engineering service opportunities, and positioning that maximizes the benefit of many happy Windows, Word, and Teams users.

Net net: Not an F, but I would score the write up as a C minus or D plus. The split infinitive in the blog post was bad. But the tiny asterisk and red lining estimable companies like Alibaba, Amazon, and Google. Clumsy clumsy.

Stephen E Arnold, September 12, 2022

Meta: Grade School Behavior?

September 9, 2022

Despite being the domain of Baby Boomers and conspiracy theorists, Facebook is still a powerful tech company. Facebook is not afraid to sell out its users despite proclamations of loving support, but Apple Insider discusses its hypocritical behavior: “Facebook Is fine When Punishing Others Financially, But Cries When Others Do It To Them.”

Zuckerberg and his company processes behaviors similar to an elementary school bully: it acts big and tough, but when it is confronted and injured Facebook runs away crying. Facebook is acting like the aforementioned bully, because Apple has affected its profits. Apple changed its privacy policies, thus preventing Facebook from harvesting dollars from user data.

Despite Facebook claiming it does not sell data, it does. Apple added the Apple App Tracking Transparency to mobile devices, so users can prevent third-party Web sites (i.e. Facebook) from sharing data. Facebook did not like that, so Zuckerberg threatened to take Apple to court, then changed his mind. He decided to ruin other companies’ bottom lines to save his own. Facebook released data that showed users are reading less, so the company will not pay publishers for news articles.

This has led large media companies to fire writers and switch over to video production:

“But Facebook had exaggerated its figures by between 150% and 900%. Facebook denies this, but it later settled a lawsuit brought by advertisers over the issue. Facebook paid out $40 million then, but some publishers who had pivoted to video simply could not move back and did not recover. While there are forces beyond Facebook that contributed to this, the University of North Carolina said that even before the coronavirus, 20 newspaper businesses were closing every month.”

Facebook has now turned to VR, but is riding on the backs of creators to drive funding for the platform. Facebook was critical of Apple’s 30% commission fee from its app store purchases, so when Apple discovered Facebook’s Meta fees they had to say something:

“’Now — Meta seeks to charge those same creators significantly more than any other platform,’[said Apple Senior Director of Corporate Communications Fred Sainz.] ‘[Meta’s] announcement lays bare Meta’s hypocrisy. It goes to show that while they seek to use Apple’s platform for free, they happily take from the creators and small businesses that use their own.’”

Would a neutral observer use the word “hypocritical” to describe some Meta actions? Sure, and may add the term “zucker squeeze.”

Whitney Grace, September 4, 2022

Google: Adulting Becomes a Thing

September 8, 2022

My goodness, it has taken more than 20 years for the Backrub-inspired search and ad company to embrace adulting. This term takes a noun like adult and converts it to a verb. This English trick is one that thrills English as a Second Language students. What I am going to do is equate “adulting” with the management precepts of Peter Drucker. Now you see why figuring out what I am saying and not saying is so darned unusual.

First, however, we need some context. That estimable source of real news (Fox) published this story: “Google CEO Sundar Pichai Looking to Improve Tech Giant’s Efficiency.” The Big Dog of the Google is participating in explainers to the tech worshipers that the time is now for adulting. The idea is that the Google is under pressure from several different hypercube vectors; for example:

  1. The lovable and enlightened Amazon with its newfound clicks from product search and a corresponding surge in product related advertising
  2. That affable crowd in Cupertino who are taking steps to make sure the walled garden does not allow Googzilla too much room in which to cause mischief
  3. Those with-it regulators and elected officials in governments near and far who don’t understand how making money on ads as the saloon swinging door with a charge to come in and leave works for the benefit of anyone except the Google
  4. Wizards who find themselves orthogonal to Google’s personnel postures. Yep, Dr. Timnit Gebru et al. “Disagree and Begone” could become a new Xoogler T shirt for diversity conference attendees
  5. Technical debt, which — despite Google’s mostly not talking about it — continues to incur some hefty costs. One can fire people but one cannot do much more than sell data center gear on eBay or Swappa
  6. High school management methods. I have explained this concept in previous posts so use the search box and read the explanation, please. The new idea is that the best high school science club members will not want to work at the Google. Yikes. Regressing toward the mean maybe?

What did the Big Dog say is the future of Google?

One big point is that the 20 percent frittered away on the dorm notion of one day a week of other stuff is over. Now Googlers have to work like a person on the Ford assembly line in 1937. Punch in, do stuff that matters, and punch out. No output, no pay. Simple. I remember reading that programmers write code about 30 minutes a day. What are these wizards going to do in the other 7.5 hours? Well, Foosball, table tennis, and volleyball may be difficult when the kid toys are removed. Google is a place for real work. What is that work? Well, Google doesn’t explain too much, but I assume it is quantifiable, good for humankind, fair, equitable, and unbiased just like Snorkel automated training data.

Another point is that the new Google sets priorities. I think priorities are useful. Why have a couple dozen messaging apps and smart software that displays ads totally unrelated to either the content of a YouTube video or to the interests of a Google customer who pays for Google services? I suppose Google has given up on solving death, which, as I understood the project, was a priority.

I also noted that Google is moving more slowly. My experience suggests that what went quickly was work blessed by the senior management. Some employees are left to their own devices to learn how Google works, snag a project, and produce something that makes money. In order to set priorities, one has to do the Drucker type work. Is that type of thinking in the Google incentive plan?

To sum up: Google is in danger of having to face life as an ageing sled dog or arthritic Googzilla. Maybe some of the “solve death” research can rejuvenate the behemoth before the snow piles up and Googzilla moves even more slowly.

Stephen E Arnold, September 8, 2022

Microsoft and Opaque Clarity

September 7, 2022

Ah, Microsoft, we wish we could say we were surprised. HackerNoon explains “How Bing Is Spying on Users Without Their Consent Using Microsoft Clarity.” A companion to Bing Ads, Clarity collects and analyzes how users interact with one’s website. It can detect how long someone spends on each page, for example, or what kind of device they use. While the tool provides helpful information to webmasters, it appears Microsoft is also helping itself to the data. The writer was dismayed to discover Clarity was collecting their users’ information and promptly banished it from their site. We learn:

“Although Microsoft Clarity does not collect personally identifiable information (PII), it does collect data that could be used to personally identify a website visitor. This data includes the visitor’s IP address, which could be used to approximate their geographic location. Clarity also collects data about the visitor’s browser, device, and operating system, which could be used to identify the visitor’s identity or track their online activity. Ask yourself, are you okay with malicious hackers installing Remote Access Trojan (RAT) on your computer and they promise they couldn’t identify who you are and only wanted to study your behavior?”

The only hint to this activity in the Clarity user agreement is the vague notice data may be used for research and development. That can mean a lot of things. The write-up continues:

“You have to specifically tell Bing Ads not to track you by submitting a request form through this form. But even if you do opt-out, there’s no guarantee that your data won’t be collected. If you would like to continue using Bing Ads for conversion tracking, I urge you to fill up the form beforehand and only start using Bing Ads after they have approved opting you out. So if you’re concerned about your privacy, you might want to avoid Bing Ads altogether. Or at the very least, be aware that your every move is being tracked.”

The writer admits the insights Clarity provides can be valuable, but warns they might not be worth the tradeoff. Yes, it is a free tool, but we are reminded that “when you’re not paying for the product, then you are the product.” They suggest choosing an alternative, like those on this list (conveniently hosted on their now Clarity-free website).

Cynthia Murrell, September 6, 2022

YouTube: Podcasts, Vidcasts, Any Old Casts Will Do for Advertising

September 6, 2022

It appears YouTube is eager to jump onto the podcast bandwagon. The Hustle ponders whether “YouTube = Future Podcast Champ?” Maybe, but Google will have to maintain interest; otherwise, another Google Plus type situation may emerge. Writer Juliet Bennett Rylah reports:

A new podcasts homepage is now available to US users, going live sans fanfare in late July. TechCrunch speculates YouTube is waiting for its creator event next month to make a formal announcement. But YouTube also: 

  • Hired podcast exec Kai Chuk in 2021 Offered podcasters and networks $50k-$300k to create videos
  • Discussed audio ads and new analytics for audio-centric creators in a leaked document 
  • Partnered with NPR to bring on 20+ of its most popular shows.

Why’s it matter? While YouTube is often seen as a video-first platform, YouTube Music had 2B+ monthly users and 50m+ paid subs as of September 2021. Though competitors including Spotify, Apple, and Amazon have made big moves in the space, a Cumulus Media analysis found YouTube is America’s most popular podcast platform, capturing 24.2% of listeners compared to Spotify’s 23.8% and Apple’s 16%.”

Rylah, fittingly, points us to a podcast for another perspective. On an episode of Marketing Against the Grain, HubSpot’s Kipp Bodnar and Kieran Flanagan assert YouTube subscribers are now the most valuable subscribers on the Internet. They also make a few predictions. For example, the pair believes YouTube’s discovery platform will give its podcasters a leg up. They also suspect the site’s background listening feature is about to become free for everyone, as it currently is in a Canadian pilot program. At the same time, the site may push both podcasts and the brands that support them toward a more visual format. But wouldn’t that just turn them into more video content? What makes a podcast a podcast? Perhaps that is a philosophical question beyond the ken of this humble, text-based content creator.

Cynthia Murrell, September 6, 2022

Ethics Is a Thing in 2022. Oh, Really?

September 5, 2022

When companies toss around the word ethics, I roll my eyes. If I am not mistaken, the high technology luminaries have created an ethical waste land. Each day more examples of peak a-ethical behavior flow to me in an electronic Cuyahoga River complete with flames, smoke, and nifty aromas. Now consider “ethical smart software.”

Why Embedding AI Ethics and Principles into Your Organization Is Critical” is an oddity, almost a prose elegiac appeal. On one hand, the essay admits ethical shortcomings exist. I noted:

Universal adoption of AI in all aspects of life will require us to think about its power, its purpose, and its impact. This is done by focusing on AI ethics and demanding that AI be used in an ethical manner. Of course, the first step to achieving this is to find agreement on what it means to use and develop AI ethically.

On the other hand, businesses must embrace ethics. That sounds like a stretch to me.

Just a possibly irrelevant question: What’s ethics mean? And another: What’s artificial intelligence?

No answers appear in the cited article.

What does appear is this statement:

 If you are not proactively prioritizing inclusivity (among the other ethical principles), you are inherently allowing your model to be subject to overt or internal biases. That means that the users of those AI models — often without knowing it — are digesting the biased results, which have practical consequences for everyday life.

Ah, “you.” I would submit that the cost of developing unbiased trained data means automated systems for building training data will be adopted and then packaged like sardines. The users of these data and the libraries of off-the-shelf models, numerical recipes, and workflow modules will further distance smart software from the pipes beneath the Pergo floor.

Costs and financial payoff, not the undefined and foggy “AI ethics”, will create some darned exciting social, political, and financial knock on effects. As I recall that bastion of MBA thinking added charcoal starter to the opioid opportunity. The world’s online bookstore struggles to cope with fake reviews and designer purses. The world’s largest online advertising outfit is — well, let’s just say — trying to look past its handling of smart software professionals who disagree with the company’s management about bias in AI/ML.

Quite a write up. The conclusion is swell too:

My organization’s development and use of AI is a minor subsection of AI in our world. We have committed to our ethical principles, and we hope that other technology firms do as well.

Absolutely.

Stephen E Arnold, September 5, 2022

ISPs and Network Providers: The Big Warming

September 5, 2022

On September 14, 2022, I will be sharing some of my team’s research about ISPs and network providers. Coincidentally, the “open” information services are providing interesting — but as yet not yet rock solid information — about the ISP and network provider world. In a sense, figuring out what ISPs and network providers are doing is like looking at distant star data in the Webb space telescope data stream. There is information flowing, but making those data speak clearly is not an easy job.

I read “I Ran the Worlds Largest DDOS for Hire Empire and CloudFlare Helped.” The write up struck me as quite interesting. I circled this pass as interesting but not backed up with footnotes or cheerful hyperlinks:

As the infrastructure provider for over 20% of all www traffic traversing the internet today, CloudFlare is in a position to enforce it’s beliefs on a global scale. Most of the time this isn’t a problem, lots of nefarious websites try to take advantage of the services CloudFlare offers and are rightfully kicked off. The problems arise in a small category of websites that blur the line.

The “blur” seems to say to me: Hey, we are big and well known, and maybe some bad actors use our service.”

Here’s another sentence which may catch the attention of legal eagles:

As someone who has previously justified their actions by saying “I am not directly causing harm, the responsibility flows downstream to my end users” I can tell you it is a shaky defense at best. The situation would be different if CloudFlare was unaware of the booter websites they are offering protection to, but that is not the case. CloudFlare knows who they are protecting and chooses to continue doing so, being fully cognizant of the end result their actions will have. Let’s talk about that end result because the hypocrisy of it all stings like a slap in the face as I type this. CloudFlare is responsible for keeping booter websites online and operating, the very same websites who’s sole purpose is to fuel CloudFlare’s very own business model, selling DDoS protection.

I am no lawyer and I certainly don’t understand anything other than my dinobaby world. However, it seems as if a big company is allegedly in a position to do more to protect truth, justice, and the American way than it may be doing. Oh, the American way means operating without meaningful oversight, regulation, and the invisible ethical hand that makes stakeholders quiver with glee.

Worth watching what other ISP and network provider examples emerge as the real journalists reach their coffee shops and begin working this subject.

Stephen E Arnold, September 5, 2022

Amazon and Fake Reviews: Ah, Ha, Fake Reviews Exist

September 5, 2022

I read “Amazon’s Delay for the Rings of Power Reviews on Prime Video Part of New Initiative to Filter Out Trolls.” The write up makes reasonably official the factoid that Amazon reviews are, in many cases, more fanciful than the plot of Rings of Power.

The write up states:

The series appears to have been review bombed — when trolls flood intentionally negative reviews for a show or film — on other sites like Rotten Tomatoes, where it has an 84% rating from professional critics, but a 37% from user-submitted reviews. “The Rings of Power” has been fending off trolls for months, especially ones who take issue with the decision to cast actors of color as elves, dwarves, hand waves and other folk of Tolkien’s fictional Middle-earth.

Amazon wants to be a good shepherd for truth. The write up says:

Amazon’s new initiative to review its reviews, however, is designed to weed out ones that are posted in bad faith, deadening their impact. In the case of “A League of Their Own,” it appears to have worked: To date, the show has an average 4.3 out of 5 star rating on Prime Video, with 80% of users rating the show with five stars and 14% with one star.

Interesting. My view is that Amazon hand waves about fake reviews but for those which could endanger its own video product. Agree with me or not, Amazon is revealing that fake reviews are an issue. What about those reviews for Chinese shirts which appear to have been fabricated for folks in the seventh grade? SageMaker, what’s up?

Stephen E Arnold, September 12, 2022

Bots Are Hot

September 2, 2022

Developer Michael I Lewis had noble intentions when he launched searchmysite.net in 2020. Because Google and other prominent search engines have become little more than SEO and advertising ambushes, he worked evenings and weekends to create a search engine free from both ads and search engine optimization. The site indexes only user-submitted personal and independent sites and leaves content curation up to its community. Naturally, the site also emphasizes privacy and is open source. To keep the lights on, Lewis charges a modest listing fee. Alas, even this principled platform has failed to escape the worst goblins of the SEO field. Lewis laments, “Almost All Searches on my Independent Search Engine Are Now from SEO Spam Bots.”

SEO spam lowers the usual SEO trickery into the realm of hacking. It’s black hat practitioners exploit weaknesses, like insecure passwords or out-of-data plugins, in any website they can penetrate and plant their own keywords, links, and other dubious content. That spam then rides its target site up the search rankings as long as it can, ripping off marks along the way. If the infiltration goes on for long, the reputation and ranking of the infected website will tank, leaving its owner wondering what went awry. The results can be devastating for affected businesses.

In spring of 2022, Lewis detected a suspicious jump in non-human visitors on searchmysite.net. He writes:

“I’ve always had some activity from bots, but it has been manageable. However, in mid-April 2022, bot activity started to increase dramatically. I didn’t notice at first because the web analytics only shows real users, and the unusual activity could only be seen by looking at the server logs. I initially suspected that it was another search engine scraping results and showing them on their results page, because the IP addresses, user agents and search queries were all different. I then started to wonder if it was a DDoS attack, as the scale of the problem and the impact it was having on the servers (and therefore running costs) started to become apparent. After some deeper investigation, I noticed that most of the search queries followed a similar pattern. … It turns out that these search patterns are ‘scraping footprints’. These are used by the SEO practitioners, when combined with their search terms, to search for URLs to target, implying that searchmysite.net has been listed as a search engine in one or more SEO tools like ScrapeBox, GSA SEO or SEnuke. It is hard to imagine any legitimate white-hat SEO techniques requiring these search results, so I would have to imagine it is for black-hat SEO operations.”

Meanwhile, Lewis’ site has seen very little traffic from actual humans. Though it might be tempting to accuse major search engines of deliberately downplaying the competition, he suspects the site is simply drowning in a sea of SEO spam. Are real people browsing the Web anymore, as opposed to lapping up whatever social media sites choose to dish out? A few, but they are increasingly difficult to detect within the crowd of bots looking to make a buck.

Cynthia Murrell, September 2, 2022

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta