CD? Ignore That, Big Tech AI

February 27, 2026

green-dino_thumb_thumb3_thumbAnother dinobaby post. No AI unless it is an image. This dinobaby is not Grandma Moses, just Grandpa Arnold.

I try to filter the Epstein Epstein Epstein just as I try to block the AI AI AI. However, one of the people on my team showed me this write up: “AIs Can Generate Near-Verbatim Copies of Novels from Training Data.” I am not surprised that a big time US big tech AI system would spit out text from a source. These companies struck me as outfits that were going to ingest content and let the lawyers run interference. Publishers and individual authors usually lack the fleets of legal eagles available to big tech outfits.

image

Thanks, Venice.ai. Good enough.

Furthermore, I am not surprised that some people are surprised that these smart software systems are stupid enough to output content that clearly illustrates that their marvels appear to surf on other people’s creative work. Why do I have this view? Before I stepped away from the work fray, I bounced in and out of some Silicon Valley entities. Heck, I worked at one for several years. I was involved in a couple of zippy start ups and heard people on the team make clear that if something could be done, just do it. “They” won’t figure it out for a long time. The “they”, of course, was users, regulators, law enforcement, morality watch dogs, and moms.

I assume that the author of the article is not aware that some of the big tech outfits are complaining that other big tech outfits are pirating their systems and methods. Yep, the outfits that just took other people’s work are squawking that a big tech company has the unmitigated gall to use another big tech firm’s intellectual property.

My term for this behavior is cynical duplicity. Its characteristics are:

  • Move fast, break things. Reason: Chaos destabilizes and makes meaningful responses quite difficult
  • Just take it. Reason: Most people don’t know that a well crafted or even a crappy crawler can suck down a lot of data quickly. By the time the source figures out that the data are gone, the data are — well, what do you think? — gone.
  • Sue people who break the law. Reason: Money buys lawyers. Lawyers, many times, just do what the client wants. The entity with the most time and money wins. Period.

How do I know cynical duplicity is operating? Check out these headlines and stories:

  1. Anthropic says Deepseek, Moonshot, and MiniMax used 24,000 fake accounts to rip off Claude
  2. Google Blocks Antigravity for OpenClaw-Linked AI Ultra Users, Cites “Malicious Usage”
  3. OpenAI Claims Deepseek Distilled US Models to Gain an Edge

Note: I pulled these headlines from Bing News. If the urls 404, contact the estimable Microsoft, not me.

As a dinobaby, I think the focus of the story about smart software spitting out novels is interesting. However, I think the CD or cynical duplicity is the significant aspect of how big tech AI outfits conduct themselves.

Stephen E Arnold, February 27, 2026

The Google Is Very Unhappy: People Are Copying Its Work. Illegal! Unfair! Terrible!

February 18, 2026

green-dino_thumb_thumb3Another dinobaby post. No AI unless it is an image. This dinobaby is not Grandma Moses, just Grandpa Arnold.

Up front I want to point out that I don’t know if the story is spot on. I am assuming that it contains a kernel of truth, maybe a whole stalk of veracity. But I find it quite interesting.

As a preface, some authors (creators) have accused AI companies of using their content to train AI models. One can pump the name “Esther Mahlangu,” into Google’s art making machine Namo Alabama or whatever and get outputs that look like Ms. Mahlangu’s art. How did Google’s smart software see Ms. Mahlangu’s geometric figures? I think Google must have ingested them, added metadata, decomposed them into Gemini building blocks, and happily and without much worry output facsimiles or what looked like Ms. Mahlangu’s designs.

image

A senior manager at a big tech company with a tight grip on a number of markets is having a hissy fit. The person in the playpen believes that a corrupt, criminal element is taking his toys. The senior executive’s assistant finds the scene as laughable as I do. Thanks, Venice.ai. Good enough. Not a cartoon. No toys in the air. But I don’t expect much from AI.

The operative work is “copy.” Maybe it should be called “harvesting” or “spidering”? From my point of view, Google has copied the intellectual spark and motif of her original and imaginative work. I am not sure Google is in the original business unless it comes to claims about its quantum supremacy. I grant that seems like a creative bit of wordsmithing. But I keep coming back to taking, using, and outputting of zeros and ones that are most just copies.

Now, what about this write up:Google: Hackers Are Trying to ‘Clone’ Gemini for Cyberattacks”? The article states:

As the tech industry races to develop new AI models, Google alleges that “private sector” entities have been trying to reverse-engineer its Gemini chatbot by bombarding it with prompts intended to leak its secrets… But the company says it “observed and mitigated frequent model extraction attacks from private sector entities all over the world and researchers seeking to clone proprietary logic.” Model extraction isn’t your typical hacker-led “break-in.” Instead of exploiting a software glitch or infiltrating a corporate network, these attacks leverage legitimate access via Gemini’s API, which Google sells to software developers who want to build their apps around the chatbot.

The write  up includes a diagram of how Google was “compromised”. (Didn’t Google buy Mandiant to deal with simple black box copying? Doesn’t Google use its own AI to defend its AI?) The write up states:

“This activity effectively represents a form of intellectual property (IP) theft,” Google alleges.

image

Google does not want its AI wizardry copied. I would remind people about these three examples related to Google’s unauthorized use of published information:

  1. Authors Guild v. Google, Inc. (Google Books)
  2. Perfect 10, Inc. v. Google, Inc.
  3. Oracle America, Inc. v. Google LLC

Google won each case because its taking was fair use. I am no lawyer. It looks to me that Google’s money, legal resources, and its ability to suggest that a nifty Google mouse pad might appear from a Googler’s briefcase helped it prevail. I just find it amusing that Google is miffed at people who are copying its stuff.

The legal eagles will take flight. Destination: Court rooms. Argument: You cannot do what we do.

Stephen E Arnold, February 18, 2026

Is This Correct? Google Sues to Protect Copyright

December 30, 2025

green-dino_thumbAnother dinobaby post. No AI unless it is an image. This dinobaby is not Grandma Moses, just Grandpa Arnold.

This headline stopped me in my tracks: “Google Lawsuit Says Data Scraping Company Uses Fake Searches to Steal Web Content.” The way my dinobaby brain works, I expect to see a publisher taking the world’s largest online advertising outfit in the crosshairs. But I trust Thomson Reuters because they tell me they are the trust outfit.

image

Google apparently cannot stop a third party from scraping content from its Web site. Is this SEO outfit operating at a level of sophistication beyond the ken of Mandiant, Gemini, and the third-party cyber tools the online giant has? Must be I guess. Thanks, Venice.ai. Good enough.

The alleged copyright violator in this case seems to be one of those estimable, highly professional firms engaged in search engine optimization. Those are the folks Google once saw as helpful to the sale of advertising. After all, if a Web site is not in a Google search result, that Web site does not exist. Therefore, to get traffic or clicks, the Web site “owner” can buy Google ads and, of course, make the Web site Google compliant. Maybe the estimable SEO professional will continue to fiddle and doctor words in a tireless quest to eliminate the notion of relevance in Google search results.

Now an SEO outfit is on the wrong site of what Google sees as the law. The write up says:

Google on Friday [December 19, 2025] sued a Texas company that “scrapes” data from online search results, alleging it uses hundreds of millions of fake Google search requests to access copyrighted material and “take it for free at an astonishing scale. The lawsuit against SerpApi, filed in federal court in California, said the company bypassed Google’s data protections to steal the content and sell it to third parties.

To be honest the phrase “astonishing scale” struck me as somewhat amusing. Google itself operates on “astonishing scale.” But what is good for the goose is obviously not good for the gander.

I asked You.com to provide some examples of people suing Google for alleged copyright violations. The AI spit out a laundry list. Here are four I sort of remembered:

  • News Outlets & Authors v. Google (AI Training Copyright Cases)
  • Google Users v. Google LLC (Privacy/Data Class Action with Copyright Claims)
  • Advertisers v. Google LLC (Advertising Content Class Action)
  • Oracle America, Inc. v. Google LLC

My thought is that with some experience in copyright litigation, Google is probably confident that the SEO outfit broke the law. I wanted to word it “broke the law which suits Google” but I am not sure that is clear.

Okay, which company will “win.” An SEO firm with resources slightly less robust than Google’s or Google?

Place your bet on one of the online gambling sites advertising everywhere at this time. Oh, Google permits online gambling ads in areas allowing gambling and with appropriate certifications, licenses, and compliance functions.

I am not sure what to make of this because Google’s ability to filter, its smart software, and its security procedures obviously are either insufficient, don’t work, or are full of exploitable gaps.

Stephen E Arnold, December 30, 2025

Do Not Mess with the Mouse, Google

December 15, 2025

green-dino_thumbAnother dinobaby post. No AI unless it is an image. This dinobaby is not Grandma Moses, just Grandpa Arnold.

Google Removes AI Videos of Disney Characters After Cease and Desist Letter” made me visualize an image of Godzilla which I sometimes misspell as Googzilla frightened of a mouse; specifically, a quite angry Mickey. Why?

image

A mouse causes a fearsome creature to jump on the kitchen chair. Who knew a mouse could roar? Who knew the green beast would listen? Thanks, Qwen. Your mouse does not look like one of Disney’s characters. Good for you.

The write up says:

Google has removed dozens of AI-generated videos that depicted Disney-owned characters after receiving a cease and desist letter from the studio on Wednesday. Disney flagged the YouTube links to the videos in its letter, and demanded that Google remove them immediately.

What adds an interesting twist to this otherwise ho hum story about copyright viewed as an old-fashioned concept is that Walt Disney invested in OpenAI and worked out a deal for some OpenAI customers to output Disney-okayed images. (How this will work out at prompt wizards try to put Minnie, Snow White, and probably most of the seven dwarves in compromising situations I don’t know. (If I were 15 years old, I would probably experiment to find a way to generate an image involving Price Charming and the Lion King in a bus station facility violating one another and the law. But now? Nah, I don’t care what Disney, ChatGPT users, and AI do. Have at it.)

The write up says that Google said:

“We have a longstanding and mutually beneficial relationship with Disney, and will continue to engage with them,” the company said. “More generally, we use public data from the open web to build our AI and have built additional innovative copyright controls like Google-extended and Content ID for YouTube, which give sites and copyright holders control over their content.”

Yeah, how did that work out when YouTube TV subscribers lost access to some Disney content. Then, Google asked users who paid for content and did not get it to figure out how to sign up to get the refund. Yep, beneficial.

Something caused the Google to jump on a kitchen chair when the mouse said, “See you in court. Bring your checkbook.”

I thought Google was too big to obey any entity other than its own mental confections. I was wrong again. When will the EU find a mouse-type tactic?

Stephen E Arnold, December 15, 2025

AI Algorithms Are Not Pirates, Just Misunderstood

September 11, 2025

Let’s be clear: AI algorithms are computer programs designed to imitate human brains. They’re not sentient . They are taught using huge amounts of data sets that contain pirated information. By proxy this makes AI developers thieves. David Carson on Medium wrote, “Theft Is Not Fair Use” arguing that AI is not abiding by one of the biggest laws that powers YouTube. (One of the big AI outfits just wrote a big check for unauthorized content suck downs. Not guilty, of course.)

Publishers, record labels, entertainment companies, and countless artists are putting AI developers on notice by filing lawsuits against AI developers. Thomson Reuters was victorious against an AI-based legal platform, Ross Intelligence, for harvesting its data. It’s a drop in the water bucket however, because Trump’s Artificial Intelligence Action Plan sought input from Big Tech. Open AI and Google asked to be exempt from copyright in their big data sets. A group of authors are suing Meta and a copyright law professor gaggle filed an amicus brief on their behalf. The professors poke holes in Meta’s fair use claim.

Big Tech is powerful and they’ve done this for years:

"Tech companies have a history of taking advantage of legacy news organizations that are desperate for revenue and are making deals with short-term cash infusions but little long-term benefit. I fear AI companies will act as vampires, draining news organizations of their valuable content to train their new AI models and then ride off into the sunset with their multi-billion dollar valuations while the news organizations continue to teeter on the brink of bankruptcy. It wouldn’t be the first time tech companies out-maneuvered (online advertising) or lied to news organizations.”

Unfortunately creative types are probably screwed. What’s funny is that Carson is a John S. Knight Journalism Fellow at Stanford. It’s the same school in which the president manipulated content to advance his career. How many of these deep suckers are graduates of this esteemed institution? Who teaches copyright basics? Maybe an AI system?

Whitney Grace, September 11, 2025

AI: Pirate or Robin Hood?

July 30, 2025

One of the most notorious things about the Internet is pirating creative properties. The biggest victim is the movie industry followed closely by publishing. Creative works that people spend endless hours making are freely distributed without proper payment to the creators and related staff. It sounds like a Robin Hood scenario, but creative folks are the ones suffering. Best selling author David Baldacci ripped into Big Tech for training their AI on stolen creative properties and he demanded that the federal government step in to rein them in.

LSE says that only a small amount of AI developers support using free and pirated data for trading models: “Most AI Researchers Reject Free Use Of Public Data To Train AI Models.” Data from UCL shows AI developers want there to be ethical standards for training data and many are in favor of asking permission from content creators. The current UK government places the responsibility on content creators to “opt out” of their work being used for AI models. Anyone with a brain knows that the AI developers skirt around those regulations.

When LSE polled people about who should protecting content creators and regulating AI, their opinions were split between the usual suspects: tech companies, governments, independent people, and international standards bodies.

Let’s see what creative genius Paul McCartney said:

While there are gaps between researchers’ and the views of authors, it would be a mistake to see these only as gaps in understanding. Song writer and surviving Beatle Paul McCartney’s comments to the BBC are a case in point: “I think AI is great, and it can do lots of great things,” McCartney told Laura Kuensberg, but it shouldn’t rip creative people off.  It’s clear that McCartney gets the opportunities AI offers. For instance, he used AI to help bring to life the voice of former bandmate John Lennon in a recent single. But like the writers protesting outside of Meta’s office, he has a clear take on what AI is doing wrong and who should be responsible. These views and the views of over members of the public should be taken seriously, rather than viewed as misconceptions that will improve with education or the further development of technologies.

Authors want protection. Publishers want money. AI companies want to do exactly what they want. This is a three intellectual body problem with no easy solution.

Whitney Grace, July 30, 2025

BBC Warns Perplexity That the Beeb Lawyers Are Not Happy

July 10, 2025

The BBC has had enough of Perplexity AI gobbling up and spitting out its content. Sometimes with errors. The news site declares, “BBC Threatened AI Firm with Legal Action over Unauthorised Content Use.” Well, less a threat and more a strongly worded letter. Tech reporter Liv McMahon writes:

“The BBC is threatening to take legal action against an artificial intelligence (AI) firm whose chatbot the corporation says is reproducing BBC content ‘verbatim’ without its permission. The BBC has written to Perplexity, which is based in the US, demanding it immediately stops using BBC content, deletes any it holds, and proposes financial compensation for the material it has already used. … The BBC also cited its research published earlier this year that found four popular AI chatbots – including Perplexity AI – were inaccurately summarising news stories, including some BBC content. Pointing to findings of significant issues with representation of BBC content in some Perplexity AI responses analysed, it said such output fell short of BBC Editorial Guidelines around the provision of impartial and accurate news.”

Perplexity answered the BBC’s charges with an odd reference to a third party:

“In a statement, Perplexity said: ‘The BBC’s claims are just one more part of the overwhelming evidence that the BBC will do anything to preserve Google’s illegal monopoly.’ It did not explain what it believed the relevance of Google was to the BBC’s position, or offer any further comment.”

Huh? Of course, Perplexity is not the only AI firm facing such complaints, nor is the BBC the only publisher complaining. The Professional Publishers Association, which represents over 300 media brands, seconds the BBC’s allegations. In fact, the organization charges, Web-scraping AI platforms constantly violate UK copyrights. Though sites can attempt to block models with the Robots Exclusion Protocol (robots.txt), compliance is voluntary. Perplexity, the BBC claims, has not respected the protocol on its site. Perplexity denies that accusation.

Cynthia Murrell, July 10, 2025

Hey, Creatives, You Are Marginalized. Embrace It

June 20, 2025

Considerations of right and wrong or legality are outdated, apparently. Now, it is about what is practical and expedient. The Times of London reports, “Nick Clegg: Artists’ Demands Over Copyright are Unworkable.” Clegg is both a former British deputy prime minister and former Meta executive. He spoke as the UK’s parliament voted down measures that would have allowed copyright holders to see when their work had been used and by whom (or what). But even that failed initiative falls short of artists’ demands. Writer Lucy Bannerman tells us:

“Leading figures across the creative industries, including Sir Elton John and Sir Paul McCartney, have urged the government not to ‘give our work away’ at the behest of big tech, warning that the plans risk destroying the livelihoods of 2.5 million people who work in the UK’s creative sector. However, Clegg said that their demands to make technology companies ask permission before using copyrighted work were unworkable and ‘implausible’ because AI systems are already training on vast amounts of data. He said: ‘It’s out there already.’”

How convenient. Clegg did say artists should be able to opt out of AI being trained on their works, but insists making that the default option is just too onerous. Naturally, that outweighs the interests of a mere 2.5 million UK creatives. Just how should artists go about tracking down each AI model that might be training on their work and ask them to please not? Clegg does not address that little detail. He does state:

“‘I just don’t know how you go around, asking everyone first. I just don’t see how that would work. And by the way if you did it in Britain and no one else did it, you would basically kill the AI industry in this country overnight. … I think expecting the industry, technologically or otherwise, to preemptively ask before they even start training — I just don’t see. I’m afraid that just collides with the physics of the technology itself.’”

The large technology outfits with the DNA of Silicon Valley has carried the day. So output and be quiet. (And don’t think any can use Mickey Mouse art. Different rules are okay.)

Cynthia Murrell, June 20, 2025

India: Fair Use Can Squeeze YouTubers

June 5, 2025

Asian News International (ANI) seems to be leveraging the vagueness of India’s fair-use definition with YouTube’s draconian policies to hold content creators over a barrel. The Reporters’ Collective declares, “ANI Finds Business Niche in Copyright Claims Against YouTubers.” Writer Ayushi Kar recounts the story of Sumit, a content creator ANI accused of copyright infringement. The news agency reported more than three violations at once, a move that triggered an automatic takedown of those videos. Worse, it gave Sumit just a week to make good with ANI or lose his channel for good. To save his livelihood, he forked over between 1,500,000 and 1,800,000 rupees (about $17,600 – $21,100) for a one-year access license. We learn:

“Sumit isn’t the lone guy facing the aggressive copyright claims of ANI, which has adopted a new strategy to punitively leverage YouTube’s copyright policies in India to generate revenue. Using the death clause in YouTube policy and India’s vague provisions for fair use of copyrighted material, ANI is effectively forcing YouTube creators to buy expensive year-long licenses. The agency’s approach is to negotiate pricey licensing deals with YouTubers, including several who are strong critics of the BJP, even as YouTube holds a sword over the content producer’s channel for multiple claims of copyright violation.”

See the write-up for more examples of content creators who went through an ANI shake down. Kar continues:

“While ANI might be following a business it understands to be legal and fair, the episode has raised larger concern about copyright laws and the fair use rights in India by content producers who are worried about being squeezed out of their livelihoods – sometimes wiping out years of labor to build a community – between YouTube’s policies and copyright owners willingness to play hardball.”

What a cute tactic. Will it come to the US? Is it already here? YouTubers, feel free to comment. There is something special about India’s laws, though, that might make this scheme especially profitable there. Kar tells us:

“India’s Copyright Act 1957 allows … use of copyrighted material without the copyright owner’s permission for purposes such as criticism, comment, news, reporting and many more. In practice, there is a severe lack of specificity in law and regulations about how fair use doctrine is to be practiced.”

That means the courts decide what fair use means case by case. Bringing one’s case to court is, of course, expensive and time consuming, and victory is far from assured. It is no wonder content creators feel they must pay up. It would be a shame if something happened to that channel.

Cynthia Murrell, June 5, 2025

Meta and Torrents: True, False, or Rationalization?

February 26, 2025

AIs gobble datasets for training. It is another fact that many LLMs and datasets contain biased information, are incomplete, or plain stink. One ethical but cumbersome way to train algorithms would be to notify people that their data, creative content, or other information will be used to train AI. Offering to pay for the right to use the data would be a useful step some argue.

Will this happen? Obviously not.

Why?

Because it’s sometimes easier to take instead of asking. According to Toms Hardware, “Meta Staff Torrented Nearly 82TB Of Pirated Books For AI Training-Court Records Reveal Copyright Violations.” The article explains that Meta pirated 81.7 TB of books from the shadow libraries Anna’s Archive, Z-Library, and LibGen. These books were then used to train AI models. Meta is now facing a class action lawsuit about using content from the shadow libraries.

The allegations arise from Meta employees’ written communications. Some of these messages provide insight into employees’ concern about tapping pirated materials. The employees were getting frown lines, but then some staffers’ views rotated when they concluded smart software helped people access information.

Here’s a passage from the cited article I found interesting:

“Then, in January 2023, Mark Zuckerberg himself attended a meeting where he said, “We need to move this stuff forward… we need to find a way to unblock all this.” Some three months later, a Meta employee sent a message to another one saying they were concerned about Meta IP addresses being used “to load through pirate content.” They also added, “torrenting from a corporate laptop doesn’t feel right,” followed by laughing out loud emoji. Aside from those messages, documents also revealed that the company took steps so that its infrastructure wasn’t used in these downloading and seeding operations so that the activity wouldn’t be traced back to Meta. The court documents say that this constitutes evidence of Meta’s unlawful activity, which seems like it’s taking deliberate steps to circumvent copyright laws.”

If true, the approach smacks of that suave Silicon Valley style. If false, my faith in a yacht owner with gold chains might be restored.

Whitney Grace, February 26, 2025

Next Page »

  • Archives

  • Recent Posts

  • Meta