Errors? AI Makes Accuracy Irrelevant
April 4, 2025
This blog post is the work of a humanoid dino baby. If you don’t know what a dinobaby is, you are not missing anything.
We have poked around some AI services. A few are very close to being dark patterns that want to become like Al Capone or moe accurately AI Capone. Am I thinking of 1min.ai? Others just try to sound so friendly when outputting wackiness? Am I thinking about the Softies or ChatGPT? I don’t know.
I did read “AI Search Has A Citation Problem.” The main point is that AI struggles with accuracy. One can gild the lily and argue that it makes work faster. I won’t argue that quick incorrect output may speed some tasks. However, the write up points out:
Premium chatbots provided more confidently incorrect answers than their free counterparts.
I think this means that paying money does not deliver accuracy, judgment, or useful information. I would agree.
A farmer wonders how the steam engine ended up in his corn field. How did smart software get involved in deciding that distorted information was a useful output for students and workers? Thanks, You.com. The train was supposed to be on its side, but by getting the image different from my prompt, you have done the job. Close enough for horse shoes, right?
The write up also points out:
Generative search tools fabricated links and cited syndicated and copied versions of articles.
I agree.
Here’s a useful finding if one accepts the data in the write up as close enough for horseshoes:
Overall, the chatbots often failed to retrieve the correct articles. Collectively, they provided incorrect answers to more than 60 percent of queries. Across different platforms, the level of inaccuracy varied, with Perplexity answering 37 percent of the queries incorrectly, while Grok 3 had a much higher error rate, answering 94 percent of the queries incorrectly.
The alleged error rate of Grok is in line with my experience. I try to understand, but when space ships explode, people set Cybertrucks on fire, and the cratering of Tesla stock cause my widowed neighbor to cry — I see a pattern of missing the mark. Your mileage or wattage may vary, of course.
The write up points out:
Platforms often failed to link back to the original source
For the underlying data and more academic explanations, please, consult the original article.
I want to shift gears and make some observations about the issue the data in the article and my team’s experience with smart software present. Here we go, gentle reader:
- People want convenience or what I call corner cutting. AI systems eliminate the old fashioned effort required to verify information. Grab and go information, like fast food, may not be good for the decision making life.
- The information floating around about a Russian content mill pumping out thousands of weaoonized news stories a day may be half wrong. Nevertheless, it makes clear that promiscuous and non-thinking AI systems can ingest weaponized content and spit it out without a warning level or even recognizing baloney when one expects a slab of Wagu beef.
- Integrating self-driving AI into autonomous systems is probably not yet a super great idea. The propaganda about Chinese wizards doing this party trick is interesting, just a tad risky when a kinetic is involved.
Where are we? Answering this question is a depressing activity. Companies like Microsoft are forging ahead with smart software helping people do things in Excel. Google is allowing its cheese-obsessed AI to write email responses. Outfits like BoingBoing are embracing questionable services like a speedy AI Popeil pocket fisherman as part of its money making effort. And how about those smart Anduril devices? Do they actually work? I don’t want to let one buzz me.
The AI crazy train is now going faster than the tracks permit. How does one stop a speeding autonomous train? I am going to stand back because that puppy is going to fall off the tracks and friction will do the job. Whoo. Whoo.
Stpehen E Arnold, April 4, 2025
Bye-Bye Newsletters, Hello AI Marketing Emails
April 4, 2025
Adam Ryan takes aim at newsletters in the Work Week article, “Perpetual: The Major Shift of Media.” Ryan starts the article saying we’re already in changing media landscape and if you’re not preparing you will be left behind. He then dives into more detail explaining that the latest trend setter is an email newsletter. From his work in advertising, Ryan has seen newsletters rise from the bottom of the food chain to million dollar marketing tools.
He explains that newsletters becoming important marketing tools wasn’t an accident and that it happened through a the democratization process. By democratization Ryan means that newsletters became easier to make through the use of simplification software. He uses the example of Shopify streamlining e-commerce and Beehiiv doing the same for newsletters. Another example is Windows making PCs easier to use with its intuitive UI.
Continuing with the Shopify example, Ryan says that mass adoption of the e-commerce tool has flooded the market place. Top brands that used to dominate the market were now overshadowed by competition. In short, everyone and the kitchen sink was selling goods and services.
Ryan says that the newsletter trend is about to shift and people (operators) who solely focus on this trend will fall out of favor. He quotes Warren Buffet: “Be fearful when others are greedy, and be greedy when others are fearful.” Ryan continues that people are changing how they consume information and they want less of it, not more. Enter the AI tool:
“Here’s what that means:
• Email open rates will drop as people consume summaries instead of full emails.
• Ad clicks will collapse as fewer people see newsletter ads.
• The entire value of an “owned audience” declines if AI decides what gets surfaced.”
It’s not the end of the line for newsletter is you become indispensable such as creating content that can’t be summarized, build relationships beyond emails, and don’t be a commodity:
“This shift is coming. AI will change how people engage with email. That means the era of high-growth newsletters is ending. The ones who survive will be the ones who own their audience relationships, create habit-driven content, and build businesses beyond the inbox.”
This is true about every major change, not just news letters.
Whitney Grace, April 4, 2025
Read AI Implements AI Into Enterprise Search
April 3, 2025
Enterprise search is an essential tool for an organization to function. There have been major shifts in enterprise search, including more accurate search results, and ability to search across multiple platforms. The Seattle startup Read AI wants to be the next enterprise search revolution says Geek Wire: “Seattle Startup Takes A Swing At Enterprise Search With New AI-Fueled Discovery Tool.”
There have been major upgrades in enterprise search before, including AI, but Read AI is doing it at level not before seen. The startup from Seattle began in 2021 with sentiment and engagement products for meetings. Later Read AI developed software tools that analyzed information across various communication channels. Now their latest product upgrade is for enterprise search that enables users to quickly locate and access information across terabytes of data and applications.
Read AI’s enterprise search tool upgrade is for Microsoft Copilot:
“Users can choose what data is discoverable. Search Copilot gets “smarter” as more workers add more data, and allows for collaboration. The new tool can also nudge users to take action based on past interactions with colleagues or customers.
The product is free to use with certain limits. Read AI CEO David Shim said reducing barriers is a key differentiator.
“People haven’t been able to understand the value of enterprise search because they haven’t had access to it,” he said.
Shim said Read AI’s ability to determine what’s important for an individual user also gives Search Copilot an edge over other AI search tools.”
Read AI wants to differentiate itself from its rivals, built-in-platform tools from Zoom, Google, and Microsoft. Read AI does have an advantage over out-of-the-box solutions, because experience tells us those tools stink. Proprietary developed tools are usually better because they’re specifically designed to enhance a specific feature, while out-of-the-box solutions are designed to with the “it’s okay to get by” mentality.
While Read AI made a smart move to upgrade Copilot with the latest AI technology, someone else would have done it eventually.
Whitney Grace, April 3, 2025
Original Research: Not-So-Original Assertions about Content Appropriation
April 2, 2025
No AI. Just a dinobaby sharing an observation about younger managers and their innocence.
The Social Science Research Council published the 30-plus page report “Beyond Public Access in LLM Pre-Training Data.” The subtitle reveals the principal finding: “Non-Public Non-Public Book Content in OpenAI’s Models.”
The write up states:
Using a legally obtained dataset of 34 copyrighted O’Reilly Media books, we apply the DE-COP membership inference attack method to investigate whether OpenAI’s large language models were trained on copyrighted content without consent. Our AUROC scores show that GPT-4o, OpenAI’s more recent and capable model, demonstrates strong recognition of paywalled O’Reilly book content (AUROC = 82%), compared to OpenAI’s earlier model GPT-3.5 Turbo. In contrast, GPT-3.5 Turbo shows greater relative recognition of publicly accessible O’Reilly book samples. GPT-4o Mini, as a much smaller model, shows no knowledge of public or non-public O’Reilly Media content when tested (AUROC ? 50%). Testing multiple models, with the same cutoff date, helps us account for potential language shifts over time that might bias our findings. These results highlight the urgent need for increased corporate transparency regarding pre-training data sources as a means to develop formal licensing frameworks for AI content training.
I want to mention that the DE-COP method provide one way to determine whether a specific data record was part of the training dataset for a machine learning model. The result of the SSRC’s research suggests that as OpenAI enhanced its models, the OpenAI organization appears to have used non-public content. Smaller (less capable OpenAI) models seemed to know less about the test set; the bigger models knew more.
The write up concludes that developers of smart software are sucking down data without obtaining permission or entering into licensing deals with publishers, copyright holders, or entities working on content not yet available as public content like this dinobaby blog post.
To sum up: A not-to-original assertion that certain organizations just take content and ignore rights is apparently accurate in this set of research. OpenAI accelerates. Speed is a positive. Breaking speed limits is believed to be no big deal.
A bit of a flashback: Over 40 years ago, I participated in a series of meetings to license the right to create abstracts of original articles from more than 1000 different publications. Our efforts were successful. Most publishers of academic business and management publications knew zero about online information services. The idea of summarizing an eight page article about executive compensation was unfamiliar to most of the people with whom our organization interacted over a period of years. Nevertheless, the owner of the database unit of the Courier Journal & Louisville Times Company insisted that we get signed agreements. The process was time consuming and expensive. Some outfits had representatives. So a negotiation went through layers of bureaucracy. Other publishers wanted a sit down meeting and a dog-and-pony show to explain why we wanted to boil down a 600 word article to about 150 words. There were worries about the impact of online access. There were worries about control of editorial content. There were worries about signing away rights. We couldn’t assuage worries, but we chugged along. The reputations of the Courier Journal and Barry Bingham Jr. helped but our small database operation had to follow the rules.
If the data in the cited document are accurate, the attitude of Barry Bingham Jr. would not have gained much traction at OpenAI. Instead of my getting a pat on the head for setting up the licensing deals with the able help of other Courier Journal professionals, I would have gotten kicked on the shin or worse.
Several observations:
- Observing content ownership remains important despite the acceleration toward illegal behaviors in the online sector
- Using show-and-tells and explaining the benefits of participating in digital innovations remains important. Knowledge minimizes anger and other negative reactions to pre-emptive behavior
- Managing the process is as important as demonstrating an unwillingness to conform to industry, social, and legal norms.
Let’s assume that the findings in the SSRC are off by 50 percent. Does that change what appears to be a strong possibility of some fancy dancing at the OpenAI disco?
My conclusion is that taking short cuts is part of the Silicon Valley approach of doing whatever and just saying, “Wow, I am sorry.”
I asked ChatGPT 4 via You.com this question:
Should I create an online product and just use content from the Harvard Business Review without permission?
Here’s the response:
No, you should not create an online product using content from the Harvard Business Review (HBR) without permission. Doing so would likely violate copyright laws and HBR’s usage policies. Here’s why:
HBR’s Copyright and Usage Policies
- HBR Material is Copyrighted: Harvard Business Review explicitly states that its content is protected by copyright, and no part of its publications may be reproduced without permission.
- Usage Restrictions: HBR has strict policies regarding how its content can be used. For example, even in academic settings, HBR articles cannot be freely distributed unless done through specific channels like creating course packs via "Harvard Business Publishing for Educators"
Permission is Required: If you want to use HBR content, you must obtain explicit permission. This includes reproducing, distributing, or using their material in any form.
Legal and Ethical Implications
- Copyright Infringement: Using HBR content without permission would likely constitute copyright infringement, which can result in legal action, fines, or other penalties.
- Ethical Concerns: Beyond legal issues, using someone else’s intellectual property without authorization is unethical and could harm your reputation or business.
What You Should Do Instead
- Seek Permission: Contact Harvard Business Review to request permission to use their content. They may have licensing options available for your intended use.
- Create Original Content: Instead of using HBR’s material, consider creating your own original content inspired by their ideas (but without copying their text or proprietary materials).
- Use Public Domain or Licensed Content: Look for content that is in the public domain or available under a license (e.g., Creative Commons) that allows commercial use.
Conclusion
Using HBR content without permission is not only illegal but also against their explicit policies. To avoid legal and ethical issues, either seek proper licensing or create your own original material.
In this case, the OpenAI’s AI may be more intelligent that some people engaged in content acquisition at the firm. But I am a dinobaby. I try to follow the social, legal, and ethical rules as my parents who had only high school educations taught me. I am a dinobaby. A sad dinobaby.
Stephen E Arnold, April 2, 2025
The AI Market: The Less-Educated
April 2, 2025
Writing is an essential function of education and communication. Writing is an innate skill as well as one that can be curated through dedicated practice. Digital writing tools such as spelling and grammar checkers and now AI like Grammarly and ChatGPT have influenced writing. Stanford University studied how AI writing tools have impacted writing in professional industries. The discovered that less-educated parts of the US heavily rely on AI. Ars Technica reviews the study in: “Researchers Surprised To Find Less-Educated Areas Adopting AI Writing Tools Faster.”
Stanford’s AI study tracked LLM adoption from January 2022 to September 2024 with a dataset that included US Consumer Financial Protection Bureau consumer complaints, corporate press releases, job postings, and UN press releases. The researchers used a statistical detection system that tracked word usage patterns. The system found that 14-24% of these communications showed AI assistance. The study also found an interesting pattern:
“The study also found that while urban areas showed higher adoption overall (18.2 percent versus 10.9 percent in rural areas), regions with lower educational attainment used AI writing tools more frequently (19.9 percent compared to 17.4 percent in higher-education areas). The researchers note that this contradicts typical technology adoption patterns where more educated populations adopt new tools fastest.”
The researchers theorize that AI-writing tools serve as equalizing measures for less-educated individuals. They also noted that AI-writing tools are being adopted because the market is saturated or the LLMs are becoming more advanced. IT will be difficult to distinguish between human and machine written text. They predict negative outcomes from this:
“ ‘The growing reliance on AI-generated content may introduce challenges in communication,’ the researchers write. ‘In sensitive categories, over-reliance on AI could result in messages that fail to address concerns or overall release less credible information externally. Over-reliance on AI could also introduce public mistrust in the authenticity of messages sent by firms.’”
It’s not good to blindly trust AI, especially with the current state of datasets. Can you imagine the critical thinking skills these future leaders and entrepreneurs will develop? On that thought, what will happen to imagination?
Whitney Grace, April 2, 2025
The First AI-Written Paper To Pass Peer Review
April 2, 2025
Cheating. I am not going to bring this topic up.
Humans have taken one small stop towards obsolesce when it comes to writing papers. Sakana AI reports that "The AI Scientist Generates Its First Peer-Reviewed Scientific Publication." This is the first known fully AI-generated paper that passed the same review process that human scientists submit their papers too. Here’s how the paper was written:
"The paper was generated by an improved version of the original AI Scientist, called The AI Scientist-v2. We will be sharing the full details of The AI Scientist-v2 in an upcoming release. This paper was submitted to an ICLR 2025 workshop that agreed to work with our team to conduct an experiment to double-blind review AI-generated manuscripts. We selected this workshop because of its broader scope, challenging researchers (and our AI Scientist) to tackle diverse research topics that address practical limitations of deep learning. The workshop is hosted at ICLR, one of three premier conferences in machine learning and artificial intelligence research, along with NeurIPS and ICML.3
The ICLR leadership and organizers were involved with the project. The paper was blindly submitted to the ICLR review team, although they were told that they might be reviewing AI generated papers.
The AI algorithm was told to research and write about a broad topic. When the process was done, three papers were selected for submission so the review board wouldn’t be overburdened. Here are the results:
“We looked at the generated papers and submitted those we thought were the top 3 (factoring in diversity and quality—We conducted our own detailed analysis of the 3 papers, please read on in our analysis section). Of the 3 papers submitted, two papers did not meet the bar for acceptance. One paper received an average score of 6.33, ranking approximately 45% of all submissions. These scores are higher than many other accepted human-written papers at the workshop, placing the paper above the average acceptance threshold. Specifically, the scores were:
• Rating: 6: Marginally above acceptance threshold
• Rating: 7: Good paper, accept
• Rating: 6: Marginally above acceptance threshold”
The AI Scientist conducted the experiment out of pure scientific curiosity to measure how current AI algorithms compare to human intellect. No problem.
Whitney Grace, April 2, 2025
No Joke: Real Secrecy and Paranoia Are Needed Again
April 1, 2025
No AI. Just a dinobaby sharing an observation about younger managers and their innocence.
In the US and the UK, secrecy and paranoia are chic again. The BBC reported “GCHQ Worker Admits Taking top Secret Data Home.” Ah, a Booz Allen / Snowden type story? The BBC reports:
The court heard that Arshad took his work mobile into a top secret GCHQ area and connected it to work station. He then transferred sensitive data from a secure, top secret computer to the phone before taking it home, it was claimed. Arshad then transferred the data from the phone to a hard drive connected to his personal home computer.
Mr. Snowden used a USB drive. The question is, “What are the bosses doing? Who is watching the logs? Who is checking the video feeds? Who is hiring individuals with some inner need to steal classified information?
But outside phones in a top secret meeting? That sounds like a great idea. I attended a meeting held by a local government agency, and phones and weapons were put in little steel boxes. This outfit was no GHCQ, but the security fellow (a former Marine) knew what he was doing for that local government agency.
A related story addresses paranoia, a mental characteristic which is getting more and more popular among some big dogs.
CNBC reported an interesting approach to staff trust. “Anthropic Announces Updates on Security Safeguards for Its AI Models” reports:
In an earlier version of its responsible scaling policy, Anthropic said it would begin sweeping physical offices for hidden devices as part of a ramped-up security effort.
The most recent update to the firm’s security safeguards adds:
updates to the “responsible scaling” policy for its AI, including defining which of its model safety levels are powerful enough to need additional security safeguards.
The actual explanation is a master piece of clarity. Here’s snippet of what Anthropic actually said in its “Anthropic’s Responsible Scaling Policy” announcement:
The current iteration of our RSP (version 2.1) reflects minor updates clarifying which Capability Thresholds would require enhanced safeguards beyond our current ASL-3 standards.
The Anthropic methods, it seems to me, to include “sweeps” and “compartmentalization.”
Thus, we have two examples of outstanding management:
First, the BBC report implies that personal computing devices can plug in and receive classified information.
And:
Second, CNBC explains that sweeps are not enough. Compartmentalization of systems and methods puts in “cells” who can do what and how.
Andy Grove’s observation popped into my mind. He allegedly rattled off this statement:
Success breeds complacency. Complacency breeds failure. Only the paranoid survive.
Net net: Cyber security is easier to “trust” and “assume”. Real fixes edge into fear and paranoia.
Stephen E Arnold, April 9, 2025
Free AI Sites (Well, Mostly Free Sort of)
April 1, 2025
Dinobaby says, “No smart software involved. That’s for “real” journalists and pundits.
One of my team generated images of French bulldogs. After months of effort, he presented me with a picture of our French bulldog complete with one floppy ear. The image was not free. I pay for the service because free image generation systems work and then degrade because of the costs associated with doing smart software without oodles of cash.
Another person proudly emailed everyone a link to Best AI Websites and the page “Free AI Tools.” The interfaces, functionality, and the outputs vary. The linked Web page is a directory presented with some of that mobile interface zip.l
There are more than 30 tools anyone can try. Here’s what the “directory” interface looks like:
The first click displays the BestFreeAIWebsites’ write up for each “service” or “tool.” Then a direct link to the free AI site is displayed. There is a “submit” button to allow those with a free AI tool to add theirs to the listing. The “add” function is a common feature of Telegram bot and Channel listings.
Here is a selection of the “free” services that are available as of March 28, 2025, in alphabetical order:
- HUUK.ai, a trip planner
- Metavoice at https://studio.themetavoice.xyz/, a “one click voice changer”
- Presentpicker.ai, a service to help a user choose a gift.
- Remaker.ai, a face swap tool
- Yomii.app, a real estate investing assistant
ChatGPT features numerous times in the list of “free” AI tools. Google shows up a couple of times with Bard and Gemini. The majority of the services “wrap” functionality around the big dogs in the LLM space.
Are these services “free”? Our view is that the “free” is a way to get people to give the services a try. If the experience is positive, upgrades are available.
As one of my team worked through the listings, he said, “Most of these services have been available as Telegram bots from other developers.” If he is correct, perhaps Telegram’s AI functions should be included in the listing?
Stephen E Arnold, April 1, 2025
Apple CEO Chases Chinese AI and Phone Sales
March 31, 2025
While the hullabaloo about making stakes in China’s burgeoning market has died down, Big Tech companies still want pieces of the Chinese pie or dumpling would be a better metaphor here. An example of Big Tech wanting to entrench itself in the ChinBaiese market is Apple. Mac Rumors reports that Apple CEO Tim Cook was recently in China and he complimented start-up Deepseek for its AI models. The story, “Apple CEO Tim Cook Praises China’s Deepseek”
While Cook didn’t say he would pursue a partnership with Deepseek, he was impressed with their AI models. He called them excellent, because Deepseek delivers AI models with high performance capabilities that have lower costs and compute requirements. Deepseek’s research has been compared to OpenAI for achieving similar results by using less resources.
When Cook visited China he reportedly made an agreement with Alibaba Group to integrate its Qwen models into Apple Intelligence. There are also rumors that Apple’s speaking with Baidu about providing LLMs for the Chinese market.
Does this mean that Tim Apple hopes he can use Chinese smart tech in the iPhone and make that more appealing to Chinese users? Hmmmm.
Cook conducted more business during his visit:
In addition to his comments on AI, Cook announced plans to expand Apple’s cooperation with the China Development Research Foundation, alongside continued investments in clean energy development. Throughout his visit, Cook posted updates on the Chinese social media platform Weibo, showcasing a range of Apple products being used in classrooms, creative environments, and more.
Cook’s comments mark a continuation of Apple’s intensified focus on the Chinese market at a time when the company is facing declining iPhone shipments and heightened competition from domestic brands. Apple’s smartphone shipments in China are believed to have fallen by 25% year-over-year in the fourth quarter of 2024, while annual shipments dropped 17% to 42.9 million units, placing Apple behind local competitors Vivo and Huawei.”
It’s evident that Apple continues to want a piece of the Chinese dumpling, but also seeks to incorporate Chinese technology into its products. Subtle, Tim Apple, subtle.
Whitney Grace, March 31, 2025
Cypersecurity Pros, Bit of an Issue. You Think?
March 28, 2025
Be aware. A dinobaby wrote this essay. No smart software involved.
I read a research report in the Register titled “MINJA Sneak Attack Poisons AI Models for Other Chatbot Users.” The write up is interesting and, I think, important. The weakness is that the essay does not make explicit that this type of vulnerability can be automated and the outputs used to create the type of weaponized content produced by some intelligence agencies (and PR firms).
The write up provides diagrams and useful detail. For this short blog post, my take on the technique is a manipulation of an LLM’s penchant for adapting to the prompts during a human-interface interaction. If the bad actor crafts misleading information, the outputs can be skewed.
How serious is the behavior in LLMs? In my view, the PR and hype about AI renders the intentional fiddling to a trivial concern. That’s not where the technique nor the implications of its effectiveness belong. Triggering wonky behavior is as easy as mismatching patient data as the article illustrates.
Before one gets too excited about autonomous systems using LLMs to just do it, more attention to the intentional weaponization of LLMs is needed.
Will the AI wizards fix this problem? Sure, someday, but it is an issue that requires time, money, and innovation. We live in an era of marketing. I know I cannot trust most people. Now I know that I can’t trust a MINJA that sneaks into my search or research and delivers a knock out blow.
The Register could have been a bit more energetic in its presentation of this issue. The cited essay does a good job of encouraging bad actors and propagandists to be more diligent in their use of LLMs.
Stephen E Arnold, March 28, 2025