Glean Goes Beyond Search: Have Xooglers Done What Google Could Not Do?

August 12, 2025

Dino-5-18-25_thumb3This blog post is the work of an authentic dinobaby. Sorry. No smart software can help this reptilian thinker.

I read an interesting online essay titled “Glean’s $4.5B Business Model: How Ex-Googlers Built the Enterprise Search That Actually Works.” Enterprise search has been what one might call a Holy Grail application. Many have tried to locate the Holy Grail. Most have failed.

Have a small group of Xooglers (former Google employees) located the Holy Grail and been able to convert its power into satisfied customers? The essay, which reminded me of an MBA write up, argues that the outfit doing business as Glean has done it. The firm has found the Holy Grail, melted it down, and turned it into an endless stream of cash.

Does this sound a bit like the marketing pitch of Autonomy, Fast Search & Transfer, and even Google itself with its descriptions of its deeply wacky yellow servers? For me, Glean has done its marketing homework. The evidence is plumped and oiled for this essay about its business model. But what about search? Yeah, well, the focus of the marketing piece is the business model. Let’s go with what is in front of me. Search remains a bit of a challenge, particularly in corporations, government agencies, and pharmaceutical-type outfits where secrecy is a bit part of that type of organization’s way of life.

What is the Glean business model? It is VTDF. Here’s an illustration:

image

Does this visual look like blue chip consulting art? Is VTDF blue chip speak? Yes. And yes. For those not familiar with the lingo here’s a snapshot of the Glean business model:

  • Value: Focuses on how the company creates and delivers core value to customers, such as solving specific problems
  • Technology: Refers to the underlying tech innovations that allow “search” to deliver what employees need to do their jobs
  • Distribution: Involves strategies for marketing, delivery, and reaching users
  • Finance: Covers revenue models, cash flow management, and financial sustainability. Traditionally this has been the weak spot for the big-time enterprise search plays.

The essay explains in dot points that Glean is a “knowledge liberator.” I am not sure how that will fly in some pharma-type outfits or government agencies in which Palantir is roosting.

Once Glean’s “system” is installed, here’s what happens (allegedly):

  • Single search box for everything
  • Natural language queries
  • Answers, not just documents
  • Context awareness across apps
  • Personalized to user permissions
  • New employees productive in days.

I want to take a moment to comment on each of these payoffs or upsides.

First, a single search box for everything is going to present a bit of a challenge in several important use cases. Consider a company with an inventory control system, vendor evaluations, and a computer aid design and database of specifications. The single search box is going to return what for a specific part? Some users will want to know how many are in stock. Others will want to know the vendor who made the part in a specific batch because it is failing in use. Some will want to know what the part looks like? The fix for this type of search problem has been to figure out how to match the employee’s role with the filters applied that that user’s query. In the last 60 years, that approach sort of worked, but it was and still is incredibly difficult to keep lined up with employee roles, assorted permissions, and the way the information is presented to the person running the query. The quality issue may require stress analysis data and access to the lawsuit the annoyed customer has just filed. I am unsure how the Xooglers have solved this type of search task.

Second, the NLP approach is great but it is early 2000s. The many efforts, including DR-LINK to which my team contributed some inputs, were not particularly home run efforts. The reason has to do with the language skills of the users. Organizations hire people who may be really good at synthesizing synthetics but not so good at explaining what the new molecule does. If the lab crew dies, the answer does not require words. Querying for the “new” is tough, since labs doing secret research do not share their data. Even company officers have a tough time getting an answer. When a search system requires the researcher to input a query, that scientist may want to draw a chemical structure or input a query like this “C8N8O16.” Easy enough if the indexing system has access to the classified research in some companies. But the NLP problem is what is called “prompt engineering.” Most humans are just not very good at expressing what they need in the way of information. So modern systems try to help out the searcher. The reason Google search sucks is that the engineers have figured out how to deliver an answer that is good enough. For C8N8O16 close enough for horseshoes might be problematic.

Third, answer are what people want. The “if” statement becomes the issue. If the user knows a correct answer or just accepts what the system outputs. If the user understands the output well enough to make an informed decision. If the system understood or predicted what the user wanted. If the content is in the search systems index. This is a lot of ifs. Most of these conditions occur with sufficient frequency to kill outfits that have sold an “enterprise search system”.

Fourth, the context awareness across apps means that the system can access content on proprietary systems within an organization and across third party systems which may or may not run on the organization’s servers. Most enterprise search systems create or have licensed filters to acquire content. However, keeping the filters alive and healthy with the churn in permissions, file tweaks, and assorted issues related to latency creating data gaps remain tricky.

Fifth, the idea of making certain content available only to those authorized to view those data is a very tricky business. Orchestrating permissions is, in theory, easy to automate. The reality in today’s organizations is the complicating factor. With distributed outfits, contractors, and employees who may be working for another country secretly add some excitement to accessing “information.” The reality in many organizations is that there are regular silos like the legal department keeping certain documents under lock and key to projects for three letter agencies. In the pharma game, knowing “who” is working on a project is often a dead give-away for what the secret project is. The company’s “people” officer may be in the dark. What about consultants? What information is available to them? The reality is that modern organizations have more silos than the corn fields around Canton, Illinois.

Sixth, no training is required. “Employees are productive in days” is the pitch. Maybe, maybe not. Like the glittering generality that employees spend 20 percent of their time searching, the data for this assertion was lacking when the “old” IDC, Sue Feldman, and her team cranked out an even larger number. If anything, search is a larger part of work today for many people. The reasons range from content management systems which cannot easily be indexed in real time to the senior vice president of sales who changes prices for a product at a trade show and tells only his contact in the accounting department. Others may not know for days or months that the apple cart has been tipped.

Glean saves time. That is the smart software pitch. I need to see some data from a statistically valid sample with a reasonable horizontal x axis. The reference to “all” is troublesome. It underscores the immature understanding of what “enterprise search” means to a licensee versus what the venture backed company can actually deliver. Fast Search found out that a certain newspaper in the UK was willing to sue for big bucks because of this marketing jingo.

I want to comment briefly about “Technology Architecture: Beyond Search.” Hey, isn’t that the name of my blog which has been pumping out information access related articles for 17 years? Yep, it is.

Okay, Glean apparently includes these technologies in their enterprise search quiver:

  • Universal connectors. Note the word “universal.” Nope, very tough.
  • A Knowledge graph. Think in terms of Maltego, an open source software. Sure as long as there is metadata. But those mobile workers and their use of cloud services and EE2E messaging services. Sounds great. Execution in a cost sensitive environment takes a bit of work.
  • An AI understanding layer. Yep, smart software. (Google’s smart software tells its users that it is ashamed of its poor performance. OpenAI rolled out ChatGPT 5 and promptly reposted ChatGPT 4o because enough users complained. Deepseek may have links to a nation state unfriendly to the US. Mark Zuckerberg’s Llama is a very old llama. Perplexity is busy fighting with Cloudflare. Anthropic is working to put coders out to pasture. Amazon, Apple, Microsoft, and Telegram are in the bolt it on business. The idea that Glean can understand [a] different employee contexts, [b] the rapidly changing real time data in an organization like that PowerPoint on the senior VP’s laptop, and [c] the file formats that have a very persistent characteristic of changing because whoever is responsible for an update or the format itself makes an intentional or unintentional change. I just can’t accept this assertion.
  • Works instantly which I interpret as “real time.” I wonder if Glean can handle changed content in a legacy Ironside system running on AS/400s. I would sure like to see that and work up the costs for that cute real time trick. By the way, years ago, I got paid by a non US government agency to identify and define the types of “real time” data it had to process. I think my team identified six types. Only one could be processed without massive resource investments to make the other four semi real. The final one was to gain access to the high-speed data about financial instrument pricing in Wall Street big dogs. That simply was not possible without resources and cartwheels. The reason? The government wanted to search for who was making real time trades in certain financial instruments. Yeah, good luck with that in a world where milliseconds require truly big money for gizmos to capture the data and the software to slap metadata on what is little more than a jet engine exhaust of zeros and ones, often encrypted in a way that would baffle some at certain three letter agencies. Remember: These are banks, not some home brew messaging service.

There are some other wild assertions in the write up. I am losing interest is addressing this first year business school “analysis.” The idea is that a company with 500 to 50,000 employees can use this ready-to-roll service is interesting. I don’t know of a single enterprise search company I have encountered since I wrestled with IBM STAIRS and the dorky IBM CICS system that has what seems to be a “one size fits all” service. The Google Search Appliance failed with its “one size fits all.” The dead bodies on the enterprise search trail is larger than the death toll on the Oregon Trail. I know from my lectures that few if any know what DELPHES’ system did. What about InQuire? And there is IBM WebFountain and Clever. What about Perfect Search? What about Surfray? What about Arikus, Convera, Dieselpoint, or Entopia?

The good news is that a free trial is available. The cost is about $30 per month per user. For an organization like the local outfit that sells hard hats and uses Ironside and AS/400s, that works out to 150 times $360 or $54,000. I know this company won’t buy. Why? The system in place is good enough. Spreadsheet fever is not the same as identifying prospects and making a solid benefit based argument.

That’s why free and open source solutions get some love. Then built in “good enough” solutions from Microsoft are darned popular. Finally, some eager beaver in the information technology department will say, “Let me put together a system using Hugging Face.”

Many companies and a number of quite intelligent people (including former Googlers) have tried to wrestle enterprise search to the ground. Good luck. Just make sure you have verifiable data and not the wild assertions about how much time spend searching or how much time an employee will save. Don’t believe anything about enterprise search that uses the words “all” or universal.”

Google said it was “universal search.” Yeah, why after decades of selling ads does the company provide so so search for the Web, Gmail, YouTube, and images. Just ask, “Why?” Search is a difficult challenge.

Glean this from my personal opinion essay: Search is difficult, and it has yet to be solved except for precisely defined use cases. Google experience or not, the task is out of reach at this time.

Stephen E Arnold, August 12, 2025

Explaining Meta: The 21st Century “Paul” Writes a Letter to Us

August 12, 2025

Dino 5 18 25No AI. Just a dinobaby being a dinobaby.

I read an interesting essay called “Decoding Zuck’s Superintelligence Memo.” The write up is similar to the assignments one of my instructors dumped on hapless graduate students at Duquesne University, a Jesuit university located in lovely Pittsburgh.

The idea is to take a text in Latin and sometimes in English and explain it, tease out its meaning, and try to explain what the author was trying to communicate. (Tortured sentences, odd ball vocabulary, and references only the mother of an ancient author could appreciate were part of the deciphering fun.)

The “Decoding Zuck” is this type of write up. This statement automatically elevates Mr. Zuckerberg to the historical significance of the Biblical Paul or possibly to a high priest of the Aten in ancient Egypt. I mean who knew?

Several points warrant highlighting.

First, the write up includes “The Zuckerberg Manifesto Pattern.” I have to admit that I have not directed much attention to Mr. Zuckerberg or his manifestos. I view outputs from Silicon Valley type outfits a particular form of delusional marketing for the purpose of doing whatever the visionary wants to do. Apparently they have a pattern and a rhetorical structure. The pattern warrants this observation from “Decoding Zuck”:

Compared to all founders and CEOs, Zuck does seem to have a great understanding of when he needs to bet the farm on an idea and a behavioral shift. Each time he does that, it is because he sees very clearly Facebook is at the end of the product life and the only real value in the company is the attention of his audience. If that attention declines, it takes away the ability to really extend the company’s life into the next cycle.

Yes, a prescient visionary.

Second, the “decoded” message means, according to “Decoding Zuck”:

More than anything, this is a positioning document in the AI arms race. By using “super intelligence” as a marketing phrase, Zuck is making his efforts feel superior to the mere “Artificial Intelligence” of OpenAI, Anthropic, and Google.

I had no idea that documents like Paul’s letter to the Romans and Mr. Zuckerberg’s manifesto were marketing collateral. I wonder if those engaged in studying ancient Egyptian glyphs will discover that the writings about Aten are assertions about the bread sold by Ramose, the thumb on the scale baker.

Third, the context for the modern manifesto of Zuck is puffery. The exegesis says:

So what do I think about this memo, and all the efforts of Meta? I remain skeptical of his ability to invent a new future for his company. In the past, he has been able to buy, snoop, or steal other people’s ideas. It has been hard for him and his company to actually develop a new market opportunity. Zuckerberg also tends to overpromise on timelines and underestimate execution challenges.

I think this analysis of the Zuckerberg Manifesto of 2025 reveals several things about how Meta (formerly Facebook) positions itself and it provides some insight into the author of “Decoding Zuck” as well:

  1. The outputs are baloney packaged as serious thought
  2. The AI race has to produce a winner, and it is not clear if Facebook (sorry Meta) will be viewed as a contender
  3. AI is not yet a slam dunk winner, bigger than the Internet as another Silicon Valley sage suggested.

Net net: The AI push reveals that some distance exists between delivering hefty profits for those who have burned billions to reach the point that a social media executive feels compelled to issue a marketing blurb.

Remarkable. Marketing by manifesto.

Stephen E Arnold, August 12, 2025

News Flash: Young Workers Are Not Happy. Who Knew?

August 12, 2025

Dino 5 18 25No AI. Just a dinobaby being a dinobaby.

My newsfeed service pointed me to an academic paper in mid-July 2025. I am just catching up, and I thought I would document this write up from big thinkers at Dartmouth College and University College London and “Rising young Worker Despair in the United States.”

The write up is unlikely to become a must-read for recent college graduates or youthful people vaporized from their employers’ payroll. The main point is that the work processes of hiring and plugging away is driving people crazy.

The author point out this revelation:

ons In this paper we have confirmed that the mental health of the young in the United States has worsened rapidly over the last decade, as reported in multiple datasets. The deterioration in mental health is particularly acute among young women…. ted the relative prices of housing and childcare have risen. Student debt is high and expensive. The health of young adults has also deteriorated, as seen in increases in social isolation and obesity. Suicide rates of the young are rising. Moreover, Jean Twenge provides evidence that the work ethic itself among the young has plummeted. Some have even suggested the young are unhappy having BS jobs.

Several points jumped from the 38 page paper:

  1. The only reference to smart software or AI was in the word “despair”. This word appears 78 times in the document.
  2. Social media gets a few nods with eight references in the main paper and again in the endnotes. Isn’t social media a significant factor? My question is, “What’s the connection between social media and the mental states of the sample?”
  3. YouTube is chock full of first person accounts of job despair. A good example is Dari Step’s video “This Job Hunt Is Breaking Me and Even California Can’t Fix It Though It Tries.” One can feel the inner turmoil of this person. The video runs 23 minutes and you can find it (as of August 4, 2025) at this link: https://www.youtube.com/watch?v=SxPbluOvNs8&t=187s&pp=ygUNZGVtaSBqb2IgaHVudA%3D%3D. A “study” is one thing with numbers and references to hump curves. A first-person approach adds a bit is sizzle in my opinion.

A few observations seem warranted:

  1. The US social system is cranking out people who are likely to be challenging for managers. I am not sure the get-though approach based on data-centric performance methods will be productive over time
  2. Whatever is happening in “education” is not preparing young people and recent graduates to support themselves with old-fashioned jobs. Maybe most of these people will become AI entrepreneurs, but I have some doubts about success rates
  3. Will the National Bureau of Economic Research pick up the slack for the disarray that seems to be swirling through the Bureau of Labor Statistics as I write this on August 4, 2025?

Stephen E Arnold, August 12, 2025

Paywalls. Users Do Not Want Them. Wow. Who Knew?

August 12, 2025

Sometimes research simply confirms the obvious. The Pew Research Center declares, “Few Americans Pay for News when they Encounter Paywalls.” Anyone still hoping the death of journalism could be forestalled with paywalls should reconsider. Writers Emily Tomasik and Michael Lipka cite a March Pew survey that found 83% of Americans have not paid for news in the past year. What do readers do when they hit a paywall? A mere 1% of those surveyed have forked over the dough to continue. However, 53% say they seek the same information elsewhere and 32% just give up on accessing it. Why? The write-up summarizes:

“Among the 83% of U.S. adults who have not paid for news in the past year, the most common reason they cite is that they can find plenty of other news articles for free. About half of those who don’t pay for news (49%) say this is the main reason. Indeed, many news websites do not have paywalls. Others have recently loosened paywalls or removed them for certain content like public emergencies or public interest stories. Another common reason people don’t pay for news is that they are not interested enough (32%). Smaller shares of Americans who don’t pay for news say the main reason is that it’s too expensive (10%) or that the news provided isn’t good enough to pay for (8%).”

The study did find some trends around who does pay for journalism. We learn:

“Overall, 17% of U.S. adults pay for news. However, highly educated adults, Democrats and older Americans – among other demographic groups – are more likely to have paid for news.

For example, 27% of college graduates say they have directly paid a news source by subscribing, donating or becoming a member in the last year – triple the share of those with a high school diploma or less formal education who have done so.”

So, those who paid to acquire knowledge are willing to pay to acquire knowledge. Who could have guessed? The survey also found senior citizens, wealthy folks, and white Americans more often pay up. Anyone curious about the survey’s methodology can read about it here.

The rule of thumb I use is that if one has 100 “readers”, two will pay if the content is really good. Must-have content bumps up the number a bit, but online publishers have to spend big on marketing to move the needle. Stick with ads and sponsored content.

Cynthia Murrell, August 12, 2025

Self-Appointed Gatekeepers and AI Wizards Clash

August 11, 2025

Dino 5 18 25No AI. Just a dinobaby being a dinobaby.

Cloudflare wants to protect those with content. Perplexity wants content. Cloudflare sees an opportunity to put up a Google-type toll booth on the Information Highway. Perplexity sees traffic stops of any type the way a soccer mom perceives an 80 year old driving at the speed limit.

Perplexity has responded to Cloudflare’s words about Perplexity allegedly using techniques to crawl sites which may not want to be indexed.

Agents or Bots? Making Sense of AI on the Open Web” states:

Cloudflare’s recent blog post managed to get almost everything wrong about how modern AI assistants actually work.

In addition to misunderstanding 20-25M user agent requests are not scrapers, Cloudflare claimed that Perplexity was engaging in “stealth crawling,” using hidden bots and impersonation tactics to bypass website restrictions. But the technical facts tell a different story.

It appears Cloudflare confused Perplexity with 3-6M daily requests of unrelated traffic from BrowserBase, a third-party cloud browser service that Perplexity only occasionally uses for highly specialized tasks (less than 45,000 daily requests).

Because Cloudflare has conveniently obfuscated their methodology and declined to answer questions helping our teams understand, we can only narrow this down to two possible explanations.

  1. Cloudflare needed a clever publicity moment and we–their own customer–happened to be a useful name to get them one.
  2. Cloudflare fundamentally misattributed 3-6M daily requests from BrowserBase’s automated browser service to Perplexity, a basic traffic analysis failure that’s particularly embarrassing for a company whose core business is understanding and categorizing web traffic.

The idea is to provide two choices, a technique much-loved by vaudeville comedians on the Paul Whiteman circuit decades ago; for example, Have you stopped stealing office supplies?

I find this situation interesting for several reasons:

  1. Smart software outfits have been sucking down data
  2. The legal dust ups, the license fees, even the posture of the US government seems dynamic; that is, uncertain
  3. Clever people often find themselves tripped by their own clever lines.

My view is that when tech companies squabble, the only winners are the lawyers and the users lose.

Stephen E Arnold, August 11, 2025

The Human Mind in Software. It Is Alive!

August 11, 2025

Has this team of researchers found LLM’s holy grail? Science magazine reports, “Researchers Claim their AI Model Simulates the Human Mind. Others are Skeptical.” The team’s paper, published in Nature, claims the model can both predict and simulate human behavior. Predict is believable. Simulate? That is a much higher bar.

The team started by carefully assembling data from 160 previously published psychology experiments. Writer Cathleen O’Grady tells us:

“The researchers then trained Llama, an LLM produced by Meta, by feeding it the information about the decisions participants faced in each experiment, and the choices they made. They called the resulting model ‘Centaur’—the closest mythical beast they could find to something half-llama, half-human, [researcher Marcel] Binz says.”

Cute. The data collection represents a total of over 60,000 participants who made over 10 million choices. That sounds like a lot. But, as computational cognitive scientist Federico Adolfi notes, 160 experiments is but “a grain of sand in the infinite pool of cognition.” See the write-up for the study’s methodology. The paper claims Centaur’s choices closely aligned with those of human subjects. This means, researchers assert, Centaur could be used to develop experiments before involving human subjects. Hmm, this sounds vaguely familiar.

Other cognitive scientists remain unconvinced. For example:

“Jeffrey Bowers, a cognitive scientist at the University of Bristol, thinks the model is ‘absurd.’ He and his colleagues tested Centaur … and found decidedly un-humanlike behavior. In tests of short-term memory, it could recall up to 256 digits, whereas humans can commonly remember approximately seven. In a test of reaction time, the model could be prompted to respond in ‘superhuman’ times of 1 millisecond, Bowers says. This means the model can’t be trusted to generalize beyond its training data, he concludes.

More important, Bowers says, is that Centaur can’t explain anything about human cognition. Much like an analog and digital clock can agree on the time but have vastly different internal processes, Centaur can give humanlike outputs but relies on mechanisms that are nothing like those of a human mind, he says.”

Indeed. Still, even if the central assertion turns out to be malarky, there may be value in this research. Both vision scientist Rachel Heaton and computational visual neuroscientist Katherine Storrs are enthusiastic about the dataset itself. Heaton is also eager to learn how, exactly, Centaur derives its answers. Storrs emphasizes a lot of work has gone into the dataset and the model, and is optimistic that work will prove valuable in the end. Even if Centaur turns out to be less human and more Llama.

Cynthia Murrell, August 11, 2025

DuckDuck Privacy. Go, Go, Go

August 8, 2025

We all know Google tracks us across the Web. But we can avoid that if we use a privacy-touting alternative, right? Not necessarily. Simple Analytics reveals, “Google Is Tracking You (Even When You Use DuckDuckGo).” Note that Simple Analytics is a Google Analytics competitor. So let us keep that in mind as we consider its blog’s assertions. Still, writer Iron Brands cites a study by Safety Detectives as he writes:

“The study analyzed browsing patterns in the US, UK, Switzerland, and Sweden. They used a virtual machine and VPN to simulate users in these countries. By comparing searches on Google and DuckDuckGo, researchers found Google still managed to collect data (often without the user knowing). Here’s how: Google doesn’t just track people through Search or Gmail. Its invisible code runs on millions of sites through Google Analytics, AdSense ads, YouTube embeds, and other background services like Fonts or Maps. That means even if you’re using DuckDuckGo, you’re not totally out of Google’s reach. In Switzerland and Sweden, using DuckDuckGo cut Google tracking by half. But in the US, more than 40% of visited pages still sent data back to Google, despite using a privacy search engine. That’s largely because many US websites rely on Google’s tools for ads and traffic analysis.”

And here we thought Google made such tools affordable out of generosity. The post continues:

“This isn’t just about search engines. It’s about how deeply Google is embedded into the internet’s infrastructure. Privacy-conscious users often assume that switching to DuckDuckGo or Brave is enough. This research says otherwise. … You need more than just a private browser or search engine to reduce tracking. Google’s reach comes from third-party scripts that websites willingly add.”

To owners of those websites, Brands implores them to stop contributing to the problem. The write-up emphasizes that laws like the EU’s GDPR do not stem the tide. Such countries, we are told, are still awash in Google’s trackers. The solution? For both websites and users to divest themselves of Google as much as possible. As it happens, Brand’s firm offers site owners just such a solution—an analytics platform that is “privacy-first and cookie-free.” Note that Beyond Search has not independently verified these claims. Concerned site owners may also want to check out alternative Google alternatives.

Cynthia Murrell, August 8, 2025

Cannot Read? Students Cannot Imagine Either

August 8, 2025

Students are losing the ability to imagine and self-reflect on their own lives says the HuffPost in the article: “I Asked My Students To Write An Essay About Their Lives. The Reason 1 Student Began To Panic Left Me Stunned.” While Millennials were the first generation to be completely engrossed in the Internet, Generation Z is the first generation to have never lived without screens. Because of the Internet’s constant presence, kids have unfortunately developed bad habits where they zone out and don’t think.

Zen masters work for years to shut off their brains, but Gen Z can do it automatically with a screen. This is a horrible thing for critical thinking skills and imagination, because these kids don’t know how to think without the assistance of AI. The article writer Liz Rose Shulman is a teacher of high school and college students. She assigned them essays and without hesitation all of them rely on AI to complete the assignments.

The students either use Grammarly to help them write everything or the rely on ChatGPT to generate an essay. The over reliance on AI tools means they don’t know how to use their brains. They’re unfamiliar with the standard writing process, problem solving, and being creative. The kids don’t believe there’s a problem using AI. Many teachers also believe the same thing and are adopting it into their curriculums.

The students are flummoxed when they’re asked to write about themselves:

I assigned a writing prompt a few weeks ago that asked my students to reflect on a time when someone believed in them or when they believed in someone else.

One of my students began to panic.

‘I have to ask Google the prompt to get some ideas if I can’t just use AI,’ she pleaded and then began typing into the search box on her screen, ‘A time when someone believed in you.’ ‘It’s about you,’ I told her. ‘You’ve got your life experiences inside of your own mind.’ It hadn’t occurred to her — even with my gentle reminder — to look within her own imagination to generate ideas. One of the reasons why I assigned the prompt is because learning to think for herself now, in high school, will help her build confidence and think through more complicated problems as she gets older — even when she’s no longer in a classroom situation.”

What’s even worse is that kids are addicted to their screens and they lack basic communication skills. Every generations goes through issues with older generations. Society will adapt and survive but let’s start teaching how to think and imagine again! Maybe if they brought back recess and enforced time without screens that would help, even with older people.

Whitney Grace, August 8, 2025

Billions at Stake: The AI Bot Wars Begin

August 7, 2025

Dino 5 18 25No AI. Just a dinobaby being a dinobaby.

I noticed that the puffs of smoke were actually canon fire in the AI bot wars. The most recent battle pits Cloudflare (a self-declared policeman of the Internet) against Perplexity, one of the big buck AI outfits. What is the fight? Cloudflare believes there is a good way to crawl and obtain publicly accessible content. Perplexity is just doing what those Silicon Valley folks have done for decades: Do stuff and apologize (or not) later.

WinBuzzer’s “Cloudflare Accuses Perplexity of Using ‘Stealth Crawlers’ to Evade Web Standards” said on August 4, 2025, at a time that has not yet appeared on my atomic clock:

Web security giant Cloudflare has accused AI search firm Perplexity of using deceptive “stealth crawlers” to bypass website rules and scrape content. In a report Cloudflare states Perplexity masks its bots with generic browser identities to ignore publisher blocks. Citing a breach of internet trust, Cloudflare has removed Perplexity from its verified bot program and is now actively blocking the behavior. This move marks a major escalation in the fight between AI companies and content creators, placing Perplexity’s aggressive growth strategy under intense scrutiny.

I like the characterization of Cloudflare as a Web security giant. Colorful.

What is the estimable smart software company doing? Work arounds. Using assorted tricks, Perplexity is engaging in what WinBuzzer calls “stealth activity.” The method is a time honored one among some bad actors. The idea is to make it difficult for routine filtering to stop the Perplexity bot from sucking down data.

If you want the details of the spoofs that Perplexity’s wizards have been using, navigate to this Ars Technica post. There is a diagram that makes absolutely crystal clear to everyone in my old age home exactly what Perplexity is doing. (The diagram captures a flow I have seen some nation state actors employ to good effect.)

The part of the WinBuzzer story I liked addressed the issue of “explosive growth and ethical scrutiny.” The idea of “growth” is interesting. From my point of view, the growth is in the amount of cash that Perplexity and other AI outfits are burning. The idea is, “By golly, we can make this AI stuff generate oodles of cash.” The ethical part is a puzzler. Suddenly Silicon Valley-type AI companies are into ethics. Remarkable.

I wish to share several thoughts:

  1. I love the gatekeeper role of the “Web security giant.” Aren’t commercial gatekeepers the obvious way to regulate smart software? I am not sharing my viewpoint. I suggest you formulate your own opinion and do with it what you will.
  2. The behavior of Perplexity, if the allegations are accurate, is not particularly surprising. In fact, in my opinion it is SOP or standard operating procedure for many companies. It is easier to apologize than ask for permission. Does that sound familiar? It should. From Google to the most recent start up, that’s how many of the tech savvy operate. Is change afoot? Yeah, sure. Right away, chief.
  3. The motivation for the behavior is pragmatic. Outfits like Perplexity have to pull a rabbit out of the hat to make a profit from the computational runaway fusion reactor that is the cost of AI. The fix is to get more content and burn more resources. Very sharp thinking, eh?

Net net: I predict more intense AI fighting. Who will win? The outfits with the most money. Isn’t that the one true way of the corporate world in the US in 2025?

Stephen E Arnold, August 7, 2025

The China Smart, US Dumb Push Is Working

August 7, 2025

Dino 5 18 25This blog post is the work of an authentic dinobaby. Sorry. No smart software can help this reptilian thinker.

I read “The US Should Run Faster on AI Instead of Trying to Trip Up China.” In a casual way, I am keeping an eye open for variations on the “China smart, US dumb” information I spot. The idea is that China is not just keeping pace with US innovation, the Middle Kingdom is either even or leading. The context is that the star burning bright for the American era has begun collapsing into a black hole or maybe to a brown dwarf. Avoidance of the US may be the best policy. As one of Brazil’s leaders noted: “America is not bullying our country [Brazil]. America is bullying the world.”

Right or wrong? I have zero idea.

The cited essay suggests that certain technology and economic policies have given China an advantage. The idea is that the disruptive kid in high school sits in the back of the room and thinks up a better Facebook-type system and then implements it.

The write up states:

The ostensible reason for the [technology and economic] controls was to cripple China’s AI progress. If that was the goal, it has been a failure.

As I zipped through the essay, I noted that the premise of the write up is that the US has goofed. The proof of this is no farther than data about China’s capabilities in smart software. I think that any large language model will evidence bias. Bias is encapsulated in many human-created utterances. I, for example, have written critically about search and retrieval for decades. Am I biased toward enterprise search? Absolutely. I know from experience that software that attempts to index content in an organization inevitably disappoints a user of that system. Why? No system to which I have been exposed has access to the totality of “information” generated by an organization. Maybe someday? But for the last 40 years, systems simply could not deliver what the marketers promised. Therefore, I am biased against claims that an enterprise search system can answer employees’ questions.

China is a slippery fish. I had a brief and somewhat weird encounter with a person deeply steeped in China’s somewhat nefarious effort to gain access to US pharma-related data. I have encountered a similar effort afoot in the technical disciplines related to nuclear power. These initiatives illustrate that China wants to be a serious contender for the title of world leader in bio-science and nuclear. Awareness of this type of information access is low even today.

I am, as a dinobaby, concerned that the lack of awareness issue creates more opportunities for information exfiltration from a proprietary source to an “open source” concept. To be frank, I am in favor of a closed approach to technology.

The reason I am making sure I have this source document and my comments is that it is a very good example of how the China good, America dumb information is migrating into what might be termed a more objective looking channel.

Net net: China’s weaponizing of information is working reasonably well. We are no longer in TikTok territory.

Stephen E Arnold, August 6, 2025

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta