Glean Goes Beyond Search: Have Xooglers Done What Google Could Not Do?

August 12, 2025

This blog post is the work of an authentic dinobaby. Sorry. No smart software can help this reptilian thinker.

I read an interesting online essay titled “Glean’s $4.5B Business Model: How Ex-Googlers Built the Enterprise Search That Actually Works.” Enterprise search has been what one might call a Holy Grail application. Many have tried to locate the Holy Grail. Most have failed.

Have a small group of Xooglers (former Google employees) located the Holy Grail and been able to convert its power into satisfied customers? The essay, which reminded me of an MBA write up, argues that the outfit doing business as Glean has done it. The firm has found the Holy Grail, melted it down, and turned it into an endless stream of cash.

Does this sound a bit like the marketing pitch of Autonomy, Fast Search & Transfer, and even Google itself with its descriptions of its deeply wacky yellow servers? For me, Glean has done its marketing homework. The evidence is plumped and oiled for this essay about its business model. But what about search? Yeah, well, the focus of the marketing piece is the business model. Let’s go with what is in front of me. Search remains a bit of a challenge, particularly in corporations, government agencies, and pharmaceutical-type outfits where secrecy is a bit part of that type of organization’s way of life.

What is the Glean business model? It is VTDF. Here’s an illustration:

Does this visual look like blue chip consulting art? Is VTDF blue chip speak? Yes. And yes. For those not familiar with the lingo here’s a snapshot of the Glean business model:

Value: Focuses on how the company creates and delivers core value to customers, such as solving specific problems
Technology: Refers to the underlying tech innovations that allow “search” to deliver what employees need to do their jobs
Distribution: Involves strategies for marketing, delivery, and reaching users
Finance: Covers revenue models, cash flow management, and financial sustainability. Traditionally this has been the weak spot for the big-time enterprise search plays.

The essay explains in dot points that Glean is a “knowledge liberator.” I am not sure how that will fly in some pharma-type outfits or government agencies in which Palantir is roosting.

Once Glean’s “system” is installed, here’s what happens (allegedly):

Single search box for everything
Natural language queries
Answers, not just documents
Context awareness across apps
Personalized to user permissions
New employees productive in days.

I want to take a moment to comment on each of these payoffs or upsides.

First, a single search box for everything is going to present a bit of a challenge in several important use cases. Consider a company with an inventory control system, vendor evaluations, and a computer aid design and database of specifications. The single search box is going to return what for a specific part? Some users will want to know how many are in stock. Others will want to know the vendor who made the part in a specific batch because it is failing in use. Some will want to know what the part looks like? The fix for this type of search problem has been to figure out how to match the employee’s role with the filters applied that that user’s query. In the last 60 years, that approach sort of worked, but it was and still is incredibly difficult to keep lined up with employee roles, assorted permissions, and the way the information is presented to the person running the query. The quality issue may require stress analysis data and access to the lawsuit the annoyed customer has just filed. I am unsure how the Xooglers have solved this type of search task.

Second, the NLP approach is great but it is early 2000s. The many efforts, including DR-LINK to which my team contributed some inputs, were not particularly home run efforts. The reason has to do with the language skills of the users. Organizations hire people who may be really good at synthesizing synthetics but not so good at explaining what the new molecule does. If the lab crew dies, the answer does not require words. Querying for the “new” is tough, since labs doing secret research do not share their data. Even company officers have a tough time getting an answer. When a search system requires the researcher to input a query, that scientist may want to draw a chemical structure or input a query like this “C8N8O16.” Easy enough if the indexing system has access to the classified research in some companies. But the NLP problem is what is called “prompt engineering.” Most humans are just not very good at expressing what they need in the way of information. So modern systems try to help out the searcher. The reason Google search sucks is that the engineers have figured out how to deliver an answer that is good enough. For C8N8O16 close enough for horseshoes might be problematic.

Third, answer are what people want. The “if” statement becomes the issue. If the user knows a correct answer or just accepts what the system outputs. If the user understands the output well enough to make an informed decision. If the system understood or predicted what the user wanted. If the content is in the search systems index. This is a lot of ifs. Most of these conditions occur with sufficient frequency to kill outfits that have sold an “enterprise search system”.

Fourth, the context awareness across apps means that the system can access content on proprietary systems within an organization and across third party systems which may or may not run on the organization’s servers. Most enterprise search systems create or have licensed filters to acquire content. However, keeping the filters alive and healthy with the churn in permissions, file tweaks, and assorted issues related to latency creating data gaps remain tricky.

Fifth, the idea of making certain content available only to those authorized to view those data is a very tricky business. Orchestrating permissions is, in theory, easy to automate. The reality in today’s organizations is the complicating factor. With distributed outfits, contractors, and employees who may be working for another country secretly add some excitement to accessing “information.” The reality in many organizations is that there are regular silos like the legal department keeping certain documents under lock and key to projects for three letter agencies. In the pharma game, knowing “who” is working on a project is often a dead give-away for what the secret project is. The company’s “people” officer may be in the dark. What about consultants? What information is available to them? The reality is that modern organizations have more silos than the corn fields around Canton, Illinois.

Sixth, no training is required. “Employees are productive in days” is the pitch. Maybe, maybe not. Like the glittering generality that employees spend 20 percent of their time searching, the data for this assertion was lacking when the “old” IDC, Sue Feldman, and her team cranked out an even larger number. If anything, search is a larger part of work today for many people. The reasons range from content management systems which cannot easily be indexed in real time to the senior vice president of sales who changes prices for a product at a trade show and tells only his contact in the accounting department. Others may not know for days or months that the apple cart has been tipped.

Glean saves time. That is the smart software pitch. I need to see some data from a statistically valid sample with a reasonable horizontal x axis. The reference to “all” is troublesome. It underscores the immature understanding of what “enterprise search” means to a licensee versus what the venture backed company can actually deliver. Fast Search found out that a certain newspaper in the UK was willing to sue for big bucks because of this marketing jingo.

I want to comment briefly about “Technology Architecture: Beyond Search.” Hey, isn’t that the name of my blog which has been pumping out information access related articles for 17 years? Yep, it is.

Okay, Glean apparently includes these technologies in their enterprise search quiver:

Universal connectors. Note the word “universal.” Nope, very tough.
A Knowledge graph. Think in terms of Maltego, an open source software. Sure as long as there is metadata. But those mobile workers and their use of cloud services and EE2E messaging services. Sounds great. Execution in a cost sensitive environment takes a bit of work.
An AI understanding layer. Yep, smart software. (Google’s smart software tells its users that it is ashamed of its poor performance. OpenAI rolled out ChatGPT 5 and promptly reposted ChatGPT 4o because enough users complained. Deepseek may have links to a nation state unfriendly to the US. Mark Zuckerberg’s Llama is a very old llama. Perplexity is busy fighting with Cloudflare. Anthropic is working to put coders out to pasture. Amazon, Apple, Microsoft, and Telegram are in the bolt it on business. The idea that Glean can understand [a] different employee contexts, [b] the rapidly changing real time data in an organization like that PowerPoint on the senior VP’s laptop, and [c] the file formats that have a very persistent characteristic of changing because whoever is responsible for an update or the format itself makes an intentional or unintentional change. I just can’t accept this assertion.
Works instantly which I interpret as “real time.” I wonder if Glean can handle changed content in a legacy Ironside system running on AS/400s. I would sure like to see that and work up the costs for that cute real time trick. By the way, years ago, I got paid by a non US government agency to identify and define the types of “real time” data it had to process. I think my team identified six types. Only one could be processed without massive resource investments to make the other four semi real. The final one was to gain access to the high-speed data about financial instrument pricing in Wall Street big dogs. That simply was not possible without resources and cartwheels. The reason? The government wanted to search for who was making real time trades in certain financial instruments. Yeah, good luck with that in a world where milliseconds require truly big money for gizmos to capture the data and the software to slap metadata on what is little more than a jet engine exhaust of zeros and ones, often encrypted in a way that would baffle some at certain three letter agencies. Remember: These are banks, not some home brew messaging service.

There are some other wild assertions in the write up. I am losing interest is addressing this first year business school “analysis.” The idea is that a company with 500 to 50,000 employees can use this ready-to-roll service is interesting. I don’t know of a single enterprise search company I have encountered since I wrestled with IBM STAIRS and the dorky IBM CICS system that has what seems to be a “one size fits all” service. The Google Search Appliance failed with its “one size fits all.” The dead bodies on the enterprise search trail is larger than the death toll on the Oregon Trail. I know from my lectures that few if any know what DELPHES’ system did. What about InQuire? And there is IBM WebFountain and Clever. What about Perfect Search? What about Surfray? What about Arikus, Convera, Dieselpoint, or Entopia?

The good news is that a free trial is available. The cost is about $30 per month per user. For an organization like the local outfit that sells hard hats and uses Ironside and AS/400s, that works out to 150 times $360 or $54,000. I know this company won’t buy. Why? The system in place is good enough. Spreadsheet fever is not the same as identifying prospects and making a solid benefit based argument.

That’s why free and open source solutions get some love. Then built in “good enough” solutions from Microsoft are darned popular. Finally, some eager beaver in the information technology department will say, “Let me put together a system using Hugging Face.”

Many companies and a number of quite intelligent people (including former Googlers) have tried to wrestle enterprise search to the ground. Good luck. Just make sure you have verifiable data and not the wild assertions about how much time spend searching or how much time an employee will save. Don’t believe anything about enterprise search that uses the words “all” or universal.”

Google said it was “universal search.” Yeah, why after decades of selling ads does the company provide so so search for the Web, Gmail, YouTube, and images. Just ask, “Why?” Search is a difficult challenge.

Glean this from my personal opinion essay: Search is difficult, and it has yet to be solved except for precisely defined use cases. Google experience or not, the task is out of reach at this time.

Stephen E Arnold, August 12, 2025

Written by Stephen E. Arnold · Filed Under AI, Enterprise search, Marketing, News

Comments

Comments are closed.

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.