Commercial Open Source: Fantastic Pipe Dream or Revenue Pipe Line?

March 26, 2024

This essay is the work of a dumb dinobaby. No smart software required.

Open source is a term which strikes me as au courant. Artificial intelligence software is often described as “open source.” The idea has a bit of “do good” mixed with the idea that commercial software puts customers in handcuffs. (I think I hear Kumbaya playing faintly in the background.) Is it possible to blend the idea of free and open software with the principles of commercial software lock in? Notable open source entrepreneurs have become difficult to differentiate from a run-of-the-mill technology company. Examples include RedHat, Elastic, and OpenAI. Ooops. Sorry. OpenAI is a different type of company. I think.

Will open source software, particularly open source AI components, end up like this private playground? Thanks, MSFT Copilot. You are into open source, aren’t you? I hope your commitment is stronger than for server and cloud security.

I had these open source thoughts when I read “AI and Data Infrastructure Drives Demand for Open Source Startups.” The source of the information is Runa Capital, now located in Luxembourg. The firm publishes a report called the Runa Open Source Start Up Index, and it is a “rosy” document. The point of the article is that Runa sees open source as a financial opportunity. You can start your exploration of the tables and charts at this link on the Runa Capital Web site.

I want to focus on some information tucked into the article, just not presented in bold face or with a snappy chart. Here’s the passage I noted:

Defining what constitutes “open source” has its own inherent challenges too, as there is a spectrum of how “open source” a startup is — some are more akin to “open core,” where most of their major features are locked behind a premium paywall, and some have licenses which are more restrictive than others. So for this, the curators at Runa decided that the startup must simply have a product that is “reasonably connected to its open-source repositories,” which obviously involves a degree of subjectivity when deciding which ones make the cut.

The word “reasonably” invokes an image of lawyers negotiating on behalf of their clients. Nothing is quite so far from the kumbaya of the “real” open source software initiative as lawyers. Just look at the licenses for open source software.

I also noted this statement:

Thus, according to Runa’s methodology, it uses what it calls the “commercial perception of open-source” for its report, rather than the actual license the company attaches to its project.

What is “open source”? My hunch it is whatever the lawyers and courts conclude.

Why is this important?

The talk about “open source” is relevant to the “next big thing” in technology. And what is that? ANSWER: A fresh set of money making plays.

I know that there are true believers in open source. I wish them financial and kumbaya-type success.

My take is different: Open source, as the term is used today, is one of the phrases repurposed to breathe life in what some critics call a techno-feudal world. I don’t have a dog in the race. I don’t want a dog in any race. I am a dinobaby. I find amusement in how language becomes the Teflon on which money (one hopes) glides effortlessly.

And the kumbaya? Hmm.

Stephen E Arnold, March 26, 2024

Written by Stephen E. Arnold · Filed Under Business strategy, Financial, News, Open source | Comments Off on Commercial Open Source: Fantastic Pipe Dream or Revenue Pipe Line?

AI Job Lawnmowers: Will Your Blooms Be Chopped Off and Put a Rat King in Your Future?

March 25, 2024

This essay is the work of a dumb dinobaby. No smart software required.

I love “you will lose your job to AI” articles. I spotted an interesting one titled “The Job Sectors That Will Be Most Disrupted By AI, Ranked.” This is not so much an article as a billboard for an outfit named Voronoi, “where data tells the story.” That’s interesting because there is no data, no methodology, and no indication of the confidence level for each “nuked job.” Nevertheless, we have a ranking.

Thanks, MSFT Copilot. Will you be sparking human rat kings? I would wager that you will.

As I understand the analysis of 19,000 tasks, here’s that the most likely to be chopped down and converted to AI silage will be:

IT / programmers: 73 percent of the job will experience a large impact

Finance / bean counters: 70 percent of the jobs will experience a large impact

Customer sales: 67 percent of the job will experience a large impact

Operations (well, that’s a fuzzy category, isn’t it?): 65 percent of the job will experience a large impact

Personnel / HR: 57 percent of the job will experience a large impact

Marketing: 56 percent of the job will experience a large impact

Legal eagles: 46 percent of the job will experience a large impact

Supply chain (another fuzzy wuzzy bucket): 43 percent of the job will experience a large impact

The kicker in the data is that the numbers date from September 2023. Six months in the faerie land of smart software is a long, long time. Let’s assume that the data meet 2024’s gold standard.

Technology, finance, sales, marketing, and lawyering may shatter the future of employees of less value in terms of compensation, cost to the organization, or whatever management legerdemain the top dogs and their consultants whip up. Imagine eliminate the overhead for humans like office space, health care, retirement baloney, and vacations makes smart software into an attractive “play.”

And what about the fuzzy buckets? My thought is that many people will be trimmed because a chatbot can close a sale for a product without the hassle which humans drag into the office; for example, sexual harassment, mental, drug, and alcohol “issues,” and the unfortunate workplace shooting. I think that a person sitting in a field office to troubleshoot issues related to a state or county contract might fall into the “operations” category even though the employee sees the job as something smart software cannot perform. Ho ho ho.

Several observations:

A trivial cost analysis of human versus software over a five-year period means humans lose
AI systems, which may suck initially, will be improved over time. These initial failures may cause the once alert to replacement employee into a false sense of security
Once displaced, former employees will have to scramble to produce cash. With lots of individuals chasing available work and money plays, life is unlikely to revert back to the good old days of the Organization Man. (The world will be Organization AI. No suit and white shirt required.)

Net net: I am glad I am old and not quite as enthralled by efficiency.

Stephen E Arnold, March 25, 2024

Written by Stephen E. Arnold · Filed Under AI, Financial, Management, News | Comments Off on AI Job Lawnmowers: Will Your Blooms Be Chopped Off and Put a Rat King in Your Future?

AI Innovation: Do Just Big Dogs Get the Fat, Farmed Salmon?

March 20, 2024

This essay is the work of a dumb dinobaby. No smart software required.

Let’s talk about statements like “AI will be open source” and “AI has spawned hundreds, if not thousands, of companies.” Those are assertions which seem to be slightly different from what’s unfolding at some of the largest technology outfits in the world. The circling and sniffing allegedly underway between the Apple and the Google pack is interesting. Apple and Google have a relationship, probably one that will need marriage counselor, but it is a relationship.

The wizard scientists have created an interesting digital construct. Thanks, MSFT Copilot. How are you coming along with your Windows 11 updates and Azure security today? Oh, that’s too bad.

The news, however, is that Microsoft is demonstrating that it wants to eat the fattest salmon in the AI stream. Microsoft has a deal of some type with OpenAI, operating under the steady hand of Sam AI-Man. Plus the Softies have cozied up to the French outfit Mistral. Today at 530 am US Eastern I learned that Microsoft has embraced an outstanding thinker, sensitive manager, and pretty much the entire Inflection AI outfit.

The number of stories about this move reflect the interest in smart software and what may be one of world’s purveyor of software which attracts bad actors from around the world. Thinking about breaches in the new Microsoft world is not a topic in the write ups about this deal. Why? I think the management move has captured attention because it is surprising, disruptive, and big in terms of money and implications.

“Microsoft Hires DeepMind Co-Founder Suleyman to Run Consumer AI” states:

DeepMind workers complained about his [former Googler Mustafa Suleyman and subsequent Inflection.ai senior manager] management style, the Financial Times reported. Addressing the complaints at the time, Suleyman said: “I really screwed up. I was very demanding and pretty relentless.” He added that he set “pretty unreasonable expectations” that led to “a very rough environment for some people. I remain very sorry about the impact that caused people and the hurt that people felt there.” Suleyman was placed on leave in 2019 and months later moved to Google, where he led AI product management until exiting in 2022.

Okay, a sensitive manager learns from his mistakes joins Microsoft.

And Microsoft demonstrates that the AI opportunity is wide open. “Why Microsoft’s Surprise Deal with $4 Billion Startup Inflection Is the Most Important Non-Acquisition in AI” states:

Even since OpenAI launched ChatGPT in November 2022, the tech world has been experiencing a collective mania for AI chatbots, pouring billions of dollars into all manner of bots with friendly names (there’s Claude, Rufus, Poe, and Grok — there’s event a chatbot name generator). In January, OpenAI launched a GPT store that’s chock full of bots. But how much differentiation and value can these bots really provide? The general concept of chatbots and copilots is probably not going away, but the demise of Pi may signal that reality is crashing into the exuberant enthusiasm that gave birth to a countless chatbots.

Several questions will be answered in the weeks ahead:

What will regulators in the EU and US do about the deal when its moving parts become known?
How will the kumbaya evolve when Microsoft senior managers, its AI partners, and reassigned Microsoft employees have their first all-hands Teams or off-site meeting?
Does Microsoft senior management have the capability of addressing the attack surface of the new technologies and the existing Microsoft software?
What happens to the AI ecosystem which depends on open source software related to AI if Microsoft shifts into “commercial proprietary” to hit revenue targets?
With multiple AI systems, how are Microsoft Certified Professional agents going to [a] figure out what broke and [b] how to fix it?
With AI the apparent “next big thing,” how will adversaries like nations not pals with the US respond?

Net net: How unstable is the AI ecosystem? Let’s ask IBM Watson because its output is going to be as useful as any other in my opinion. My hunch is that the big dogs will eat the fat, farmed salmon. Who will pull that lucious fish from the big dog’s maw? Not me.

Stephen E Arnold, March 20, 2024

Written by Stephen E. Arnold · Filed Under AI, Business strategy, Financial, Microsoft, News | Comments Off on AI Innovation: Do Just Big Dogs Get the Fat, Farmed Salmon?

Old Code, New Code: Can You Make It Work Again… Sort Of?

March 18, 2024

This essay is the work of a dumb dinobaby. No smart software required.

Even hippy dippy super slick AI start ups have a technical debt problem. It is, in my opinion, no different from the “costs” imposed on outfits like JPMorgan Chase or (heaven help us) AMTRAK. Software which mostly works is subject to two environmental problems. First, the people who wrote the code or made it work that last time catastrophe struck (hello, AT&T, how are those pushed updates working for you now?) move on, quit, or whatever. Second, the technical options for remediating the problem are evolving (how are those security hot fixes working out, Microsoft?).

The helpful father asks an question the aspiring engineer cannot answer. Thus it was when the wizard was a child, and it is when the wizard is working on a modern engineering project. Buildings tip; aircraft lose doors and wheels. Software updates kill computers. Self-driving cars cannot. Thanks, MSFT Copilot. Did you get your model airplane to fly when you were a wee lad? I think I know the answer.

I thought about this problem of the cost of code remediating, fixing, redoing, upgrading or whatever term fast-talking sales engineers use in their Zooms and PowerPoints as I read “The High-Risk Refactoring.” The write up does a good job of explaining in a gentle way what happens when suits authorize making old code like new again. (The suits do not know the agonies of the original developers, but why should “history” intrude on a whiz bang GenX or GenY management type?

The article says:

it’s highly important to ensure the system works the same way after the swap with the new code. In that regard, immediately spotting when something breaks throughout the whole refactoring process is very helpful. No one wants to find that out in production.

No kidding.

In most cases, there are insufficient skilled people and money to create a new or revamped system, get it up and running in parallel for an appropriate period of time, identify the problems, remediate them, and then make the cut over. People buy cars this way, but that’s not how most organizations, regardless of size, “do” software. Okay, the take your car in, buy a new one, and drive off will not work in today’s business environment.

The write up focuses on what most organizations do; that is, write or fix new code and stick it into a system. There may or may not be resources for a staging server, but the result is the same. The old software has been “fixed” and the documentation is “sort of written” and people move on to other work or in the case of consulting engineering firms, just get replaced by a new, higher margin professional.

The write up takes a different approach and concludes with four suggestions or questions to ask. I quote:

“Refactor if things are getting too complicated, but stop if can’t prove it works.

Accompany new features with refactoring for areas you foresee to be subject to a change, but copy-pasting is ok until patterns arise.

Be proactive in finding new ways to ensure refactoring predictability, but be conservative about the assumption QA will find all the bugs.

Move business logic out of busy components, but be brave enough to keep the legacy code intact if the only argument is “this code looks wrong”.

These are useful points. I would like to suggest some bright white lines for those who have to tackle an IRS-mainframe- or AT&T-billing system type of challenge as well as tweaking an artificial intelligence solution to respond to those wonky multi-ethnic images Google generated in order to allow the Sundar & Prabhakar Comedy Team to smile sheepishly and apologize again for lousy software.

Are you ready? Let’s go:

Fixes add to the complexity of the code base. As time goes stumbling forward, the complexity of the software becomes greater. The cost of making sure the fix works and does not create exciting dependency behavior goes up. Thus, small fixes “cost” more, and these costs are tough to control.
The safest fixes are “wrappers”; that is, no one in his or her right mind wants to change software written in 1978 for a machine no longer in production by the manufacturer. Therefore, new software is written to interact in a “safe” way with the original software. The new code “fixes up” the problem without screwing up what grandpa programmer wrote almost half a century ago. The problem is that “wrappers” tend to slow stuff down. The fix is to say one will optimize the system while one looks for a new project or job.
The software used for “fixing” a problem is becoming the equivalent of repairing an aircraft component with Dawn laundry detergent. The “fix” is cheap, easy to use, and good enough. The software equivalent of this Dawn solution is that it will not stand the test of time. Instead of code crafted in good old COBOL or Assembler, we have some Fancy Dan tools which may fall out of favor in a matter of months, not decades.

Many projects result in better, faster, and cheaper. The reminder “Pick two” is helpful.

Net net: Fixing up lousy or flawed software is going to increase risks and costs. The question asked by bean counters is, “How much?” The answer is, “No one knows until the project is done … if ever.”

Stephen E Arnold, March 18, 2024

Written by Stephen E. Arnold · Filed Under Business process, Financial, News, Technology | Comments Off on Old Code, New Code: Can You Make It Work Again… Sort Of?

Thomson Reuters Is Going to Do AI: Run Faster

March 11, 2024

This essay is the work of a dumb dinobaby. No smart software required.

Thomson Reuters, a mostly low profile outfit, is going to do AI. Why’s this interesting to law schools, lawyers, accountants, special librarians, libraries, and others who “pay” for “real” information? There are three reasons:

Money
Markets
Mania.

Thomson Reuters has been a tech talker for decades. The company created skunk works. It hired quirky MIT wizards. I bought businesses with information technology. But underneath the professional publishing clear coat, the firm is the creation of Lord Thomson of Fleet. The firm has a track record of being able to turn a profit on its $7 billion in revenues. But the future, if news reports are accurate, is artificial intelligence or smart software.

The young publishing executive says, “I have go to get ahead of this AI bus before it runs over me.” Thanks, MSFT Copilot. Working on security today?

But wait! What makes Thomson Reuters different from the New York Times or (heaven forbid the question) Rupert Murdoch’s confections? The answer is in my opinion: Thomson Reuters does the trust thing and is a professional publisher. I don’t want to explain that in the world of Lord Thomson of Fleet that publishing is publishing. Nope. Not going there. Thomson Reuters is a custom made billiard cue, not one of those bar pool cheapos.

As appropriate to today’s Thomson Reuters, the news appeared in Thomson’s own news releases first; for example, “Thomson Reuters Profit Beats Estimates Amid AI Push.” Yep, AI drives profits. That’s the “m” in money. Plus, Thomson late last year this article found its way to the law firm market (yep, that’s the second “m”): “Morgan Lewis and Thomson Reuters Enter into Partnership to Put Law Firms’ Needs at the Heart of AI Development.”

Now the third “m” or mania. Here’s a representative story, “Thomson Reuters to Invest US$8 billion in a Substantial AI-Focused Spending Initiative.” You can also check out the Financial Times’s report at this link.

Thomson Reuters is a $7 billion corporation. If the $8 billion number is on the money, the venerable news outfit is going to spend the equivalent on one year’s revenue acquiring and investing in smart software. In terms of professional publishing, this chunk of change is roughly the equivalent of Sam AI-Man’s need for trillions of dollars for his smart software business.

Several thoughts struck me as I was reading about the $8 billion investment in smart software:

In terms of publishing or more narrowly professional publishing, $8 billion will take some time to spend. But time is not on the side of publishing decision making processes. When the check is written for an AI investment, there may be some who ask, “Is this the correct investment? After all, aren’t we professional publishers serving lawyers, accountants, and researchers?”
The US legal processes are interesting. But the minor challenge of Crown copyright adds a bit of spice to certain investments. The UK government itself is reluctant to push into some AI areas due to concerns that certain information may not be available unless the red tape about copyright has been trimmed, rolled, and put on the shelf. Without being disrespectful, Thomson Reuters could find that some of the $8 billion headed into its clients pockets as legal challenges make their way through courts in Britain, Canada, and the US and probably some frisky EU states.
The game for AI seems to be breaking into two what a former Greek minister calls the techno feudal set up. On one hand, there are giant technology centric companies (of which Thomson Reuters is not one of the club members). These are Google- and Microsoft-scale outfits with infrastructure, data, customers, and multiple business models. On the other hand, there are the Product Watch outfits which are using open source and APIs to create “new” and “important” AI businesses, applications, and solutions. In short, there are some barons and a whole grab-bag of lesser folk. Is Thomson Reuters going to be able to run with the barons. Remember, please, the barons are riding stallions. Thomson Reuter-type firms either walk or ride donkeys.

Net net: If Thomson Reuters spends $8 billion on smart software, how many lawyers, accountants, and researchers will be put out of work? The risks are not just bad AI investments. The threat maybe to gut the billing power of the paying customers for Thomson Reuters’ content. This will be entertaining to watch.

PS. The third “m”? It is mania, AI mania.

Stephen E Arnold, March 11, 2024

Written by Stephen E. Arnold · Filed Under AI, Business strategy, Financial, News, Publishing | Comments Off on Thomson Reuters Is Going to Do AI: Run Faster

The Internet as a Library and Archive? Ho Ho Ho

March 8, 2024

This essay is the work of a dumb dinobaby. No smart software required.

I know that I find certain Internet-related items a knee slapper. Here’s an example: “Millions of Research Papers at Risk of Disappearing from the Internet.” The number of individuals — young at heart and allegedly-informed seniors — think the “Internet” is a library or better yet an archive like the Library of Congress’ collection of “every” book.

A person deleting data with some degree of fierceness. Yep, thanks MSFT Copilot. After three tries, this is the best of the lot for a prompt asking for an illustration of data being deleted from a personal computer. Not even good enough but I like the weird orange coloration.

Here are some basics of how “Internet” services work:

Every year costs go up of storage for old and usually never or rarely accessed data. A bean counter calls a meeting and asks, “Do we need to keep paying for ping, power, and pipes?” Some one points out, “Usage of X percent of the data described as “old” is 0.0003 percent or whatever number the bright young sprout has guess-timated. The decision is, as you might guess, dump the old files and reduce other costs immediately.
Doing “data” or “online” is expensive, and the costs associated with each are very difficult, if not impossible to control. Neither government agencies, non-governmental outfits, the United Nations, a library in Cleveland or the estimable Harvard University have sufficient money to make available or keep at hand information. Thus, stuff disappears.
Well-intentioned outfits like the Internet Archive or Project Gutenberg are in the same accountant ink pot. Not every Web site is indexed and archived comprehensively. Not every book that can be digitized and converted to a format someone thinks will be “forever.” As a result, one has a better chance of discovering new information browsing through donated manuscripts at the Vatican Library than running an online query.
If something unique is online “somewhere,” that item may be unfindable. Hey, what about Duke University’s collection of “old” books from the 17th century? Who knew?
Will a government agency archive digital content in a comprehensive manner? Nope.

The article about “risks of disappearing” is a hoot. Notice this passage:

“Our entire epistemology of science and research relies on the chain of footnotes,” explains author Martin Eve, a researcher in literature, technology and publishing at Birkbeck, University of London. “If you can’t verify what someone else has said at some other point, you’re just trusting to blind faith for artefacts that you can no longer read yourself.”

I like that word “epistemology.” Just one small problem: Trust. Didn’t the president of Stanford University have an opportunity to find his future elsewhere due to some data wonkery? Google wants to earn trust. Other outfits don’t fool around with trust; these folks gather data, exploit it, and resell it. Archiving and making it findable to a researcher or law enforcement? Not without friction, lots and lots of friction. Why verify? Estimates of non-reproducible research range from 15 percent to 40 percent of scientific, technical, and medical peer reviewed content. Trust? Hello, it’s time to wake up.

Many estimate how much new data are generated each year. I would suggest that data falling off the back end of online systems has been an active process. The first time an accountant hears the IT people say, “We can just roll off the old data and hold storage stable” is right up there with avoiding an IRS audit, finding a life partner, and billing an old person for much more than the accounting work is worth.

After 25 years, there is “risk.” Wow.

Stephen E Arnold, March 8, 2024

Written by Stephen E. Arnold · Filed Under Content processing, Financial, Government, News | Comments Off on The Internet as a Library and Archive? Ho Ho Ho

ACM: Good Defense or a Business Play?

March 8, 2024

This essay is the work of a dumb dinobaby. No smart software required.

Professional publishers want to use the trappings of peer review, standards, tradition, and quasi academic hoo-hah to add value to their products; others want a quasi-monopoly. Think public legal filings and stuff in high school chemistry book. The customers of professional publishers are typically not the folks at the pizza joint on River Road in Prospect, Kentucky. The business of professional publishing in an interesting one, but in the wild and crazy world of collapsing next-gen publishing, professional publishing is often ignored. A publisher conference aimed at professional publishers is quite different from the Jazz Age South by Southwest shindig.

Yep, free. Thanks, MSFT Copilot. How’s that security today?

But professional publishers have been in the news. Examples include the dust up about academics making up data. The big time president of the much-honored Stanford University took intellectual short cuts and quit late last year. Then there was the some nasty issue about data and bias at the esteemed Harvard University. Plus, a number of bookish types have guess-timated that a hefty percentage of research studies contain made-up data. Hey, you gotta publish to get tenure or get a grant, right?

But there is an intruder in the basement of the professional publishing club. The intruder positions itself in the space between the making up of some data and the professional publishing process. That intruder is ArXiv, an open-access repository of electronic preprints and postprints (known as e-prints) approved for posting after moderation, according to Wikipedia. (Wikipedia is the cancer which killed the old-school encyclopedias.) Plus, there are services which offer access to professional content without paying for the right to host the information. I won’t name these services because I have no desire to have legal eagles circle about my semi-functioning head.

Why do I present this grade-school level history? I read “CACM Is Now Open Access.” Let’s let the Association of Computing Machinery explain its action:

For almost 65 years, the contents of CACM have been exclusively accessible to ACM members and individuals affiliated with institutions that subscribe to either CACM or the ACM Digital Library. In 2020, ACM announced its intention to transition to a fully Open Access publisher within a roughly five-year timeframe (January 2026) under a financially sustainable model. The transition is going well: By the end of 2023, approximately 40% of the ~26,000 articles ACM publishes annually were being published Open Access utilizing the ACM Open model. As ACM has progressed toward this goal, it has increasingly opened large parts of the ACM Digital Library, including more than 100,000 articles published between 1951–2000. It is ACM’s plan to open its entire archive of over 600,000 articles when the transition to full Open Access is complete.

The decision was not an easy one. Money issues rarely are.

I want to step back and look at this interesting change from a different point of view:

Getting a degree today is less of a must have than when I was a wee dinobaby. My parents told me I was going to college. Period. I learned how much effort was required to get my hands on academic journals. I was a master of knowing that Carnegie-Mellon had new but limited bound volumes of certain professional publications. I knew what journals were at the University of Pittsburgh. I used these resources when the Duquesne Library was overrun with the faithful. Now “researchers” can zip online and whip up astonishing results. Google-type researchers prefer the phrase “quantumly supreme results.” This social change is one factor influencing the ACM.
Stabilizing revenue streams means pulling off a magic trick. Sexy conferences and special events complement professional association membership fees. Reducing costs means knocking off the now, very very expensive printing, storing, and shipping of physical journals. The ACM seems to have figured out how to keep the lights on and the computing machine types spending.
ACM members can use ACM content the way they do a pirate library’s or the feel good ArXiv outfit. The move helps neutralize discontent among the membership, and it is good PR.

These points raise a question; to wit: In today’s world how relevant will a professional association and its professional publications be going foreword. The ACM states:

By opening CACM to the world, ACM hopes to increase engagement with the broader computer science community and encourage non-members to discover its rich resources and the benefits of joining the largest professional computer science organization. This move will also benefit CACM authors by expanding their readership to a larger and more diverse audience. Of course, the community’s continued support of ACM through membership and the ACM Open model is essential to keeping ACM and CACM strong, so it is critical that current members continue their membership and authors encourage their institutions to join the ACM Open model to keep this effort sustainable.

Yep, surviving in a world of faux expertise.

Stephen E Arnold, March 8, 2024

Written by Stephen E. Arnold · Filed Under Business strategy, Financial, News, Publishing | Comments Off on ACM: Good Defense or a Business Play?

Engineering Trust: Will Weaponized Data Patch the Social Fabric?

March 7, 2024

This essay is the work of a dumb dinobaby. No smart software required.

Trust is a popular word. Google wants me to trust the company. Yeah, I will jump right on that. Politicians want me to trust their attestations that citizen interest are important. I worked in Washington, DC, for too long. Nope, I just have too much first-hand exposure to the way “things work.” What about my bank? It wants me to trust it. But isn’t the institution the subject of a a couple of government investigations? Oh, not important. And what about the images I see when I walk gingerly between the guard rails. I trust them right? Ho ho ho.

In our post-Covid, pre-US national election, the word “trust” is carrying quite a bit of freight. Whom to I trust? Not too many people. What about good old Socrates who was an Athenian when Greece was not yet a collection of ferocious football teams and sun seekers. As you may recall, he trusted fellow residents of Athens. He end up dead from either a lousy snack bar meal and beverage, or his friends did him in.

One of his alleged precepts in his pre-artificial intelligence worlds was:

“We cannot live better than in seeking to become better.” — Socrates

Got it, Soc.

Thanks MSFT Copilot and provider of PC “moments.” Good enough.

I read “Exclusive: Public Trust in AI Is Sinking across the Board.” Then I thought about Socrates being convicted for corruption of youth. See. Education does not bring unlimited benefits. Apparently Socrates asked annoying questions which open him to charges of impiety. (Side note: Hey, Socrates, go with the flow. Just pray to the carved mythical beast, okay?)

A loss of public trust? Who knew? I thought it was common courtesy, a desire to discuss and compromise, not whip out a weapon and shoot, bludgeon, or stab someone to death. In the case of Haiti, a twist is that a victim is bound and then barbequed in a steel drum. Cute and to me a variation of stacking seven tires in a pile dousing them with gasoline, inserting a person, and igniting the combo. I noted a variation in the Ukraine. Elderly women make cookies laced with poison and provide them to special operation fighters. Subtle and effective due to troop attrition I hear. Should I trust US Girl Scout cookies? No thanks.

What’s interesting about the write up is that it provides statistics to back up this brilliant and innovative insight about modern life is its focus on artificial intelligence. Let me pluck several examples from the dot point filled write up:

“Globally, trust in AI companies has dropped to 53%, down from 61% five years ago.”
“Trust in AI is low across political lines. Democrats trust in AI companies is 38%, independents are at 25% and Republicans at 24%.”
“Eight years ago, technology was the leading industry in trust in 90% of the countries Edelman studies. Today, it is the most trusted in only half of countries.”

AI is trendy; crunchy click bait is highly desirable even for an estimable survivor of Silicon Valley style news reporting.

Let me offer several observations which may either be troubling or typical outputs from a dinobaby working in an underground computer facility:

Close knit groups are more likely to have some concept of trust. The exception, of course, is the behavior of the Hatfields and McCoys
Outsiders are viewed with suspicion. Often for now reason, a newcomer becomes the default bad entity
In my lifetime, I have watched institutions take actions which erode trust on a consistent basis.

Net net: Old news. AI is not new. Hyperbole and click obsession are factors which illustrate the erosion of social cohesion. Get used to it.

Stephen E Arnold, March 7, 2024

Written by Stephen E. Arnold · Filed Under AI, Business strategy, Financial, News | Comments Off on Engineering Trust: Will Weaponized Data Patch the Social Fabric?

Philosophy and Money: Adam Smith Remains Flexible

March 6, 2024

This essay is the work of a dumb dinobaby. No smart software required.

In the early twenty-first century, China was slated to overtake the United States as the world’s top economy. Unfortunately for the “sleeping dragon,” China’s economy has tanked due to many factors. The country, however, still remains a strong spot for technology development such as AI and chips. The Register explains why China is still doing well in the tech sector: “How Did China Get So Good At Chips And AI? Congressional Investigation Blames American Venture Capitalists.”

Venture capitalists are always interested in increasing their wealth and subverting anything preventing that. While the US government has choked China’s semiconductor industry and denying it the use of tools to develop AI, venture capitalists are funding those sectors. The US’s House Select Committee on the China Communist Party (CCP) shared that five venture capitalists are funneling billions into these two industries: Walden International, Sequoia Capital, Qualcomm Ventures, GSR Ventures, and GGV Capital. Chinese semiconductor and AI businesses are linked to human rights abuses and the People’s Liberation Army. These five venture capitalist firms don’t appear interested in respecting human rights or preventing the spread of communism.

The House Select Committee on the CCP discovered that one $1.9 million went to AI companies that support China’s mega-surveillance state and aided in the Uyghur genocide. The US blacklisted these AI-related companies. The committee also found that $1.2 bullion was sent to 150 semiconductor companies.

The committee also accused of sharing more than funding with China:

“The committee also called out the VCs for "intangible" contributions – including consulting, talent acquisition, and market opportunities. In one example highlighted in the report, the committee singled out Walden International chairman Lip-Bu Tan, who previously served as the CEO of Cadence Design Systems. Cadence develops electronic design automation software which Chinese corporates, like Huawei, are actively trying to replicate. The committee alleges that Tan and other partners at Walden coordinated business opportunities and provided subject-matter expertise while holding board seats at SMIC and Advanced Micro-Fabrication Equipment Co. (AMEC).”

Sharing knowledge and business connections is equally bad (if not worse) than funding China’s tech sector. It’s like providing instructions and resources on how to build nuclear weapon. If China only had the resources it wouldn’t be as frightening.

Whitney Grace, March 6, 2024

Written by Stephen E. Arnold · Filed Under Financial, Governance, News | Comments Off on Philosophy and Money: Adam Smith Remains Flexible

Synthetic Data: From Science Fiction to Functional Circumscription

March 4, 2024

This essay is the work of a dumb humanoid. No smart software required.

Synthetic data are information produced by algorithms, not by real-world events. It’s created using real-world data and numerical recipes. The appeal is that it is easier than collecting real life information, cheaper than dealing with data from real life, and faster than fooling around with surveys, monitoring devices, and law suits. In theory, synthetic data is one promising way of skirting the expense of getting humans involved.

“What Is [a] Synthetic Sample – And Is It All It’s Cracked Up to Be?” tackles the subject of a synthetic sample, a topic which is one slice of the synthetic data universe. The article seeks “to uncover the truth behind artificially created qualitative and quantitative market research data.” I am going to avoid the question, “Is synthetic data useful?” because the answer is, “Yes.” Bean counters and those looking to find a way out of the pickle barrel filled with expensive brine are going to chase after the magic of algorithms producing data to do some machine learning magic.

In certain situations, fake flowers are super. Other times, the faux blooms are just creepy. Thanks, MSFT Copilot Bing thing. Good enough.

Are synthetic data better than real world data? The answer from my vantage point is, “It depends.” Fancy math can prove that for some use cases, synthetic data are “good enough”; that is, the data produce results close enough to what a “real” data set provides. Therefore, just use synthetic data. But for other applications, synthetic data might throw some sand in the well-oiled marketing collateral describing the wonders of synthetic data. (Some university research labs are quite skilled in PR speak, but the reality of their methods may not line up with the PowerPoints used to raise venture capital.)

This essay discusses a research project to figure out if a synthetic sample works or in my lingo if the synthetic sample is good enough. The idea is that as long as the synthetic data is within a specified error range, the synthetic sample can be used and may produce “reliable” or useful results. (At least one hopes this is the case.)

I want to focus on one portion of the cited article and invite you to read the complete Kantar explanation.

Here’s the passage which snagged my attention:

… right now, synthetic sample currently has biases, lacks variation and nuance in both qual and quant analysis. On its own, as it stands, it’s just not good enough to use as a supplement for human sample. And there are other issues to consider. For instance, it matters what subject is being discussed. General political orientation could be easy for a large language model (LLM), but the trial of a new product is hard. And fundamentally, it will always be sensitive to its training data – something entirely new that is not part of its training will be off-limits. And the nature of questioning matters – a highly ’specific’ question that might require proprietary data or modelling (e.g., volume or revenue for a particular product in response to a price change) might elicit a poor-quality response, while a response to a general attitude or broad trend might be more acceptable.

These sentences present several thorny problems is academic speak. Let’s look at them in the vernacular of rural Kentucky where I live.

First, we have the issue of bias. Training data can be unintentionally or intentionally biased. Sample radical trucker posts on Telegram, and use those messages to train a model like Reor. That output is going to express views that some people might find unpalatable. Therefore, building a synthetic data recipe which includes this type of Telegram content is going to be oriented toward truck driver views. That’s good and bad.

Second, a synthetic sample may require mixing data from a “real” sample. That’s a common sense approach which reduces some costs. But will the outputs be good enough. The question then becomes, “Good enough for what applications?” Big, general questions about how a topic is presented might be close enough for horseshoes. Other topics like those focusing on dealing with a specific technical issue might warrant more caution or outright avoidance of synthetic data. Do you want your child or wife to die because the synthetic data about a treatment regimen was close enough for horseshoes. But in today’s medical structure, that may be what the future holds.

Third, many years ago, one of the early “smart” software companies was Autonomy, founded by Mike Lynch. In the 1990s, Bayesian methods were known but some — believe it or not — were classified and, thus, not widely known. Autonomy packed up some smart software in the Autonomy black box. Users of this system learned that the smart software had to be retrained because new terms and novel ideas not in the original training set were not findable by the neuro linguistic program’s engine. Yikes, retraining requires human content curation of data sets, time to retrain the system, and the expense of redeploying the brains of the black boxes. Clients did not like this and some, to be frank, did not understand why a product did not work like an MG sports car. Synthetic data has to be trained to “know” about new terms and avid the “certain blindness” probability based systems possess.

Fourth, the topic of “proprietary data modeling” means big bucks. The idea behind synthetic data is that it is cheaper. Building proprietary training data and keeping it current is expensive. Is it better? Yeah, maybe. Is it faster? Probably not when humans are doing the curation, cleaning, verifying, and training.

The write up states:

But it’s likely that blended models (human supplemented by synthetic sample) will become more common as LLMs get even more powerful – especially as models are finetuned on proprietary datasets.

Net net: Synthetic data warrants monitoring. Some may want to invest in synthetic data set companies like Kantar, for instance. I am a dinobaby, and I like the old-fashioned Stone Age approach to data. The fancy math embodies sufficient risk for me. Why increase risk? Remember my reference to a dead loved one? That type of risk.

Stephen E Arnold, March 4, 2023

Written by Stephen E. Arnold · Filed Under AI, Financial, News, Statistics | Comments Off on Synthetic Data: From Science Fiction to Functional Circumscription

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Employment
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Fogint
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Telegram
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

Commercial Open Source: Fantastic Pipe Dream or Revenue Pipe Line?

AI Job Lawnmowers: Will Your Blooms Be Chopped Off and Put a Rat King in Your Future?

AI Innovation: Do Just Big Dogs Get the Fat, Farmed Salmon?

Old Code, New Code: Can You Make It Work Again… Sort Of?

Thomson Reuters Is Going to Do AI: Run Faster

Now the third “m” or mania. Here’s a representative story, “Thomson Reuters to Invest US$8 billion in a Substantial AI-Focused Spending Initiative.” You can also check out the Financial Times’s report at this link.

The Internet as a Library and Archive? Ho Ho Ho

ACM: Good Defense or a Business Play?

Engineering Trust: Will Weaponized Data Patch the Social Fabric?

Philosophy and Money: Adam Smith Remains Flexible

Synthetic Data: From Science Fiction to Functional Circumscription

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Now the third “m” or mania. Here’s a representative story, “Thomson Reuters to Invest US$8 billion in a Substantial AI-Focused Spending Initiative.” You can also check out the Financial Times’s report at this link.

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta