The Internet as a Library and Archive? Ho Ho Ho
March 8, 2024
This essay is the work of a dumb dinobaby. No smart software required.
I know that I find certain Internet-related items a knee slapper. Here’s an example: “Millions of Research Papers at Risk of Disappearing from the Internet.” The number of individuals — young at heart and allegedly-informed seniors — think the “Internet” is a library or better yet an archive like the Library of Congress’ collection of “every” book.
A person deleting data with some degree of fierceness. Yep, thanks MSFT Copilot. After three tries, this is the best of the lot for a prompt asking for an illustration of data being deleted from a personal computer. Not even good enough but I like the weird orange coloration.
Here are some basics of how “Internet” services work:
- Every year costs go up of storage for old and usually never or rarely accessed data. A bean counter calls a meeting and asks, “Do we need to keep paying for ping, power, and pipes?” Some one points out, “Usage of X percent of the data described as “old” is 0.0003 percent or whatever number the bright young sprout has guess-timated. The decision is, as you might guess, dump the old files and reduce other costs immediately.
- Doing “data” or “online” is expensive, and the costs associated with each are very difficult, if not impossible to control. Neither government agencies, non-governmental outfits, the United Nations, a library in Cleveland or the estimable Harvard University have sufficient money to make available or keep at hand information. Thus, stuff disappears.
- Well-intentioned outfits like the Internet Archive or Project Gutenberg are in the same accountant ink pot. Not every Web site is indexed and archived comprehensively. Not every book that can be digitized and converted to a format someone thinks will be “forever.” As a result, one has a better chance of discovering new information browsing through donated manuscripts at the Vatican Library than running an online query.
- If something unique is online “somewhere,” that item may be unfindable. Hey, what about Duke University’s collection of “old” books from the 17th century? Who knew?
- Will a government agency archive digital content in a comprehensive manner? Nope.
The article about “risks of disappearing” is a hoot. Notice this passage:
“Our entire epistemology of science and research relies on the chain of footnotes,” explains author Martin Eve, a researcher in literature, technology and publishing at Birkbeck, University of London. “If you can’t verify what someone else has said at some other point, you’re just trusting to blind faith for artefacts that you can no longer read yourself.”
I like that word “epistemology.” Just one small problem: Trust. Didn’t the president of Stanford University have an opportunity to find his future elsewhere due to some data wonkery? Google wants to earn trust. Other outfits don’t fool around with trust; these folks gather data, exploit it, and resell it. Archiving and making it findable to a researcher or law enforcement? Not without friction, lots and lots of friction. Why verify? Estimates of non-reproducible research range from 15 percent to 40 percent of scientific, technical, and medical peer reviewed content. Trust? Hello, it’s time to wake up.
Many estimate how much new data are generated each year. I would suggest that data falling off the back end of online systems has been an active process. The first time an accountant hears the IT people say, “We can just roll off the old data and hold storage stable” is right up there with avoiding an IRS audit, finding a life partner, and billing an old person for much more than the accounting work is worth.
After 25 years, there is “risk.” Wow.
Stephen E Arnold, March 8, 2024
ACM: Good Defense or a Business Play?
March 8, 2024
This essay is the work of a dumb dinobaby. No smart software required.
Professional publishers want to use the trappings of peer review, standards, tradition, and quasi academic hoo-hah to add value to their products; others want a quasi-monopoly. Think public legal filings and stuff in high school chemistry book. The customers of professional publishers are typically not the folks at the pizza joint on River Road in Prospect, Kentucky. The business of professional publishing in an interesting one, but in the wild and crazy world of collapsing next-gen publishing, professional publishing is often ignored. A publisher conference aimed at professional publishers is quite different from the Jazz Age South by Southwest shindig.
Yep, free. Thanks, MSFT Copilot. How’s that security today?
But professional publishers have been in the news. Examples include the dust up about academics making up data. The big time president of the much-honored Stanford University took intellectual short cuts and quit late last year. Then there was the some nasty issue about data and bias at the esteemed Harvard University. Plus, a number of bookish types have guess-timated that a hefty percentage of research studies contain made-up data. Hey, you gotta publish to get tenure or get a grant, right?
But there is an intruder in the basement of the professional publishing club. The intruder positions itself in the space between the making up of some data and the professional publishing process. That intruder is ArXiv, an open-access repository of electronic preprints and postprints (known as e-prints) approved for posting after moderation, according to Wikipedia. (Wikipedia is the cancer which killed the old-school encyclopedias.) Plus, there are services which offer access to professional content without paying for the right to host the information. I won’t name these services because I have no desire to have legal eagles circle about my semi-functioning head.
Why do I present this grade-school level history? I read “CACM Is Now Open Access.” Let’s let the Association of Computing Machinery explain its action:
For almost 65 years, the contents of CACM have been exclusively accessible to ACM members and individuals affiliated with institutions that subscribe to either CACM or the ACM Digital Library. In 2020, ACM announced its intention to transition to a fully Open Access publisher within a roughly five-year timeframe (January 2026) under a financially sustainable model. The transition is going well: By the end of 2023, approximately 40% of the ~26,000 articles ACM publishes annually were being published Open Access utilizing the ACM Open model. As ACM has progressed toward this goal, it has increasingly opened large parts of the ACM Digital Library, including more than 100,000 articles published between 1951–2000. It is ACM’s plan to open its entire archive of over 600,000 articles when the transition to full Open Access is complete.
The decision was not an easy one. Money issues rarely are.
I want to step back and look at this interesting change from a different point of view:
- Getting a degree today is less of a must have than when I was a wee dinobaby. My parents told me I was going to college. Period. I learned how much effort was required to get my hands on academic journals. I was a master of knowing that Carnegie-Mellon had new but limited bound volumes of certain professional publications. I knew what journals were at the University of Pittsburgh. I used these resources when the Duquesne Library was overrun with the faithful. Now “researchers” can zip online and whip up astonishing results. Google-type researchers prefer the phrase “quantumly supreme results.” This social change is one factor influencing the ACM.
- Stabilizing revenue streams means pulling off a magic trick. Sexy conferences and special events complement professional association membership fees. Reducing costs means knocking off the now, very very expensive printing, storing, and shipping of physical journals. The ACM seems to have figured out how to keep the lights on and the computing machine types spending.
- ACM members can use ACM content the way they do a pirate library’s or the feel good ArXiv outfit. The move helps neutralize discontent among the membership, and it is good PR.
These points raise a question; to wit: In today’s world how relevant will a professional association and its professional publications be going foreword. The ACM states:
By opening CACM to the world, ACM hopes to increase engagement with the broader computer science community and encourage non-members to discover its rich resources and the benefits of joining the largest professional computer science organization. This move will also benefit CACM authors by expanding their readership to a larger and more diverse audience. Of course, the community’s continued support of ACM through membership and the ACM Open model is essential to keeping ACM and CACM strong, so it is critical that current members continue their membership and authors encourage their institutions to join the ACM Open model to keep this effort sustainable.
Yep, surviving in a world of faux expertise.
Stephen E Arnold, March 8, 2024
AI May Kill Jobs Plus It Can Kill Bambi, Koalas, and Whales
March 8, 2024
This essay is the work of a dumb dinobaby. No smart software required.Amid the AI hype is little mention of a huge problem.
As Nature’s Kate Crawford reports, “Generative AI’s Environmental Costs Are Soaring—and Mostly Secret.” Besides draining us of fresh water, AI data centers also consume immense amounts of energy. We learn:
“One assessment suggests that ChatGPT, the chatbot created by OpenAI in San Francisco, California, is already consuming the energy of 33,000 homes. It’s estimated that a search driven by generative AI uses four to five times the energy of a conventional web search. Within years, large AI systems are likely to need as much energy as entire nations.”
Even OpenAI’s head Sam Altman admits this is not sustainable, but he has a solution in mind. Is he pursuing more efficient models, or perhaps redesigning data centers? Nope. Altman’s hopes are pinned on nuclear fusion. But that technology has been “right around the corner” for the last 50 years. We need solutions now, not in 2050 or later. Sadly, it is unlikely AI companies will make the effort to find and enact those solutions unless forced to. The article notes a piece of legislation, the Artificial Intelligence Environmental Impacts Act of 2024, has finally been introduced in the Senate. But in the unlikely event the bill makes it through the House, it may be too feeble to make a real difference. Crawford considers:
“To truly address the environmental impacts of AI requires a multifaceted approach including the AI industry, researchers and legislators. In industry, sustainable practices should be imperative, and should include measuring and publicly reporting energy and water use; prioritizing the development of energy-efficient hardware, algorithms, and data centers; and using only renewable energy. Regular environmental audits by independent bodies would support transparency and adherence to standards. Researchers could optimize neural network architectures for sustainability and collaborate with social and environmental scientists to guide technical designs towards greater ecological sustainability. Finally, legislators should offer both carrots and sticks. At the outset, they could set benchmarks for energy and water use, incentivize the adoption of renewable energy and mandate comprehensive environmental reporting and impact assessments. The Artificial Intelligence Environmental Impacts Act is a start, but much more will be needed — and the clock is ticking.”
Tick. Tock. Need a dead dolphin? Use a ChatGPT-type system.
Cynthia Murrell, March 8, 2024
Engineering Trust: Will Weaponized Data Patch the Social Fabric?
March 7, 2024
This essay is the work of a dumb dinobaby. No smart software required.
Trust is a popular word. Google wants me to trust the company. Yeah, I will jump right on that. Politicians want me to trust their attestations that citizen interest are important. I worked in Washington, DC, for too long. Nope, I just have too much first-hand exposure to the way “things work.” What about my bank? It wants me to trust it. But isn’t the institution the subject of a a couple of government investigations? Oh, not important. And what about the images I see when I walk gingerly between the guard rails. I trust them right? Ho ho ho.
In our post-Covid, pre-US national election, the word “trust” is carrying quite a bit of freight. Whom to I trust? Not too many people. What about good old Socrates who was an Athenian when Greece was not yet a collection of ferocious football teams and sun seekers. As you may recall, he trusted fellow residents of Athens. He end up dead from either a lousy snack bar meal and beverage, or his friends did him in.
One of his alleged precepts in his pre-artificial intelligence worlds was:
“We cannot live better than in seeking to become better.” — Socrates
Got it, Soc.
Thanks MSFT Copilot and provider of PC “moments.” Good enough.
I read “Exclusive: Public Trust in AI Is Sinking across the Board.” Then I thought about Socrates being convicted for corruption of youth. See. Education does not bring unlimited benefits. Apparently Socrates asked annoying questions which open him to charges of impiety. (Side note: Hey, Socrates, go with the flow. Just pray to the carved mythical beast, okay?)
A loss of public trust? Who knew? I thought it was common courtesy, a desire to discuss and compromise, not whip out a weapon and shoot, bludgeon, or stab someone to death. In the case of Haiti, a twist is that a victim is bound and then barbequed in a steel drum. Cute and to me a variation of stacking seven tires in a pile dousing them with gasoline, inserting a person, and igniting the combo. I noted a variation in the Ukraine. Elderly women make cookies laced with poison and provide them to special operation fighters. Subtle and effective due to troop attrition I hear. Should I trust US Girl Scout cookies? No thanks.
What’s interesting about the write up is that it provides statistics to back up this brilliant and innovative insight about modern life is its focus on artificial intelligence. Let me pluck several examples from the dot point filled write up:
- “Globally, trust in AI companies has dropped to 53%, down from 61% five years ago.”
- “Trust in AI is low across political lines. Democrats trust in AI companies is 38%, independents are at 25% and Republicans at 24%.”
- “Eight years ago, technology was the leading industry in trust in 90% of the countries Edelman studies. Today, it is the most trusted in only half of countries.”
AI is trendy; crunchy click bait is highly desirable even for an estimable survivor of Silicon Valley style news reporting.
Let me offer several observations which may either be troubling or typical outputs from a dinobaby working in an underground computer facility:
- Close knit groups are more likely to have some concept of trust. The exception, of course, is the behavior of the Hatfields and McCoys
- Outsiders are viewed with suspicion. Often for now reason, a newcomer becomes the default bad entity
- In my lifetime, I have watched institutions take actions which erode trust on a consistent basis.
Net net: Old news. AI is not new. Hyperbole and click obsession are factors which illustrate the erosion of social cohesion. Get used to it.
Stephen E Arnold, March 7, 2024
NSO Group: Pegasus Code Wings Its Way to Meta and Mr. Zuckerberg
March 7, 2024
This essay is the work of a dumb dinobaby. No smart software required.
NSO Group’s senior managers and legal eagles will have an opportunity to become familiar with an okay Brazilian restaurant and a waffle shop. That lovable leader of Facebook, Instagram, Threads, and WhatsApp may have put a stick in the now-ageing digital bicycle doing business as NSO Group. The company’s mark is pegasus, which is a flying horse. Pegasus’s dad was Poseidon, and his mom was the knock out Gorgon Medusa, who did some innovative hair treatments. The mythical pegasus helped out other gods until Zeus stepped in an acted with extreme prejudice. Quite a myth.
Poseidon decides to kill the mythical Pegasus, not for its software, but for its getting out of bounds. Thanks, MSFT Copilot. Close enough.
Life imitates myth. “Court Orders Maker of Pegasus Spyware to Hand Over Code to WhatsApp” reports that the hand over decision:
is a major legal victory for WhatsApp, the Meta-owned communication app which has been embroiled in a lawsuit against NSO since 2019, when it alleged that the Israeli company’s spyware had been used against 1,400 WhatsApp users over a two-week period. NSO’s Pegasus code, and code for other surveillance products it sells, is seen as a closely and highly sought state secret. NSO is closely regulated by the Israeli ministry of defense, which must review and approve the sale of all licenses to foreign governments.
NSO Group hired former DHS and NSA official Stewart Baker to fix up NSO Group gyro compass. Mr. Baker, who is a podcaster and affiliated with the law firm Steptoe and Johnson. For more color about Mr. Baker, please scan “Former DHS/NSA Official Stewart Baker Decides He Can Help NSO Group Turn A Profit.”
A decade ago, Israel’s senior officials might have been able to prevent a social media company from getting a copy of the Pegasus source code. Not anymore. Israel’s home-grown intelware technology simply did not thwart, prevent, or warn about the Hamas attack in the autumn of 2023. If NSO Group were battling in court with Harris Corp., Textron, or Harris Corp., I would not worry. Mr. Zuckerberg’s companies are not directly involved with national security technology. From what I have heard at conferences, Mr. Zuckerberg’s commercial enterprises are responsive to law enforcement requests when a bad actor uses Facebook for an allegedly illegal activity. But Mr. Zuckerberg’s managers are really busy with higher priority tasks. Some folks engaged in investigations of serious crimes must be patient. Presumably the investigators can pass their time scrolling through #Shorts. If the Guardian’s article is accurate, now those Facebook employees can learn how Pegasus works. Will any of those learnings stick? One hopes not.
Several observations:
- Companies which make specialized software guard their systems and methods carefully. Well, that used to be true.
- The reorganization of NSO Group has not lowered the firm’s public relations profile. NSO Group can make headlines, which may not be desirable for those engaged in national security.
- Disclosure of the specific Pegasus systems and methods will get a warm, enthusiastic reception from those who exchange ideas for malware and related tools on private Telegram channels, Dark Web discussion groups, or via one of the “stealth” communication services which pop up like mushrooms after rain in rural Kentucky.
Will the software Pegasus be terminated? I remain concerned that source code revealing how to perform certain tasks may lead to downstream, unintended consequences. Specialized software companies try to operate with maximum security. Now Pegasus may be flying away unless another legal action prevents this.
Where is Zeus when one needs him?
Stephen E Arnold, March 7, 2024
AI and Warfare: Gaza Allegedly Experiences AI-Enabled Drone Attacks
March 7, 2024
This essay is the work of a dumb dinobaby. No smart software required.
We have officially crossed a line. DeepNewz reveals: “AI-Enabled Military Tech and Indian-Made Hermes 900 Drones Deployed in Gaza.” It this what they mean by “helpful AI”? We cannot say we are surprised. The extremely brief write-up tells us:
“Reports indicate that Israel has deployed AI-enabled military technology in Gaza, marking the first known combat use of such technology. Additionally, Indian-made Hermes 900 drones, produced in collaboration between Adani‘s company and Elbit Systems, are set to join the Israeli army’s fleet of unmanned aerial vehicles. This development has sparked fears about the implications of autonomous weapons in warfare and the role of Indian manufacturing in the conflict in Gaza. Human rights activists and defense analysts are particularly worried about the potential for increased civilian casualties and the further escalation of the conflict.”
On a minor but poetic note, a disclaimer states the post was written with ChatGPT. Strap in, fellow humans. We are just at the beginning of a long and peculiar ride. How are those assorted government committees doing with their AI policy planning?
Cynthia Murrell, March 7, 2024
Kagi Hitches Up with Wolfram
March 6, 2024
This essay is the work of a dumb dinobaby. No smart software required.
“Kagi + Wolfram” reports that the for-fee Web search engine with AI has hooked up with one of the pre-eminent mathy people innovating today. The write up includes PR about the upsides of Kagi search and Wolfram’s computational services. The article states:
…we have partnered with Wolfram|Alpha, a well-respected computational knowledge engine. By integrating Wolfram Alpha’s extensive knowledge base and robust algorithms into Kagi’s search platform, we aim to deliver more precise, reliable, and comprehensive search results to our users. This partnership represents a significant step forward in our goal to provide a search engine that users can trust to find the dependable information they need quickly and easily. In addition, we are very pleased to welcome Stephen Wolfram to Kagi’s board of advisors.
The basic wagon gets a rethink with other animals given a chance to make progress. Thanks, MSFT Copilot. Good enough, but in truth I gave up trying to get a similar image with the dog replaced by a mathematician and the pig replaced with a perky entrepreneur.
The integration of mathiness with smart search is a step forward, certainly more impressive than other firms’ recycling of Web content into bubble gum cards presenting answer. Kagi is taking steps — small, methodical ones — toward what I have described as “search enabled applications” and my friend Dr. Greg Grefenstette described in his book with the snappy title “Search-Based Applications: At the Confluence of Search and Database Technologies (Synthesis Lectures on Information Concepts, Retrieval, and Services, 17).”
It may seem like a big step from putting mathiness in a Web search engine to creating a platform for search enabled applications. It may be, but I like to think that some bright young sprouts will figure out that linking a mostly brain dead legacy app with a Kagi-Wolfram service might be useful in a number of disciplines. Even some super confident really brilliantly wonderful Googlers might find the service useful.
Net net: I am gratified that Kagi’s for-fee Web search is evolving. Google’s apparent ineptitude might give Kagi the chance Neeva never had.
Stephen E Arnold, March 6, 2024
Philosophy and Money: Adam Smith Remains Flexible
March 6, 2024
This essay is the work of a dumb dinobaby. No smart software required.
In the early twenty-first century, China was slated to overtake the United States as the world’s top economy. Unfortunately for the “sleeping dragon,” China’s economy has tanked due to many factors. The country, however, still remains a strong spot for technology development such as AI and chips. The Register explains why China is still doing well in the tech sector: “How Did China Get So Good At Chips And AI? Congressional Investigation Blames American Venture Capitalists.”
Venture capitalists are always interested in increasing their wealth and subverting anything preventing that. While the US government has choked China’s semiconductor industry and denying it the use of tools to develop AI, venture capitalists are funding those sectors. The US’s House Select Committee on the China Communist Party (CCP) shared that five venture capitalists are funneling billions into these two industries: Walden International, Sequoia Capital, Qualcomm Ventures, GSR Ventures, and GGV Capital. Chinese semiconductor and AI businesses are linked to human rights abuses and the People’s Liberation Army. These five venture capitalist firms don’t appear interested in respecting human rights or preventing the spread of communism.
The House Select Committee on the CCP discovered that one $1.9 million went to AI companies that support China’s mega-surveillance state and aided in the Uyghur genocide. The US blacklisted these AI-related companies. The committee also found that $1.2 bullion was sent to 150 semiconductor companies.
The committee also accused of sharing more than funding with China:
“The committee also called out the VCs for "intangible" contributions – including consulting, talent acquisition, and market opportunities. In one example highlighted in the report, the committee singled out Walden International chairman Lip-Bu Tan, who previously served as the CEO of Cadence Design Systems. Cadence develops electronic design automation software which Chinese corporates, like Huawei, are actively trying to replicate. The committee alleges that Tan and other partners at Walden coordinated business opportunities and provided subject-matter expertise while holding board seats at SMIC and Advanced Micro-Fabrication Equipment Co. (AMEC).”
Sharing knowledge and business connections is equally bad (if not worse) than funding China’s tech sector. It’s like providing instructions and resources on how to build nuclear weapon. If China only had the resources it wouldn’t be as frightening.
Whitney Grace, March 6, 2024
Poohbahs Poohbahing: Just Obvious Poohbahing
March 6, 2024
This essay is the work of a dumb dinobaby. No smart software required.
We’re already feeling the effects of AI technology in deepfake videos and soundbites and generative text. While our present circumstances are our the beginning of AI technology, so-called experts are already claiming AI has gone bananas. The Verge, a popular Silicon Valley news outlet, released a new podcast episode where they declare that, “The AIs Are Officially Out Of Control.”
AI generated images and text aren’t 100% accurate. AI images are prone to include extra limbs, false representations of people, and even entirely miss the prompt. AI generative text is about as accurate as a Wikipedia article, so you need to double check and edit the response. Unfortunately AI are only as smart as the datasets that program them. AIs have been called “racist”and “sexist” due to limited data. Google Gemini also has gone too far on diversity and inclusion returning images that aren’t historically accurate when asked to deliver.
The podcast panelists made an obvious point when the pundits said that Google’s results qualities have declined. Bad SEO, crappy content, and paid results pollute search. They claim that the best results Google returns are coming from Reddit posts. Reddit is a catch-all online forum that Google recently negotiated deal with to use its content to train AI. That’s a great idea, especially when Reddit is going public on the stock market.
The problem is that Reddit is full of trolls who do things for %*^ and giggles. While Reddit is a brilliant source of information because it is created by real people, the bad actors will train the AI-chatbots to be “racist” and “sexist” like previous iterations. The worst incident involves ethnically diverse Nazis:
“Google has apologized for what it describes as “inaccuracies in some historical image generation depictions” with its Gemini AI tool, saying its attempts at creating a “wide range” of results missed the mark. The statement follows criticism that it depicted specific white figures (like the US Founding Fathers) or groups like Nazi-era German soldiers as people of color, possibly as an overcorrection to long-standing racial bias problems in AI.”
I am not sure which is the problem: Uninformed generalizations, flawed AI technology capable of zapping billions in a few hours, or minimum viable products are the equivalent of a blue jay fouling up a sparrow’s nest. Chirp. Chirp. Chirp.
Whitney Grace, March 6, 2024
The RCMP: Monitoring Sparks Criticism
March 5, 2024
This essay is the work of a dumb dinobaby. No smart software required.
The United States and United Kingdom receive bad reps for monitoring their citizens’ Internet usage. Thankfully it is not as bad as China, Russia, and North Korea. The “hat” of the United States is hardly criticized for anything, but even Canada has its foibles. Canada’s Royal Canadian Mounted Police (RCMP) is in water hot enough to melt all its snow says The Madras Tribune: “RCMP Slammed For Private Surveillance Used To Trawl Social Media, ‘Darknet’.”
It’s been known that the RCMP has used private surveillance tools to monitor public facing information and other social media since 2015. The Privacy Commissioner of Canada (OPC) revealed that when the RCMP was collecting information, the police force failed to comply with privacy laws. The RCMP also doesn’t agree with the OPC’s suggestions to make their monitoring activities with third party vendors more transparent. The RCMP also argued that because they were using third party vendors they weren’t required to ensure that information was collected according to Canadian law.
The Mounties’ non-compliance began in 2014 after three police officers were shot. An information monitoring initiative called Project Wideawake started and it involved the software Babel X from Babel Street, a US threat intelligence company. Babel X allowed the RCMP to search social media accounts, including private ones, and information from third party data brokers.
Despite the backlash, the RCMP will continue to use Babel X:
“ ‘Despite the gaps in (the RCMP’s) assessment of compliance with Canadian privacy legislation that our report identifies, the RCMP asserted that it has done enough to review Babel X and will therefore continue to use it,’ the report noted. ‘In our view, the fact that the RCMP chose a subcontracting model to pay for access to services from a range of vendors does not abrogate its responsibility with respect to the services that it receives from each vendor.’”
Canada might be the politest of country in North America, but its government hides a facade dedicated to law enforcement as much as the US.
Whitney Grace, March 5, 2024