Which Is It, City of Columbus: Corrupted or Not Corrupted Data

August 23, 2024

green-dino_thumb_thumb_thumb_thumb_thumb_thumb_thumb_thumb_thumb_thumb_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

I learned that Columbus, Ohio, suffered one of those cyber security missteps. But the good news is that I learned from the ever reliable Associated Press, “Mayor of Columbus, Ohio, Says Ransomware Attackers Stole Corrupted, Unusable Data.” But then I read the StateScoop story “Columbus, Ohio, Ransomware Data Might Not Be Corrupted After All.”

image

The answer is, “I don’t know.” Thanks, MSFT Copilot. Good enough.

The story is a groundhog day tale. A bad actor compromises a system. The bad actor delivers ransomware. The senior officers know little about ransomware and even less about the cyber security systems marketed as a proactive, intelligent defense against bad stuff like ransomware. My view, as you know, is that it is easier to create sales decks and marketing collateral than it is is to deliver cyber security software that works. Keep in mind that I am a dinobaby. I like products that under promise and over deliver. I like software that works, not sort of works or mostly works. Works. That’s it.

What’s interesting about Columbus other than its zoo, its annual flower festival, and the OCLC organization is that no one can agree on this issue. I believe this is a variation on the Bud Abbott and Lou Costello routine “Who’s on First.”

StateScoop’s story reported:

An anonymous cybersecurity expert told local news station WBNS Tuesday that the personal information of hundreds of thousands of Columbus residents is available on the dark web. The claim comes one day after Columbus Mayor Andrew Ginther announced to the public that the stolen data had been “corrupted” and most likely “unusable.” That assessment was based on recent findings of the city’s forensic investigation into the incident.

The article noted:

Last week, the city shared a fact sheet about the incident, which explains: “While the city continues to evaluate the data impacted, as of Friday August 9, 2024, our data mining efforts have not revealed that any of the dark web-posted data includes personally identifiable information.”

What are the lessons I have learned from these two stories about a security violation and ransomware extortion?

  1. Lousy cyber security is a result of indifferent (maybe lousy) management? How do I know? The City of Columbus cannot generate a consistent story.
  2. The compromised data were described in two different and opposite ways. The confusion underscores that the individuals involved are struggling with basic data processes. Who’s on first? I don’t know. No, he’s on third.
  3. The generalization that no one wants the data misses an important point. Data, once available, is of considerable interest to state actors who might be interested in the employees associated with either the university, Chemical Abstracts, or some other information-centric entity in Columbus, Ohio.

Net net: The incident is one more grim reminder of the vulnerabilities which “managers” choose to ignore or leave to people who may lack certain expertise. The fix may begin in the hiring process.

Stephen E Arnold, August 23, 2024

Phishers: Targeting Government Contract Shoemakers Who Do Not Have Shoes But Talk about Them

August 22, 2024

green-dino_thumb_thumb_thumb_thumb_thumb_thumb_thumb_thumb_thumb_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

The proverb "The shoemaker’s children go barefoot" has inspired some bad actors who phish for online credentials. The obvious targets, some might suggest, are executives at major US government agencies. Those individuals are indeed targets, but a number of bad actors have found ways to get a GS-9 to click on a link designed to steal credentials. An even more promising barrel containing lots of fish may be the vendors who sell professional services, including cyber security, to the US government agencies.

image

Of course, our systems are secure. Thanks, MSFT Copilot. How is Word doing today? Still crashing?

This Sophisticated New Phishing Campaign Is Going after US Government Contractors” explains:

Researchers from Perception Point revealed the “Uncle Scam” campaign bypasses security checks to deliver sophisticated phishing emails designed by LLMs to be extremely convincing. The attackers use advanced tools, including AI-powered phishing kits and the Microsoft Dynamics 365 platform, to execute convincing multi-step attacks.

The write up then reveals one of the key — maybe the principal key to success:

One of the key elements that makes this phishing campaign particularly effective is the abuse of Microsoft’s Dynamics 365 Marketing platform. The attackers leverage the domain "dyn365mktg.com," associated with Dynamics 365, to send out their malicious emails. Because this domain is pre-authenticated by Microsoft and complies with DKIM and SPF standards, phishing emails are more likely to bypass spam filters and reach the inboxes of unsuspecting recipients.

If I understand this statement, the recipient sees email with a pattern set up to suck credentials. Why would a government contractor click on such an email? The domain is “pre-authenticated by Microsoft.” If it looks like a duck and walks like a duck, the email must be a duck. Yes, it is a digital duck which is designed to take advantage of yet another “security” and “trust” facet of the Microsoft ecosystem.

I found this series of statements interesting. Once again, the same old truisms are trotted out to help a victim avoid a similar problem in the future. I quote:

To safeguard your organization from falling victim to sophisticated phishing attacks like "Uncle Scam," Perception Point recommends taking the following precautions:

  • Double-check the Sender’s Email: Always scrutinize the sender’s email address for any signs of impersonation.
  • Hover Before You Click: Before clicking any link, hover over it to reveal the actual URL and ensure it is legitimate. 
  • Look for Errors: Pay attention to minor grammatical mistakes, unusual phrasing, or inconsistencies in the email content.
  • Leverage Advanced Detection Tools: Implement AI-powered multi-layered security solutions to detect and neutralize sophisticated phishing attempts.
  • Educate Your Team: Regularly train employees on how to identify phishing emails and the importance of verifying unsolicited communications.
  • Trust Your Instincts: If an email or offer seems too good to be true, it probably is. Always verify the authenticity of such communications through trusted channels.

How well do these tips work in today’s government contractor workspace? Answer: Not too well.

The issue is the underlying software. The fix is going to be difficult to implement. Microsoft is working to make its systems more secure. The government contractors can make shoes in the form of engineering change orders, scope changes, and responses to RFQs which hit every requirement in the RFP. But many of those firms have assumed that the cyber security systems will do their job.

Ignorance is bliss. Maybe not for the compromised contractor, but the bad actors are enjoying the Uncle Scam play and may for years to come.

Stephen E Arnold, August 22, 2024

AI Balloon: Losing Air and Boring People

August 22, 2024

Though tech bros who went all-in on AI still promise huge breakthroughs just over the horizon, Windows Central’s Kevin Okemwa warns: “The Generative AI Bubble Might Burst, Sending the Tech to an Early Deathbed Before Its Prime: ‘Don’t Believe the Hype’.” Sadly, it is probably too late to save certain career paths, like coding, from an AI takeover. But perhaps a slowdown would conserve some valuable resources. Wouldn’t that be nice? The write-up observes:

“While AI has opened up the world to endless opportunities and untapped potential, its hype might be short-lived, with challenges abounding. Aside from its high water and power demands, recent studies show that AI might be a fad and further claim that 30% of its projects will be abandoned after proof of concept. Similar sentiments are echoed in a recent Blood In The Machine newsletter, which points out critical issues that might potentially lead to ‘the beginning of the end of the generative AI boom.’ From the Blood in the Machine newsletter analysis by Brian Merchant, who is also the Los Angeles Times’ technology columnist:

‘This is it. Generative AI, as a commercial tech phenomenon, has reached its apex. The hype is evaporating. The tech is too unreliable, too often. The vibes are terrible. The air is escaping from the bubble. To me, the question is more about whether the air will rush out all at once, sending the tech sector careening downward like a balloon that someone blew up, failed to tie off properly, and let go—or, more slowly, shrinking down to size in gradual sputters, while emitting embarrassing fart sounds, like a balloon being deliberately pinched around the opening by a smirking teenager.’”

Such evocative imagery. Merchant’s article also notes that, though Enterprise AI was meant to be the way AI firms made their money, it is turning out to be a dud. There are several reasons for this, not the least of which is AI models’ tendency to “hallucinate.”

Okemwa offers several points to support Merchant’s deflating-balloon claim. For example, Microsoft was recently criticized by investors for wasting their money on AI technology. Then there NVIDIA: The chipmaker recently became the most valuable company in the world thanks to astronomical demand for its hardware to power AI projects. However, a delay of its latest powerful chip dropped its stock’s value by 5%, and market experts suspect its value will continue to decline. The write-up also points to trouble at generative AI’s flagship firm, OpenAI. The company is plagued by a disturbing exodus of top executives, rumors of pending bankruptcy, and a pesky lawsuit from Elon Musk.

Speaking of Mr. Musk, how do those who say AI will kill us all respond to the potential AI downturn? Crickets.

Cynthia Murrell, August 22, 2024

Cyber Security Outfit Wants Its Competition to Be Better Fellow Travelers

August 21, 2024

green-dino_thumb_thumb_thumb_thumb_t[2]This essay is the work of a dumb dinobaby. No smart software required.

I read a write up which contains some lingo that is not typical Madison Avenue sales speak. The sort of odd orange newspaper published “CrowdStrike Hits Out at Rivals’ Shady Attacks after Global IT Outage.” [This is a paywalled story, gentle reader. Gone are the days when the orange newspaper was handed out in Midtown Manhattan.] CrowdStrike is a company with interesting origins. The firm has become a player in the cyber security market, and it has been remarkably successful. Microsoft — definitely a Grade A outfit focused on making system administrators’ live as calm as Lake Paseco on summer morning — allowed CrowdStrike to interact with the most secure component of its software.

What does the leader of CrowdStrike reveal? Let’s take a quick look at a point or two.

First, I noted this passage from the write up which seems a bit a proactive tactic to make sure those affected by the tiny misstep know that software is not perfect. I mean who knew?

CrowdStrike’s president hit out at “shady” efforts by its cyber security rivals to scare its customers and steal market share in the month since its botched software update sparked a global IT outage. Michael Sentonas told the Financial Times that attempts by competitors to use the July 19 disruption to promote their own products were “misguided”.

I am not sure what misguided means, but I think the idea is that competitors should not try to surf on the little ripples the CrowdStrike misstep caused. A few airline passengers were inconvenienced, sure. But that happens anyway. The people in hospitals whose surgeries were affected seem to be mostly okay in a statistical sense. And those interrupted financial transactions. No big deal. The market is chugging along.

image

Cyber vendors are ready and eager to help those with a problematic and possibly dangerous vehicle. Thanks, MSFT Copilot. Are you hands full today?

I also circled this passage:

SentinelOne chief executive Tomer Weingarten said the global shutdown was the result of “bad design decisions” and “risky architecture” at CrowdStrike, according to trade magazine CRN. Alex Stamos, SentinelOne’s chief information security officer, warned in a post on LinkedIn it was “dangerous” for CrowdStrike “to claim that any security product could have caused this kind of global outage”.

Yep, dangerous. Other vendors’ software are unlikely to create a CrowdStrike problem. I like this type of assertion. Also, I find the ambulance-chasing approach to closing deals and boosting revenue a normal part of some companies’ marketing. I think one outfit made FED or fear, uncertainty, and doubt a useful wrench in the firm’s deal-closing guide to hitting a sales target. As a dinobaby, I could be hallucinating like some of the smart software and the even smarter top dogs in cyber security companies.

I have to include this passage from the orange outfit’s write up:

Sentonas [a big dog at CrowdStrike], who this month went to Las Vegas to accept the Pwnie Award for Epic Fail at the 2024 security conference Def Con, dismissed fears that CrowdStrike’s market dominance would suffer long-term damage. “I am absolutely sure that we will become a much stronger organization on the back of something that should never have happened,” he said. “A lot of [customers] are saying, actually, you’re going to be the most battle-tested security product in the industry.”

The Def Con crowd was making fun of CrowdStrike for is inconsequential misstep. I assume CrowdStrike’s leadership realizes that the award is like a having the “old” Mad Magazine devote a cover to a topic.

My view is that [a] the incident will be forgotten. SolarWinds seems to be fading as an issue in the courts and in some experts’ List of Things to Worry About. [b] Microsoft and CrowdStrike can make marketing hay by pointing out that each company has addressed the “issue.” Life will be better going forward. And, [c] Competitors will have to work overtime to cope with a sales retention tactic more powerful than any PowerPoint or PR campaign — discounts, price cuts, and free upgrades to AI-infused systems.

But what about that headline? Will cyber security marketing firms change their sales lingo and tell the truth? Can one fill the tank of a hydrogen-powered vehicle in Eastern Kentucky?

PS. Buying cyber security, real-time alerts, and other gizmos allow an organization to think, “We are secure, right?”

Stephen E Arnold, August 21, 2024

Threat. What Threat? Google Does Not Behave Improperly. No No No.

August 21, 2024

green-dino_thumb_thumb_thumb_thumb_thumb_thumb_thumb_thumb_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

Amazing write up from a true poohbah publication: “Google Threatened Tech Influencers Unless They Preferred the Pixel.” Even more amazing is the Googley response: “We missed the mark?”

image

Thanks, MSFT Copilot. Good enough.

Let’s think about this.

The poohbah publication reports:

A Pixel 9 review agreement required influencers to showcase the Pixel over competitors or have their relationship terminated. Google now says the language ‘missed the mark.’

What?

I thought Google was working overtime to build relationships and develop trust. I thought Google was characterized unfairly as a monopolist. I thought Google had some of that good old “Do no evil” DNA.

These misconceptions demonstrate how out of touch a dinobaby like me can be.

The write up points out:

The Verge has independently confirmed screenshots of the clause in this year’s Team Pixel agreement for the new Pixel phones, which various influencers began posting on X and Threads last night. The agreement tells participants they’re “expected to feature the Google Pixel device in place of any competitor mobile devices.” It also notes that “if it appears other brands are being preferred over the Pixel, we will need to cease the relationship between the brand and the creator.” The link to the form appears to have since been shut down.

Does that sound like a threat? As a dinobaby and non-influencer, I think the Google is just trying to prevent miscreants like those people posting information about Russia’s special operation from misinterpreting the Pixel gadgets. Look. Google was caught off guard and flipped into Code Red or whatever. Now the Gemini smart software is making virtually everyone’s life online better.

I think the Google is trying to be “honest.” The term, like the word “ethical”, can house many means. Consequently non-Googley phones, thoughts, ideas, and hallucinations are not permitted. Otherwise what? The write up explains:

Those terms certainly caused confusion online, with some assuming such terms apply to all product reviewers. However, that isn’t the case. Google’s official Pixel review program for publications like The Verge requires no such stipulations. (And, to be clear, The Verge would never accept such terms, in accordance with our ethics policy.)

The poohbah publication has ethics. That’s super.

Here’s the “last words” in the article about this issue that missed the mark:

Influencer is a broad term that encompasses all sorts of creators. Many influencers adhere to strict ethical standards, but many do not. The problem is there are no guidelines to follow and limited disclosure to help consumers if what they’re reading or watching was paid for in some way. The FTC is taking some steps to curtail fake and misleading reviews online, but as it stands right now, it can be hard for the average person to spot a genuine review from marketing. The Team Pixel program didn’t create this mess, but it is a sobering reflection of the murky state of online reviews.

Why would big outfits appear to threaten people? There are no consequences. And most people don’t care. Threats are enervating. There’s probably a course at Stanford University on the subject.

Net net: This is new behavior? Nope. It is characteristic of a largely unregulated outfit with lots of money which, at the present time, feels threatened. Why not do what’s necessary to remain wonderful, loved, and trusted. Or else!

Stephen E Arnold, August 21, 2024

Moving Quickly: School Cell Phone Bans

August 21, 2024

In a victory for common sense, 9to5Mac reports, “More Schools Banning Students from Using Smartphones During Class Time.” Proponents of bans argue they improve learning outcomes and reduce classroom disruption. To which we reply: well, duh. They also claim bans protect children from cyberbullying. Maybe. Writer Ben Lovejoy states:

“More schools are banning students from using smartphones in classes, with calls for a federal ban rather than the current mix of state laws. Apple’s home state of California is expected to be the next state to introduce a ban. Orlando has so far taken the toughest line, banning smartphone use during the entire day, and blocking access to social media networks on the school Wi-Fi. Worldwide, around one in four countries has implemented bans or restrictions on the use of smartphones in schools. A 9to5Mac poll conducted a year ago found strong support for the same happening in the US, with 73% in favor and only 21% opposed. … Within the US, four states have already implemented bans, or are in the process of doing so: Florida, Indiana, Louisiana, and South Carolina. Exact policies vary. Some schools allow phones to used during breaks, while the strictest insist that they are placed in lockers or other safe places at the beginning of the school day, and not retrieved until the end of the day.

“Cellphone-free education” laws in Minnesota and Ohio will go into effect next year. The governors of California, Virginia, and New York indicate their states may soon follow suit. Meanwhile, according to a survey by the National Parents Union, 70% of parents support bans. But most want students to have access to their phones during lunchtime and other official breaks. Whether just during class times or all day, it can be expensive to implement these policies.

“Pennsylvania recently allotted millions of dollars in grants for schools to purchase lockable bags to store pupils’ phones while Delaware recently allocated $250,000 for schools to test lockable phone pouches.”

Leaving phones at home is not an option—today’s parents would never stand for it. The days of being unable to reach one’s offspring for hours at a time are long gone. How did parents manage to live with that for thousands of years?

Cynthia Murrell, August 21, 2024

Good Enough: The New Standard of Excellence

August 20, 2024

green-dino_thumb_thumb_thumb_thumb_t[1]This essay is the work of a dumb dinobaby. No smart software required.

I read an interesting essay about software development. “[The] Biggest Productivity Killers in the Engineering Industry” presents three issues which add to the time and cost of a project. Let’s look at each of these factors and then one trivial downstream consequence of implementing these productivity touchpoints.

The three killers are:

  1. Working on a project until it meets one’s standards of “perfectionism.” Like “love” and “ethics”, perfectionism is often hard to define without a specific context. A designer might look at an interface and its colors and say, “It’s perfect.” The developer or, heaven forbid, the client looks and says, “That sucks.” Oh, oh.
  2. Stalling; that is, not jumping right into a project and making progress. I worked at an outfit which valued what it called “an immediate and direct response.” The idea is that action is better than reaction. Plus is demonstrates that one is not fooling around.
  3. Context switching; that is, dealing with other priorities or interruptions.

I want to highlight one of these “killers” — The need for “good enough.” The essay contains some useful illustrations. Here’s the one for the perfectionism-good enough trade off. The idea is pretty clear. As one chases getting the software or some other task “perfect” means that more time is required. The idea is that if something takes too long, then the value of chasing perfectionism hits a cost wall. Therefore, one should trade off time and value by turning in the work when it is good enough.

image

The logic is understandable. I do have one concern not addressed in the essay. I believe my concern applies to the other two productivity killers, stalling and interruptions (my term for context switching).

What is this concern?

How about doors falling off aircraft, stranded astronauts, cybersecurity which fails to protect Social Security Numbers, and city governments who cannot determine if compromised data were “good” or “corrupted.” We just know the data were compromised. There are other examples; for instance, the CrowdStrike misstep which affected only a few million people. How did CrowdStrike happen? My hunch is that “good enough” thinking was involved along with someone putting off making sure the internal controls were actually controlling and interruptions so the person responsible for software controls was pulled into a meeting instead of finishing and checking his or her work.

The difficulty is composed of several capabilities; specifically:

  1. Does the person doing the job know how to make it work in a good enough manner? In my experience, the boss may not and simply wants the fix implemented now or the product shipped immediately.
  2. Does the company have a culture of excellence or is it similar to big outfits which cannot deliver live streaming content, allow reviewers to write about a product without threatening them, or provide tactics which kill people because no one on the team understands the concept of ethical behavior? Frankly, today I am not sure any commercial enterprise cares about much other than revenue.
  3. Does anyone in a commercial organization have responsibility to determine the practical costs of shipping a product or delivering a service that does not deliver reliable outputs? Reaction to failed good enough products and services is, in my opinion, the management method applied to downstream problems.

Net net: Good enough, like it or not, is the new gold standard. Or, is that standard like the Olympic medals, an amalgam. The “real” gold is a veneer; the “good” is a coating on enough.

Stephen E Arnold, August 20, 2024

x

x

A Xoogler Reveals Why Silicon Valley Is So Trusted, Loved, and Respected

August 20, 2024

green-dino_thumb_thumb_thumb_thumb_thumb_thumb_thumb_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

Amazing as it seems, a Xoogler — in this case, the former adult at Google — is spilling the deepest, darkest secrets of success. The slick executive gave a talk at Stanford University. Was the talk a deep fake? I heard that the video was online and then disappeared. Then I spotted a link to a video which purported to be the “real” Ex-Google CEO’s banned interview. It may or may not be at this link because… Google’s policies about censorship are mysterious to me.

image

Which persona is real: The hard edged executive or the bad actor in the mirror? Thanks, MSFT Copilot. How is the security effort going?

Let’s cut to the chase. I noted the Wall Street Journal’s story “Eric Schmidt Walks Back Claim Google Is Behind on AI Because of Remote Work.” Someone more alert than I noticed an interesting comment; to wit:

If TikTok is banned, here’s what I propose each and every one of you do: Say to your LLM the following: “Make me a copy of TikTok, steal all the users, steal all the music, put my preferences in it, produce this program in the next 30 seconds, release it, and in one hour, if it’s not viral, do something different along the same lines.” That’s the command. Boom, boom, boom, boom. So, in the example that I gave of the TikTok competitor — and by the way, I was not arguing that you should illegally steal everybody’s music — what you would do if you’re a Silicon Valley entrepreneur, which hopefully all of you will be, is if it took off, then you’d hire a whole bunch of lawyers to go clean the mess up, right? But if nobody uses your product, it doesn’t matter that you stole all the content.
And do not quote me.

I want to point out that this snip comes from the Slashdot post from Msmash on August 16, 2024.

Several points dug into my dinobaby brain:

  1. Granting an interview, letting it be captured to video, and trying to explain away the remarks strikes me as a little wild and frisky. Years ago, this same Googler was allegedly annoyed when an online publication revealed facts about him located via Google.
  2. Remaining in the news cycle in the midst of a political campaign, a “special operation” in Russia, and the wake of the Department of Justice’s monopoly decision is interesting. Those comments, like the allegedly accurate one quoted above, put the interest in the Google on some people’s radar. Legal eagles are your sensing devices beeping?
  3. The entitled behavior of saying one thing to students and then mansplaining that the ideas were not reflective of the inner self is an illustration of behavior my mother would have found objectionable. I listened to my mother. To whom does the Xoogler listen?

Net net: Stanford’s president was allegedly making up information and he subsequently resigned. Now a guest lecturer explains that it is okay to act in what some might call an illegal manner. What are those students learning? I would assert that it is not ethical behavior.

Stephen E Arnold, August 20, 2024

Microsoft and Palantir: Moving Up to Higher Impact Levels

August 20, 2024

Microsoft And Palantir Sell AI Spyware To Us Government

While AI is making the news about how it will end jobs, be used for deep fakes, and overturn creativity industries, there’s something that’s not being mentioned: spyware. The Verge writes about how two big technology players are planning to bring spyware to the US government: “Palantir Partners With Microsoft To Sell AI To The Government.”

Palantir and Microsoft recently announced they will combine their software to power services for US defense and intelligence services. Microsoft’s large language models (LLMs) will be used via Azure OpenAI Service with Palantir’s AI Platforms (AIP). These will be used through Microsoft’s classified government cloud environments. This doesn’t explain exactly what the combination of software will do, but there’s speculation.

Palantir is known for its software that analyses people’s personal data and helping governments and organizations with surveillance. Palantir has been very successful when it comes to government contracts:

“Despite its large client list, Palantir didn’t post its first annual profit until 2023. But the AI hype cycle has meant that Palantir’s “commercial business is exploding in a way we don’t know how to handle,” the company’s chief executive officer Alex Carp told Bloomberg in February. The majority of its business is from governments, including that of Israel — though the risk factors section of its annual filing notes that it does not and will not work with “the Chinese communist party.””

Eventually the details about Palantir’s and Microsoft’s partnership will be revealed. It probably won’t be off from what people imagine, but it is guaranteed to be shocking.

Whitney Grace, August 20, 2024

Suddenly: Worrying about Content Preservation

August 19, 2024

green-dino_thumb_thumb_thumb_thumb_t[1]This essay is the work of a dumb dinobaby. No smart software required.

Digital preservation may be becoming a hot topic for those who  rarely think about finding today’s information tomorrow or even later today. Two write ups provide some hooks on which thoughts about finding information could be hung.

image

The young scholar faces some interesting knowledge hurdles. Traditional institutions are not much help. Thanks, MSFT Copilot. Is Outlook still crashing?

The first concerns PDFs. The essay and how to is “Classifying All of the PDFs on the Internet.” A happy quack to the individual who pursued this project, presented findings, and provided links to the data sets. Several items struck me as important in this project research report:

  1. Tracking down PDF files on the “open” Web is not something that can be done with a general Web search engine. The takeaway for me is that PDFs, like PowerPoint files, are either skipped or not crawled. The author had to resort to other, programmatic methods to find these file types. If an item cannot be “found,” it ceases to exist. How about that for an assertion, archivists?
  2. The distribution of document “source” across the author’s prediction classes splits out mathematics, engineering, science, and technology. Considering these separate categories as one makes clear that the PDF universe is about 25 percent of the content pool. Since technology is a big deal for innovators and money types, losing or not being able to access these data suggest a knowledge hurdle today and tomorrow in my opinion. An entity capturing these PDFs and making them available might have a knowledge advantage.
  3. Entities like national libraries and individualized efforts like the Internet Archive are not capturing the full sweep of PDFs based on my experience.

My reading of the essay made me recognize that access to content on the open Web is perceived to be easy and comprehensive. It is not. Your mileage may vary, of course, but this write up illustrates a large, multi-terabyte problem.

The second story about knowledge comes from the Epstein-enthralled institution’s magazine. This article is “The Race to Save Our Online Lives from a Digital Dark Age.” To  make the urgency of the issue more compelling and better for the Google crawling and indexing system, this subtitle adds some lemon zest to the dish of doom:

We’re making more data than ever. What can—and should—we save for future generations? And will they be able to understand it?

The write up states:

For many archivists, alarm bells are ringing. Across the world, they are scraping up defunct websites or at-risk data collections to save as much of our digital lives as possible. Others are working on ways to store that data in formats that will last hundreds, perhaps even thousands, of years.

The article notes:

Human knowledge doesn’t always disappear with a dramatic flourish like GeoCities; sometimes it is erased gradually. You don’t know something’s gone until you go back to check it. One example of this is “link rot,” where hyperlinks on the web no longer direct you to the right target, leaving you with broken pages and dead ends. A Pew Research Center study from May 2024 found that 23% of web pages that were around in 2013 are no longer accessible.

Well, the MIT story has a fix:

One way to mitigate this problem is to transfer important data to the latest medium on a regular basis, before the programs required to read it are lost forever. At the Internet Archive and other libraries, the way information is stored is refreshed every few years. But for data that is not being actively looked after, it may be only a few years before the hardware required to access it is no longer available. Think about once ubiquitous storage mediums like Zip drives or CompactFlash.

To recap, one individual made clear that PDF content is a slippery fish. The other write up says the digital content itself across the open Web is a lot of slippery fish.

The fix remains elusive. The hurdles are money, copyright litigation, and technical constraints like storage and indexing resources.

Net net: If you want to preserve an item of information, print it out on some of the fancy Japanese archival paper. An outfit can say it archives, but in reality the information on the shelves is a tiny fraction of what’s “out there”.

Stephen E Arnold, August 19, 2024

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta