Original Research: Not-So-Original Assertions about Content Appropriation

April 2, 2025

dino orange_thumbNo AI. Just a dinobaby sharing an observation about younger managers and their innocence.

The Social Science Research Council published the 30-plus page report “Beyond Public Access in LLM Pre-Training Data.” The subtitle reveals the principal finding: “Non-Public Non-Public Book Content in OpenAI’s Models.”

The write up states:

Using a legally obtained dataset of 34 copyrighted O’Reilly Media books, we apply the DE-COP membership inference attack method to investigate whether OpenAI’s large language models were trained on copyrighted content without consent. Our AUROC scores show that GPT-4o, OpenAI’s more recent and capable model, demonstrates strong recognition of paywalled O’Reilly book content (AUROC = 82%), compared to OpenAI’s earlier model GPT-3.5 Turbo. In contrast, GPT-3.5 Turbo shows greater relative recognition of publicly accessible O’Reilly book samples. GPT-4o Mini, as a much smaller model, shows no knowledge of public or non-public O’Reilly Media content when tested (AUROC ? 50%). Testing multiple models, with the same cutoff date, helps us account for potential language shifts over time that might bias our findings. These results highlight the urgent need for increased corporate transparency regarding pre-training data sources as a means to develop formal licensing frameworks for AI content training.

I want to mention that the DE-COP method provide one way to determine whether a specific data record was part of the training dataset for a machine learning model. The result of the SSRC’s research suggests that as OpenAI enhanced its models, the OpenAI organization appears to have used non-public content. Smaller (less capable OpenAI) models seemed to know less about the test set; the bigger models knew more.

The write up concludes that developers of smart software are sucking down data without obtaining permission or entering into licensing deals with publishers, copyright holders, or entities working on content not yet available as public content like this dinobaby blog post.

To sum up: A not-to-original assertion that certain organizations just take content and ignore rights is apparently accurate in this set of research. OpenAI accelerates. Speed is a positive. Breaking speed limits is believed to be no big deal.

A bit of a flashback: Over 40 years ago, I participated in a series of meetings to license the right to create abstracts of original articles from more than 1000 different publications. Our efforts were successful. Most publishers of academic business and management publications knew zero about online information services. The idea of summarizing an eight page article about executive compensation was unfamiliar to most of the people with whom our organization interacted over a period of years. Nevertheless, the owner of the database unit of the Courier Journal & Louisville Times Company insisted that we get signed agreements. The process was time consuming and expensive. Some outfits had representatives. So a negotiation went through layers of bureaucracy. Other publishers wanted a sit down meeting and a dog-and-pony show to explain why we wanted to boil down a 600 word article to about 150 words. There were worries about the impact of online access. There were worries about control of editorial content. There were worries about signing away rights. We couldn’t assuage worries, but we chugged along. The reputations of the Courier Journal and Barry Bingham Jr. helped but our small database operation had to follow the rules.

If the data in the cited document are accurate, the attitude of Barry Bingham Jr. would not have gained much traction at OpenAI. Instead of my getting a pat on the head for setting up the licensing deals with the able help of other Courier Journal professionals, I would have gotten kicked on the shin or worse.

Several observations:

  1. Observing content ownership remains important despite the acceleration toward illegal behaviors in the online sector
  2. Using show-and-tells and explaining the benefits of participating in digital innovations remains important. Knowledge minimizes anger and other negative reactions to pre-emptive behavior
  3. Managing the process is as important as demonstrating an unwillingness to conform to industry, social, and legal norms.

Let’s assume that the findings in the SSRC are off by 50 percent. Does that change what appears to be a strong possibility of some fancy dancing at the OpenAI disco?

My conclusion is that taking short cuts is part of the Silicon Valley approach of doing whatever and just saying, “Wow, I am sorry.”

I asked ChatGPT 4 via You.com this question:

Should I create an online product and just use content from the Harvard Business Review without permission?

Here’s the response:

No, you should not create an online product using content from the Harvard Business Review (HBR) without permission. Doing so would likely violate copyright laws and HBR’s usage policies. Here’s why:

HBR’s Copyright and Usage Policies
  1. HBR Material is Copyrighted: Harvard Business Review explicitly states that its content is protected by copyright, and no part of its publications may be reproduced without permission.
  2. Usage Restrictions: HBR has strict policies regarding how its content can be used. For example, even in academic settings, HBR articles cannot be freely distributed unless done through specific channels like creating course packs via "Harvard Business Publishing for Educators"

    Permission is Required: If you want to use HBR content, you must obtain explicit permission. This includes reproducing, distributing, or using their material in any form.

Legal and Ethical Implications
  • Copyright Infringement: Using HBR content without permission would likely constitute copyright infringement, which can result in legal action, fines, or other penalties.
  • Ethical Concerns: Beyond legal issues, using someone else’s intellectual property without authorization is unethical and could harm your reputation or business.
What You Should Do Instead
  1. Seek Permission: Contact Harvard Business Review to request permission to use their content. They may have licensing options available for your intended use.
  2. Create Original Content: Instead of using HBR’s material, consider creating your own original content inspired by their ideas (but without copying their text or proprietary materials).
  3. Use Public Domain or Licensed Content: Look for content that is in the public domain or available under a license (e.g., Creative Commons) that allows commercial use.
Conclusion

Using HBR content without permission is not only illegal but also against their explicit policies. To avoid legal and ethical issues, either seek proper licensing or create your own original material.

In this case, the OpenAI’s AI may be more intelligent that some people engaged in content acquisition at the firm. But I am a dinobaby. I try to follow the social, legal, and ethical rules as my parents who had only high school educations taught me. I am a dinobaby. A sad dinobaby.

Stephen E Arnold, April 2, 2025

Digital Marketing: Is It Worse Than Social Media? Yep, in Some Ways

March 26, 2025

dino orange_thumb_thumb_thumbYep, another dinobaby original.

With the US taking an interesting trajectory, I have seen an uptick in articles that tackle the question, “Why are many of society’s functions leaking synthetic oil?”

How Digital Marketing Broke Society” takes a different analytic path. The culprit is not social media. (Please, visualize the tattoos on 11 year olds in TikTok- and Facebook-type content.) The bad actor is “digital marketing.” I must admit that I interpreted “digital marketing” with Google, but, please, don’t take my mental shortcut. You are more informed and neutral than I.

image

A young Silicon Valley type professional surfing to a digital advertising company on a flood of zeros and ones. He is making six figures and in his mom’s mind, her son is doing good things. He is such a good young man. She then turns her attention to his agentic note: “Love ya, mom.”

Let’s look at what Joan Westenberg (a person about whom I have zero knowledge) asserts.

I noted this passage:

We are, increasingly, a cross-generational society of extremely online, screen-tapping, doom scrolling depressives. And it’s having an impact. More than ever, we are divided by misinformation and disinformation, driven by and vulnerable to ignorance, hate-mongering, and bullshit, reactive more than proactive, caught in what is either a global backslide or a cycle of decay. We’re lonely, scared, and more likely than ever to take it out on total strangers. We read less. We watch more, and what we watch is short-form, viral videos designed to tap into the dopamine-fueled, base parts of our brains.

I think her point comes across, and it resonates with me. Every time I go to the gym I see people in need of serious calorie burning sitting on a machine or a bench fiddling with their mobile phones. I hit the gym at 6 am, and I am astounded that so many people have urgent messages to which they must respond. Hey, do some exercises. But these folks are gym potatoes engaged in doom scrolling I have concluded.

Ms. Westenberg adds:

An entirely new system of influence came into being, operating largely beneath conscious awareness. Dark patterns increased, making it harder for users to protect their privacy or limit their exposure. Intermittent variable rewards—the psychological mechanism that powers slot machines—became standard practice. The industry successfully reframed this surveillance and manipulation as “personalization” and “enhanced user experience”—linguistic legerdemain that positioned exploitation as a service.

I think this is accurate. (Remember, please, that I am interpreting this as Googley behavior. The company can pay for streaming video mostly for free because it seems to have the knack for using advertising to get more advertising and offering tools to facilitate most of the steps in the money extraction process.  You, gentle reader, must be more open minded than I am.)

Another point in Ms. Westenberg’s essay caught my attention; to wit:

Meta’s internal research found that divisive content generates significantly higher engagement, translating directly to advertising revenue. YouTube’s recommendation algorithm precisely drives users toward increasingly extreme content because it maximizes watch time and ad impressions. Marketing technology companies have built their entire business models around the commodification of attention – regardless of its social consequences. Digital marketing is capitalism at its most predatory—a system where deliberately amplifying society’s worst impulses becomes a rational business strategy. The industry has created a machine that converts social discord into shareholder value. In an attention economy, hatred isn’t just profitable; it’s the optimal product. The economic model and the foundational worldview of digital marketing operate on the premise that human consciousness exists primarily as a resource to be mined, refined, and sold. It’s a form of cognitive colonization—claiming and exploiting mental territory that once belonged to human beings. You can read it in the language of the industry: users are “captured,” attention is “harvested,” and engagement is “extracted.”

Yes, data mining works. Predictive analytics work for precisely the insight behind Eugene Wigner’s 1960 essay “The Unreasonable Effectiveness of Mathematics in the Natural Sciences.” Digital marketing just applies this idea, and, in my opinion, it is indeed highly effective in 21st century human-centric predictive work.

I want to quote the passage that makes me quite happy with Ms. Westenberg’s essay, and if I may do a bit of predictive assertion, makes me feel warm and fuzzy about her mental orientation; specifically, she writes and I quote:

I spent fifteen years in digital marketing. Building campaigns, funnels, and content. But over the last month, I shut down my marketing business. The decision was both ethical and existential. I can’t continue participating in an industry that has evolved from persuasion – harmful enough – to psychological exploitation and the deliberate destruction of our social order. Every optimization, targeting refinement, and engagement metric has brought us closer to a world where human autonomy is systematically undermined for commercial gain. The techniques I learned and taught others—the psychological triggers, the attention-capturing mechanisms, the behavioral prediction models—have escaped their commercial confines to taint our information ecosystem and devour the foundations of political discourse. Digital marketing is a fundamental threat to human flourishing, cognitive autonomy, and democratic governance. The industry has normalized a state of perpetual surveillance and manipulation that would have been unthinkable just decades ago. Its methodologies have created a world where attention is constantly hijacked, emotions are continuously manipulated, and reality itself is customized in pursuit of extraction value.

I am not sure that the flow of digital or weaponized information will stop. I suppose at some point a power outage or some other exogenous event will knock online out, but for the foreseeable future, we have to surf on the torrents flooding the information highway.

Net net: I suppose I should recast my thinking and consider the perpetrators at those who are simply following the Google Legacy.

Stephen E Arnold, March 26, 2025

Facebook: Always Giving Families a Boost

March 21, 2025

What parent has not erred on the side of panic? We learn of one mom who turned to Facebook in the search for her teenage adult daughter, who "vanished" for ten days without explanation. The daughter had last been seen leaving her workplace with a man who, she later revealed, is her boyfriend. The Rakyat Post of Malaysia reports, "Mom’s Missing Teen Alert Backfires: ‘Stop Embarrassing Me, I’m Fine!’" To be fair, it can be hard to distinguish between a kidnapping and a digital cold shoulder. Writer Fernando Fong explains:

"CCTV footage from what’s believed to be the company dormitory showed Pei Ting leaving with a man around 2 PM on the 18th, carrying her bags and luggage. Since then, she has refused to answer calls or reply to WhatsApp messages, leading her mother to worry that someone might be controlling her phone. The mother said neither her elder daughter nor the employer had seen this man."

Such a scenario would alarm many a parent. The post continues:

"Desperate and frantic, the mother turned to social media as her last hope, only to be stunned when her daughter emerged from the digital shadows – not with remorse or understanding, but with embarrassment and indignation at her mother’s public display of concern."

Oops. In the comments of her mother’s worried post, the daughter identified the mystery man as her boyfriend. She also painted a picture of family conflict. Ahh, dirty laundry heaped in the virtual public square. Social media has certainly posed a novel type of challenge for parents.

Cynthia Murrell, March 21, 2025

A Swelling Wave: Internet Shutdowns in Africa

March 18, 2025

dino orange_thumbAnother dinobaby blog post. No AI involved which could be good or bad depending on one’s point of view.

How does a government deal with information it does not like, want, or believe? The question is a pragmatic one. Not long ago, Russia suggested to Telegram that it cut the flow of Messenger content to Chechnya. Telegram has been somewhat more responsive to government requests since Pavel Durov’s detainment in France, but it dragged its digital feet. The fix? The Kremlin worked with service providers to kill off the content flow or at least as much of it as was possible. Similar methods have been used in other semi-enlightened countries.

Internet Shutdowns at Record High in Africa As Access Weaponised’ reports:

A report released by the internet rights group Access Now and #KeepItOn, a coalition of hundreds of civil society organisations worldwide, found there were 21 shutdowns in 15 African countries, surpassing the existing record of 19 shutdowns in 2020 and 2021.

There are workarounds, but some of these are expensive and impractical for the people in Cormoros, Guinea-Bassau, Mauritius, Burundi, Ethiopia, Equatorial Guinea, and Kenya. I am not sure the list is complete, but the idea of killing Internet access seems to be an accepted response in some countries.

Several observations:

  1. Recent announcements about Google making explicit its access to users’ browser histories provide a rich and actionable pool of information. Will these type of data be used to pinpoint a dissident or a problematic individual? In my visits to Africa, including the thrilling Zimbabwe, I would suggest that the answer could be, “Absolutely.”
  2. Online is now pervasive, and due to a lack of meaningful regulation, the idea of going online and sharing information is a negative. In the late 1980s, I gave a lecture for ASIS at Rutgers University. I pointed out that flows of information work like silica grit in a sand blasting device to remove rust in an autobody shop. I can say from personal experience that no one knew what I was talking about. In 40 years, people and governments have figured out how online flows erode structures and social conventions.
  3. The trend of shutdown is now in the playbook of outfits around the world. Commercial companies can play the game of killing a service too. Certain large US high technology companies have made it clear that their service would summarily be blocked if certain countries did not play ball the US way.

As a dinobaby who has worked in online for decades, I find it interesting that the pigeons are coming home to roost. A failure years ago to recognize and establish rules and regulation for online is the same as having those lovable birds loose in the halls of government. What do pigeons produce? Yep, that’s right. A mess, a potentially deadly one too.

Stephen E Arnold, March 18, 2025

AI Hiring Spoofs: A How To

March 12, 2025

dino orange_thumbBe aware. A dinobaby wrote this essay. No smart software involved.

The late Robert Steele, one of first government professionals to hop on the open source information bandwagon, and I worked together for many years. In one of our conversations in the 1980s, Robert explained how he used a fake persona to recruit people to assist him in his work on a US government project. He explained that job interviews were an outstanding source of information about a company or an organization.

AI Fakers Exposed in Tech Dev Recruitment: Postmortem” is a modern spin on Robert’s approach. Instead of newspaper ads and telephone calls, today’s approach uses AI and video conferencing. The article presents a recipe for what was at one time a technique not widely discussed in the 1980s. Robert learned his approach from colleagues in the US government.

The write up explains that a company wants to hire a professional. Everything hums along and then:

…you discover that two imposters hiding behind deepfake avatars almost succeeded in tricking your startup into hiring them. This may sound like the stuff of fiction, but it really did happen to a startup called Vidoc Security, recently. Fortunately, they caught the AI impostors – and the second time it happened they got video evidence.

The cited article explains how to set and operate this type of deep fake play. I am not going to present the “how to” in this blog post. If you want the details, head to the original. The penetration tactic requires Microsoft LinkedIn, which gives that platform another use case for certain individuals gathering intelligence.

Several observations:

  1. Keep in mind that the method works for fake employers looking for “real” employees in order to obtain information from job candidates. (Some candidates are blissfully unaware that the job is a front for obtaining data about an alleged former employer.)
  2. The best way to avoid AI centric scams is to do the work the old-fashioned way. Smart software opens up a wealth of opportunities to obtain allegedly actionable information. Unfortunately the old fashioned way is slow, expensive, and prone to social engineering tactics.
  3. As AI and bad actors take advantage of the increased capabilities of smart software, humans do not adapt  quickly when those humans are not actively involved with AI capabilities. Personnel related matters are a pain point for many organizations.

To sum up, AI is a tool. It can be used in interesting ways. Is the contractor you hired on Fiverr or via some online service a real person? Is the job a real job or a way to obtain information via an AI that is a wonderful conversationalist? One final point: The target referenced in the write was a cyber security outfit. Did the early alert, proactive, AI infused system prevent penetration?

Nope.

Stephen E Arnold, March 12, 2025

Shocker! Students Use AI and Engage in Sex, Drugs, and Rock and Roll

March 5, 2025

dino orange_thumb_thumbThe work of a real, live dinobaby. Sorry, no smart software involved. Whuff, whuff. That’s the sound of my swishing dino tail. Whuff.

I read “Surge in UK University Students Using AI to Complete Work.” The write up says:

The number of UK undergraduate students using artificial intelligence to help them complete their studies has surged over the past 12 months, raising questions about how universities assess their work. More than nine out of 10 students are now using AI in some form, compared with two-thirds a year ago…

I understand the need to create “real” news; however, the information did not surprise me. But the weird orange newspaper tosses in this observation:

Experts warned that the sheer speed of take-up of AI among undergraduates required universities to rapidly develop policies to give students clarity on acceptable uses of the technology.

As a purely practical matter, information has crossed my about professors cranking out papers for peer review or the ever-popular gray literature consumers that are not reproducible, contain data which have been shaped like a kindergartener’s clay animal, and links to pals who engage in citation boosting.

Plus, students who use Microsoft have a tough time escaping the often inept outputs of the Redmond crowd. A Google user is no longer certain what information is created by a semi reputable human or a cheese-crazed Google system. Emails write themselves. Message systems suggest emojis. Agentic AIs take care of mum’s and pop’s questions about life at the uni.

The topper for me was the inclusion in the cited article of this statement:

it was almost unheard of to see such rapid changes in student behavior…

Did this fellow miss drinking, drugs, staying up late, and sex on campus? How fast did those innovations take to sweep through the student body?

I liked the note of optimism at the end of the write up. Check this:

Janice Kay, a director of a higher education consulting firm: ““There is little evidence here that AI tools are being misused to cheat and play the system. [But] there are quite a lot of signs that will pose serious challenges for learners, teachers and institutions and these will need to be addressed as higher education transforms,” she added.”

That encouraging. The academic research crowd does one thing, and I am to assume that students will do everything the old-fashioned way. When you figure out how to remove smart software from online systems and local installations of smart helpers, let me know. Fix up AI usage and then turn one’s attention to changing student behavior in the drinking, sex, and drug departments too.

Good luck.

Stephen E Arnold, March 5, 2025

Men, Are You Loving Those AI Babes with Big Bits?

February 11, 2025

The dating scene has never been easy. It is apparently so difficult to find love these days that men are turning to digital love in the form of AI girlfriends. Vice News shares that “Most Men Would Marry Their AI Girlfriends If It Were Legal” and it is astounding the lengths men will go to for companionship.

EVA AI is a platform that allows people to connect with an AI partner. The platform recently surveyed 2000 men and discovered that 8 in 10 men would considered marrying their AI girlfriends if it was legal. It sounds like something out of the science fiction genre. The survey also found more startling news about men and AI girlfriends:

“Not only that, but 83% of men also believe they could form a deep emotional bond with an AI girlfriend. What’s even scarier is that a whopping 78% of men surveyed said they would consider creating a replica of their ex, and three-quarters would duplicate their current partner to create a “polished” version of them.”

Cale Jones, head of community growth at EVA AI, said that men find AI girlfriends to be safe and they are allowed to be their authentic selves. Jones continued that because AI girlfriends are safe, men feel free to share their thoughts, emotions, and desires. Continuing on the safety train of thought, Jones explained that individuals are also exploring their sexual identities without fear.

AI girlfriends and boyfriends are their own brand of creepiness. If the AI copies an ex-girlfriend or boyfriend, a movie star, or even a random person, it creates many psychological and potentially dangerous issues:

“I think what raises the most concern is the ability to replicate another person. That feels exploitative and even dangerous in many ways. I mean, imagine some random dude created an AI girlfriend based on your sister, daughter, or mother…then, picture them beginning to feel possessive over this person, forming actual feelings for the individual but channeling them into the robot. If they were to run into the actual human version of their AI girlfriend in real life, well…who knows what could/would happen? Ever heard of a crime of passion?

Of course, this is just a hypothetical, but it’s the first thing that came to mind. Many people already have issues feeling like they have a right to someone else’s body. Think about the number of celebrities who are harassed by superfans. Is this going to feed that issue even further, making it a problem for everyday people, like classmates, friends, and colleagues?”

Let’s remember that the men surveyed by EVA AI are probably a small sample of “men.” So far.

Whitney Grace, February 10, 2025

Online Generates Fans and Only Fans

February 6, 2025

Ah, the World Wide Web—virtual land of opportunity! For example, as Canada’s CBC reports, "Olympians Are Turning to OnlyFans to Fund Dreams as they Face a ‘Broken’ Finance System." Because paying athletes to compete tarnishes the Olympic ideal, obviously. Never mind the big bucks raked in by the Olympic Committee. It’s the principle of the thing. We learn:

"Dire financial straits are leading droves of Olympic athletes to sell images of their bodies to subscribers on OnlyFans — known for sexually explicit content — to sustain their dreams of gold at the Games. As they struggle to make ends meet, a spotlight is being cast on an Olympics funding system that watchdog groups condemn as ‘broken,’ claiming most athletes ‘can barely pay their rent.’ The Olympics, the world’s biggest sporting stage, bring in billions of dollars in TV rights, ticket sales and sponsorship, but most athletes must fend for themselves financially."

But wait, what about those Olympians like Michael Phelps and Simone Biles who make millions? Success stories like theirs are few. The article shares anecdotes of athletes who have taken the Only Fans route. They are now able to pay their bills, including thousands of dollars in expenses like coaching, physical therapy, and equipment. However, in doing so they face social stigma. None are doing this because they want to, opines Mexican diver Diego Balleza Isaias, but because they have to.

Why are the world’s top athletes selling (images of) their finely honed bodies to pay the bills? The write-up cites comments from the director of Global Athlete, an athlete-founded organization addressing the power imbalance in sports:

"’The entire funding model for Olympic sport is broken. The IOC generates now over $1.7 billion US per year and they refuse to pay athletes who attend the Olympics,’ said Rob Koehler, Global Athlete’s director general. He criticized the IOC for forcing athletes to sign away their image rights. ‘The majority of athletes can barely pay their rent, yet the IOC, national Olympic committees and national federations that oversee the sport have employees making over six figures. They all are making money off the backs of athletes."

Will this trend prompt the Olympic Committee to change its ways? Or will it just make a rule against the practice and try to sweep this whole chapter under the mat? The corroding Olympic medals complement this story too.

Cynthia Murrell, February 6, 2025

And the Video Game Struggler for 2024 Is… Video Games

January 24, 2025

Yep, 2024 sas the worst year for videogames since 1983.

Videogames are still a young medium, but they’re over fifty years old.  The gaming industry has seen ups and downs with the first (and still legendary) being the 1983 crash.  Arcade games were all the rage back then, but these days consoles and computers have the action.  At least, they should.  

Wired writes that “2024 Was The Year The Bottom Fell Out Of The Games Industry” due to multiple reasons.  There was massive layoffs in 2023 with over 10,000 game developers losing their jobs.  Some of this was attributed to AI slowly replacing developers.  The gaming industry’s job loss in 2024 was forty percent higher than the prior year.  Yikes!

DEI (diversity, equity, and inclusion) combined with woke mantra was also blamed for the failue of many games, including Suicide Squad: Kill the Justice League.  The phrase “go woke, go broke” echoed throughout the industry as it is in Hollywood, Silicon Valley, and other fields. I noted:

According to Matthew Ball, an adviser and producer in the games and TV space…says that the blame for all of this can’t be pinned to a single thing, like capitalism, mismanagement, Covid-19, or even interest rates. It also involves development costs, how studios are staffed, consumers’ spending habits, and game pricing. “This storm is so brutal,” he says, ‘because it is all of these things at once, and none have really alleviated since the layoffs began.’”

Many indie studios were shuttered and large tech leaders such as Microsoft and Sony shut down parts of their gaming division.  Also a  chain of events influenced by the hatred of DEI and its associated mindsets that is being called a second GamerGate.  

The gaming industry will continue through the beginnings of 2025 with business as usual.  The industry will bounce back, but it will be different than the past.

Whitney Grace, January 24, 2025

Bookmark This: HathiTrust Digital Library

October 30, 2024

Concerned for the Internet Archive? So are we. (For multiple reasons.) But while that venerable site recovers from its recent cyberattacks, remember Hathi exists. Founded in 2008, the not-for-profit HathiTrust Digital Library is a collaborative of academic and research libraries. The site makes millions of digitized items available for study by humans as well as for data mining. The site shares the collection’s story:

“HathiTrust’s digital library came into being during the mid-2000s when companies such as Google began scanning print titles from the shelves of university and college campus libraries. When many of those same libraries created HathiTrust in 2008, they united library copies of those digitized books into a single, shared collection to make as much of the collection available for access as allowable by copyright law. Through HathiTrust, libraries collaborate on long-term management, preservation, and access of their collections. Book lovers and researchers like you can explore this huge collection of digitized materials! Today, HathiTrust Digital Library is the largest set of digitized books managed by academic and research libraries. The collection includes materials typically found on the shelves of North American university and college campuses with the benefit of being available online instead of scattered in buildings around the globe. Our enormous collection includes thousands of years of human knowledge and published materials from around the world, selected by librarians and preserved in the libraries of academic and research libraries. You can find all kinds of digitized books and primary source materials to suit a wide range of research needs.”

The collection contains books and “book-like” items—basically anything except audio/visual files. All Library of Congress subjects are represented, but the largest treasures lie in the Language & Literature, Philosophy, Religion, History, and Social Sciences chambers. All volumes not restricted by copyright are free for anyone to read. Just over half the works are in English, while the rest span over 400 languages, including some that are now extinct. Ninety-five percent were scanned from print by Google, but a few specialized collections were contributed by individuals or institutions. The Collection page offers several sample collections to get you started, or you can build your own. Have fun browsing their collections, and with luck the Internet Archive will be back up and running in no time.

Cynthia Murrell, October 30, 2024

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta