Wikipedia: Good for Students, Good for the Google
November 14, 2019
There may be some help for over stressed PhD student.
The Internet Archive is making it even easier to check out online citations, beginning in the most logical place. The organization’s blog describes how it is “Weaving Books into the Web—Starting with Wikipedia.” Writer Brewster Kahle tells us:
“The Internet Archive has transformed 130,000 references to books in Wikipedia into live links to 50,000 digitized Internet Archive books in several Wikipedia language editions including English, Greek, and Arabic. And we are just getting started. By working with Wikipedia communities and scanning more books, both users and robots will link many more book references directly into Internet Archive books. In these cases, diving deeper into a subject will be a single click. … For example, the Wikipedia article on Martin Luther King, Jr. cites the book To Redeem the Soul of America, by Adam Fairclough. That citation now links directly to page 299 inside the digital version of the book provided by the Internet Archive. There are 66 cited and linked books on that article alone. Readers can see a couple of pages to preview the book and, if they want to read further, they can borrow the digital copy using Controlled Digital Lending in a way that’s analogous to how they borrow physical books from their local library.”
The Internet Archive hopes to bring four million more books online over the next few years. It costs about $20 per book, and anyone can help by sponsoring the digitization of specific books or simply donating to the organization. As the director of their Wayback Machine declares, “Together we can achieve universal access to all knowledge, one linked book, paper, web page, news article, music file, video and image at a time.”
Who benefits? Students and, of course, Google. There’s a reason many queries’ results pages point to the Wikipedia service.
Cynthia Murrell, November 14, 2019
Search System Bayard
November 1, 2019
Looking for an open source search and retrieval tool written in Rust and built on top of Tantivy (Lucene?). Point your browser to Github and grab the files. The read me file highlights these features:
- Full-text search/indexing
- Index replication
- Bringing up a cluster
- Command line interface.
DarkCyber has not tested it, but a journalist contacted us on October 31, 2019, and was interested in the future of search. I pointed out that there are free and open source options.
What people want to buy, however, is something that does not alienate two thirds of the search system’s users the first day the software is deployed.
Surprised? You may not know what you don’t know, but, gentle reader, you are an exception.
Stephen E Arnold, November 1, 2019
Open Source Fact Checking Service
October 18, 2019
Misinformation is not new, neither is the wide, mass distribution of it. The problem nowadays is the plethora, amount, and platforms available to spread the misinformation. Another problem is that people who believe and spread misinformation can now find each other and congregate. It is important to verify facts, but with so many sources claiming to post the truth (online and off line) how can you check?
Reddit is one platform where misinformation spreads, however, it is gathering place for people to find truth and check facts. One of their popular threads is the “Ask Me Anything (AMA)” and recently they had one with Yaz Sinan. In his AMA titled, “I Built A Platform For Journalism With ‘Open Source’ Fact Checking. In The Age Of Information (And Misinformation (Overload), The Goal Is To Help The Best Journalists Stand Out By Making Their Fact Checking Process Fully Transparent And Reviewable.”
Yaz Sinan is a programmer living in Toronto, Canada. For the past three years, he has built fact checking tools. To test his tools, he has participated in over 500 fact checks. Sinan dubbed his platform Sourced Fact and the best thing about it is that it is open source! Sinan built this platform, because there are many projects in production intending to battle misinformation. Sinan does not think the projects will be able to keep up. His belief is that t takes more energy to refute BS than the energy to produce it.
Sourced Fact takes a different approach than other projects, because journalists upload their articles and annotate their articles with verified checks for readers. Sinan wants to make it easy for journalists to “show their work,” readers can review them, and it will make the journalists stand out from their peers. Sinan approaches Sourced Fact with an open mind and a lot of common sense:
“– This approach only works for journalism covering information based on publicly reviewable evidence. This includes legislation, public government initiatives, whistle blower documents, and scientific data. This isn’t a good fit for journalism based on undocumented sources.
– This approach doesn’t eliminate bias. One can provide completely accurate facts and still introduce bias by omitting facts that don’t agree with their views. I do think however that helping the accurate provable facts stand out from everything else would still be a meaningful improvement to what we have today.
– – I don’t expect the average reader to click into and explore the evidence for every claim. Just like the average consumer of open source code rarely reads the code. The point though is that it’s out there for anyone who wants to check it, so whoever wants to double check can do so anytime.”
I want Sinan’s platform to become an industry standard for news outlets around the world, particularly the United States. Sinan, please apply for grants to make your genius Sourced Fact work!
Whitney Grace, October 18, 2019
Open Source: Everything New Is Old Again
October 7, 2019
The Andreessen Horowitz open source info blitz contains some good stuff. You will want to read the essay “Open Source: From Community to Commercialization” and, if you qualify, download the pdf of lecture notes. We noted this statement from the essay about the SaaS open source business model:
In a SaaS model, you provide a complete hosted offering of the software. If your value and competitive edge is in the operational excellence of the software, then SaaS is a good choice. However, since SaaS is usually based around cloud hosting, there is the potential risk that public clouds will choose to take your open source code and compete.
Accurate.
We noted this statement at the end of the article:
I [Peter Levine / Jennifer Li?] believe Open Source 3.0 will expand how we think of and define open source businesses. Open source will no longer be RedHat, Elastic, Databricks, and Cloudera; it will be – at least in part – Facebook, Airbnb, Google, and any other business that has open source as a key part of its stack. When we look at open source this way, then the renaissance underway may only be in its infancy. The market and possibilities for open source software are far greater than we have yet realized.
Correct.
Years ago, the DarkCyber team undertook a study of a dozen open source software vendors specializing in search and retrieval. Today, most of those vendors have embraced “artificial intelligence”, “predictive analytics”, and “natural language processing”. That’s because search is a utility and the developers and vendors of general purpose open source software have to differentiate themselves. In the course of that research, DarkCyber noted several things.
- Big companies in 2008 were among the most enthusiastic testers and eventually users of open source software. Why? Our data suggested that open source allowed users of commercial proprietary software more freedom to make changes. Bug fixes would often arrive in a more timely way. Plus, the IBM- and Oracle-style license fees did not come along for the ride. That is probably true in some cases today.
- Open source was a free lunch. The developers often contributed for the common good; others created and made available open source software as a way to demonstrate and prove their capabilities. Translation, as one person told one of my researchers, “A job, man. Big bucks.”
- Monetization was mostly “little plays”; that is use our free stuff and then pay for support or proprietary extensions.
Flash forward to today. Some of these three decade old findings may still be in play, but the context is now very different.
What’s changed?
For the first time, meta plays are possible. Forget the investment, merger, and acquisition angles that motivate venture capital firms. Think in terms of just using Amazon and paying for what you need.
Start ups no longer just use Microsoft because it is available and works. Start ups use Amazon because it appears to be open source, cheap or subsidized, and available globally.
The challenge this presents to open source is significant. DarkCyber is not convinced that open source developers, users of open source software, analysts, and other professionals recognize what Amazon’s meta play and strategy is doing; that is, creating a new context of open source.
Want to learn more about Amazon’s meta play for open source? Write seaky2000 at yahoo dot com and inquire about our Amazon strategy webinar. Note: It’s not a freebie.
Everthing new is old again, including vendor lock in.
Stephen E Arnold, October 7, 2019
Roy Cohn Documents Released by FBI
September 30, 2019
If you are interested in Roy Cohn, a New York attorney, new information is available. Released by the FBI, the documents contain about 700 pages of information. You can access the data at this link. The documents are redacted. Mr. Cohn interacted with a number of individuals with a high profile. Mr. Cohn died in 1986, that’s 33 years ago. The New York Post ran a photo of Mr. Cohn with a youthful President Trump and mentioned some of Mr. Cohn’s high profile activities.
Stephen E Arnold, September 30, 2019
Open Source Software: Just So Darned Good
August 9, 2019
The Trump administration’s proscription against doing business with Chinese tech company Huawei has cast a wide net, and one blogger suspects such a net may soon ensnare one of our favorite things. Bunnie’s Blog warns, “Open Source Could Be a Casualty of the Trade War.” The writer checked out Executive Order 13873, and considers how the incredibly broad text could be used to target just about any tech company around the world. They also extensively criticize the technique of weaponizing supply chains and its unintended consequences, so navigate to the blog post to delve into that reasoning.
One of those consequences, they fear, may be the very existence of open-source projects. Huawei, as our immediate example, has contributed significantly to the Linux Foundation. Linux has, so far, escaped the Huawei blacklist net because of a license exemption; however, Bunnie writes:
“Should Huawei be designated as a ‘foreign adversary’ under EO13873, it greatly expands the scope of the ban because it prohibits transactions with entities under the direction or influence of foreign adversaries. The executive order also broadly includes any information technology including hardware and software with no exemption for open source. In fact, it explicitly states that ‘…openness must be balanced by the need to protect our country against critical national security threats’. While the context of ‘open’ in this case refers to an ‘investment climate’, I worry the text is broad enough to easily extend its reach into open source technologies.
We noted this statement too:
“There’s nothing in Github (or any other source-sharing platform) that prevents your code from being accessed by a foreign adversary and incorporated into their technological base, so there is an argument that open source developers are aiding and abetting an enemy by effectively sharing technology with them. Furthermore, in addition to considering requests to merge code from a technical standpoint, one has to also consider the possibility that the requester could be subject to the influence of Huawei, in which case accepting the merge may put you at risk of stiff penalties under the IEEPA (up to $250K for accidental violations; $1M and 20 years imprisonment for willful violations).”
The beauty of open source is, well, its openness. Bunnie argues that if the government gets to decide what entities can contribute and which cannot, the freedom that underpins open source software will vanish.
Cynthia Murrell, August 9, 2019
Capital One and Surprising Consequences
August 4, 2019
DarkCyber noted the ZDNet article “GitHub Sued for Aiding Hacking in Capital One Breach.” According to the “real news” outfit:
While Capital One is named in the lawsuit because it was its data that the hacker stole, GitHub was also included because the hacker posted some of the stolen information on the code-sharing site.
Github (now owned by Microsoft) allegedly failed to detect the stolen data. Github did not block the posting of Social Security numbers. These follow a specific pattern. Many text parsing methods identify and index the pattern and link the number to other data objects.
What law did Github violate? Management lapses are not usually the stuff that makes for a good legal drama, at least on “Law and Order” reruns. The write up reports:
The lawsuit alleges that by allowing the hacker to store information on its servers, GitHub violated the federal Wiretap Act.
DarkCyber thanks ZDNet for including a link to the complaint.
Lawyers, gotta love ‘em because we have a former Amazon employee, a financial institution with a remarkable track record of security issues, and a company owned by Microsoft. What about the people affected? Oh, them. What if Github is “guilty”? Perhaps a new chapter in open source and public posting sites begins?
Stephen E Arnold, August 4, 2019
Open Source: No Handcuffs, Freedom, and Maybe Problems
July 26, 2019
DarkCyber has noted the use of open source technology in policeware (software and systems for law enforcement) and in intelware (software and systems for intelligence professionals). The reasons mentioned to me when I get a demonstration include avoiding the handcuffs clicked on when one licenses proprietary software, the ability to get bug fixes and enhancements without waiting for the proprietary software vendor to get around to these adjustments, and a bigger pool of technical talent from which to draw. “12 Challenges Businesses face when Using Open-Source Software” does a good job of identifying some issues to consider when adopting open source code.
Let’s look at three of these which I have encountered in the last few months. I won’t name the vendors of the policeware and intelware systems, and if you want the other nine “challenges”, please, navigate to the original article.
Here are the three “challenges”, which in some cases may be deal breakers:
Cost. Note that the article pegs cost last in the list of 12 issues. My thought is that cost in the number one consideration. I have heard, “Our software is more value centric because we use open source software.” My response is, “So the license fees is reduced, but what about the cost of support, training, and coding special widgets to get the system working to meet our specifications?” No policeware or intelware system is “cheap.” Less expensive than another product, sure. But in terms of headcount, direct and indirect system costs, and time — vendors often understate costs and licensees say “Wow, I’ll go with you.”
Compatibility. Because a chunk of code or a system is open source and perceived as open, the software may not be compatible with one’s existing code. More problematic, the assumption that open source can happily ingest whatever “common” or “database” content one wants to have the open source software process. Think in terms of finding, licensing, or writing “filters,” “import routines,” or “file conversion” routines. Vendors of proprietary software may not have what you need, but you can buy filters from a cheerful sales professional or directly from the company. Working out “compatibility” can be expensive and slow down the process.
Mystery Sources. Open source is perceived as one way for a developer to demonstrate his open sourciness and his expertise. However, intelligence agencies in some countries create or contribute code to open source projects. Assuming that what looks like a benign tool may prove to be somewhat problematic. How problematic? Data about compromised open source software are elusive. In the US, third parties who use open source software for projects sub contracted by a prime contractor can be a vector for backdoors, exploits, and malware. Paranoiac project managers and contracting officers may wish to ponder this issue. Legalese will not reduce the aperture for fancy dancing.
Is open source inherently more risky than proprietary solutions? No, risk is about equal. Proprietary software is fraught with problems. So is open source. That’s a point of fact that is often glossed over.
Stephen E Arnold, July 26, 2019
Spy on the Competition: Sounds Good, Right?
July 11, 2019
DarkCyber noted this consumer and small business oriented write up about spying. Navigate to “7 ways to Spy on Your Competitor’s Facebook Ads [2019 Update].” The update promises to add some nifty new, useful methods to the original story.
What are the methods? Here’s a run down of four of them. You will have to navigate to the original story for the other three, or you could just not bother. Spoiler: None of the methods reference commercially available tools and services available from specialist vendors. Who’s a specialist vendor? Attend one of our LE and intel training sessions, and we will share a list of 30 firms with you.
Here are four methods we found interesting:
- Use services which report about a firm’s online advertising activities.
- Use services which report about a firm’s online advertising activities.
- Use services which report about a firm’s online advertising activities.
- Use services which report about a firm’s online advertising activities.
There you go. The spying methods.
DarkCyber wants to point out that these methods are different from the persistent tracking bug data some vendors helpfully install on one’s Internet connected device.
Plus, these methods are quite different from the approaches implemented in commercial OSINT and intercept analysis systems.
My next relatively public lecture will be in October in San Antonio. After the session, look me up. I might share a couple of solutions. Better yet write darkcyber333 at yandex dot com and sign up for a for fee intelligence systems webinar.
Stephen E Arnold, July 11, 2019
Short Honk: NLP Tools
March 26, 2019
Making sense of unstructured content is tricky. If you are looking for opens source natural language processing tools, “12 Open Source Tools for Natural Language Processing” provides a list.
Stephen E Arnold, March 26, 2019