Open Source Panda Simplifies Data Analysis

March 20, 2018

An article at Quartz draws our attention to a potential alternative to Excel—the open source Pandas—in, “Meet the Man Behind the Most Important Tool in Data Science.” Writer Dan Kopf profiles Panda’s developer, Wes McKinny, who launched the Python tool in 2009. In 2012, Pandas’ popularity took off. Now, Kopf tells us:

Millions of people around the world use Pandas. In October 2017 alone, Stack Overflow, a website for programmers, recorded 5 million visits to questions about Pandas from more than 1 million unique visitors. Data scientists at Google, Facebook, JP Morgan, and virtually other major company that analyze data uses Pandas. Most people haven’t heard of it, but for many people who do heavy data analysis—a rapidly growing group these days—life wouldn’t be the same without it. (Pandas is open source, so it’s free to use.) So what does Pandas do that is so valuable? I asked McKinney how he explains it to non-programmer friends. ‘I tell them that it enables people to analyze and work with data who are not expert computer scientists,’ he says. ‘You still have to write code, but it’s making the code intuitive and accessible. It helps people move beyond just using Excel for data analysis.’

McKinney is inspired to improve data science tools because he likes to “empower people to solve problems.” In fact, Pandas sprung from his frustration at the limitations of available tools when he first came to embrace Python. See the article to follow the developer from his time as a high school athlete to his current, full-time work on Pandas and other open source projects, as well as more on Pandas itself.

Cynthia Murrell, March 20, 2018

Written by Stephen E. Arnold · Filed Under Facebook, Google, News, Open source | Comments Off on Open Source Panda Simplifies Data Analysis

Quote to Note: Facebook and Open Source As a Wooden Club

February 24, 2018

I read “Serverless & GraphQL.” Here’s the passage which caught my attention because I did not know about this use of open source as a wooden club:

And I don’t know how many of you know about some of the Facebook technologies and the patents and licensing issues that are around those- they had an interesting clause, if you sue Facebook, you lose the right to use any of their stuff in any of their products and some people were really scared about it.

That’s one way to earn a “like.”

Stephen E Arnold, February 25, 2018

Written by Stephen E. Arnold · Filed Under Business strategy, News, Open source | Comments Off on Quote to Note: Facebook and Open Source As a Wooden Club

The Companies Leading Open Source

November 6, 2017

Open-source enthusiasts will want to check out this roster from Datamation, “35 Top Open Source Companies.” We’re reminded that the open-source community has moved well beyond a collection of individual hobbyists to include many corporate initiatives. The article notes:

While independent developers are still an important part of the open source community, today much of the work on open source projects is being done by corporate developers. In a recent appearance at the Open Source Summit, Linux founder Linus Torvalds acknowledged this corporate influence and welcomed it. ‘It’s very important to have companies in open source,’ he said. ‘It’s one thing I have been very happy about.’ The list below highlights some of the leading for-profit companies that are using, sponsoring and contributing to open source projects. It includes a mix of large enterprises, small startups and everything in between. Some of the companies exclusively offer products based on open source software, while others sell a mix of proprietary and open source solutions. But all of these companies play a significant role in the open source community.

The write-up emphasizes the list is alphabetical, not a ranking of any sort. Red Hat is there, of course; they are behind Apache and OpenStack, after all, and boast the most popular Linux iteration for large organizations. We also see Cloudera and Hortonworks, homes popular supported Hadoop versions, and the vast open-source repository, GitHub. As for search, Elastic makes the roll with its Elasticsearch project, and MongoDB is recognized for its popular NoSQL database. Some of the biggest companies we see include Adobe, Facebook, Google, IBM, Intel, Microsoft, Oracle, and Samsung. See the write-up for the complete list.

Cynthia Murrell, November 6, 2017

Written by Stephen E. Arnold · Filed Under Data, Google, News, Open source | Comments Off on The Companies Leading Open Source

Internet Archive: The Bono Books

October 16, 2017

I read “Books from 1923 to 1941 Now Liberated!” The collection is based on books which libraries can scan. The write up explains the provision of the US copyright law which makes these books eligible for inclusion in the Internet Archive. Hopefully libraries will find the resources to contribute books. I did some spot checks. One gap is history books. There are others. This is an excellent effort. The interface to the Bono books retains the Internet Archive’s unique approach to interfaces; for example, clicking on a book displays the scanned pages. Clicking on a page turns the page. The outside edge of the scanned image allows one to “jump” to a particular page. Getting back to a book’s table of contents takes a bit of effort, however. Those looking for anthologies can find a collection of 20th century poetry by hunting. The search system is just good enough. Worth checking out. Libraries, scan those history books. Who doesn’t love Theodor Mommsen’s early work?

Stephen E Arnold, October 16, 2017

Written by Stephen E. Arnold · Filed Under News, Open source, Reference tool | Comments Off on Internet Archive: The Bono Books

Enterprise Search: Still Floundering after All These Years

October 11, 2017

Enterprise search conferences once had pride of place. Enterprise search or “search” was the Big Data, artificial intelligence, and cyber intelligence solution from 1998 to 2007.

But by 2007, the fanciful claims of enterprise search vendors were perceived as “big hat, no cattle” posturing. Unable to generate sustainable revenues, the high profile enterprise search systems began looking for a buyer. Those who failed disappeared. Do you know where Convera, Delphes, Entopia, and Siderean are today? What’s the impact of Exalead on Dassault? Autonomy on Hewlett Packard Enterprise? Vivisimo on IBM?

Easy questions to ignore. Time marches on. Proprietary search cost a bundle to keep working. The “fix” to the development, enhancement, and bug fix problems was open source.

A solution emerged. Lucene. That brings us to the title of this blog post: “Enterprise Search: Still Floundering after All These Years.”

The money from license fees is insufficient to make enterprise search work in a good enough way. Open source search, which seems to be largely free of license fees, allows vendors to offer search and highly profitable services to the organizations who want or need an “enterprise search system.”

This means that a vendor who makes more money offering search services can be perceived as a problem to an venture funded company built on promises and tens of millions in venture capital.

The truth of this observation was revealed in an article written by or for Search Technologies, a unit of a Fancy Dan consulting firm. If I understand the Search Technologies’ write up, Lucidworks (né Lucid Words) told Search Technologies that it was not welcome at a conference designed to promote Solr.

Here’s what Search Technologies said in “Why Wasn’t Search Technologies at Lucene/Solr Revolution 2017?”

Lucene/Solr Revolution’s organizer, Lucidworks, informed us that we were no longer welcome to exhibit or speak at the event. Lucidworks considered us a company that:

Competes with their professional services group (maybe)

Is not likely to resell Lucidworks’ platform exclusively (we are vendor-agnostic, after all), and,

Has technology assets that compete with their Fusion platform (partially true)

I don’t care too much about venture funded outfits running conferences to make their “one true way” evident to the attendees. I don’t worry about a blue chip consulting firm’s ability to generate sales leads.

No.

I find that some of enterprise search’s most problematic weaknesses have not been solved after 50 years of flailing. Examples include:

The cost of moving beyond “good enough” information access
Revealing that enterprise search systems are expensive to tune and shape to the needs of an organization
Developing solutions which keep indexes current and searches responsive
Seamless handling different types of content, including video, engineering drawings, and data tucked inside legacy systems
Keeping the majority of the users happy so bootleg search systems are not installed to meet departmental or operating unit needs.

The “search” problem is an illustration of innovation running out of gas. I have zero stake in Lucidworks, Search Technologies, or enterprise search. I am content to be an observer who points out that search vendors, their marketing, the consultants, and the conference organizers are their own worst enemy.

That’s why enterprise search imploded about a decade ago. Search today is pretty much “good enough.” Antidot, Lucene, Solr, dtSearch, X1, Fabasoft, Funnelback, et al. Each does “good enough” search in my opinion.

To make any system better takes consulting and engineering services. These deliver high margins. Users? Well, users want enterprise search to answer questions and work like Google. After 50 years of effort, no company has been able to meet the users’ needs.

That says more than two consulting firms trading digital jabs. What’s at stake is consulting revenue and proprietary fixes. Users? Yes, what about the users?

Stephen E Arnold, October 10, 2017

Written by Stephen E. Arnold · Filed Under Enterprise search, News, Open source | 1 Comment

Microsoft and Open Source Software: Cost Cutting Tactic or a RedHat Type Play

September 1, 2017

Short honk: We were delighted to read “Windows 10: New Feature Sees Microsoft Blur the Line between Windows and Linux.” The write up explains that Windows allows a person to move outputs to a Linux distribution.

Few have covered Microsoft’s dalliance with Solr and the increased interest in using open source software to reduce development costs at Microsoft.

I suppose that’s understandable. The new president is not giving talks about following in the footsteps of IBM which has based dear old Watson on Lucene, home brew code, and technology from acquisitions.

Open source is an easy way to reduce development costs, keep pace with the innovations from the “community,” and free up time for marketing and sales.

Microsoft is becoming a close cousin to IBM, complete with major league strike outs like the Windows phone adventure.

A more significant misperception appears in the write up. I noted this passage:

The Free Software Foundation Europe, has previously said Microsoft’s gradual acceptance of Linux is a compliment, and a net gain for the Free Software movement.

Microsoft’s enthusiasm for some open source technology may be a precursor of Microsoft’s getting in the open source software business, emulating or duplicating the business models of RedHat and Elastic (the Elasticsearch folks).

Worth watching.

Stephen E Arnold, September 1, 2017

Written by Stephen E. Arnold · Filed Under Microsoft, News, Open source | Comments Off on Microsoft and Open Source Software: Cost Cutting Tactic or a RedHat Type Play

Support for Open Source AI from Financial Firms

August 31, 2017

Financial tech reporter Ian Allison at the International Business Times finds it interesting that financial services firms are joining tech companies like Google and Microsoft in supporting open source AI solutions. In his piece, “Finance and Artificial Intelligence Are Going ‘Fintech’ and Open Source,” Allison points to one corporate software engineer as instrumental to the trend:

QR Capital Management was probably patient zero when it came to opening up their code around data storage – and this move, shepherded by software engineer Wes McKinney, kickstarted the popular Pandas libraries project. Now he has returned to open source work at Two Sigma. We have also seen open source data storage offerings coming out of Man AHL in the form of Arctic. Taking part in a panel on open source infrastructure, McKinney said investment in an open source project yields dividends later: data storage underlies other verticals, and when other people use the software and build libraries on top of it, that makes in-house systems more compatible.

See this link for more about the panda’s library. In the same panel Allison cites above, participants were asked how best to sustain the open source community. McKinney gave this advice:

I feel a compulsion not to let open source projects die. But without sponsorship it can become hard to sustain. So when commercials ask me how they can help, I say sponsor an individual – to triage issues, do patches; that goes a long way.

So, what industry will be next to throw its weight behind open source projects?

Cynthia Murrell, August 31, 2017

Written by Stephen E. Arnold · Filed Under AI, Investment, News, Open source | Comments Off on Support for Open Source AI from Financial Firms

A Brilliant List of Open Source Localization Tools

August 24, 2017

Open source projects over technology developers the ability to access technology usually locked behind pay walls. One trouble with open source technology is language translation and the ability for developers to localize their projects. Language continues to remain a barrier in our technology driven world, but there are tools to overcome it. OpenSource.com curated a list of, “18 Open Source Translation Tools To Localize Your Project.”

The curator understands the pains of proprietary software:

The proprietary versions of these tools can be quite expensive. A single license for SDL Trados Studio (the leading CAT tool) can cost thousands of euros, and even then it is only useful for one individual and the customizations are limited (and psst, they cost more, too). Open source projects looking to localize into many languages and streamline their localization processes will want to look at open source tools to save money and get the flexibility they need with customization.

The list includes tools for machine translation, which is a hot commodity. Software that can generate a digestible and accurate translation from one language to another is a must have for many localization projects. The list recommends checking out Apertium and Moses. Computer-assisted translation tools are a must have for all translations and language students, because they can save hours of looking up information in dead tree lexicons. They also work in real time, saving more countless hours, so you should check out OmegaT, Subtitles Translator, and Anaphraseus. If you are working with multiple translators on your project you will need to utilize a translation management system to organize everyone-think SharePoint. Jabylon, Zanata, GlobalSight, and Pootle are some good TMS software to check out. Also included are localization automation tools that can ease your work burden, such as Okapi Framework and Mojito.

Whitney Grace, August 24, 2017

Written by Stephen E. Arnold · Filed Under News, Open source, SharePoint, software | Comments Off on A Brilliant List of Open Source Localization Tools

DARPA Open Catalog

January 18, 2017

If you are interested in DARPA’s open catalog of open source software, you can find the pointers at this link. The public facing Web site does not provide the names of the companies or research organizations working on the software. The cyber-related listings available in 2015 and early 2-16 no longer appear. Links do point to the program manager for specific projects; for example, the office responsible to ADAMS which detects anomalies in Big Data sets. For generalists interested in DARPA Dark Web projects, the information is difficult to locate using open source tools. The change in the scope of the public facing Open Catalog appears to have taken place July 2016. Some information about specific software can be located if one knows the name of a research entity involved in the Memex project; for example, a query for Stanford University’s DeepDive which was updated in early 2016. One use of DeepDive is to identify spouses in the news.

Stephen E Arnold, January 18, 2017

Written by Stephen E. Arnold · Filed Under Government, News, Open source | Comments Off on DARPA Open Catalog

Google Looks to Curb Hate Speech with Jigsaw

January 6, 2017

No matter how advanced technology becomes, certain questions continue to vex us. For example, where is the line between silencing expression and prohibiting abuse? Wired examines Google’s efforts to walk that line in its article, “Google’s Digital Justice League: How Its Jigsaw Projects are Hunting Down Online Trolls.” Reporter Merjin Hos begins by sketching the growing problem of online harassment and the real-world turmoil it creates, arguing that rampant trolling serves as a sort of censorship — silencing many voices through fear. Jigsaw, a project from Google, aims to automatically filter out online hate speech and harassment. As Jared Cohen, Jigsaw founder and president, put it, “I want to use the best technology we have at our disposal to begin to take on trolling and other nefarious tactics that give hostile voices disproportionate weight, to do everything we can to level the playing field.”

The extensive article also delves into Cohen’s history, the genesis of Jigsaw, how the team is teaching its AI to identify harassment, and problems they have encountered thus far. It is an informative read for anyone interested in the topic.

Hos describes how the Jigsaw team has gone about instructing their algorithm:

The group partnered with The New York Times (NYT), which gave Jigsaw’s engineers 17 million comments from NYT stories, along with data about which of those comments were flagged as inappropriate by moderators.

Jigsaw also worked with the Wikimedia Foundation to parse 130,000 snippets of discussion around Wikipedia pages. It showed those text strings to panels of ten people recruited randomly from the CrowdFlower crowdsourcing service and asked whether they found each snippet to represent a ‘personal attack’ or ‘harassment’. Jigsaw then fed the massive corpus of online conversation and human evaluations into Google’s open source machine learning software, TensorFlow. …

By some measures Jigsaw has now trained Conversation AI to spot toxic language with impressive accuracy. Feed a string of text into its Wikipedia harassment-detection engine and it can, with what Google describes as more than 92 per cent certainty and a ten per cent false-positive rate, come up with a judgment that matches a human test panel as to whether that line represents an attack.

There is still much to be done, but soon Wikipedia and the New York Times will be implementing Jigsaw, at least on a limited basis. At first, the AI’s judgments will be checked by humans. This is important, partially because the software still returns some false positives—an inadvertent but highly problematic overstep. Though a perfect solution may be impossible, it is encouraging to know Jigsaw’s leader understands how tough it will be to balance protection with freedom of expression. “We don’t claim to have all the answers,” Cohen emphasizes.

Cynthia Murrell, January 6, 2017

Written by Stephen E. Arnold · Filed Under AI, Google, News, Open source, Search quality, Technology | Comments Off on Google Looks to Curb Hate Speech with Jigsaw

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Employment
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Fogint
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

Open Source Panda Simplifies Data Analysis

Quote to Note: Facebook and Open Source As a Wooden Club

The Companies Leading Open Source

Internet Archive: The Bono Books

Enterprise Search: Still Floundering after All These Years

Microsoft and Open Source Software: Cost Cutting Tactic or a RedHat Type Play

Support for Open Source AI from Financial Firms

A Brilliant List of Open Source Localization Tools

DARPA Open Catalog

Google Looks to Curb Hate Speech with Jigsaw

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta