A Brilliant List of Open Source Localization Tools

August 24, 2017

Open source projects over technology developers the ability to access technology usually locked behind pay walls.  One trouble with open source technology is language translation and the ability for developers to localize their projects.  Language continues to remain a barrier in our technology driven world, but there are tools to overcome it.  OpenSource.com curated a list of, “18 Open Source Translation Tools To Localize Your Project.”

The curator understands the pains of proprietary software:

The proprietary versions of these tools can be quite expensive. A single license for SDL Trados Studio (the leading CAT tool) can cost thousands of euros, and even then it is only useful for one individual and the customizations are limited (and psst, they cost more, too). Open source projects looking to localize into many languages and streamline their localization processes will want to look at open source tools to save money and get the flexibility they need with customization.

The list includes tools for machine translation, which is a hot commodity.  Software that can generate a digestible and accurate translation from one language to another is a must have for many localization projects.  The list recommends checking out Apertium and Moses.  Computer-assisted translation tools are a must have for all translations and language students, because they can save hours of looking up information in dead tree lexicons.  They also work in real time, saving more countless hours, so you should check out OmegaT, Subtitles Translator, and Anaphraseus.  If you are working with multiple translators on your project you will need to utilize a translation management system to organize everyone-think SharePoint.  Jabylon, Zanata, GlobalSight, and Pootle are some good TMS software to check out.  Also included are localization automation tools that can ease your work burden, such as Okapi Framework and Mojito.

Whitney Grace, August 24, 2017

DARPA Open Catalog

January 18, 2017

If you are interested in DARPA’s open catalog of open source software, you can find the pointers at this link. The public facing Web site does not provide the names of the companies or research organizations working on the software. The cyber-related listings available in 2015 and early 2-16 no longer appear. Links do point to the program manager for specific projects; for example, the office responsible to ADAMS which detects anomalies in Big Data sets. For generalists interested in DARPA Dark Web projects, the information is difficult to locate using open source tools. The change in the scope of the public facing Open Catalog appears to have taken place July 2016. Some information about specific software can be located if one knows the name of a research entity involved in the Memex project; for example, a query for Stanford University’s DeepDive which was updated in early 2016. One use of DeepDive is to identify spouses in the news.

Stephen E Arnold, January 18, 2017

Google Looks to Curb Hate Speech with Jigsaw

January 6, 2017

No matter how advanced technology becomes, certain questions continue to vex us. For example, where is the line between silencing expression and prohibiting abuse? Wired examines Google’s efforts to walk that line in its article, “Google’s Digital Justice League: How Its Jigsaw Projects are Hunting Down Online Trolls.” Reporter Merjin Hos begins by sketching the growing problem of online harassment and the real-world turmoil it creates, arguing that rampant trolling serves as a sort of censorship — silencing many voices through fear. Jigsaw, a project from Google, aims to automatically filter out online hate speech and harassment. As Jared Cohen, Jigsaw founder and president, put it, “I want to use the best technology we have at our disposal to begin to take on trolling and other nefarious tactics that give hostile voices disproportionate weight, to do everything we can to level the playing field.”

The extensive article also delves into Cohen’s history, the genesis of Jigsaw, how the team is teaching its AI to identify harassment, and problems they have encountered thus far. It is an informative read for anyone interested in the topic.

Hos describes how the Jigsaw team has gone about instructing their algorithm:

The group partnered with The New York Times (NYT), which gave Jigsaw’s engineers 17 million comments from NYT stories, along with data about which of those comments were flagged as inappropriate by moderators.

Jigsaw also worked with the Wikimedia Foundation to parse 130,000 snippets of discussion around Wikipedia pages. It showed those text strings to panels of ten people recruited randomly from the CrowdFlower crowdsourcing service and asked whether they found each snippet to represent a ‘personal attack’ or ‘harassment’. Jigsaw then fed the massive corpus of online conversation and human evaluations into Google’s open source machine learning software, TensorFlow. …

By some measures Jigsaw has now trained Conversation AI to spot toxic language with impressive accuracy. Feed a string of text into its Wikipedia harassment-detection engine and it can, with what Google describes as more than 92 per cent certainty and a ten per cent false-positive rate, come up with a judgment that matches a human test panel as to whether that line represents an attack.

There is still much to be done, but soon Wikipedia and the New York Times will be implementing Jigsaw, at least on a limited basis. At first, the AI’s judgments will be checked by humans. This is important, partially because the software still returns some false positives—an inadvertent but highly problematic overstep. Though a perfect solution may be impossible, it is encouraging to know Jigsaw’s leader understands how tough it will be to balance protection with freedom of expression. “We don’t claim to have all the answers,” Cohen emphasizes.

Cynthia Murrell, January 6, 2017

Lucidworks Sees Watson as a Savior

December 21, 2016

Lucidworks (really?). A vision has appeared to the senior managers of Lucidworks, an open source search outfit which has ingested $53 million and sucked in another $6 million in debt financing in June 2016. Yep, that Lucidworks. The “really” which the name invokes is an association I form when someone tells me that commercializing open source search is going to knock off the pesky Elastic of Elasticsearch fame while returning a juicy payoff to the folks who coughed up the funds to keep the company founded in 2007 chugging along. Yep, Lucid works. Sort of, maybe.

I read “Lucidworks Integrates IBM Watson into Fusion Enterprise Discovery Platform.” The write up explains that Lucidworks is “tapping into” the IBM Watson developer cloud. The write up explains that Lucidworks has:

an application framework that helps developers to create enterprise discovery applications so companies can understand their data and take action on insights.

Ah, so many buzzwords. Search has become applications. “Action on insights” puts some metaphorical meat on the bones of Solr, the marrow of Lucidworks. Really?

With Watson in the company’s back pocket, Lucidworks will deliver. I learned:

Customers can rely on Fusion to develop and deploy powerful discovery apps quickly thanks to its advanced cognitive computing features and machine learning from Watson. Fusion applies Watson’s machine learning capabilities to an organization’s unique and proprietary mix of structured and unstructured data so each app gets smarter over time by learning to deliver better answers to users with each query. Fusion also integrates several Watson services such as Retrieve and Rank, Speech to Text, Natural Language Classifier, and AlchemyLanguage to bolster the platform’s performance by making it easier to interact naturally with the platform and improving the relevance of query results for enterprise users.

But wait. Doesn’t Watson perform these functions already. And if Watson comes up a bit short in one area, isn’t IBM-infused Yippy ready to take up the slack?

That question is not addressed in the write up. It seems that the difference between Watson, its current collection of partners, and affiliated entities like Yippy are vast. The write up tells me:

customers looking for hosted, pre-tuned machine learning and natural language processing capabilities can point and click their way to building sophisticated applications without the need for additional resources. By bringing Watson’s cognitive computing technology to the world of enterprise data apps, these discovery apps made with Fusion are helping professionals understand the mountain of data they work with in context to take action.

This sounds like quite a bit of integration work. Lucidworks. Really?

Stephen E Arnold, December 21, 2016

IBM Open Sourciness Goes Only So Far

December 19, 2016

I love IBM, Big Blue, creator of Watson. Watson, as you may know, is a confection consisting of goodies from IBM’s internal code wizards, acquired technologies like the instantly Big Data friendly Vivisimo, and Lucene. Yep, like Attivio and many other “search” vendors, open source Lucene is the way to reduce the costs for basic information retrieval.

I assume you know about OpenLava, which is an open source system for managing certain types of IBM systems. The Open Lava Web page here states:

With an active community of users and developers, OpenLava development is accelerating, delivering high-quality implementations of important new features including:

  • Fair-share scheduling – allocate resources between users and groups according to configurable policies
  • Job pre-emption – Ensure that critical users, jobs and groups have the resources they need – when they need them
  • Docker support – Providing application isolation, fast service deployment and cloud mobility
  • Cloud & VM friendly auto-scaling – Easily add or remove cluster nodes on the fly without cluster re-configuration

These features are in addition to the many advanced capabilities already in OpenLava including job arrays, run-windows, n-way host failover, job limits, dependencies for multi-step workflows, parallel job support and much more.

I read “OpenLava under IBM Attack.” I believe everything I read on the Internet. The write up explains that that Big Blue wants the OpenLava open source code removed. The write up states:

IBM claims that the versions of OpenLava starting from 3.0 infringe their copyright
and that some source code have been stolen from them, copied, or otherwise taken
from their code base.

Several thoughts:

  1. The folks involved with OpenLava did knowingly and intentionally rip off IBM’s software, and the marketer of Watson and its open source tinged Watson is taking a logical and appropriate action against the open source alternative to IBM’s own management software
  2. IBM is unhappy with OpenLava’s adoption by IBM customers. IBM customers should buy only software from IBM-authorized sources. Other old school enterprise software companies have this philosophy too.
  3. There is a failure to communicate. OpenLava is not making its case understandable to the outfit poised to hire 25,000 more employees and IBM is not making itself clear to the crafty folks at OpenLava.

I don’t have a dog in the fight. But I find it interesting that IBM Watson with its Lucene tinged capabilities is finding open source distasteful in some circumstances.

Life was far simpler when open source projects were more malleable. Next stop? The legal eagles’ nests.

Stephen E Arnold, December 19, 2016

Tor Phone to Take on Google

December 13, 2016

Tor users have nil or very limited options to surf Underground Web anonymously as Android-powered phones still manage to scrape user data. The Tor Project intends to beat Google at its own game with Tor-enabled smartphone.

An article that appeared on arsTechnica and titled Tor Phone Is Antidote to Google “Hostility” Over Android, Says Developer, says:

The prototype is meant to show a possible direction for Tor on mobile. We are trying to demonstrate that it is possible to build a phone that respects user choice and freedom, vastly reduces vulnerability surface, and sets a direction for the ecosystem with respect to how to meet the needs of high-security users.

The phone is powered by custom-made CopperHead OS and can be run only on Google Nexus or Pixel hardware phones. Of course due to high technicalities involved, it is recommended only for Linux geeks.

For voice calls, according to the article:

To protect user privacy, the prototype runs OrWall, the Android firewall that routes traffic over Tor, and blocks all other traffic. Users can punch a hole through the firewall for voice traffic, for instance, to enable Signal.

Google’s Android is an Open Source platform that OEMs can customize. This creates multiple security threats enabling hackers and snoopers to create backdoors. CopperHead OS, on the other hand, plugs these security holes with verified boot and also stops Google Play Store from overriding native apps. Seems the days of mobile Tor are finally here.

Vishal Ingole, December  13, 2016

Super Secretive Google DeepMind Open Sources AI Rocket Science

December 6, 2016

Take that IBM. And you Microsoft, I see you and raise you more smart software. The high stakes poker game for the control of the burgeoning market for smart software is getting exciting and expensive. The Google either bursts with confidence, or it fears that outfits like IBM and Microsoft are poised to run the table.

Navigate to the “real” news story “Google DeepMind Makes AI Training Platform Publicly Available.” You will learn that:

DeepMind is putting the entire source code for its training environment — which it previously called Labyrinth and has now renamed as DeepMind Lab — on the open-source depository GitHub, the company said Monday. Anyone will be able to download the code and customize it to help train their own artificial intelligence systems. They will also be able to create new game levels for DeepMind Lab and upload these to GitHub.

Is the move a response to auto maker, dreamer, and launcher of expensive fireworks Elon Musk’s puttering in smart software? The “real” news story said:

OpenAI, a rival research shop set up by billionaire entrepreneur Elon Musk, venture capitalist Peter Thiel and Sam Altman, a founder of Silicon Valley startup accelerator Y Combinator, made its own AI training platform, called OpenAI Gym, available to the public in April [2016]. On Monday it also announced that it was making public an interface called Universe that lets an AI agent “use a computer like a human does: by looking at screen pixels and operating a virtual keyboard and mouse,” the company said in a statement. In short, it’s a go-between that lets an AI system learn the skills needed to play games or operate other applications. Researchers can use tools in OpenAI’s Gym to measure how these agents perform.

Why pay for artificial intelligence to perform the handful of tasks that the Harvard Business Review documented in the helpful write up “What Artificial Intelligence Can and Can’t Do Right Now.”

Perhaps free software will allow the capabilities of smart software to achieve the heights of wonder the marketers envision? How does one spell lock in?

Stephen E Arnold, December 6, 2016

Iran-Russia Ink Pact for Search Engine Services

November 28, 2016

Owing to geopolitical differences, countries like Iran are turning towards like-minded nations like Russia for technological developments. Russian Diplomat posted in Iran recently announced that home-grown search engine service provider Yandex will offer its services to the people of Iran.

Financial Tribune in a news report Yandex to Arrive Soon said that:

Last October, Russian and Iranian communications ministers Nikolay Nikiforov and Mahmoud Vaezi respectively signed a deal to expand bilateral technological collaborations. During the meeting, Russian Ambassador Vaezi said, We are familiar with the powerful Russian search engine Yandex. We agreed that Yandex would open an office in Iran. The system will be adapted for the Iranian people and will be in Persian.

Iran traditionally has been an extremist nation and at the center of numerous international controversies that indirectly bans American corporations from conducting business in this hostile territory. On the other hand, Russia which is seen as a foe to the US stands to gain from these sour relations.

As of now, .com and .com.tr domains owned by Yandex are banned in Iran, but with the MoU signed, that will change soon. There is another interesting point to be observed in this news piece:

Looking at Yandex.ir, an official reportedly working for IRIB purchased the website, according to a domain registration search.  DomainTools, a portal that lists the owners of websites, says Mohammad Taqi Mozouni registered the domain address back in July.

Technically, and internationally accepted, no individual or organization can own a domain name of a company with any extension (without necessary permissions) that has already carved out a niche for itself online. It is thus worth pondering what prompted a Russian search engine giant to let a foreign governmental agency acquire its domain name.

Vishal Ingole November 28, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Android Has No Competition in Mobile OS Market

November 23, 2016

Google’s Android OS currently powers 88% of the smartphones in the world, leaving minuscule 12.1 percent to Apple’s iOS and the remaining 0.3 percent for Windows Mobile, BlackBerry OS and Tizen.

IBTimes in an article titled Android Rules! 9 out of Every 10 Phones Run Google’s OS says:

Google’s Android OS dominated the world by powering 88 percent of the world’s smartphone market in the third quarter of 2016. This means 9 out of every 10 mobile phones in the world are using Android, while the rest rely on iOS or other mobile OS such as BlackBerry OS, Tizen and Windows Phone.

The growth occurred despite the fact that smartphone shipments are falling. China and Africa which were big markets have been performing poorly since last three-quarters. Android’s gain thus can be attributed to the fact that Android is an OpenSource system that can be used by any device manufacturer.

Despite being the clear leader, the mobile OS is full of bugs and other inherent problems, as the article points out:

Android platform is getting overcrowded with hundreds of manufacturers, few Android device vendors make profits, and Google’s new Pixel range is attacking its own hardware partners that made Android popular in the first place.

At present, Samsung, Huawei, Oppo and Vivo are the leading Android phone makers. However, Google recently unveiled Pixel, its flagship phone for the premium category. Does it mean that Google has its eyes set on the premium handset category market? Only time can tell.

Vishal Ingole, November 23, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Code.gov: Missing Some Stuff

November 9, 2016

I love US government Web sites. They come and then they fade. The new kid on the block is Code.gov. The idea is that the US government has created a portal for open source software. Here are the entities whose code is available to anyone able to navigate to this link.

  • Agriculture
  • Commerce
  • EPA
  • Energy
  • Executive Office of the President
  • GSA (home of 18F)
  • Labor
  • NASA
  • National Archives and Records Administration
  • OPM (yep, the security conscious folks)
  • Treasury
  • Veterans Affairs (what are you looking at, cupcake?)

I did notice some interesting gaps; for example, does the Department of Defense have open source software? Well, maybe not. We do include a pointer to more than 100 useful programs in the forthcoming Dark Web Notebook. Want to reserve a copy? Write benkent2020 at yahoo dot com and we’ll put your name on the list.

Stephen E Arnold, November 9, 2016

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta