Local News Station Produces Dark Web Story

April 22, 2016

The Dark Web continues to emerge as a subject of media interest for growing audiences. An article, Dark Web Makes Illegal Drug, Gun Purchases Hard To Trace from Chicago CBS also appears to have been shared as a news segment recently. Offering some light education on the topic, the story explains the anonymity possible for criminal activity using the Dark Web and Bitcoin. The post describes how these tools are typically used,

“Within seconds of exploring the deep web we found over 15,000 sales for drugs including heroin, cocaine and marijuana. In addition to the drugs we found fake Illinois drivers licenses, credit card and bank information and dangerous weapons. “We have what looks to be an assault rifle, AK 47,” said Petefish. That assault rifle AK 47 was selling for 10 bitcoin which would be about $4,000. You can buy bitcoins at bitcoin ATM machines using cash, leaving very little trace of your identity. Bitcoin currency along with the anonymity and encryption used on the dark web makes it harder for authorities to catch criminals, but not impossible.”

As expected, this piece touches on the infamous Silk Road case along with some nearby cases involving local police. While the Dark Web and cybercrime has been on our radar for quite some time, it appears mainstream media interest around the topic is slowly growing. Perhaps those with risk to be affected, such as businesses, government and law enforcement agencies will also continue catching on to the issues surrounding the Dark Web.

Megan Feil, April 22, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under Dark Web, Financial, Government, News, Search, Security | Comments Off on Local News Station Produces Dark Web Story

Data Intake: Still a Hassle

April 21, 2016

I read “Big Data’s Biggest Problem: It’s Too Hard to Get the Data In.” Here’s a quote I noted:

According to a study by data integration specialist Xplenty, a third of business intelligence professionals spend 50% to 90% of their time cleaning up raw data and preparing to input it into the company’s data platforms. That probably has a lot to do with why only 28% of companies think they are generating strategic value from their data.

My hunch is that with the exciting hyperbole about Big Data, the problem of normalizing, cleaning, and importing data is ignored. The challenge of taking file A in a particular file format and converting to another file type is indeed a hassle. A number of companies offer expensive filters to perform this task. The one I remember is Outside In, which sort of worked. I recall that when odd ball characters appeared in the file, there would be some issues. (Does anyone remember XyWrite?) Stellent purchased Outside In in order to move content into that firm’s content management system. Oracle purchased Stellent in 2006. Then Kapow “popped” on the scene. The firm promoted lots of functionality, but I remember it as a vendor who offered software which could take a file in one format and convert it into another format. Kofax (yep, the scanner oriented outfit) bought Kofax to move content from one format into one that Kofax systems could process. Then Lexmark bought Kofax and ended up with Kapow. With that deal, Palantir and other users of the Kapow technology probably had a nervous moment or are now having a nervous moment as Lexmark marches toward a new owner. Entropy, a French outfit, was a file conversion outfit. It sold out to Salesforce. Once again, converting files from Type A to another desired format seems to have been the motivating factor.

Let us not forget the wonderful file conversion tools baked into software. I can save a Word file as an RTF file. I can import a comma separated file into Excel. I can even fire up Framemaker and save a Dot fm file as RTF. In fact, many programs offer these import and export options. The idea is to lessen the pain of have a file in one format which another system cannot handle. Hey, for fun, try opening a macro filled XyWrite file in Framemaker or Indesign. Just change the file extension to one the system thinks it recognizes. This is indeed entertaining.

The write up is not interested in the companies which have sold for big bucks because their technology could make file conversion a walk in the Hounz Lane Park. (Watch out for the rats, gentle reader.) The write up points out three developments which will make the file intake issues go away:

The software performing file conversion “gets better.” Okay, I have been waiting for decades for this happy time to arrive. No joy at the moment.
“Data preparers become the paralegals of data science.” Now that’s a special idea. I am not clear on what a “data preparer” is, but it sounds like a task that will be outsourced pretty quickly to some country far from the home of NASCAR.
Artificial intelligence” will help cleanse data. Excuse me, but smart software has been operative in file conversion methods for quite a while. In my experience, the exception files keep on piling up.

What is the problem with file conversion? I don’t want to convert this free blog post into a lengthy explanation. I can highlight five issues which have plagued me and my work in file conversion for many years:

First, file types change over time. Some of the changes are not announced. Others like the Microsoft Word XML thing are the subject of months long marketing., The problem is that unless the outfit responsible for the file conversion system creates a fix, the exception files can overrun a system’s capacity to keep track of problems. If someone is asleep at the switch, data in the exception folder can have an adverse impact on some production systems. Loss of data is interesting but trashing the file structure is a carnival. Who does not pay attention? In my experience, vendors, licensees, third parties, and probably most of the people responsible for a routine file conversion task.

Second, the thrill of XML is that it is not particularly consistent. Somewhere along the line, creativity takes precedence over for well formed. How does one deal with a couple hundred thousand XML files in an exception folder? What do you think about deleting them?

Third, the file conversion software works as long as the person creating a document does not use Fancy Dan “inserts” in the source document. Problems arise from videos, certain links, macros, and odd ball formatting of the source document. Yep, some folks create text in Excel and wonder why the resulting text is a bit of a mess.

Fourth, workflows get screwed up. A file conversion system is semi smart. If a process creates a file with an unrecognized extension, the file conversion system fills the exception folder. But what if one valid extension is changed to a supported but incorrect extension. Yep, XML users be aware that there are proprietary XML formats. The files converted and made available to a system are “sort of right.” Unfortunately sort of right in mission critical applications can have some interesting consequences.

Fifth, attention to detail is often less popular than fiddling with one’s mobile phone or reading Facebook posts. Human inattention can make large scale data conversion fail. I have watched as a person of my acquaintance deleted the folder of exception files. Yo, it is time for lunch.

So what? Smart software makes certain assumptions. At this time, file intake is perceived as a problem which has been solved. My view is that file intake is a core function which needs a little bit more attention. I do not need to be told that smart software will make file intake pain go away.

Stephen E Arnold, April 21, 2016

Written by Stephen E. Arnold · Filed Under Big data, News, Text processing | Comments Off on Data Intake: Still a Hassle

Artificial Intelligence Algorithms Want Wilde Byrons

April 21, 2016

I read “The Next Hot Job in Silicon Valley Is for Poets.” The idea is that English majors, among others of this ilk, will be contributors to more human artificial intelligence systems. The write up informed me:

As in fiction, the AI writers for virtual assistants dream up a life story for their bots. Writers for medical and productivity apps make character decisions such as whether bots should be workaholics, eager beavers or self-effacing. “You have to develop an entire backstory — even if you never use it,” Ewing [a Hollywood writer] said.

With the apparent boom in smart software, English majors and other word oriented creative types may see an end to their employment problems. Now what can be done about unemployed lawyers? Perhaps one can ask IBM Watson?

Stephen E Arnold, April 21, 2016

Written by Stephen E. Arnold · Filed Under AI, News | Comments Off on Artificial Intelligence Algorithms Want Wilde Byrons

Google Removes Pirate Links

April 21, 2016

A few weeks ago, YouTube was abuzz with discontent from some of its most popular YouTube stars. Their channels had been shut down die to copyright claims by third parties, even thought the content in question fell under the Fair Use defense. YouTube is not the only one who has to deal with copyright claims. TorrentFreak reports that “Google Asked To Remove 100,000 ‘Pirate Links’ Every Hour.”

Google handles on average two million DMCA takedown notices from copyright holders about pirated content. TorrentFreak discovered that the number has doubled since 2015 and quadrupled since 2014. The amount beats down to one hundred thousand per hour. If the rate continues it will deal with one billion DMCA notices this year, while it had previously taken a decade to reach this number.

“While not all takedown requests are accurate, the majority of the reported links are. As a result many popular pirate sites are now less visible in Google’s search results, since Google downranks sites for which it receives a high number of takedown requests. In a submission to the Intellectual Property Enforcement Coordinator a few months ago Google stated that the continued removal surge doesn’t influence its takedown speeds.”

Google does not take broad sweeping actions, such as removing entire domain names from search indexes, as it does not want to become a censorship board. The copyright holders, though, are angry and want Google to promote only legal services over the hundreds of thousands of Web sites that pop up with illegal content. The battle is compared to an endless whack-a-mole game.

Pirated content does harm the economy, but the numbers are far less than how the huge copyright holders claim. The smaller people who launch DMCA takedowns, they are hurt more. YouTube stars, on the other hand, are the butt of an unfunny joke and it would be wise for rules to be revised.

Whitney Grace, April 21, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under Google, News, Search, Security, Technology | Comments Off on Google Removes Pirate Links

Digging for a Direction of Alphabet Google

April 21, 2016

Is Google trying to emulate BAE System‘s NetReveal, IBM i2, and systems from Palantir? Looking back at an older article from Search Engine Watch, How the Semantic Web Changes Everything for Search may provide insight. Then, Knowledge Graph had launched, and along with it came a wave of communications generating buzz about a new era of search moving from string-based queries to a semantic approach, organizing by “things”. The write-up explains,

“The cornerstone of any march to a semantic future is the organization of data and in recent years Google has worked hard in the acquisition space to help ensure that they have both the structure and the data in place to begin creating “entities”. In buying Wavii, a natural language processing business, and Waze, a business with reams of data on local traffic and by plugging into the CIA World Factbook, Freebase and Wikipedia and other information sources, Google has begun delivering in-search info on people, places and things.”

This article mentioned Knowledge Graph’s implication for Google to deliver strengthened and more relevant advertising with this semantic approach. Even today, we see the Alphabet Google thing continuing to shift from search to other interesting information access functions in order to sell ads.

Megan Feil, April 21, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under Analytics, Data, Google, News, Search, Technology | Comments Off on Digging for a Direction of Alphabet Google

Yahoo: An Interesting Hatchet Job

April 20, 2016

I think that the Huffington Post is part of America Online, which is part of Verizon, which is supposed to be interested in buying poor old Yahoo.

I thought about that chain of dependent clauses right after I read “The One-Time Ruler of the Web Has Lost More Than Its Mojo — A Lesson for Us All.” The write up does a good job of pointing out that Yahoo has been lost in space for a long, long time. I highlighted this statement:

One researcher tracked Yahoo!’s “boilerplate,” the block of text that describes a company’s self-description found at the bottom of most press releases. In 24 years the boilerplate changed 24 times!

What’s interesting in this mélange of popular and academic statements about the home of the Yahooligans is that the company has lacked vision. Ah, the vision thing. This argument is supported by a statement allegedly made by Helen Keller:

It is a terrible thing to see and have no vision.

Yahoo strikes me as an interesting company. One could argue that Yahoo did try to change to adapt to technology, competitors, and market opportunities. The effort, unlike Google’s approach, lacked the steady flow of cash produced by Google’s online advertising model.

Where did that Google online advertising model originate? GoTo.com which became Overture. Overture became a Yahoo property. The Google was “inspired” by that model. Perhaps one can view Google as a sibling of Yahoo, just a younger, sighted relative?

Stephen E Arnold, April 20, 2016

Written by Stephen E. Arnold · Filed Under News, Yahoo | Comments Off on Yahoo: An Interesting Hatchet Job

List of Acquisitions Related to Smart Software

April 20, 2016

I loathe the buzzwords “artificial intelligence,” “cognitive,” and their ilk. I am okay with smart software or, better yet, semi smart software. If you want a listing of the outfits acquiring smart software companies in the last few years, navigate to “The Race For AI: Google, Facebook, Amazon, Apple In A Rush To Grab Artificial Intelligence Startups.” Mid tier consulting firms will be charging big bucks for their round up of these deals. The list contains the names of 21 outfits and their new owners. Don’t you wish you were a start up owned by IBM or Yahoo?

Stephen E Arnold, April 20, 2016

Written by Stephen E. Arnold · Filed Under Acquisition, AI, News | 1 Comment

Software That Contains Human Reasoning

April 20, 2016

Computer software has progressed further and keeps advancing faster than we can purchase the latest product. Software is now capable of holding simple conversations, accurately translating languages, GPS, self-driving cars, etc. The one thing that that computer developers cannot program is human thought and reason. The New York Times wrote “Taking Baby Steps Toward Software That Reasons Like Humans” about the goal just out of reach.

The article focuses on Richard Socher and his company MetaMind, a deep learning startup working on pattern recognition software. He along with other companies focused on artificial intelligence are slowly inching their way towards replicating human thought on computers. The progress is slow, but steady according to a MetaMind paper about how machines are now capable of answering questions of both digital images and textual documents.

“While even machine vision is not yet a solved problem, steady, if incremental, progress continues to be made by start-ups like Mr. Socher’s; giant technology companies such as Facebook, Microsoft and Google; and dozens of research groups. In their recent paper, the MetaMind researchers argue that the company’s approach, known as a dynamic memory network, holds out the possibility of simultaneously processing inputs including sound, sight and text.”

The software that allows computers to answer questions about digital images and text is sophisticated, but the data to come close to human capabilities is not only limited, but also nonexistent. We are coming closer to understanding the human brain’s complexities, but artificial intelligence is not near Asimov levels yet.

Whitney Grace, April 20, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under AI, Data, Facebook, Google, Microsoft, News, Security | Comments Off on Software That Contains Human Reasoning

Lessons to Learn from Instagram Translation Systems

April 20, 2016

Social media services attempt to eliminate the publishing of pornographic content on their sites through a combination of user reporting and algorithms. However, Daily Star reports Shock as one million explicit porn films found on Instagram. This content existed on Instagram despite their non-nudity policy. However, according to the article, much of the pornographic videos and photos were removed after news broke. Summarizing how the content was initially published, the article states,

“The videos were unearthed by tech blogger Jed Ismael, who says he’s discovered over one million porn films on the site. Speaking on his blog, Ismael said: “Instagram has banned certain English explicit hashtags from being showed in search. “Yet users seem to find a way around the policy, by using non English terms or hashtags. “I came across this discovery by searching for the hashtag “?????” which means movies in Arabic.” Daily Star Online has performed our own search and easily found hardcore footage without the need for age verification checks.”

While Tor has typically been seen as the home for such services, it appears some users have found a workaround. Who needs the Dark Web? As for those online translation systems, perhaps some services should consider their utility.

Megan Feil, April 20, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under Dark Web, News, Search, Security, Technology | Comments Off on Lessons to Learn from Instagram Translation Systems

Chinese Restaurant Names as Journalism

April 19, 2016

I read an article in Jeff Bezos’ newspaper. The title was “We Analyzed the Names of Almost Every Chinese Restaurant in America. This Is What We Learned.” The almost is a nifty way of slip sliding around the sampling method which used restaurants listed in Yelp. Close enough for “real” journalism.

Using the notion of a frequency count, the write up revealed:

The word appearing most frequently in the names of the sample was “restaurant.”
The words “China” and “Chinese” appear in about 15,000 of the sample’s restaurant names
“Express” is a popular word, not far ahead of “panda”.

The word list and their frequencies were used to generate a word cloud:

To answer the question where Chinese food is most popular in the US, the intrepid data wranglers at Jeff Bezos’ newspaper output a map:

Amazing. I wonder if law enforcement and intelligence entities know that one can map data to discover things like the fact that the word “restaurant” is the most used word in a restaurant’s name.

Stephen E Arnold, April 19, 2016

Written by Stephen E. Arnold · Filed Under News, Text analytics | 1 Comment

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Fogint
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

Local News Station Produces Dark Web Story

Data Intake: Still a Hassle

Artificial Intelligence Algorithms Want Wilde Byrons

Google Removes Pirate Links

Digging for a Direction of Alphabet Google

Yahoo: An Interesting Hatchet Job

List of Acquisitions Related to Smart Software

Software That Contains Human Reasoning

Lessons to Learn from Instagram Translation Systems

Chinese Restaurant Names as Journalism

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta