Shorter Content Means Death for Scientific Articles

December 26, 2016

The digital age is a culture that subsists on digesting quick bits of information before moving onto the next. Scientific journals are hardly the herald of popular trends, but in order to maintain relevancy with audiences the journals are pushing for shorter articles. The shorter articles, however, presents a problem for the authors says Ars Technica in the, “Scientific Publishers Are Killing Research Papers.”

Shorter articles are also pushed because scientific journals have limited pages to print. The journals are also pressured to include results and conclusions over methods to keep the articles short. The methods, in fact, are usually published in another publication labeled supplementary information:

Supplementary information doesn’t come in the print version of journals, so good luck understanding a paper if you like reading the hard copy. Neither is it attached to the paper if you download it for reading later—supplementary information is typically a separate download, sometimes much larger than the paper itself, and often paywalled. So if you want to download a study’s methods, you have to be on a campus with access to the journal, use your institutional proxy, or jump through whatever hoops are required.

The lack of methodical information can hurt researchers who rely on the extra facts to see if it is relevant to their own work. The shortened articles also reference the supplementary materials and without them it can be hard to understand the published results. The shorter scientific articles may be better for general interest, but if they lack significant information than how can general audiences understand them?

In short, the supplementary material should be included online and should be easily accessed.

Whitney Grace, December 26, 2016

Written by Stephen E. Arnold · Filed Under Data, News, Publishing, Reference tool, Technology | Comments Off on Shorter Content Means Death for Scientific Articles

The Noble Quest Behind Semantic Search

November 25, 2016

A brief write-up at the ontotext blog, “The Knowledge Discovery Quest,” presents a noble vision of the search field. Philologist and blogger Teodora Petkova observed that semantic search is the key to bringing together data from different sources and exploring connections. She elaborates:

On a more practical note, semantic search is about efficient enterprise content usage. As one of the biggest losses of knowledge happens due to inefficient management and retrieval of information. The ability to search for meaning not for keywords brings us a step closer to efficient information management.

If semantic search had a separate icon from the one traditional search has it would have been a microscope. Why? Because semantic search is looking at content as if through the magnifying lens of a microscope. The technology helps us explore large amounts of systems and the connections between them. Sharpening our ability to join the dots, semantic search enhances the way we look for clues and compare correlations on our knowledge discovery quest.

At the bottom of the post is a slideshow on this “knowledge discovery quest.” Sure, it also serves to illustrate how ontotext could help, but we can’t blame them for drumming up business through their own blog. We actually appreciate the company’s approach to semantic search, and we’d be curious to see how they manage the intricacies of content conversion and normalization. Founded in 2000, ontotext is based in Bulgaria.

Cynthia Murrell, November 25, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under Connectors, Data, Management, News, Search, Semantic, Technology | Comments Off on The Noble Quest Behind Semantic Search

Surprise, Most Dark Web Content Is Legal

November 21, 2016

If you have been under the impression that Dark Web is that big chunk of the Internet where all activities and content is illegal, you are wrong.

In a news report published by Neowin, and titled Terbium Labs: Most of the Dark Web Content, Visible Through Tor, Is Legal reveals:

Contrary to popular belief that the majority of the dark web, accessible through Tor is mostly legal… or offline! With extremism making up just a minuscule 0.2% of the content looked at.

According to this Quora thead, Dark Web was developed by US Military and Intelligence to communicate with their assets securely. The research started in 1995 and in 1997, mathematicians at Naval Research Laboratory developed The Onion Router Project or Tor. People outside Military Intelligence started using Tor to communicate with others for various reasons securely. Of course, people with ulterior motives spotted this opportunity and began utilizing Tor. This included arms and drug dealers, human traffickers, pedophiles. Mainstream media thus propagated the perception that Dark Web is an illegal place where criminal actors lurk, and all content is illegal.

Terbium Labs study indicates that 47.7% of content is legal and rest is borderline legal in the form of hacking services. Very little content is technically illegal like child pornography, arms dealing, drug dealing, and human trafficking related.

The Dark Web, however, is not a fairyland where illegal activities do not occur. As the news report points out:

While this report does prove that seedy websites exist on the dark web, they are in fact a minority, contradictory to what many popular news reports would have consumers believe.

Multiple research agencies have indicated that most content is legal on Dark Web with figures to back that up. But they still have not revealed, what this major chunk of legal content is made of? Any views?

Vishal Ingole, November 21, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under Dark Web, Legal matters, News, Security | Comments Off on Surprise, Most Dark Web Content Is Legal

Gleaning Insights and Advantages from Semantic Tagging for Digital Content

September 22, 2016

The article titled Semantic Tagging Can Improve Digital Content Publishing on Aptara Corp. blog reveals the importance of indexing. The article waves the flag of semantic tagging at the publishing industry, which has been pushed into digital content kicking and screaming. The difficulties involved in compatibility across networks, operating systems, and a device are quite a headache. Semantic tagging could help, if only anyone understood what it is. The article enlightens us,

Put simply, semantic markups are used in the behind-the-scene operations. However, their importance cannot be understated; proprietary software is required to create the metadata and assign the appropriate tags, which influence the level of quality experienced when delivering, finding and interacting with the content… There have been many articles that have agreed the concept of intelligent content is best summarized by Ann Rockley’s definition, which is “content that’s structurally rich and semantically categorized and therefore automatically discoverable, reusable, reconfigurable and adaptable.

The application to the publishing industry is obvious when put in terms of increasing searchability. Any student who has used JSTOR knows the frustrations of searching digital content. It is a complicated process that indexing, if administered correctly, will make much easier. The article points out that authors are competing not only with each other, but also with the endless stream of content being created on social media platforms like Facebook and Twitter. Publishers need to take advantage of semantic markups and every other resource at their disposal to even the playing field.

Chelsea Kerwin, September 22, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
There is a Louisville, Kentucky Hidden Web/Dark Web meet up on September 27, 2016.
Information is at this link: https://www.meetup.com/Louisville-Hidden-Dark-Web-Meetup/events/233599645/

Written by Stephen E. Arnold · Filed Under Analytics, Applications, Facebook, News, Semantic, Twitter | Comments Off on Gleaning Insights and Advantages from Semantic Tagging for Digital Content

Content Cannot Be Searched If It Is Not There

August 16, 2016

Google Europe is already dealing with a slew of “right to be forgotten” requests, but Twitter had its own recent fight with deletion related issue. TechCrunch shares the story about “Deleted Tweet Archive PostGhost Shut Down After Twitter Cease And Desist” order. PostGhost was a Web site that archived tweets from famous public figures. PostGhost gained its own fame for recording deleted tweets.

The idea behind PostGhost was to allow a transparent and accurate record. The Library of Congress already does something similar as it archives every Tweet. Twitter, however, did not like PostGhost and sent them a cease and desist threatening to remove their API access. Apparently,Google it is illegal to post deleted tweets, something that evolved from the European “right to be forgotten” laws.

So is PostGhost or Twitter wrong?

“There are two schools of thought when something like this happens. The first is that it’s Twitter’s prerogative to censor anything and all the things. It’s their sandbox and we just play in it. The second school of thought says that Twitter is free-riding on our time and attention and in exchange for that they should work with their readers and users in a sane way.”

Twitter is a platform for a small percentage of users, the famous and public figures, who instantly have access to millions of people when they voice their thoughts. When these figures put their thoughts on the Internet it has more meaning than the average tweet. Other Web sites do the same, but it looks like public figures are exempt from this rule. Why? I am guessing money is exchanging hands.

Whitney Grace, August 16, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

There is a Louisville, Kentucky Hidden /Dark Web meet up on August 23, 2016.
Information is at this link: https://www.meetup.com/Louisville-Hidden-Dark-Web-Meetup/events/233019199/

Written by Stephen E. Arnold · Filed Under Google, Government, News, Privacy, Security, Twitter | Comments Off on Content Cannot Be Searched If It Is Not There

Facebook Acknowledges Major Dependence on Artificial Intelligence

July 28, 2016

The article on Mashable titled Facebook’s AI Chief: ‘Facebook Today Could Not Exist Without AI’ relates the current conversations involving Facebook and AI. Joaquin Candela, the director of applied machine learning at Facebook, states that “Facebook could not exist without AI.” He uses the examples of the News Feed, ads, and offensive content, all of which involve AI stimulating a vastly more engaging and personalized experience. He explains,

“If you were just a random number and we changed that random number every five seconds and that’s all we know about you then none of the experiences that you have online today — and I’m not only talking about Facebook — would be really useful to you. You’d hate it. I would hate it. So there is value of course in being able to personalize experiences and make the access of information more efficient to you.”

And we thought all Facebook required is humans and ad revenue. Candela makes it very clear that Facebook is driven by machine learning and personalization. He paints a very bleak picture of what Facebook would look like without AI- completely random ads, unranked New Feeds, and offensive content splashing around like beached whale. Only in the last few years, computer vision has changed Facebook’s process of removing such content. What used to take reports and human raters now is automated.

Chelsea Kerwin, July 28, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under AI, Data, Facebook, News, Social Media | Comments Off on Facebook Acknowledges Major Dependence on Artificial Intelligence

Google Changes Its Algorithm Again

May 26, 2016

As soon as we think we have figured out how to get our content to the top of Google’s search rankings, the search engine goes and changes its algorithms. The Digital Journal offers some insight into “Op-Ed: How Will The Google 2016 Algorithm Change Affect Our Content?”

In early 2016, Google announced they were going to update their Truth Algorithm and it carries on many of the aspects they have been trying to push. Quality content over quantity is still very important. Keyword heavy content is negated in favor of pushing Web sites that offer relevant, in-depth content and that better answer a user’s intent.

SEO changes took a dramatic turn with a Penguin uploaded and changes in the core algorithm. The biggest game changer is with mobile technologies:

“The rapid advancement of mobile technologies is deeply affecting the entire web scenario. Software developers are shifting towards the development of new apps and mobile websites, which clearly represent the future of information technology. Even the content for mobile websites and apps is now different, and Google had to account for that with the new ranking system changes. The average mobile user is very task oriented and checks his phones just to quickly accomplish a specific task, like finding a nearby café or cinema. Mobile-oriented content must be much shorter and concise than web-oriented one. The average web surfer wants to know, learn and explore things in a much more relaxed setting.”

Google wants to clear its search results of what is known as unviable information and offer users a better quality search experience for both their mobile devices and standard desk computers. Good to know that someone wants to deliver a decent product.

Whitney Grace, May 26, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under algorithms, Google, Mobile, News, Technology | Comments Off on Google Changes Its Algorithm Again

Extensive Cultural Resources Available at Europeana Collections

May 17, 2016

Check out this valuable cultural archive, highlighted by Open Culture in the piece, “Discover Europeana Collections, a Portal of 48 Million Free Artworks, Books, Videos, Artifacts & Sounds from across Europe.” Writer Josh Jones is clearly excited about the Internet’s ability to place information and artifacts at our fingertips, and he cites the Europeana Collections as the most extensive archive he’s discovered yet. He tells us the works are:

“… sourced from well over 100 institutions such as The European Library, Europhoto, the National Library of Finland, University College Dublin, Museo Galileo, and many, many more, including contributions from the public at large. Where does one begin?

“In such an enormous warehouse of cultural history, one could begin anywhere and in an instant come across something of interest, such as the the stunning collection of Art Nouveau posters like that fine example at the top, ‘Cercle Artstique de Schaerbeek,’ by Henri Privat-Livemont (from the Plandiura Collection, courtesy of Museu Nacional d’Art de Catalynya, Barcelona). One might enter any one of the available interactive lessons and courses on the history of World War I or visit some of the many exhibits on the period, with letters, diaries, photographs, films, official documents, and war propaganda. One might stop by the virtual exhibit, ‘Photography on a Silver Plate,’ a fascinating history of the medium from 1839-1860, or ‘Recording and Playing Machines,’ a history of exactly what it sounds like, or a gallery of the work of Swiss painter Jean Antoine Linck. All of the artifacts have source and licensing information clearly indicated.”

Jones mentions the archive might be considered “endless,” since content is being added faster than anyone could hope to keep up with. While such a wealth of information and images could easily overwhelm a visitor, he advises us to look at it as an opportunity for discovery. We concur.

Cynthia Murrell, May 17, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under Data, EDiscovery, News, Search, User experience, visualization | Comments Off on Extensive Cultural Resources Available at Europeana Collections

Data on Dark Web Not Excused from Fact or Fiction Debate

April 19, 2016

Remember when user information was leaked from the extramarital affairs website AshleyMadison? While the leak caused many controversies, the release of this information specifically on the Dark Web gives reason to revisit an article from Mashable, Another blow for Ashley Madison: User emails leaked on Dark Web as a refresher on the role Tor played. A 10-gigabyte file was posted as a Torrent on the Dark Web which included emails and credit card information among other user data. The article concluded,

“With the data now out there, Internet users are downloading and sifting through it for anything – or, rather, anyone – of note. Lists of email addresses of AshleyMadison users are being circulated on social media. Several appear to be connected to members of the UK government but are likely fake. As Wired notes, the site doesn’t require email verification, meaning the emails could be fake or even hijacked.”

The future of data breaches and leaks may be unclear, but the falsification of information — leaked or otherwise — always remains a possibility. Regardless of the element of scandal existing in future leaks, it is important to note that hackers and other groups are likely not above manipulation of information.

Megan Feil, April 19, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under Dark Web, Data, News, Search, Security | Comments Off on Data on Dark Web Not Excused from Fact or Fiction Debate

Bigger Picture Regarding Illegal Content Needed

March 25, 2016

Every once in awhile an article on the Dark Web comes along that takes a step back from the latest action on Tor and offers a deep-dive on the topic at large. Delving into the World of the Dark Web was recently published on Raconteur, for example. In this article, we learned the definition of darknets: networks only accessible through particular software, such as Tor, and trusted peer authorization. The article continues,

“The best known, and by far the most popular, darknet is the Onion Router (Tor), which was created by the US Naval Research Labs in the 90s as an enabler of secure communication and funded by the US Department of Defense. To navigate it you use the Tor browser, similar to Google Chrome or Internet Explorer apart from keeping the identity of the person doing the browsing a secret. Importantly, this secrecy also applies to what the user is looking at. It is because servers hosting websites on the Tor network, denoted by their .onion (dot onion) designation, are able to mask their location.”

Today, the Dark Web is publicly available to be used anonymously by anyone with darknet software and home to a fair amount of criminal activity. Researchers at King’s College London scraped the .onion sites and results suggested about 57 percent of Tor sites host illegal content. We wonder about the larger context; for example, what percent of sites viewed on mainstream internet browsers host illegal content?

Megan Feil, March 25, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under Dark Web, Google, News, Security, Technology | Comments Off on Bigger Picture Regarding Illegal Content Needed

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Employment
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Fogint
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Telegram
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

Shorter Content Means Death for Scientific Articles

The Noble Quest Behind Semantic Search

Surprise, Most Dark Web Content Is Legal

Gleaning Insights and Advantages from Semantic Tagging for Digital Content

Content Cannot Be Searched If It Is Not There

Facebook Acknowledges Major Dependence on Artificial Intelligence

Google Changes Its Algorithm Again

Extensive Cultural Resources Available at Europeana Collections

Data on Dark Web Not Excused from Fact or Fiction Debate

Bigger Picture Regarding Illegal Content Needed

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta