In this post, I look at the alleged Russian hacking of the United States elections and the ethicality of such unauthorized intervention from the viewpoint of privacy concerns and social impact of such cyber attack. [Read more…]
The following article is a step by step guide on how to sync Fitbit data into a Google spreadsheet. [Read more…]
A little bit of research on Browser support for functions keys. The keys marked with OK are available to programmers, the rest is dedicated to the browser specific functionality. [Read more…]
Web administrators transitioning the existing WordPress sites (and image resources) to Amazon S3 have couple of paid plugins to choose from to ease the move to S3. However, most of the WordPress plugins come at a pretty hefty prices, some as high as couple of hundred dollars. The following guide outlines the step by step option that is completely free of charge. [Read more…]
Tor is a free software that prevents people from learning your location or browsing habits by letting you communicate anonymously on the Internet. Over the years, I have seen TOR installed in a variety of different environments, and I think it’s important to mention that outside of using Tor Browser for anonymous web browsing, there are various other ways of using TOR. In the following article, I outline one of these TOR options. [Read more…]
The following article analyses how anonymization methods and various other obfuscation techniques fare in their undertaking to defend against the privacy concerns. [Read more…]
Nowadays, it is the volume, velocity, veracity and variety of Big Data that are the primary factors and true amplifiers of the security issues experienced in the large-scale cloud infrastructures. The upsurge in security issues in Big Data installations is predominantly driven by an overall increase in volume and velocity of the data. However, dealing with the diversity of data sources (variety of data) is quickly becoming yet another of the pressing security concerns and the existence of enormous amounts of data is no longer the single factor creating the new security challenges. Data variety is one of the newest security challenges of Big Data infrastructures.
Just a short guide on how to fix “Cast to a nearby device” annoying popup on Android devices. [Read more…]
In the following post, I cover the brief history of Enterprise Data Warehouse (EDW), analyze the major challenges of Enterprise Data Warehouse solutions and discuss traditional EDW and their capacity to handle the Volume, Variety, and Velocity (three of the V’s of Big Data). I also explore Big Data platforms as a potential alternative to EDW. [Read more…]
The following post discusses the method of ‘percentage correct’ predictions and explains why it may not be the most precise method to measure performance. I also examine the topic of analytic measurement techniques in general and recommend the correct substitute prediction method for the situation when ‘percentage correct’ is not a suitable performance measurement approach. [Read more…]
In this article, I continue exploring Logging as a data set. I have described this type of datasets earlier in Log Management and Big Data Analytics post. In this section, I suggest an application of a particular partitioning method called: K-means clustering, because I think that it is the most suitable candidate for use within the specific section of a problem domain of log file management. I explain why I considered the k-means technique to be the most appropriate for this type of data. I also cover the advantages that this analysis brings to logging in general, and demonstrate on a real data set the usage of k-means cluster analysis method. [Read more…]
In the following article, I explore the issue of log collection and analysis, a very specific problem domain for many large organizations. The logging is a suitable example of a volume and high-velocity data set, which makes it a good candidate for the application of Big Data analytic techniques. This article is not meant to go into details of how analytic methods perform data classification or certain other analytic tasks; it’s mainly to shed some light on the application of Big Data techniques to logging and outline some of the business benefits of log analysis. [Read more…]
This article is just me thinking loud about creating something better than the simple wordcount.java example that is usually bundled with the Big Data solutions such as Hadoop – which I covered in the previous post. I wanted a script that would be a bit more complex and relate more to a meaningful web indexing. I wrote a Java program that acts as a Web Parser and can programmatically provide the meaning of any website by statistically judging its content. If ran against Google search results, it can also provide AI like answers to complex questions (such as ‘who is the president of some country’), or guess the closest meaning behind the set of keywords (such as ‘gold, color, breed’ will result in the response: ‘Golden Retriever) – see the examples below. Of course, this is just a result of a bit of a spare time. But it’s something that could perhaps be further explored, as a method to derive basic meaning behind the textual content in big data (to get the gist of the content in couple words). Anyhow, in the current form it’s just a further play on Hadoop’s wordcount.java.
This is a short guide on how to install Hadoop single node cluster on a Windows computer without Cygwin. The intention behind this little test, is to have a test environment for Hadoop in your own local Windows environment. [Read more…]
The following article analyses the applicability of the CAP theorem to Big Data. I will explain the CAP theorem, explore the three of its characteristics, as well as provide the proof of the CAP theorem on an example that is closely related to Big Data use case. I will also briefly discuss couple of possible ways deal with the CAP-related issues in distributed Big Data applications and offer overview of those implementations that best fit each of the CAP properties. [Read more…]