The following short article shows just how simple it is to use Python programming language in a data science project. In this example, we’ll first go to Statista.com (public dataset provider) and download the MS Excel dataset that contains the list of 100 largest companies in the world. The file that contains only two columns, the company name and their current market value. Our goal is to use Python to read the rows and cells inside the Excel file and use it to search the internet for some additional information, such as the company’s headquarters location and it’s map coordinates (latitude and longitude). You’ll see how easily this can be done by using Python web-scraping capabilities. We’ll also show how to write the newly found information back into the Excel sheet and use it to create an infographic that shows the headquarter location of 100 of the world’s top companies on the map.
Recently, I came across a post from 2001 which allowed downloading the entire collection of UTZOO NetNews Archive of earliest USENET posts. These were essentially the earliest available discussions posted to the Internet by folks working for various Universities connected to the Internet. Millions of posts created between Feb 1981 and June of 1991.
Until 2001, these early Usenet discussions were considered being lost, but miraculously Henry Spencer from the University of Toronto, Department of Zoology was backing it up onto magnetic tapes and kept them stored for all these years (apparently at the great cost).
H. Spencer had altogether 141 of these magnetic tapes, but there were of no use, so eventually, him and a couple of motivated people such as David Wiseman (who dragged 141 tapes back and forth in his a pickup truck), Lance Bailey, Bruce Jones, Bob Webber, Brewster Kahle, and Sue Thielen; embarked on a process of converting all of these tapes into regular format, accessible to everyone.
And that’s the copy I downloaded. What a treasure, right?
Well, not so fast, once I unzipped the data, I realized that the TGZ format contains literally millions of small text files (each post in its own file). While it was certainly nice to have, it wasn’t something that I or anyone else could read. Certainly not in a forum like discussion format, it wasn’t obvious which post is the one that starts the discussion or which ones are the replies to the thread. And forget about searching through these files, that was utterly not possible. Just to put things into perspective, it took me over 5 hours to just unzip the archives.
That said, it didn’t take long for me to decide to develop a Java-based converter that would attempt to convert the entire collection from millions of flat files into a fully searchable MySQL database. The following post talks about the process and also includes the Java code of the solution released as open source.
As SnagIt currently doesn’t support Linux, the Flameshot is likely the only screenshot taking utility worthwhile to consider if you’re a Linux user. Here are step by step instructions on how to install Flameshot under Kubuntu KDE Plasma Desktop and also how to associate it with the PRT SC button.
Recently I came across a situation where I needed to create a one-click web-based solution, that allows clients to open and edit PhotoShop, AutoCAD, WordPerfect, PDF, LibreOffice and other files hosted remotely from my WebDAV enabled Apache web server by using their own desktop tools. Essentially I needed a solution for opening and editing remotely hosted documents (located on my web server) through a simple HREF link on a web page. This link, when clicked, would automatically open the PSD file hosted on my server, but do so in the PhotoShop installed on the client’s Windows computer, allowing them to seamlessly open and edit such files, without any need to download and upload the files back and forth to my server. Initially, it seemed like an impossible, task, but then I realized that I should be able to create my very own custom browser URI scheme and some kind of a Windows-based client that handles opening of remote files in local installation Photoshop. After a bit of playing around with it, I got it working and created a solution capable of opening for remote editing not only PhotoShop files, but virtually any other remotely hosted file types in their associated applications, meaning in the software installed on the client’s desktop computer . The following article is how I went about it. [Read more…]
A couple of time already, I had to go through the process of enabling WebDAV on Apache HTTP Server 2.4.x and even though it is a fairly simple thing to do, I always end up Googling around a proper working solution. Here are the steps… [Read more…]
I don’t mind advertising, as long as it remains unobtrusive to the end user. As a matter of fact, I am using ads on this blog. However, just recently Viber implemented the new type of video ads implemented on the Windows and Macintosh desktop Viber platform. These ads aren’t your usual standalone picture ads, these are video ads that tend to auto-play on their own along with the sound, which is especially annoying when one forgets to turn off their PC speakers. Unfortunately, Viber does not provide application settings to disable or control ads. So, here is how you can easily disable these ads on your Windows or Mac machine.
Tor is a free software that prevents people from learning your location or browsing habits by letting you communicate anonymously on the Internet. This article outlines the process of installing TOR on a Centos/RHEL 7 server and shows how to communicate programmatically through TOR network by using cURL in PHP. [Read more…]
The following example shows a simple example of how to query Microsoft Cognitive Service Text Analytics API by posting JSON array through cURL in PHP and extract the entities.
Recently I came across a need to create a Linux shell script to run a PostgreSQL SQL query, export it in CSV Format and send it attached to an email. The following is a self-explanatory Linux shell script outlining the process.
The following short article explains how to use Netsh command line scripting utility to add or delete inbound and outbound Windows firewall rules. [Read more…]
The following short example shows a simple example of how to query Google Natural Language API by posting JSON as text using cURL in PHP. [Read more…]
The technology is dramatically amending our workplaces. In this post, I touch on ethical dilemmas associated with electronic surveillance of employees. [Read more…]
Recently I got asked about my views on Amazon AWS and their 100% redundant cloud computing SLA. Here is a couple of notes I took on the subject… [Read more…]