Converting UTZOO Usenet archive from magnetic tapes to MySQL database using Java

Recently, I came across a post from 2001 which allowed downloading the entire collection of UTZOO NetNews Archive of earliest USENET posts. These were essentially the earliest available discussions posted to the Internet by folks working for various Universities connected to the Internet. Millions of posts created between Feb 1981 and June of 1991.

Until 2001, these early Usenet discussions were considered being lost, but miraculously Henry Spencer from the University of Toronto, Department of Zoology was backing it up onto magnetic tapes and kept them stored for all these years (apparently at the great cost).

H. Spencer had altogether 141 of these magnetic tapes, but there were of no use, so eventually, him and a couple of motivated people such as David Wiseman (who dragged 141 tapes back and forth in his a pickup truck), Lance Bailey, Bruce Jones, Bob Webber, Brewster Kahle, and Sue Thielen; embarked on a process of converting all of these tapes into regular format, accessible to everyone.

And that’s the copy I downloaded. What a treasure, right?

Well, not so fast, once I unzipped the data, I realized that the TGZ format contains literally millions of small text files (each post in its own file). While it was certainly nice to have, it wasn’t something that I or anyone else could read. Certainly not in a forum like discussion format, it wasn’t obvious which post is the one that starts the discussion or which ones are the replies to the thread. And forget about searching through these files, that was utterly not possible. Just to put things into perspective, it took me over 5 hours to just unzip the archives.

That said, it didn’t take long for me to decide to develop a Java-based converter that would attempt to convert the entire collection from millions of flat files into a fully searchable MySQL database. The following post talks about the process and also includes the Java code of the solution released as open source.

[Read more…]

How create a custom browser URI scheme and C# protocol handler client that supports opening and editing of remotely hosted documents through WebDAV

Recently I came across a situation where I needed to create a one-click web-based solution, that allows clients to open and edit PhotoShop, AutoCAD, WordPerfect, PDF, LibreOffice and other files hosted remotely from my WebDAV enabled Apache web server by using their own desktop tools. Essentially I needed a solution for opening and editing remotely hosted documents (located on my web server) through a simple HREF link on a web page. This link, when clicked, would automatically open the PSD file hosted on my server, but do so in the PhotoShop installed on the client’s Windows computer, allowing them to seamlessly open and edit such files, without any need to download and upload the files back and forth to my server.  Initially, it seemed like an impossible, task, but then I realized that I should be able to create my very own custom browser URI scheme and some kind of a Windows-based client that handles opening of remote files in local installation Photoshop. After a bit of playing around with it, I got it working and created a solution capable of opening for remote editing not only PhotoShop files, but virtually any other remotely hosted file types in their associated applications, meaning in the software installed on the client’s desktop computer .  The following article is how I went about it. [Read more…]

Open Source Bananagrams Solver in JavaScript, PHP & MySQL

I had some time over the Christmas holidays and decided to find a programmatic solution to a game of Bananagrams… yeah, why not :)… So, I’ve built a variety of MySQL dictionaries from Aspell, HTML and CSS were used to style the page, then coded the backend in PHP and placed a solver logic into JavaScript web worker. Here is a short outline of entire process along with a web site and YouTube demo. [Read more…]

Java for Beginners 8 – Creating a Multi-User Application

With the recent fast adoption of Internet and cloud environments, the focus of software development shifted from building single user type applications, into a realm of constructing software capable of handling numerous local or remote users. The core benefit of multi-user applications lies in their capacity to tolerate synchronized access from simultaneous users. Other advantages often include the ability to support different types of desktop and mobile devices using a variety of operating systems. [Read more…]

Java for Beginners 7 – Abstract Classes and Interfaces

The Java recognizes the limited nature of abstract classes and interfaces and their inability to become an object or a traditional class. Nevertheless, it is the absence of the qualities above and the ability to use abstract classes in various objects that proves to be incredibly useful to developers. In this article, I briefly outline the possible application of abstract classes and interfaces as well as provide examples to highlight the contrast between them.

[Read more…]

Java for Beginners 4 – Working with Java Loop Statements

Java recognizes three types of loops, namely ‘for,’ ‘while’ and ‘do-while’ loops that all accomplish the similar goals of iterating the block of code until a certain condition is met. The following article explores the differences between each of the loop variants and demonstrates the situations when choosing one type of loop over the other can be a benefit. There are many examples of using all three types of loops, but I’ve selected the general form and the infinite loop to illustrate the primary differences.

[Read more…]

Polyomino Combinatorics: How to count distinct n- letter long array permutations

Let’s say we have 3 arrays of following tetromino letters: J, L, O. Tetromino J is a 4 member long array (J[0],J[1],J[2],J[3]), signifying the number of positions J block can take. L tetromino is also a 4 member long array and O is a square tetromino that has only one position (1 member long array). How do we calculate the total number of all unique n-letter long permutations, without programmatically permuting through them? [Read more…]