Recently I came across a situation where I needed to perform the MySQL search in such way, that it would account for typos in user search queries. For example, if the database of words in MySQL contained only the word “assistance” and the user typed the misspelled word “asistence”, I had to be able to return the correct word “assistance” as a closest possible suggestion from the database. It seemed like a trivial task at first, but it took me quite a while to figure the SQL query that had a good balance between performance and overall quality of results. [Read more…]
Recently, I came across a post from 2001 which allowed downloading the entire collection of UTZOO NetNews Archive of earliest USENET posts. These were essentially the earliest available discussions posted to the Internet by folks working for various Universities connected to the Internet. Millions of posts created between Feb 1981 and June of 1991.
Until 2001, these early Usenet discussions were considered being lost, but miraculously Henry Spencer from the University of Toronto, Department of Zoology was backing it up onto magnetic tapes and kept them stored for all these years (apparently at the great cost).
H. Spencer had altogether 141 of these magnetic tapes, but there were of no use, so eventually, him and a couple of motivated people such as David Wiseman (who dragged 141 tapes back and forth in his a pickup truck), Lance Bailey, Bruce Jones, Bob Webber, Brewster Kahle, and Sue Thielen; embarked on a process of converting all of these tapes into regular format, accessible to everyone.
And that’s the copy I downloaded. What a treasure, right?
Well, not so fast, once I unzipped the data, I realized that the TGZ format contains literally millions of small text files (each post in its own file). While it was certainly nice to have, it wasn’t something that I or anyone else could read. Certainly not in a forum like discussion format, it wasn’t obvious which post is the one that starts the discussion or which ones are the replies to the thread. And forget about searching through these files, that was utterly not possible. Just to put things into perspective, it took me over 5 hours to just unzip the archives.
That said, it didn’t take long for me to decide to develop a Java-based converter that would attempt to convert the entire collection from millions of flat files into a fully searchable MySQL database. The following post talks about the process and also includes the Java code of the solution released as open source.
Recently I came across a need to create a Linux shell script to run a PostgreSQL SQL query, export it in CSV Format and send it attached to an email. The following is a self-explanatory Linux shell script outlining the process.
The following post outlines the process of converting Aspell language dictionary for any language into Mysql or MariaDB database. [Read more…]
It happens from time to time that we have duplicate queries in our database Unique index type tables that are caused by difficult to detect carriage returns. The following method cleans up the database by first removing the hard returns, and then by removing the duplicates. [Read more…]
The following article provides a high-level overview of NoSQL databases and the various associated data store types related to these kinds of databases. A particular section of the article is dedicated to a brief summary of the Document Oriented NoSQL databases. I provide example data that illustrate how Document NoSQL database store the data and also outline the most significant differences between the relational type of SQL database and document-oriented NoSQL. [Read more…]
Recently I came across a statement that said: “MongoDB (btw. that’s MongoDB) uses the BSON format which extends the JSON model to provide additional data types” and I think this topic deserves a bit of clarification. [Read more…]
This is just a short look at the popularity of MongoDB, Redis and Apache Cassandra. [Read more…]
This post is a placeholder for a future venture into commercial benefits of Shared-Nothing (SN) distributed database architecture. [Read more…]
In this article, we’ll analyze the situations in which it is more appropriate to implement a distributed database instead of a centralized (local) database. But before we do so, let’s see how the two of the database concepts differ. [Read more…]
This post outlines how to configure MariaDB or MySQL on Centos 7 in order to store a specific database outside of the typical /var/lib/mysql/DBNAME directory. Or in other words, how you can have each of your databases residing on a different drive (or a different folder on the same drive), while running only a single instance of MariaDB or MySQL. [Read more…]
This article describes how to fix MySQL connection error: (“Warning: PDO::__construct(): The server requested authentication method unknown to the client [mysql_old_password]”), experienced during MySQL or MariaDB authentication (typically during establishing a database connection). [Read more…]
I get this asked quite often, so here is a simple example of using MySQL command for replacing one string with another in multiple rows: [Read more…]
This article illustrates how to create MySQL database containing entire Wikipedia out of downloaded Wikipedia XML dump (enwiki-latest-pages-articles-multistream.xml)