KDnuggets : News : 2001 : n05 : item19    (previous | next)

Q&A

From: Wray Buntine wray@dynaptics.com
Date: Wed, 21 Feb 2001 13:06:57 -0800
Subject: Suitability of MYSQL for Data Mining
In response to Alan Mclean's (?) question in
KDnuggets News 2001 : n04 : item38 about MySQL.

MySQL has its strength as a back-end for an internet
server, not in traditional database applications.
Its routinely twice as fast on many apps where it
is suited.

MySQL has previously had the following differences with
major commercial SQL systems:
	*  Transaction support not as good, i.e., don't
	   use it for financial transactions.
	*  Row locking not traditionally supported,
	   so difficult when multiple statements are
	   required for your transactions.
	*  SQL statements cannot be precompiled for efficient reuse.
	*  Not as well supported in distributed environment.
	*  Thread-safe version require special compiling.
	   But, now precompiled thread-safe versions around.
Some of these are being addressed in latest versions,
and some are not as critical in data mining.
	*   VA Linux, for instance, is using the latest distributed support
            in MySQL to deliver systems targeted at server-farms using
            reader/writer CPUs sharing disks.  Thus they are going for
            a Linux/MySQL solution to an area dominated by SUN.
	*   Transaction support and row locking not as critical in
	    data-mining if your using it as a load-rarely read-often
	    system.

Of course, given these caveats, MySQL is a great system.  Being a
successful open source project means that it is robust and
being developed/updated at a rapid pace.

Now if you're building a dynamic/embedded data mining system,
such as a personalization engine, then SQL might not be the
way to go anyway.  You might want to use an embedded btree system
for a 10 times speed up in performance by working directly in binary
and avoiding socket communication for database work.   However,
this assumes you have some high-quality programmers around who
know about building some of the architecture needed to make this
work in a distributed/multi-processor environment.  Systems
based on SQL are intrinsically easier to maintain if you're
relying on grunt/plug-compatible programmers!!

At Dynaptics, we're not a traditional data mining company
trawling large databases, but an embedded data mining company,
and MySQL is a great part of our solutions.

Wray Buntine
Dir. of Advanced Dev.
Dynaptics Inc.


KDnuggets : News : 2001 : n05 : item19    (previous | next)

Copyright © 2001 KDnuggets.   Subscribe to KDnuggets News!