KDnuggets : News : 2001 : n11 : item8    (previous | next)

News


Subject: Interview: Usama Fayyad on data mining successes, failures, NASDAQ, and predictions for the future

9) Excluding your own work, what were the largest successes of data mining so far?
What were the biggest failures?

I think some of the early applications that IBM did between 1996 and 1999 have been very important to establish credibility for the field in business settings. Some of the application companies, such as SGI, NCR, and many others drove serious long-term advances in the field by productizing the technology. The most widespread impacts on the field will come from efforts to standardize the way data mining is used, invoked, persisted and shared. This includes some of the IBM work, some of the Microsoft work (OLE DB for Data Mining), some of the Oracle attempts, and groups like DMG and the PMML standard for predictive models.

The other big development is the standard statistical package providers beginning to offer data mining tools. This includes S+, SPSS, SAS, and others. They still are complicated tools, but it is an important signal to statisticians and practitioners that data mining is something that needs attention and is getting attention.

The field now has evolved beyond the need for basic technology, like trees, clustering, SVM's, and so forth. The strong need now is in figuring out how to do applications that work, that scale, and that are easy for people to understand as business solutions. Work in context of business solutions is likely to bring through the next breakthroughs. The science must continue as well, of course, to facilitate the business solutions. Science and business need to work hand in hand to push the field forward.

10) Currently, many dot-coms and data mining companies are having a hard time. Which companies will recover? What is your prediction for NASDAQ a year from now?

As every data miner knows, the prediction business is tricky. Only reasonable predictions come with attached uncertainty interval. If I were to attempt to predict the NASDAQ, My expected variance is fairly large. But since you asked, I'll take a quick attempt: I believe that the real historical growth rate should be between 10% and 12% per year. Technology companies are still commanding higher PE ratios than stable companies, but that is justified because technology will continue to grow. Given the flat regime we're in now, I expect well be about 10% higher next year, so my range on NASDAQ is 2100 to 2300 a year from now. The biggest factor that will impact it is what happens to oil prices. If they come down, technology will come back up big time. If they rise from here, we could be in trouble.

As for dot coms and data mining companies; the two categories of companies are very different from each other. Data mining companies are technology infrastructure companies or business solution companies. Dot coms are content, commerce, or services portals. Dot coms received irrational acceleration in valuation. Now the pendulum is swinging the other way. Will the Internet continue to be crucial? Absolutely. It is too fundamental a change to disappear. Businesses and consumers are more connected to each other today than any time in history. The Internet infrastructure will fundamentally change the world. It has not even begun to yet. But change never happens overnight.

Data mining companies never really went through a big bump in valuations. I think we saw a flurry of activity with over 300 vendors in the field. These vendors are selling the wrong tools to the wrong audience (sophisticated DM tools to business end-users). They need to transition to selling solutions. They are learning. But many will disappear in the process.

However, for anyone in the data mining business, there are some very encouraging powerful forces in technology today that indicate that KDD and data mining are going to continue to grow in importance. The new "natural laws" in our digital data world paint a very bright picture for data miners. We are all familiar with Moore's law that says processing capacity doubles every 18 months. Few people are aware of a more aggressive cousin to that law.

Data storage capacity doubles every 9 months. This law has been in operation for over 10 years now. The results of the two laws are all around us, people have way more data than they know what to do with. The "natural laws" lead to a prediction that I am willing to bet all my fortunes on: the gap between how much data can be generate, and our ability to process it will continue to grow dramatically. This means that the need for technologies to help reduce, understand, mine, and exploit this data will grow in importance. Taking a simple processing approach to dealing with this data will not help. We need the next generation tools and regimes. This is a HUGE opportunity for data mining. It is up to us to make sure we respond to the opportunity by delivering the tools of the future. This is nothing less than building the bulldozers, the cranes, and all the power tools of the digital data universe. There is a lot of useful structure in data out there.  Now it is time for data mining tools to help us discover it and exploit. The future is indeed very bright. However, no one knows what kinds of companies are likely to figure out the right formula to effectively benefit from it. I promise to continue trying to figure it out...

Brief Bio

Usama Fayyad is President & CEO of digiMine, Inc. He received his Ph.D. from The University of Michigan, Ann Arbor in 1991. He is an Editor-in-Chief of Data Mining and Knowledge Discovery, the primary technical journal in the field, and has chaired several past KDD conferences.

Prior to digiMine, he was at Microsoft where he founded and led Microsoft Research's Data Mining & Exploration (DMX) Group. His work with Microsoft product groups included the development of data mining prediction components that ship with Microsoft Site Server (Commerce Server 3.0 and 4.0) and developing scalable algorithms for mining large databases and architecting their fit with server products such as Microsoft SQL Server and OLAP Services. In addition to managing the DMX research group, he managed the core part of the development team that is building providers in the SQL Server product group. He was also a driving force behind establishing a new industry standard in data mining based on Microsoft's OLE DB API.

Prior to joining Microsoft, Usama was at the Jet Propulsion Laboratory (JPL), California Institute of Technology (1989-1995) where he founded and headed the Machine Learning Systems Group and developed data mining systems for the analysis of large scientific databases. For this work he received the most distinguished excellence awards from Caltech/JPL and a U.S. Government Medal from NASA. He remained affiliated with JPL as Distinguished Visiting Scientist after he moved to Microsoft.

Usama lives with his wife and their 3 children in Seattle area.


KDnuggets : News : 2001 : n11 : item8    (previous | next)

Copyright © 2001 KDnuggets.   Subscribe to KDnuggets News!