Subject: KDD Nuggets 95:16

	KDD Nuggets 95:16, e-mailed 95-07-07
Contents: 
	* Cen Li, new KDD related page 
		http://www.vuse.vanderbilt.edu/~biswas/ResearchPages/kdd.html
	* W. Sarle, How are neural networks related to statistical methods?
	* GPS, useful pointers about Montreal, 
		http://www.showboat.com/showboat/montreal.htm
	* R. Kohavi, MLC++ Utilities 1.2, 
		http://robotics.stanford.edu:/users/ronnyk/mlc.html.
	* M. Pazzani, Postdoc at UC Irvine
	* R. Tucker, Renascence Partners -- http://www.rpl.com/ 
	* S. Dixon, Data Mining related jobs at SmithKline Beecham

The KDD Nuggets is a moderated mailing list for news and information
relevant to Knowledge Discovery in Databases (KDD), also known as
Data Mining, Knowledge Extraction, etc.  Relevant items include
tool announcements and reviews, summaries of publications, information
requests, interesting ideas, clever opinions, etc.
Please include a descriptive subject line in your submission.

Nuggets frequency is approximately bi-weekly. 

 Back issues of Nuggets, a catalog of S*i*ftware (data mining tools), 
 references, FAQ, and other KDD-related information are available 
 at Knowledge Discovery Mine, URL http://info.gte.com/~kdd/  or 
 by anonymous ftp to ftp.gte.com, cd /pub/kdd, get README

E-mail add/delete requests to kdd-request@gte.com
E-mail contributions to kdd@gte.com
	-- Gregory Piatetsky-Shapiro (moderator)

********************* Official disclaimer ***********************************
* All opinions expressed herein are those of the writers (or the moderator) * 
* and not necessarily of their respective employers (or GTE Laboratories)   *
*****************************************************************************

~~~~~~~~~~~~ Quotable Quote  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"The Universe is full of magical things, patiently waiting 
  for our wits to grow sharper."
			 	-Eden Phillipotts
			(thanks to Susan Tafolla)
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Return-Path: <cenli@vuse.vanderbilt.edu>
Date: Tue, 27 Jun 95 11:26:50 CDT
From: cenli@vuse.vanderbilt.edu (Cen Li)
To: kdd%eureka@gte.com
Subject: new WWW page to be included in the kdd home page

Dr. Shapiro:

I am a graduate student working on KDD at Vanderbilt University 
under Prof. Gautam Biswas. We have been putting together a web page
in this area. We would like it to be included in your KDD page. 
The link is:
"http://www.vuse.vanderbilt.edu/~biswas/ResearchPages/kdd.html"

Thank you very much.

Sincerely,
Cen Li.


>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Here is a very good excerpt from Warren Sarle that I picked up on CLASS-L
list.  GPS.

Return-Path: <owner-class-l@CCVM.SUNYSB.EDU>
Date:         Tue, 27 Jun 1995 14:28:57 -0400
Reply-To: "Classification, clustering,              and phylogeny estimation" <CLASS-L@CCVM.SUNYSB.EDU>
Sender: "Classification, clustering,              and phylogeny estimation" <CLASS-L@CCVM.SUNYSB.EDU>
From: Warren Sarle <saswss@UNX.SAS.COM>
Subject:      Re: Neural Network Modeling Q's
X-To:         CLASS-L@ccvm.sunysb.edu
X-Cc:         dihost!fjurden@dirsrch.attmail.com
To: Multiple recipients of list CLASS-L <CLASS-L@CCVM.SUNYSB.EDU>
In-Reply-To:  <199506271322.AA04755@lamb.sas.com> from "Art Kendall" at Jun 27,
              95 09:16:44 am

> does anybody have any info/citations on neural network modeling and its
> relation to other stat techniques such as regression, structural equation
> modeling, etc.?
>
> thanks.
> Frank H. Jurden, Ph.D.
> Decision Insight, Inc.
> 2600 Grand
> Kansas City, MO USA
> dihost!fjurden@dirsrch.attmail.com

Below is an excerpt that I wrote for the comp.ai.neural-nets FAQ, a
discussion of neural nets and structural equation models, a compilation
of neural net and statistical jargon, and directions for obtaining more
information via ftp.
______________________________________________________________________

Q: How are neural networks related to statistical methods?

A: There is considerable overlap between the fields of neural
networks and statistics.

Statistics is concerned with data analysis. In neural network
terminology, statistical inference means learning to generalize from
noisy data. Some neural networks are not concerned with data analysis
(e.g., those intended to model biological systems) and therefore have
little to do with statistics. Some neural networks do not learn (e.g.,
Hopfield nets) and therefore have little to do with statistics. Some
neural networks can learn successfully only from noise-free data (e.g.,
ART or the perceptron rule) and therefore would not be considered
statistical methods. But most neural networks that can learn to
generalize effectively from noisy data are similar or identical to
statistical methods. For example:

 * Feedforward nets with no hidden layer (including functional-link
   neural nets and higher-order neural nets) are basically
   generalized linear models.

 * Feedforward nets with one hidden layer are closely related
   to projection pursuit regression.

 * Probabilistic neural nets are identical to kernel discriminant
   analysis.

 * General regression neural nets are identical to Nadaraya-Watson
   kernel regression.

 * Kohonen nets for adaptive vector quantization are very similar
   to k-means cluster analysis.

 * Hebbian learning is closely related to principal component
   analysis.

Some neural network areas that appear to have no close relatives in the
existing statistical literature are:

 * Kohonen's self-organizing maps.

 * Reinforcement learning (although this is treated in the
   operations research literature as Markov decision processes).

 * Stopped training (the purpose and effect of stopped training are
   similar to shrinkage estimation, but the method is quite different).

Feedforward nets are a subset of the class of nonlinear regression and
discrimination models. Statisticians have studied the properties of this
general class but had not considered the specific case of feedforward
neural nets before such networks were popularized in the neural network
field. Still, many results from the statistical theory of nonlinear
models apply directly to feedforward nets, and the methods that are
commonly used for fitting nonlinear models, such as various
Levenberg-Marquardt and conjugate gradient algorithms, can be used to
train feedforward nets.

While neural nets are often defined in terms of their algorithms or
implementations, statistical methods are usually defined in terms of
their results. The arithmetic mean, for example, can be computed by a
(very simple) backprop net, by applying the usual formula SUM(x_i)/n, or
by various other methods. What you get is still an arithmetic mean
regardless of how you compute it. So a statistician would consider
standard backprop, Quickprop, and Levenberg-Marquardt as different
algorithms for implementing the same statistical model such as a
feedforward net. On the other hand, different training criteria, such as
least squares and cross entropy, are viewed by statisticians as
fundamentally different estimation methods with different statistical
properties.

It is sometimes claimed that neural networks, unlike statistical models,
require no distributional assumptions. In fact, neural networks involve
exactly the same sort of distributional assumptions as statistical
models, but statisticians study the consequences and importance of these
assumptions while most neural networkers ignore them. For example,
least-squares training methods are widely used by statisticians and
neural networkers. Statisticians realize that least-squares training
involves implicit distributional assumptions in that least-squares
estimates have certain optimality properties for noise that is normally
distributed with equal variance for all training cases and that is
independent between different cases. These optimality properties are
consequences of the fact that least-squares estimation is maximum
likelihood under those conditions. Similarly, cross-entropy is maximum
likelihood for noise with a Bernoulli distribution. If you study the
distributional assumptions, then you can recognize and deal with
violations of the assumptions. For example, if you have normally
distributed noise but some training cases have greater noise variance
than others, then you may be able to use weighted least squares instead
of ordinary least squares to obtain more efficient estimates.

References:

   Balakrishnan, P.V., Cooper, M.C., Jacob, V.S., and Lewis, P.A. (1994)
   "A study of the classification capabilities of neural networks using
   unsupervised learning: A comparison with k-means clustering",
   Psychometrika, 59, 509-525.

   Chatfield, C. (1993), "Neural networks: Forecasting breakthrough or
   passing fad", International Journal of Forecasting, 9, 1-3.

   Cheng, B. and Titterington, D.M. (1994), "Neural Networks: A Review
   from a Statistical Perspective", Statistical Science, 9, 2-54.

   Geman, S., Bienenstock, E. and Doursat, R. (1992), "Neural Networks
   and the Bias/Variance Dilemma", Neural Computation, 4, 1-58.

   Kuan, C.-M. and White, H. (1994), "Artificial Neural Networks: An
   Econometric Perspective", Econometric Reviews, 13, 1-91.

   Kushner, H. & Clark, D. (1978), _Stochastic Approximation Methods for
   Constrained and Unconstrained Systems_, Springer-Verlag.

   Michie, D., Spiegelhalter, D.J. and Taylor, C.C. (1994), _Machine
   Learning, Neural and Statistical Classification_, Ellis Horwood.

   Ripley, B.D. (1993), "Statistical Aspects of Neural Networks", in
   O.E. Barndorff-Nielsen, J.L. Jensen and W.S. Kendall, eds.,
   _Networks and Chaos: Statistical and Probabilistic Aspects_,
   Chapman & Hall. ISBN 0 412 46530 2.

   Ripley, B.D. (1994), "Neural Networks and Related Methods for
   Classification," Journal of the Royal Statistical Society, Series B,
   56, 409-456.

   Sarle, W.S. (1994), "Neural Networks and Statistical
   Models," Proceedings of the Nineteenth Annual SAS Users
   Group International Conference, Cary, NC: SAS Institute,
   pp 1538-1550.

   White, H. (1989), "Learning in Artificial Neural Networks: A
   Statistical Perspective," Neural Computation, 1, 425-464.

   White, H. (1989), "Some Asymptotic Results for Learning in Single
   Hidden Layer Feedforward Network Models", J. of the American Statistical
   Assoc., 84, 1008-1013.

   White, H. (1992), _Artificial Neural Networks: Approximation and
   Learning Theory_, Blackwell.

______________________________________________________________________

> Is anyone familiar with both LISREL and neural networks who can explain (or
> cite a publication which explains) how neural network analysis differs from
> using LISREL to work with structural equations with latent variables?

Linear regression is a special case of a structural equation model
(LISREL) with no latent variables. Linear regression is also a special
case of a feedforward neural net with no hidden layer and a linear
activation function. Aside from that, there is no connection between
structural equation models and neural networks. Latent variables are
things that are assumed to exist in the real world but can't be measured
directly. Hidden units are computational conveniences--they are not
latent variables.

Note that in a network diagram for a neural network, the arrows go from
the inputs to the hidden units. In a path diagram for a latent variable
model, the arrows go in the opposite direction from the latent variables
to the manifest variables.

You can set up a neural net with linear hidden units that is equivalent
to certain principal component or maximum redundancy models. However,
principal components and maximum redundancy are not latent variable
models. If you use principal components to estimate common factors
(which are latent variables), you will get wrong answers. Similarly, if
you use a neural net to estimate a superficially similar latent variable
model, you will get wrong answers. This is because standard estimation
methods such as ordinary least squares (OLS) apply to various linear and
nonlinear models such as principal components and feedforward neural
nets, but OLS does _not_ produce consistent estimates for latent
variable models.

______________________________________________________________________

               Neural Network and Statistical Jargon
               =====================================

         Warren S. Sarle  saswss@unx.sas.com  May 12, 1995

The neural network (NN) and statistical literatures contain many of the
same concepts but usually with different terminology. Sometimes the same
term or acronym is used in both literatures but with different meanings.
Only in very rare cases is the same term used with the same meaning,
although some cross-fertilization is beginning to happen. Below is a
list of such corresponding terms or definitions.

Particularly loose correspondences are marked by a ~ between the two
columns. A < indicates that the term on the left is roughly a subset of
the term on the right, and a > indicates the reverse. Terminology in
both fields is often vague, so precise equivalences are not always
possible.  The list starts with some basic definitions.

There is disagreement in the NN literature on how to count layers. Some
people count inputs as a layer and some don't. I specify the number of
hidden layers instead. This is awkward but unambiguous.


Definition                     Statistical Jargon
==========                     ==================

generalizing from noisy data   Statistical inference
and assessment of the
accuracy thereof

the set of all cases one       Population
wants to be able to
generalize to

a function of the values in    Parameter
a population, such as the
mean or a globally optimal
synaptic weight

a function of the values in    Statistic
a sample, such as the mean
or a learned synaptic weight


Neural Network Jargon          Definition
=====================          ==========

Neuron, neurode, unit,         a simple linear or nonlinear computing
node, processing element       element that accepts one or more inputs,
                               computes a function thereof, and may
                               direct the result to one or more other
                               neurons

Neural networks                a class of flexible nonlinear regression
                               and discriminant models, data reduction
                               models, and nonlinear dynamical systems
                               consisting of an often large number of
                               neurons interconnected in often complex
                               ways and often organized into layers


Neural Network Jargon          Statistical Jargon
=====================          ==================

Statistical methods            Linear regression and discriminant
                               analysis, simulated annealing, random
                               search

Architecture                   Model

Training, Learning,            Estimation, Model fitting, Optimization
Adaptation

Classification                 Discriminant analysis

Mapping, Function              Regression
approximation

Supervised learning            Regression, Discriminant analysis

Unsupervised learning,         Principal components, Cluster analysis,
Self-organization              Data reduction

Competitive learning           Cluster analysis

Hebbian learning,              Principal components
Cottrell/Munro/Zipser
technique

Training set                   Sample, Construction sample

Test set, Validation set       Hold-out sample

Pattern, Vector, Case          Observation, Case

Reflectance pattern            an observation normalized to sum to 1

Binary(0/1),                   Binary, Dichotomous
Bivalent or Bipolar(-1/1)

Input                          Independent variables, Predictors,
                               Regressors, Explanatory variables,
                               Carriers

Output                         Predicted values

Training values                Dependent variables, Responses,
Target values                  Observed values

Training pair                  Observation containing both inputs
                               and target values

Shift register,                Lagged variable
(Tapped) (time) delay (line),
Input window

Errors                         Residuals

Noise                          Error term

Generalization                 Interpolation, Extrapolation,
                               Prediction

Error bars                     Confidence interval

Prediction                     Forecasting

Adaline                        Linear two-group discriminant analysis
(ADAptive LInear NEuron)       (not Fisher's but generic)

(No-hidden-layer) perceptron ~ Generalized linear model (GLIM)

Activation function,         > Inverse link function in GLIM
Signal function,
Transfer function

Softmax                        Multiple logistic function

Squashing function             bounded function with infinite domain

Semilinear function            differentiable nondecreasing function

Phi-machine                    Linear model

Linear 1-hidden-layer          Maximum redundancy analysis, Principal
perceptron                     components of instrumental variables

1-hidden-layer perceptron    ~ Projection pursuit regression

Weights,                     < (Regression) coefficients,
Synaptic weights               Parameter estimates

Bias                         ~ Intercept

the difference between the     Bias
expected value of a statistic
and the corresponding true
value (parameter)

Shortcuts, Jumpers,          ~ Main effects
Bypass connections,
direct linear feedthrough
(direct connections from
input to output)

Functional links               Interaction terms or transformations

Second-order network           Quadratic regression,
                               Response-surface model

Higher-order network           Polynomial regression, Linear model
                               with interaction terms

Instar, Outstar                iterative algorithms of doubtful
                               convergence for approximating an
                               arithmetic mean or centroid

Delta rule, adaline rule,      iterative algorithm of doubtful
Widrow-Hoff rule,              convergence for training a linear
LMS (Least Mean Squares) rule  perceptron by least squares, similar
                               to stochastic approximation

training by minimizing the     LMS (Least Median of Squares)
median of the squared errors

Generalized delta rule         iterative algorithm of doubtful
                               convergence for training a nonlinear
                               perceptron by least squares, similar
                               to stochastic approximation

Backpropagation                Computation of derivatives for a
                               multilayer perceptron and various
                               algorithms such as the generalized
                               delta rule based thereon

Weight decay, Regularization > Shrinkage estimation, Ridge regression

Jitter                         random noise added to the inputs to
                               shrink the estimates

Growing, Pruning, Brain        Subset selection, Model selection,
damage, Self-structuring,      Pre-test estimation
Ontogeny

Optimal brain surgeon          Wald test

LMS (Least mean squares)       OLS (Ordinary least squares)
                               (see also "LMS rule" above)

Relative entropy, Cross        Kullback-Leibler divergence
entropy

Evidence framework             Empirical Bayes estimation

OLS (Orthogonal least squares) Forward stepwise regression

Probabilistic neural network   Kernel discriminant analysis

General regression neural      Kernel regression
network

Topologically distributed    < (Generalized) Additive model
encoding

Adaptive vector quantization   iterative algorithms of doubtful
                               convergence for K-means cluster analysis

Adaptive Resonance Theory 2a ~ Hartigan's leader algorithm

Learning vector quantization   a form of piecewise linear discriminant
                               analysis using a preliminary cluster
                               analysis

Counterpropagation             Regressogram based on k-means clusters

Encoding, Autoassociation      Dimensionality reduction
                               (Independent and dependent variables
                               are the same)

Heteroassociation              Regression, Discriminant analysis
                               (Independent and dependent variables
                               are different)

Epoch                          Iteration

Continuous training,           Iteratively updating estimates one
Incremental training,          observation at a time via difference
On-line training,              equations, as in stochastic approximation
Instantaneous training

Batch training,                Iteratively updating estimates after
Off-line training              each complete pass over the data as in
                               most nonlinear regression algorithms

______________________________________________________________________

Further information on neural networks is available by anonymous ftp
from ftp.sas.com (Internet gateway IP 192.35.83.8) in the directory
/pub/sugi19/neural :

 README         This document.

 neural1.ps     Sarle, W.S. (1994), "Neural Networks and Statistical
                Models," Proceedings of the Nineteenth Annual SAS Users
                Group International Conference, Cary, NC: SAS Institute,
                pp 1538-1550. (Postscript file)

 neural2.ps     Sarle, W.S. (1994), "Neural Network Implementation in
                SAS Software," Proceedings of the Nineteenth Annual SAS
                Users Group International Conference, Cary, NC: SAS
                Institute, pp 1551-1573. (Slightly revised version,
                postscript file)

 plots.ps       Plots from the 2nd paper in high-resolution graphics.
                (Postscript file)

 macros.sas     Macros from the 2nd paper.

 example.sas    Examples using the macros with the XOR and sine data.
 example.bls    Output from example.sas.

 example2.sas   Examples using the macros with the motorcycle data.
 example2.bls   Output from example2.sas.

 tnn2.sas       The TNN system of macros for feedforward neural nets,
                alpha release, version 2.
 tnn2.doc       Introductory documentation for TNN.
 tnn2.ref       Reference guide to TNN macros and arguments
 tnn2ex.sas     Examples using TNN with the XOR, iris, and sine data.
 tnn2ex.bls     Output from tnn2ex.sas.
 tnn2exm.sas    Examples using TNN with the motorcycle data.
 tnn2exm.bls    Output from tnn1ex.sas.

 netiml.sas     The NETIML system of IML modules and macros for
                multilayer perceptrons.
 netiml.ps      Documentation for netiml.sas.
 netimlex.sas   Examples using netiml.sas
 netimlex.bls   Output from netimlex.sas.

 paint.sas      Macro for setting colors and symbols in SAS/INSIGHT.

 jargon         Translations of neural network and statistical jargon.

 kangaroos      Nontechnical explanation of training methods and
                nonlinear optimization (plain ascii version of
                material from neural2.ps, plus related posts from
                the comp.ai.neural-nets newsgroup on Usenet).

Please note that postscript files (those with a .ps extension) require
a postscript printer or viewer in order for you to read them.

--

Warren S. Sarle       SAS Institute Inc.   The opinions expressed here
saswss@unx.sas.com    SAS Campus Drive     are mine and not necessarily
(919) 677-8000        Cary, NC 27513, USA  those of SAS Institute.


>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Return-Path: <gps0@ns.gte.com>
Date: Tue, 27 Jun 1995 16:36:26 -0400
From: gps0@eureka (Gregory Piatetsky-Shapiro)
Subject: Re: useful pointers about Montreal

http://www.showboat.com/showboat/montreal.htm -- montreal home page 

http://www.cam.org/~delisle/Montreal.html Montreal at night

Montreal weather http://www.droit.umontreal.ca/cgi-bin/weather

and, of course, 
http://www-aig.jpl.nasa.gov/kdd95 -- KDD-95 home page

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From ML-LIST

From: Ronny Kohavi <ronnyk@cs.stanford.edu>
Date: Tue, 20 Jun 1995 20:01:47 -0700
Subject: MLC++ Utilities V1.2: Machine Learning Library in C++


			   MLC++ Utilities 1.2

MLC++ is a Machine Learning library of C++ classes being developed at
Stanford.  The utilities are compiled versions of programs for Sun and
can be used without access to a C++ compiler.  The utilities (and
sources) are available freely.

More information about the library can be obtained at URL
http://robotics.stanford.edu:/users/ronnyk/mlc.html.

Version 1.2, to be released 30 June 1995, includes the following new
additions to 1.1:

*. Discretization (binning, Holte, Entropy) [ML-95].
*. Better feature subset selection using the wrapper approach.
*. Automatic tuning of C4.5 [paper to appear in ML-95].
*. Combining classifiers (bagging/ensemble).
*. Utility to display trees generated by C4.5 using dot.
*. Interface to Aha IB series.
*. Entropy-based decision graphs [EODG, IJCAI-95].

   Ronny Kohavi (ronnyk@CS.Stanford.EDU, http://robotics.stanford.edu/~ronnyk)

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Return-Path: <gps0@ns.gte.com>
To: kdd@gte.com, comp.ai@super-pan.ICS.UCI.EDU
Subject: Postdoc at UC Irvine
Date: Thu, 29 Jun 1995 08:53:08 -0700
From: Michael Pazzani <pazzani@super-pan.ICS.UCI.EDU>
Content-Type: text
Content-Length: 1166


I would like to hire a PostDoctoral researcher to do research on
Machine learning algorithms and applications of machine learning to
biomedical problems and intelligent agents.  This is a 3 year position
starting in the Fall of 1995. To apply, send a resume to
pazzani@ics.uci.edu.  Application screening will begin immediately
upon receipt.  Maximum consideration will be given to applications
received by July 1, 1995.

UC Irvine is located in Southern California, three miles from the
Pacific Ocean adjacent to Newport Beach, and approximately forty miles
south of Los Angeles.  The campus is situated in the heart of a
national center of high-technology enterprise.  Both the campus and
the enterprise area are growing rapidly and offer exciting
professional and cultural opportunities.

The University of California is an Affirmative Action/Equal
Opportunity Employer, committed to excellence through diversity.

Michael Pazzani
Associate Professor
Department of Information and Computer Science
University of California
Irvine, CA 92717-3425
phone (714) 824-5888
fax   (714) 824-4056
e-mail pazzani@ics.uci.edu
http://www.ics.uci.edu/dir/faculty/AI/pazzani

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Return-Path: <rtucker@pipeline.com>
From: Roth Tucker <rtucker@pipeline.com>
Date: Fri, 30 Jun 1995 11:14:43 -0400
To: kdd%eureka@gte.com (KDD Nuggets Moderator)
Subject: Renascence Partners
 
Renascence Partners Limited is a team of business consulting professionals
who bring the practical science of decision making to bear on those
strategic, tactical and procedural issues that are under-adressed using the
simplistic tools of yesterday. 
 
Like many decision science firms, we believe that data alone in not enough.
 Neither is good old seat-of-the-pants intuition.  A seamless melding of
the two requires a system that can agressively seek data and combine it
with an accurate representation of relevent human knowledge to reach an
inclusive, complete solution.  
 
Our tools include proprietary software engines using data mining, fuzzy
sets, neural networks, cognitive process capture and knowledge
representaion.  
 
Recent examples of work include: 
 
Building a model to forecast physician prescribing behavior, which takes
seasonal illness and physician attitudes into account 
 
Creating a system to select future customer "parters" which relies on both
hard data (such as financial figures) and more fuzzy measures (such as
"clarity of management vision") 
  
Above all, we are dedicated to bringing the most powerful tools to bear on
the tough probelms faced by our clients, but doing so in a non-threatening,
inclusive process.. 
 
Roth Tucker 
Managing Director, Systems and Analytics
Our address is http://www.rpl.com/ 
 
Please address inquiries to: Rtucker@rpl.com 
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Return-Path: <gps0@ns.gte.com>
To: kdd@gte.com
Subject: Job available in database mining
Reply-To: dixon%phmms0.mms@sb.com
Date: Fri, 30 Jun 1995 15:55:14 -0400
From: Scott Dixon <dixonjs@phu929.mms.sb.com>
Content-Type: text
Content-Length: 2879

We have several positions open.  Note in particular the jobs in database
mining and in protein structure (which could include pattern
recognition/machine learning approaches in biopolymer sequence databases):


OPPORTUNITIES IN COMPUTATIONAL CHEMISTRY/MOLECULAR MODELING

     
SmithKline Beecham, a worldwide leader in pharmaceutical research
has a number of openings in our Physical & Structural Chemistry
Department.

Database Mining

The appropriate individual will have a Ph.D. in chemistry or
biochemistry or computer science (or related field) with
extensive experience in pattern recognition, machine learning or
chemometrics.  Excellent communication and computer skills,
including the ability to develop new algorithms, are required.
Experience in chemical or biological database analysis is
desirable.  Job Code #H00L.


Protein Modeling

This individual will be responsible for the development and
application of methods to assign structure and function to genome
sequence information.  Qualifications required include a Ph.D. in
chemistry, biophysics or bioinformatics with extensive experience
in biopolymer sequence analysis and protein modeling.  Other
requirements include excellent computer skills and the ability to
develop computer programs and new algorithms.  A knowledge of DNA
sequencing methods and molecular biology is desirable.  Refer to
Job Code #H0118.


Scientific Programmer

As a member of an established group, the selected candidate will
assist in the development of state-of-the-art software.
Qualifications include BS/MS in computer science or electrical
engineering (with experience or course work in chemistry) or
BS/MS in chemistry with demonstrated abilities in scientific
computer programming.  Excellent programming skills and knowledge
of UNIX, C and FORTRAN are needed.  Experience with Silicon
Graphics workstations and computational chemistry software are
desirable.  Refer to Job Code #H0117.


Combinatorial Chemistry/ Molecular Diversity

The selected individual will join with other team members to
develop and apply methods for the design of combinatorial
chemical libraries and the analysis of diversity in chemical
databases.  The necessary qualifications include a Ph.D.  in
chemistry or a related field and extensive experience in
computational chemistry.  Excellent computer skills and
communication skills are also necessary.  Experience in pattern
recognition or chemical diversity methods is desirable.  Job Code
#H00M.


Located in our state-of-the-art research facility in suburban
Philadelphia, SmithKline Beecham offers an excellent
compensation/benefits/relocation package.  Interested candidates
should send resume with salary requirements indicating desired
Job Code, to: 
SmithKline Beecham Pharmaceuticals, Job Code ____
P.O. Box 2645 
Bala Cynwyd, PA 19004.  
We are an Equal Opportunity Employer, M/F/D/V.

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~