ADAPTIVE RESONANCE THEORY

ADAPTIVE RESONANCE THEORY (ART)

Adaptive Resonance Theory (ART) is a theory developed by Stephen Grossberg and Gail Carpenter on aspects of how the brain processes information. It describes a number of neural network models which use supervised and unsupervised learning methods, and address problems such as pattern recognition and prediction.

The basic ART system is an unsupervised learning model. It typically consists of a comparison field and a recognition field composed of neurons, a vigilance parameter, and a reset module.

The vigilance parameter has considerable influence on the system: higher vigilance produces highly detailed memories (many, fine-grained categories), while lower vigilance results in more general memories (fewer, more-general categories).

The comparison field takes an input vector (a one-dimensional array of values) and transfers it to its best match in the recognition field. Its best match is the single neuron whose set of weights (weight vector) most closely matches the input vector.

Each recognition field neuron outputs a negative signal (proportional to that neuron’s quality of match to the input vector) to each of the other recognition field neurons and inhibits their output accordingly.

In this way the recognition field exhibits lateral inhibition, allowing each neuron in it to represent a category to which input vectors are classified. After the input vector is classified, the reset module compares the strength of the recognition match to the vigilance parameter.

If the vigilance threshold is met, training commences. Otherwise, if the match level does not meet the vigilance parameter, the firing recognition neuron is inhibited until a new input vector is applied; training commences only upon completion of a search procedure.

In the search procedure, recognition neurons are disabled one by one by the reset function until the vigilance parameter is satisfied by a recognition match.

If no committed recognition neuron’s match meets the vigilance threshold, then an uncommitted neuron is committed and adjusted towards matching the input vector.

ART ARCHITECTURE

ART consist of two layers of neurons labeled “comparison “and “recognition”. Gain 1, Gain 2, and reset module provide control information needed for training and classification

COMPARISON LAYER

The Comparison layer accept input vector(X) and initially passes it through unchanged to become the vector C.In later phase binary vector produced from recognition layer modifying c as described below

Each neuron in comparison layer receives three binary inputs such as (1) a component x, from input vector X,(2)feed back signal ,weighted sum of recognition layer outputs,(3)an input from the gain signal (gain 1).To output one, at least two of three neuron’s three inputs must be one; otherwise, its output is zero. This is called two third rules. Initially gain signal is set to one, providing one of the needed inputs, and all components of the output vector from recognition neuron are set to zero. So vector C is initially equal to X.

RECOGNITION LAYER

The recognition layer serves to classify the input vector. Each recognition layer neuron(R) has an associated weight vector B, only the neuron with a weight vector best matching the input vector “fires”, all others are inhibited.

In operation, each recognition layer neuron computes a dot product between its weights and the incoming vector C. the neuron that has weights most like vector C will have the largest output, there by winning the competition while inhibiting all other neurons in the layer.

GAIN 2

The output of Gain 2 is one if input vector X has any component that is one. More preciously, G2 is logical “or” of the components of X

GAIN 1

Like gain 1, the output of G is one if any component of binary input vector X is one; if any component of R is one, G1 forced to zero.

RESET

Reset module measures the similarity between input and recognition category. If they differ more than the vigilance parameter, reset signal is sent to disable the firing neuron in recognition layer.

In operation, reset module calculates similarity as the rate of number of ones in vector C to number of ones in vector X.If the ratio is below vigilance parameter reset signal is applied. Else resonance occurs.

WORKING EXAMPLE

ART CATEGORIES

I. ART 1: It is the simplest variety of ART networks, accepting only binary inputs.

II. ART 2: It is the simplest variety of ART networks, accepting analog and binary inputs.

III. ART 3: It is built on ART 2 by incorporating chemical mechanism. It robust mechanism for parallel search of learned pattern code.

IV. FUZZY ART: It is formed by incorporating fuzzy dynamics into the ART modules.

V. GART: It is formed by incorporating gauss theory into ART modules.

7. ART 1 ALGORITHM

It is an unsupervised learning technique used for classifying the input data. It can classify only binary inputs. ART-1 is the first version of ART-based networks proposed by Carpenter and Grossberg...It uses normal set theory operations to classify the inputs. The main advantage of ART 1 algorithm is it can solve stability plasticity dilemma. Also it overcomes the difficulties of Back propagation networks.

The network was intended for unsupervised clustering of binary data. It has two major subsystems: attentional subsystem and orienting subsystem. The attentional subsystem is a one layer neural network. It has D input neurons to learn D -dimensional data and C output neurons to map C maximum clusters. Initially all output neurons are 1uncommitted once an output neuron learned from a pattern, it becomes committed. The activation function is computed at all committed output neurons. The input and output is connected by both top-down and bottom-up weights. Baraldi & Parmiggiani have proved mathematically that the bottom-up and top-down attentional module is equivalent to an attentional system with only forward connections. Baraldi and Alpaydin generalize this result to all ART-1 based networks by stating: “the attentional module of all ART 1based systems is functionally equivalent to a feed-forward network featuring no top-down connections. The architecture of simplified ART is shown below.

Candidate meets the vigilance constraint or no more candidates are left. If none of the output

Nodes can encode the pattern; a new node is committed to the pattern.

ART 1 ALGORITHM

STEP1: Create initial prototype vector with first example vector.

PV=EV

STEP2:Take another example vector(EV)

STEP3:Check how close the EV is to current PV it is close if the value of function is true

||PV∩EV|| >||EV||

β +PV β+d

STEP4:If EV is not close to PV,then check with next PV.

STEP5:Else EV become next PV

STEP6:If EV matches PV ,then check with vigilance parameter ρ

That is

||PV∩EV|| <ρ

||EV||

STEP7: If it pass vigilance test, add to the cluster PV

STEP8: Update PV as

PV=PV∩EV

STEP9: Stop

FUZZY ADAPTIVE RESONANCE THEORY

Fuzzy ART is a supervised learning method used for data clustering purposes in data mining field. It is formed by incorporating fuzzy logic into ART modules. It was developed by Gail Carpenter and Grossberg in 1992.

Fuzzy Logic

It is a form of multi-valued logic derived from fuzzy set theory to deal with reasoning that is approximate rather than precise. In contrast with binary sets having binary logic, also known as crisp logic, the fuzzy logic variables may have a membership value of not only 0 or 1. Just as in fuzzy set theory with fuzzy logic(FL) the set membership values can range (inclusively) between 0 and 1, in fuzzy logic the degree of truth of a statement can range between 0 and 1 and is not constrained to the two truth values {true (1), false (0)} as in classic propositional logic.

FL is a problem-solving control system methodology that lends itself to implementation in systems ranging from simple, small, embedded micro-controllers to large, networked, multi-channel PC or workstation-based data acquisition and control systems. It can be implemented in hardware, software, or a combination of both. FL provides a simple way to arrive at a definite conclusion based upon vague, ambiguous, imprecise, noisy, or missing input information. FL's approach to control problems mimics how a person would make decisions, only much faster.

HOW FL DIFFERENT FROM CONVENTIONAL METHODS?

FL incorporates a simple, rule-based IF X AND Y THEN Z approach to a solving control problem rather than attempting to model a system mathematically. The FL model is empirically-based, relying on an operator's experience rather than their technical understanding of the system. For example, rather than dealing with temperature control in terms such as "SP =500F", "T <1000f",><220c",>

HOW DOES FL WORKS?

FL requires some numerical parameters in order to operate such as what is considered significant error and significant rate-of-change-of-error, but exact values of these numbers are usually not critical unless very responsive performance is required in which case empirical tuning would determine them. For example, a simple temperature control system could use a single temperature feedback sensor whose data is subtracted from the command signal to compute "error" and then time-differentiated to yield the error slope or rate-of-change-of-error, hereafter called "error-dot". Error might have units of degs F and a small error considered to be 2F while a large error is 5F. The "error-dot" might then have units of degs/min with a small error-dot being 5F/min and a large one being 15F/min. These values don't have to be symmetrical and can be "tweaked" once the system is operating in order to optimize performance. Generally, FL is so forgiving that the system will probably work the first time without any tweaking.

WHY USE FL?

FL offers several unique features that make it a particularly good choice for many control problems.

1) It is inherently robust since it does not require precise, noise-free inputs and can be programmed to fail safely if a feedback sensor quits or is destroyed. The output control is a smooth control function despite a wide range of input variations.

2) Since the FL controller processes user-defined rules governing the target control system, it can be modified and tweaked easily to improve or drastically alter system performance. New sensors can easily be incorporated into the system simply by generating appropriate governing rules.

3) FL is not limited to a few feedback inputs and one or two control outputs, nor is it necessary to measure or compute rate-of-change parameters in order for it to be implemented. Any sensor data that provides some indication of a system's actions and reactions is sufficient. This allows the sensors to be inexpensive and imprecise thus keeping the overall system cost and complexity low.

4) Because of the rule-based operation, any reasonable number of inputs can be processed (1-8 or more) and numerous outputs (1-4 or more) generated, although defining the rule base quickly becomes complex if too many inputs and outputs are chosen for a single implementation since rules defining their interrelations must also be defined. It would be better to break the control system into smaller chunks and use several smaller FL controllers distributed on the system, each with more limited responsibilities.

5) FL can control nonlinear systems that would be difficult or impossible to model mathematically. This opens doors for control systems that would normally be deemed unfeasible for automation.

LINGUISTIC VARIABLES

In 1973, Professor Lotfi Zadeh proposed the concept of linguistic or "fuzzy" variables. Think of them as linguistic objects or words, rather than numbers. The sensor input is a noun, e.g. "temperature", "displacement", "velocity", "flow", "pressure", etc. Since error is just the difference, it can be thought of the same way. The fuzzy variables themselves are adjectives that modify the variable (e.g. "large positive" error, "small positive" error, “zero" error, "small negative" error, and "large negative" error). As a minimum, one could simply have "positive", "zero", and "negative" variables for each of the parameters. Additional ranges such as "very large" and "very small" could also be added to extend the responsiveness to exceptional or very nonlinear conditions, but aren't necessary in a basic system.

HOW FL IS USED?

1) Define the control objectives and criteria: What am I trying to control? What do I have to do to control the system? What kind of response do I need? What are the possible (probable) system failure modes?

2) Determine the input and output relationships and choose a minimum number of variables for input to the FL engine (typically error and rate-of-change-of-error).

3) Using the rule-based structure of FL, break the control problem down into a series of IF X AND Y THEN Z rules that define the desired system output response for given system input conditions. The number and complexity of rules depends on the number of input parameters that are to be processed and the number fuzzy variables associated with each parameter. If possible, use at least one variable and its time derivative. Although it is possible to use a single, instantaneous error parameter without knowing its rate of change, this cripples the system's ability to minimize overshoot for a step inputs.

4) Create FL membership functions that define the meaning (values) of Input/output terms used in the rules.

5) Create the necessary pre- and post-processing FL routines if implementing in S/W, otherwise program the rules into the FL H/W engine.

6) Test the system, evaluate the results, tune the rules and membership functions, and retest until satisfactory results are obtained.

FUZZY ART ARCHITECTURE

The basic architecture of fuzzy ARTMAP is shown in Figure 2. It consists of a pair of fuzzy ART modules, ARTa and ARTb, connected by an associative learning network called a Map Field. The other component of this architecture is a controller that uses a minimum learning rule to conjointly minimize predictive error and maximize code compression or predictive generalization. It enables the system to operate in real time and determine the number of hidden units (or recognition categories) needed to meet the accuracy or matching standards. The `hidden units' in ARTa represent learned recognition categories.

In the training phase, ARTa and ARTb modules of the system are presented with a stream of input ap and desired output pairs bp respectively. The two modules classify the ap and bp vectors into categories and the map field makes the association between ARTa and ARTb categories. A mismatch between the actual bp and predicted bp causes a memory search in ARTa. A mechanism called match tracking then raises the ARTa vigilance by the minimum amount necessary to trigger a memory search. This can lead to a selection of a new ARTa category that is a better predictor of bp. between learning trials, the vigilance relaxes back to its baseline value. Match tracking therefore sacrifices only the minimum amount of generalization needed to correct a predictive error. Fast learning and match tracking enable fuzzy ARTMAP to learn to predict novel events while maximizing code compression and preserving code stability.

During training, the weights of the F2 layer of nodes get created and updated as a result of learning. Initially, at the start of training there is no node in the F2 layer. The first input vector will trigger the generation of the first category (F2 node), which represents the properties of input vector. Based on equations 6, each input vector is matched with the existing F2 categories. A new category is created only if no existing category can match the statistics of the input vector based on learning and other parameters. When there is a match, the category satisfying the match function will use the current input vector to refine its weight value in order to incorporate more general (spectral) characteristics for this category. The weight vector refining process can be regarded as broadening the categorization. Matching process will generate more categories and will ensure that the similarity in spectral space is transferred and represented in the category's weight vector.

Fuzzy ARTMAP may use several categories to represent one class in order to capture the spectral variance in inputs relating to the class. Fuzzy ARTMAP categories represent the inter-class and intra-class variability among classes

WORKING PROCEDURE

The description of ART networks given below outlines the essential features of adaptive resonance theory. A minimal ART1 module consists of a two-level network of interconnecting neurons: a comparison level (F1) and a recognition level (F2)). An input signal in the form of a binary vector (F0) presented to F1 is propagated forward to F2. F1 and F2 are connected with both feed forward connections (F1 to F2) and feedback (F2 to F1) connections. Long-term memory in encoded in both these feed forward and feedback connections. F2 nodes interact with each other by lateral inhibition. The result is a competitive winner-take-all response producing an F2 activity pattern in which only the node associated with a single category is significantly activated. The corresponding learned weight vector is then propagated backward to the F1 level where it is compared with the original input vector. If the two patterns are close according to a matching criterion determined by an ART `vigilance' parameter, `resonance' occurs and long-term memory is altered to incorporate the new observation. If the two patterns differ significantly, the ART module enters a search mode. During this phase, the network attempts to find a new category node in the F2 layer for the current input vector A. The present active node in F2 is disabled and a second category node is selected. The new weight vector is then propagated back to the F1 level and compared with the input vector as before. If the two match, then resonance proceeds. If not, the second F2 node is disabled and another attempt is made to find a good match. The process repeats until eventually a matching F2 node is found or a new category is established. The binary ART1 module described above is not an associative memory system. However, a minimal ART1 architecture embedded in a larger system can perform as an associative memory system. Fuzzy ARTMAP incorporates fuzzy logic in its ART1 modules and has fuzzy set theoretic operations instead of ART1's binary set theoretic operations.

ADVANTAGES

Ø It is a fast learning method

Ø It can solve stability plasticity dilemma

Ø It is supervised learning method

Ø It can process vague input also

DISADVANTAGES

Ø High complexity

Ø Cost of network is high