Home Contact

about us
technology
white paper
FAQ
services
projects
tools
download
 

Brief description of KET technology

Knowledge Extraction Technology (KET) is a set of methods and tools for finding dependence patterns in data (knowledge discovery in data). It was created by Dr. N. N. Liachenko (N. N. Lyashenko) in the U.S.S.R., in the early 1980’s and considerably upgraded afterwards in the U.S.A.. The efficiency of KET was demonstrated on large complex databases. In many projects, the databases were contaminated, contained a mixture of quantitative and qualitative fields, considerable amount of missing values and no theoretical knowledge of the underlying relations between variables was available. Mostly, these large and difficult tasks motivated the creation of KET.

Methodology

KET is a methodology and a tool kit that can read a database of a client and find dependence between parts of data. For example, it can read patient histories and known outcomes (consequences) and create a diagnostic rule, or can predict future cost  of care (usual concern of insurance people), it can derive a logic of treatment for a certain type of disease from a big number of example cases and create “the best practice chart” (or a software tool for decision support). It can also derive prevention rules for security systems or industrial production lines. derive optimal policies for insurance business, etc. All this under assumption that a sufficient set of examples is available.

 Six principal questions motivated creation of KET:

bullet

How to avoid assumptions on underlying relationship between variables in a database?

bullet

How to identify an adequate model automatically?

bullet

How to distinguish between an influence of data quality and model selection on the results of analysis?

bullet

How to combine quantitative, qualitative and even “anecdotal” data into one consistent analysis procedure?

bullet

How to incorporate an expert knowledge and general semantic hierarchies in the analytical process along with structured data?

bullet

How to provide an automatic choice of known methods, when they are  applicable, and to build a procedure that can apply them in conjunction with "internal" KET utilities?

These questions were unavoidable when either conventional statistical methods or fashionable AI approaches (such as Artificial Neural Nets, Genetic Algorithms or Vector Support Machines) are applied. The major flaw of well known knowledge discovery (KD) methods is the necessity to make strong assumptions about the knowledge to be discovered. Although it is not possible to create a KD method with no inductive bias, it is possible to build a consistent strategy that radically minimizes assumptions. Such a strategy is labeled here as the “Model-FreeÔ” approach. Of course, it is not an agenda to avoid models, it is a tool that builds models that are not assumed, but empirically motivated.

Important Features

Below we summarize some features of KET, witch are especially useful in analysis of big and complicated data sets.

bullet

KET is a “Model-FreeÔ  ™ technology that creates an adequate model not requiring any prior knowledge of relations or predetermined class of models and is not restricted by any type of analytical structure. Particularly, KET consistently separates steps of analysis focused on scale-invariant and scale-dependent features of data.

bullet

KET employs purely semantic approach to the input of analysis. We did not say "data sources", because KET is capable to utilize structured data, semantic nets, conceptual frame systems, knowledge bases and results of knowledge acquisition from experts including free text memos in restricted languages.

bullet

KET will perform with any combination of quantitative and qualitative scales.

bullet

KET will identify essential variables before constructing a model.

bullet

KET will evaluate potential accuracy of a future data descriptor (predictor, classification rule, etc.), before it is constructed.

bullet

KET will deliver the informationally optimal data descriptor under given complexity constraints. In this sense, we can talk about informationally exhaustive approach.

bullet

KET possess meta-decision capability, i.e., it can be used as a tool that selects KET-specific or known tools.

bulletMany KET procedures are based on the search for invariants in data and on the use of these invariants. Particularly, KET systematically distinguishes between models (in some invariant form) and their numerous equivalent representations. For example, KET can quickly generate a pattern and after that transform it into an equivalent neural net already trained instead of using traditional ANN training algorithms.
bullet

The KET internal Self-Referencing Engine is an important part of the tool kit. The Engine builds internal self-referencing structures, self-evaluates and continuously self-improves. The technology can optionally create “Self-Modifying Intelligent Agents”, or kernels of expert systems.

For details and theoretical basis of these features, see "White Paper" page on this site.

Comparison

When comparing KET to other AI technologies, we have to take into account two aspects. On one hand, KET tools incorporate a unique combination of features that distinguishes them from other tools (see the previous section). On the other hand, it is not necessary to consider KET as a rival to other AI methodologies and software systems.

KET can be used as a meta-tool to facilitate the use of other technologies, to enhance them and to cover issues that were not sufficiently addressed by other tools.

From this "cooperative" point of view, KET addresses the following issues.

bullet

We have about 400 software tools for data analysis on the U.S. market. The right choice itself is becoming an issue. It would be very attractive to have a tool that selects tools on some "early detection" basis (i.e., not after trying each of them). KET makes a step in this direction.

bullet

There is a general confusion in KDD practice and literature between the concepts of "pattern of dependence", "representation of a pattern" and "pattern finders". For example, it is well-known that tools creating reasonably compact artificial neural nets are very slow to train. On the other hand, it is also known that Boolean nets can be converted into equivalent Boolean logic formulas, decision trees, production rules, finite automata, neural nets and other forms and vice versa. Training process considerably depends on this conversion. Therefore, conceptually it is more productive to view neural nets just as a form of representation of a pattern, not as a method of finding them. Decision to use neural nets under this paradigm does not imply that one of known net training processes should be used for finding patterns. KET consistently distinguishes between deployment form of a pattern and efficient algorithms that build the pattern.

 

Meta-capabilities of KET and representational part of its software along with invariant approaches in model search allowed to achieve progress in addressing the issues above. As a result, KET is capable to employ direct approaches in model identification and to represent the descriptor in a form that meets user criteria (other than accuracy and complexity).

 

For comparative details "White Paper" page on this site. Reviewing recently published research results, we conclude that many issues that KET considers important are currently getting attention from the AI community. What distinguishes KET from this point of view is its integrated nature, which allows to approach the problem in a unified conceptual framework, while the contemporary literature with all interesting achievements creates impression of a very fragmented picture with a set of disparate topics and groups of methods. In KET the unified framework enhances performance, because links between many potential approaches become visible and, therefore, can be exploited to the advantage of analysis.

Implementation

The software implementation of KET consists of 800+ modules. Each module is designed to support a particular step in KET processing. Because of their functional autonomy, modules can be used not only in applications built on KET ideology, but also as add-ons to enhance many other applications. (See more about software tools in "Tools" section of this site.)

Applications

As a result of these differences, the tool kit was often used for projects that somebody tried to do and failed. Therefore, the role of KET in many cases was that of the trouble shooter in data analysis.

Since 1980’s Dr. Liachenko tested his technology on various applications in the U.S.S.R., U.S.A. and Canada. The KET Tool Kit was used to discover and resolve problems in various applications (for World Health Organization, EPA, General Electric, etc.) and successfully operate on various platforms.

KET Data Mining features provide efficient solutions for prediction, classification, selection of essential variables, creation of simulation models, design of control and early-warning systems.

Based on these features, KET identifies adequate (and sometimes unexpected) models and automatically represents them in a form convenient for the user (e.g., as an analytical structure, C++ code, an executable for a given platform, a neural net, etc.).

Links

For methodological details see "White Paper".

For services provided by KET, LLC review "Services".

For examples of some projects accomplished in the past see “Projects”.

For information about software implementations and tools see "Tools".


   
  New feature !!

A new KET module was introduced recently to support interaction with "Mathematica 8" system.

 

  News !!

KET, LLC joined BioMed Content Group, Inc. in initiative of using AI agents to facilitate work of physicians and educators. 

Copyright 2002-2006, Knowledge Extraction Tools, LLC. All rights reserved