Brief description of KET technology
Knowledge Extraction Technology
(KET) is a set of methods and tools for finding dependence patterns in
data (knowledge discovery in data). It was created by Dr. N. N. Liachenko
(N. N. Lyashenko) in the U.S.S.R., in the early 1980’s and considerably
upgraded afterwards in the U.S.A.. The efficiency of KET was
demonstrated on large complex databases. In many projects, the databases
were contaminated, contained a mixture of quantitative and qualitative
fields, considerable amount of missing values and no theoretical
knowledge of the underlying relations between variables was available.
Mostly, these large and difficult tasks motivated the creation of KET.
KET is a methodology
and a tool kit that can read a database of a client and find dependence
between parts of data. For example, it can read patient histories and
known outcomes (consequences) and create a diagnostic rule, or can
predict future cost of care (usual concern of insurance people), it can
derive a logic of treatment for a certain type of disease from a big
number of example cases and create “the best practice chart” (or a
software tool for decision support). It can also derive prevention
rules for security systems or industrial production lines. derive
optimal policies for insurance business, etc. All this under assumption that a
sufficient set of examples is available.
questions motivated creation of KET:
How to avoid assumptions
on underlying relationship between
variables in a database?
How to identify an adequate
How to distinguish between an
influence of data quality and model selection on the results of
How to combine quantitative,
qualitative and even “anecdotal” data into one consistent analysis
How to incorporate an expert
knowledge and general semantic hierarchies in the analytical process
along with structured data?
How to provide an automatic choice of
known methods, when they are applicable, and to build a
procedure that can apply them in conjunction with "internal" KET
These questions were unavoidable when either
conventional statistical methods or fashionable AI approaches (such as
Artificial Neural Nets, Genetic Algorithms or Vector Support Machines) are applied. The major
flaw of well known knowledge discovery (KD) methods is the necessity to
assumptions about the knowledge to be discovered. Although it is not possible
to create a KD method with no inductive bias, it is possible to
build a consistent strategy that radically minimizes assumptions.
Such a strategy is labeled here as the “Model-FreeÔ” approach. Of course,
it is not an
agenda to avoid models, it is a tool that builds models that are not
assumed, but empirically motivated.
Below we summarize some features of KET, witch are
especially useful in analysis of big and complicated data sets.
KET is a
“Model-FreeÔ” ™ technology that
creates an adequate model not requiring any prior knowledge of
relations or predetermined class of models and is not restricted by any type of analytical
structure. Particularly, KET consistently separates steps of analysis
focused on scale-invariant and scale-dependent features of data.
KET employs purely semantic
approach to the input of analysis. We did not say "data sources",
because KET is capable to utilize
semantic nets, conceptual frame systems, knowledge bases and results
of knowledge acquisition from experts including free text memos in
KET will perform with any
combination of quantitative and qualitative scales.
KET will identify essential
variables before constructing a model.
KET will evaluate potential accuracy of
a future data descriptor (predictor, classification rule, etc.),
before it is constructed.
KET will deliver the
informationally optimal data descriptor under given complexity
constraints. In this sense, we can talk about informationally
KET possess meta-decision capability, i.e., it can be
used as a tool that selects KET-specific or known tools.
|Many KET procedures are based on the search for
invariants in data and on the use of these invariants.
Particularly, KET systematically distinguishes between models (in some
invariant form) and their numerous equivalent representations.
For example, KET can quickly generate a pattern and after that
transform it into an equivalent neural net already trained instead of
using traditional ANN training algorithms. |
The KET internal Self-Referencing
Engine is an important part of the tool kit. The Engine builds
internal self-referencing structures, self-evaluates and
continuously self-improves. The technology can optionally create
Agents”, or kernels of expert systems.
For details and theoretical basis of these features, see
"White Paper" page on this site.
When comparing KET to other AI technologies, we have to
take into account two aspects. On one hand, KET tools incorporate a
unique combination of features that distinguishes them from other tools
(see the previous section). On the other hand, it is not necessary to
consider KET as a rival to other AI methodologies and software systems.
KET can be used as a
meta-tool to facilitate the use of other technologies, to enhance them
and to cover issues that were not sufficiently addressed by other tools.
From this "cooperative" point of view, KET
addresses the following issues.
We have about 400 software tools for
data analysis on the U.S. market. The right choice itself is becoming an issue.
It would be very attractive to have a tool that selects tools on some
"early detection" basis (i.e., not after trying each of them). KET
makes a step in this direction.
There is a general confusion in KDD
practice and literature between the concepts of "pattern of
dependence", "representation of a pattern" and "pattern finders". For
example, it is well-known that tools creating reasonably compact
artificial neural nets are very slow to train. On the other hand, it
is also known that Boolean nets can be converted into equivalent
Boolean logic formulas, decision trees, production rules, finite
automata, neural nets and other forms and vice versa. Training process
considerably depends on this conversion. Therefore, conceptually it is
more productive to view neural nets just as a form of representation
of a pattern, not as a method of finding them. Decision to use neural
nets under this paradigm does not imply that one of known net training
processes should be used for finding patterns. KET consistently
distinguishes between deployment form of a pattern and efficient
algorithms that build the pattern.
Meta-capabilities of KET and representational part of
its software along with invariant approaches in model search allowed
to achieve progress in addressing the issues above. As a result, KET
is capable to employ direct approaches in model identification and to
represent the descriptor in a form that meets user criteria (other
than accuracy and complexity).
For comparative details
Paper" page on this site.
Reviewing recently published research results, we conclude that many
issues that KET considers important are currently getting
attention from the AI community. What distinguishes KET from this point
of view is its integrated nature, which allows to approach the
problem in a unified conceptual framework, while the contemporary
literature with all interesting achievements creates impression of a
very fragmented picture with a set of disparate topics and groups of
methods. In KET the unified framework enhances performance, because
links between many potential approaches become visible and, therefore,
can be exploited to the advantage of analysis.
The software implementation of KET consists of 800+ modules. Each
module is designed to support a particular step in KET processing.
Because of their functional autonomy, modules can be used not only in
applications built on KET ideology, but also as add-ons to enhance many
other applications. (See more about software tools in
"Tools" section of this site.)
As a result of these differences, the
tool kit was often used for projects that somebody tried to do and
failed. Therefore, the role of KET in many cases was that of the trouble
shooter in data analysis.
Since 1980’s Dr. Liachenko tested his
technology on various applications in the U.S.S.R., U.S.A. and Canada.
The KET Tool Kit was used to discover and resolve problems in various
applications (for World Health Organization, EPA, General
Electric, etc.) and successfully operate on various platforms.
KET Data Mining features provide efficient
solutions for prediction, classification, selection of essential
variables, creation of simulation models, design of control and
Based on these features, KET identifies
adequate (and sometimes unexpected) models and automatically represents
them in a form convenient for the user (e.g., as an analytical
structure, C++ code, an executable for a given platform, a neural net,
For methodological details see
For services provided by KET, LLC review
For examples of some projects
accomplished in the past see
For information about software implementations and tools
New feature !!
A new KET module was introduced recently to support
interaction with "Mathematica 8" system.
KET, LLC joined
Content Group, Inc.
in initiative of using AI agents to facilitate work of physicians and