Micro-biological Environmental Data Analysis
(Chesapeake Bay area)
Object of Investigation
A bacteriological data base was collected during several
years by the EPA and was given to George Mason University, VA
(Department of Biology) for data analysis. EPA presumed the data
potentially might reveal the causes of high bacteria concentration and
bacteria production rate observed during certain periods of time. The
database contained measurements of biochemical oxygen demand,
chlorophyll concentration, organic carbon particulate and several other
A pilot project was initiated by professor Dr. Robert
Jonas (GMU) and Dr. N. N. Lyashenko to analyze a part of the Chesapeake
Bay data set with Dr. Lyashenko’s, Knowledge Extraction Technology.
To assess the potential predictability of Bacterial
Abundance and Production variables.
To evaluate the quality of the database in relation to
the previous problem.
To identify an adequate processing strategy among
possible KET strategies.
To classify the database as essentially dynamic
or static in relation to the possibility to predict Bacterial
Abundance and Production.
To estimate the potential possibility to optimize the
data collection process.
To formulate recommendations for more substantial
research in the future.
Before Dr. Jonas and Dr. Lyashenko performed the project,
many researchers had analyzed the data set and applied conventional
statistical methods. The results were controversial, three types of
difficulties were identified: data was noised and contained many missing
values; the database was mixed, i.e., consisted of both numerical and
qualitative parameters; the underlying numerical dependencies were
certainly non-linear and dynamic.
Variables of Bacteria Abundance and Production were
predicted. In the static model, the first variable was predicted within
99.7% and the second within 81% accuracy. The major results were as
Nonlinear Predictors were essentially most accurate. ·
Chesapeake Bay Ecological System was essentially
Dynamics of the system was essentially nonlinear.
Most unstable processes occurred in May.
Nearly all unstable areas were located in the Western
part of the river.
As a result of the KET analysis, considerable reductions
and relocation to measurement activities in the area were recommended
without the loss of prediction accuracy. Predictors allowed to create a
dynamic model that can be used to evaluate different cleaning procedures
in the future.
it was Done
An Information Analysis Module from the KET Tool
Kit was used to identify informative variables from which predictors
were constructed. An analytical descriptors were used to obtain the
KET analysis and directed the construction of an optimal
data collection map to identifying the prime locations,
schedules, and measure pollutant concentrations. KET Logic Descriptors
could identified the type, character and quantity of unstable