Pages

Tuesday 22 May 2012

Confusion Matrix

The data sets which are collected over a period of time without any boundary lines are often difficult to understand and simulate. The results generated are often confusing as the data sets are not properly initialized. The values which are extended during data mining often do not match the actual/original values. Based on the data collected, attributes can be determined by the prediction model. The attribute value can be either true or false. This prediction model can classify data sets into two categories: Category one matches one attribute, category two matches another (can be taken as true or false). The results are analyzed through observers' expertise, historical data and simulation results. Tools supporting data prediction model are widely used by research community. Complex data sets can define more than two attributes which can refine the prediction model. For example if we are dealing with patient data: a patient can be anaemic or non anaemic. The correlation model can be anaemic patient with depression or without depression and vise versa. A Query using Pearson coefficient [1] to determine correlation between two attributes :

SELECT
                Group1, Group2,
                ((psum - (sum1 * sum2 / n)) / sqrt((sum1sq - pow(sum1, 2.0) / n) * (sum2sq - pow(sum2, 2.0) / n)))
AS
                r, n
FROM
(SELECT
                n1.Group AS Group1,
                n2.Group AS Group2,
                SUM(n1.depression) AS sum1,
                SUM(n2.depression AS sum2,
                SUM(n1.depression * n1.depression) AS sum1sq,
                SUM(n2.depression * n2.depression) AS sum2sq,
                SUM(n1.depression * n2.depression) AS psum,
                COUNT(*) AS n
FROM
                testdata AS n1
LEFT JOIN
               testdata AS n2
ON
               n1.anaemic = n2.nonanaemic
WHERE  
                n1.Group > n2.Group
GROUP BY
                n1.Group, n2.Group) AS step1
ORDER BY
                r DESC,
                n DESC

[1] http://www.vanheusden.com/misc/pearson.php

Sunday 1 April 2012

SQL Based Ontology Applications

MS SQL server Knowledge Management (KM) and Data Mining (DM) capabilities form a decision support system for Business/Medical applications. The Standard for relational database applications is SQL, and OWL for ontology based applications. Various approaches are adapted by researchers for ontology based database applications. Mapping and transformation are very common integration models for intefacing between DB and Onotology based applications [1]. OWL uses XSD data types such as string, integer, float, boolean, time and date. Transformation is in fact conversion of data types from XSD to SQL. Mapping requires knowledge of data types and objects properties. For specific applications the researcher/user can pre-define set of rules for mapping or transformation. Ontology is segmentation of rules with existing database for creating a smart decision support system that enhances the conventional relational database management system [2].

References:
1-   Irina Estrova et. al, "Storing OWL Ontologies in SQL Relational Databases" In Proceedings of
World Academy of Science, Engineering and Technology 29 2007,

2- Peter Mika et. al., "Towards a New Synthesis of Ontology, Technology and Knowledge Management", Technical Report IR-BI-001, 2004.