Vous n'êtes pas identifié(e).

#1 14/12/2017 17:47:13

Sotos
Membre

Procédure Stockée d'Analyse Statistique en SQL

Bonjour à tous,

Je développe en ce moment une SP d'analyse statistique univariée quantitative.
En clair, une sous-partie de la PROC SAS UNIVARIATE en rajoutant d'autres métriques stat.

  • Partie 1:
    Une table résultat TEMP TABLE qui s'appelle __UNIVARIATE__

    Drop Table If Exists __UNIVARIATE__ Cascade ;
    Create Temporary Table __UNIVARIATE__
    (
        "TimeStamp"    TIMESTAMP( 6) WITH TIME ZONE ,
        "DatabaseName" VARCHAR( 63) ,
        "SchemaName"   VARCHAR( 63) ,
        "TableName"    VARCHAR( 63) ,
        "ColumnName"   VARCHAR( 63) ,
    
        "N w Null"  BIGINT ,
        "N"         BIGINT ,
        "1/N"       FLOAT ,
        "N-1"       BIGINT ,
    
        "Min"  FLOAT ,
        "Max"  FLOAT ,
    
        "Avg"     FLOAT ,
        "Avgx 2"  FLOAT ,
        "Avgx 3"  FLOAT ,
        "Avgx 4"  FLOAT ,
    
        "Sum"     FLOAT ,
        "Sum x2"  FLOAT ,
        "Sum x3"  FLOAT ,
        "Sum x4"  FLOAT ,
    
        "Uniq" BIGINT ,
        "0"    BIGINT ,
        "POS"  BIGINT ,
        "NEG"  BIGINT ,
        "Null" BIGINT ,
    
        "Geometric Mean" FLOAT ,
        "Harmonic Mean"  FLOAT ,
    
        "Range" FLOAT ,
        "Mid-range" FLOAT ,
    
        "Kurtosis" FLOAT ,
        "Skewness" FLOAT ,
    
        "Perc Uniq"        FLOAT ,
        "Perc Uniq w Null" FLOAT ,
        "Perc 0"           FLOAT ,
        "Perc POS"         FLOAT ,
        "Perc NEG"         FLOAT ,
        "Perc Null"        FLOAT ,
    
        "Corrected Sum of Squares" FLOAT ,
        "Student Test Statistic"   FLOAT ,
    
        "Standard Deviation Population" FLOAT ,
        "Standard Deviation Sample"     FLOAT ,
        "Variance Population"           FLOAT ,
        "Variance Sample"               FLOAT ,
    
        "Coefficient Variation"         FLOAT ,
    
        "Sturges" FLOAT ,
        "Rice"    FLOAT ,
        "Sqrt"    FLOAT ,
        "Doane"   FLOAT ,
        "Scott"   FLOAT
    )
    On Commit Preserve Rows
    ;
  • Partie 2:
    Une vue dynamique basée sur l'objet système:  information_schema.columns qui génère toutes les requêtes à exécuter

    Drop View If Exists stat.__QBUILDER_UNIVARIATE__ Cascade ;
    Create Or Replace View stat.__QBUILDER_UNIVARIATE__
    AS
    Select
          table_catalog AS "DatabaseName"
        , table_schema  AS "SchemaName"
        , table_name    AS "TableName"
        , ROW_NUMBER() Over ( Partition By table_catalog, table_schema, table_name Order By table_name) AS RN
        , ' Select ' ||
                'Current_Timestamp( 6 ) AS "TimeStamp" ' ||
                ', Cast( '''|| col.table_catalog ||''' AS Varchar(63) ) AS "DatabaseName" ' ||
                ', Cast( '''|| col.table_schema ||''' AS Varchar(63) ) AS "SchemaName" ' ||
                ', Cast( '''|| col.table_name ||''' AS Varchar(63) ) AS "TableName" ' ||
                ', Cast( '''|| col.column_name ||''' AS Varchar(63) ) AS "ColumnName" ' ||
                ', t.* ' ||
            'From ' ||
            '( ' ||
                'Select ' ||
                    't.* ' ||
                    ', "Max" - "Min" AS "Range" ' ||
                    ', ( "Max" + "Min" ) / 2 AS "Mid-range" ' ||
                    ', ( ' ||
                            '( ' ||
                                '( ' ||
                                    '"1/N" * "Sum x4" ' ||
                                '- ' ||
                                    '4 * "Avg" * "Sum x3" * "1/N" ' ||
                                '+ ' ||
                                    '6 * "Avgx 2" * ( ( "Sum x2" - "N" * "Avgx 2" ) / NULLIF( NULLIF( "N-1", 0 ), -1 ) ) ' ||
                                ') ' ||
                                '+ ' ||
                                '3 * "Avgx 4" ' ||
                            ') ' ||
                            '/ ' ||
                            'NULLIF( ' ||
                                'Power( ' ||
                                    'SQRT( ' ||
                                        'Case ' ||
                                            'When ( "Sum x2" - "N" * "Avgx 2" ) / NULLIF( NULLIF( "N-1", 0 ), -1 ) >= 0 ' ||
                                            'Then ( "Sum x2" - "N" * "Avgx 2" ) / NULLIF( NULLIF( "N-1", 0 ), -1 )' ||
                                            'Else NULL ' ||
                                        'End ' ||
                                    ') ' ||
                                ', 4 ) ' ||
                            ', 0 ) ' ||
                        ') ' ||
                    '- ' ||
                        '( ' ||
                            '( 3 * Power( NULLIF( NULLIF( "N-1", 0 ), -1 ), 2 ) ) ' ||
                            '/ ' ||
                            'NULLIF( ( NULLIF( "N", 2 ) -2 ) * ( NULLIF( "N", 3 ) -3 ), 0 ) ' ||
                        ') ' ||
                    'AS "Kurtosis" ' ||
                    ', ( ' ||
                        '"1/N" * "Sum x3" - "Avgx 3" - 3 * "Avg" * ( ( "Sum x2" - ( "N" * "Avgx 2" ) ) / NULLIF( NULLIF( "N-1", 0 ), -1 ) ) ' ||
                    ') ' ||
                    '/ ' ||
                    'NULLIF( ' ||
                        'Power( ' ||
                            'SQRT( ' ||
                                'Case ' ||
                                    'When ( "Sum x2" - ( "N" * "Avgx 2" )) ' ||
                                    '/ ' ||
                                    'NULLIF( NULLIF( "N-1", 0 ), -1 ) >= 0 ' ||
                                    'Then ( "Sum x2" - ( "N" * "Avgx 2" )) ' ||
                                    '/ ' ||
                                    'NULLIF( NULLIF( "N-1", 0 ),- 1 ) ' ||
                                'Else ' ||
                                    'NULL ' ||
                                'End ' ||
                                ') ' ||
                        ', 3 ) ' ||
                    ', 0 ) ' ||
                    'AS "Skewness" ' ||
                    ', "Uniq" / Cast( "N" AS Float ) * 100 AS "Perc Uniq" ' ||
                    ', "Uniq" / Cast( "N w Null" AS Float ) * 100 AS "Perc Uniq w Null" ' ||
                    ', "0" / Cast( "N" AS Float ) * 100 AS "Perc 0" ' ||
                    ', "POS" / Cast( "N" AS Float ) * 100 AS "Perc POS" ' ||
                    ', "NEG" / Cast( "N" AS Float ) * 100 AS "Perc NEG" ' ||
                    ',"Null" / Cast( "N w Null" AS Float ) * 100 AS "Perc Null" ' ||
                    ', "Sum x2" - ( "N" * "Avgx 2" ) AS "Corrected Sum of Squares" ' ||
                    ', ( "Avg" - 0 ) ' ||
                        '/ ' ||
                        'NULLIF( ' ||
                            '( ' ||
                                'SQRT( ' ||
                                    'Case ' ||
                                        'When ( "Sum x2" - ( "N" * "Avgx 2" )) / NULLIF( NULLIF( "N-1", 0 ), -1 ) >= 0 ' ||
                                        'Then ( "Sum x2" - ( "N" * "Avgx 2" )) / NULLIF( NULLIF( "N-1", 0 ), -1 ) ' ||
                                        'Else NULL ' ||
                                    'End ' ||
                                ') '||
                                '/ ' ||
                                'NULLIF( SQRT( Case When "N" >= 0 Then "N" Else NULL End ), 0 ) ' ||
                            ') ' ||
                        ', 0 ) ' ||
                    'AS "Student Test Statistic" ' ||
                    ', SQRT( ' ||
                        'Case ' ||
                            'When ( "Sum x2" -( "N" * "Avgx 2" )) / NULLIF( "N", 0 ) >= 0 ' ||
                            'Then ( "Sum x2" -( "N" * "Avgx 2" )) / NULLIF( "N", 0 ) ' ||
                            'Else NULL ' ||
                        'End ' ||
                    ') AS "Standard Deviation Population" ' ||
                    ', SQRT( ' ||
                        'Case ' ||
                            'When ( "Sum x2" - ( "N" * "Avgx 2" )) / NULLIF( NULLIF( "N-1", 0 ), -1 ) >= 0 ' ||
                            'Then ( "Sum x2" - ( "N" * "Avgx 2" )) / NULLIF( NULLIF( "N-1", 0 ), -1 ) ' ||
                            'Else NULL ' ||
                        'End ' ||
                    ') AS "Standard Deviation Sample" ' ||
                    ', ( "Sum x2" - ( "N" * "Avgx 2" ) ) / NULLIF( "N", 0 ) AS "Variance Population" ' ||
                    ', ( "Sum x2" - ( "N" * "Avgx 2" ) ) / NULLIF( NULLIF( "N-1", 0 ), -1 ) AS "Variance Sample" ' ||
                    ', SQRT( ' ||
                        'Case ' ||
                            'When ( "Sum x2" - "N" * "Avgx 2" ) / NULLIF( NULLIF( "N-1", 0 ), -1 ) >= 0 ' ||
                            'Then ( "Sum x2" - "N" * "Avgx 2" ) / NULLIF( NULLIF( "N-1", 0 ), -1 ) ' ||
                            'Else NULL ' ||
                        'End ' ||
                        ') ' ||
                    '/ ' ||
                        'NULLIF( "Avg", 0 ) AS "Coefficient Variation" ' ||
                    ', ( 1 + 10 / Cast( 3 AS Float ) * LOG( Case When "N" > 0 Then "N" Else NULL End ) ) AS "Sturges" ' ||
                    ', ( 2 * Power( "N",( 1 / Cast( 3 AS Float ))) ) AS "Rice" ' ||
                    ', ( SQRT( Case When "N" >= 0 Then "N" Else NULL End ) ) AS "Sqrt" ' ||
                    ', ( 1 + ( LOG( Case When "N" > 0 Then "N" Else NULL End ) / LOG( 2 ) ) ' ||
                    '+ ' ||
                    '( ' ||
                        'LOG( ' ||
                            'Case ' ||
                                'When 1 + ' ||
                                    '( ' ||
                                        '( "1/N" * "Sum x3" - "Avgx 3" - 3 * "Avg" * (( "Sum x2" - ( "N" * "Avgx 2" )) / NULLIF( NULLIF( "N-1", 0 ), -1 ))) ' ||
                                        '/ ' ||
                                        'NULLIF( ' ||
                                            'Power( ' ||
                                                'SQRT( ' ||
                                                    'Case ' ||
                                                        'When( "Sum x2" - ( "N" * "Avgx 2" )) / NULLIF( NULLIF( "N-1", 0 ), -1 ) >= 0 ' ||
                                                        'Then( "Sum x2" - ( "N" * "Avgx 2" )) / NULLIF( NULLIF( "N-1", 0 ), -1 ) ' ||
                                                        'Else NULL ' ||
                                                    'End ' ||
                                                ') ' ||
                                            ', 3 ) ' ||
                                        ', 0 ) ' ||
                                    ') > 0 ' ||
                                'Then 1 + ' ||
                                    '( ' ||
                                        '( "1/N" * "Sum x3" - "Avgx 3" - 3 * "Avg" * (( "Sum x2" - ( "N" * "Avgx 2" )) / NULLIF( NULLIF( "N-1", 0 ), -1 ))) ' ||
                                        '/ ' ||
                                        'NULLIF( ' ||
                                            'Power( ' ||
                                                'SQRT( ' ||
                                                    'Case ' ||
                                                        'When( "Sum x2" - ( "N" * "Avgx 2" )) / NULLIF( NULLIF( "N-1", 0 ), -1 ) >= 0 ' ||
                                                        'Then( "Sum x2" - ( "N" * "Avgx 2" )) / NULLIF( NULLIF( "N-1", 0 ), -1 ) ' ||
                                                        'Else NULL ' ||
                                                    'End ' ||
                                                ') ' ||
                                            ', 3 ) ' ||
                                        ', 0 ) ' ||
                                    ') ' ||
                                'Else NULL ' ||
                            'End ) ' ||
                        '/ ' ||
                        'LOG( 2 ) ' ||
                    ') ' ||
                    ') ' ||
                    '/* Log_2 (x) = Log_10 (x) / Log_10 (2) */ AS "Doane" ' ||
                    ', ( ' ||
                        'Cast( 3.49 AS Float ) ' ||
                        '* ' ||
                        'SQRT( ' ||
                            'Case ' ||
                                'When ( "Sum x2" - ( "N" * "Avgx 2" )) / NULLIF( NULLIF( "N-1", 0 ), -1 ) >= 0 ' ||
                                'Then ( "Sum x2" - ( "N" * "Avgx 2" )) / NULLIF( NULLIF( "N-1", 0 ), -1 ) ' ||
                                'Else NULL ' ||
                            'End ' ||
                        ') ' ||
                        '/ ' ||
                    'Power( Case When "N" >= 0 Then "N" Else NULL End, ( -1 / Cast( 3 AS Float ))) ' ||
                    ') AS "Scott" ' ||
                'From ' ||
                '( ' ||
                    'Select ' ||
                        'Cast( COUNT(*) AS Bigint ) AS "N w Null" ' ||
                        ', Cast( COUNT( "'|| col.column_name || '" ) AS Bigint ) AS "N" ' ||
                        ', ( 1 / NULLIF( Cast( COUNT( "'|| col.column_name || '" ) AS Float ), 0 ) ) AS "1/N" ' ||
                        ', Cast( COUNT( "'|| col.column_name || '" ) AS Bigint ) - 1 AS "N-1" ' ||
                        ', MIN( "'|| col.column_name || '") AS "Min" ' ||
                        ', MAX( "'|| col.column_name || '") AS "Max" ' ||
                        ', AVG( "'|| col.column_name || '") AS "Avg" ' ||
                        ', Power( AVG( "'|| col.column_name || '"), 2 ) AS "Avgx 2" ' ||
                        ', Power( AVG( "'|| col.column_name || '"), 3 ) AS "Avgx 3" ' ||
                        ', Power( AVG( "'|| col.column_name || '"), 4 ) AS "Avgx 4" ' ||
                        ', Sum( "'|| col.column_name || '") AS "Sum" ' ||
                        ', Sum( Power( "'|| col.column_name || '", 2 )) AS "Sum x2" /* Also called: "Uncorrected Sum of Squares" */ ' ||
                        ', Sum( Power( "'|| col.column_name || '", 3 )) AS "Sum x3" ' ||
                        ', Sum( Power( "'|| col.column_name || '", 4 )) AS "Sum x4" ' ||
                        ', Cast( COUNT( DISTINCT "'|| col.column_name || '" ) AS Bigint ) AS "Uniq" ' ||
                        ', Cast( Sum( Case When "'|| col.column_name || '" = 0 Then 1 Else 0 End ) AS Bigint ) AS "0" ' ||
                        ', Cast( Sum( Case When "'|| col.column_name || '" > 0 Then 1 Else 0 End ) AS Bigint ) AS "POS" ' ||
                        ', Cast( Sum( Case When "'|| col.column_name || '" < 0 Then 1 Else 0 End ) AS Bigint ) AS "NEG" ' ||
                        ', Cast( Sum( Case When "'|| col.column_name || '" is NULL Then 1 Else 0 End ) AS Bigint ) AS "Null" ' ||
                        ', EXP( AVG( LN( Case When "'|| col.column_name || '"> 0 Then "'|| col.column_name || '" Else NULL End )) ) AS "Geometric Mean" ' ||
                        ', 1 / NULLIF( AVG( 1 / NULLIF(( Case When "'|| col.column_name || '"> 0 Then "'|| col.column_name || '" Else NULL End ), 0 )), 0 ) AS "Harmonic Mean" ' ||
                    'From ' ||
                    '( ' ||
                        'Select ' ||
                            'Cast( "'|| col.column_name || '" AS Float ) AS "'|| col.column_name || '" ' ||
                        'From "'|| col.table_schema ||'"."'|| col.table_name ||'" ' ||
                    ') t ' ||
                ') t ' ||
            ') t ' AS "TheQuery"
    From
        information_schema.columns AS col
    Where 1=1
        AND col.data_type IN
        (
              'numeric'
            , 'integer'
            , 'double precision'
            , 'bigint'
            , 'real'
            , 'smallint'
        )
    ;


    C'est dans cette table qu'on va aller chercher avec la SP, la requête à exécuter.
    Ainsi, pour un certain schéma et une certaine table dans votre database, on dressera un profil de variation pour chaque colonne numérique de cette table.

  • Partie 3:
    La SP qui prend simplement la sous-partie des requêtes (liées aux colonnes numériques) à exécuter.
    On retourne un résultat sous forme de table du DDL de __UNIVARIATE__

    DROP FUNCTION If Exists stat.f_udf_sql_trans_trim( col CHARACTER VARYING ) ;
    CREATE OR REPLACE FUNCTION stat.f_udf_sql_trans_trim( col CHARACTER VARYING )
    RETURNS CHARACTER VARYING AS
    $func$
    BEGIN
    RETURN
        TRIM( BOTH E'\\x0D' FROM /* gets rid of CR: 'Carriage Return' special character */
        TRIM( BOTH E'\\x0A' FROM /* gets rid of LF: 'Line Feed' special character */
        TRIM( BOTH E'\\x0B' FROM /* gets rid of vertical TAB: '\v' special character */
        TRIM( BOTH E'\\x09' FROM /* gets rid of horizontal TAB: '\t' special character */
        TRIM( col) ) ) ) )       /* gets rid of Space character: ' ' special character */
    ;
    END ;
    $func$
    LANGUAGE plpgsql
    VOLATILE
    RETURNS NULL ON NULL INPUT
    EXTERNAL SECURITY DEFINER
    ;
    
    
    
    
    Drop Function If Exists stat.f_sp_sql_stat_univariate( CHARACTER VARYING, CHARACTER VARYING) ;
    Create Or Replace Function stat.f_sp_sql_stat_univariate( IN sch VARCHAR( 63), IN tbl VARCHAR( 63) )
    
    RETURNS SETOF __UNIVARIATE__ AS
    
    $func$
    
    Declare
    
        q_row RECORD ;
        ddl VARCHAR := '' ;
    
    Begin
    
        For q_row IN
            Select
                *
            From
                stat.__QBUILDER_UNIVARIATE__
            Where 1=1
                And "SchemaName" = stat.f_udf_sql_trans_trim( sch )
                And "TableName" = stat.f_udf_sql_trans_trim( tbl )
        Loop
            ddl = 'INSERT INTO __UNIVARIATE__ ' || q_row."TheQuery" ;
            Execute ddl ;
            ddl := '' ;
            Return Query Execute q_row."TheQuery" ;
        End Loop ;
        Return ;
    
    End ;
    
    $func$
    
    LANGUAGE plpgsql
    VOLATILE
    RETURNS NULL ON NULL INPUT
    EXTERNAL SECURITY DEFINER
    ;

J'ai testé cela sur les données d'iris des statisticiens.

begin ;

Create Schema If Not Exists data ;
Create Schema If Not Exists stat ;

Create Table If Not Exists data.iris
(
      SepalLength Float
    , SepalWidth  Float
    , PetalLength Float
    , PetalWidth  Float
    , Species     Character Varying
)
;

INSERT INTO data.iris VALUES( 5.1,3.5,1.4,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 4.9,3.0,1.4,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 4.7,3.2,1.3,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 4.6,3.1,1.5,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.0,3.6,1.4,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.4,3.9,1.7,0.4,'Iris-setosa');
INSERT INTO data.iris VALUES( 4.6,3.4,1.4,0.3,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.0,3.4,1.5,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 4.4,2.9,1.4,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 4.9,3.1,1.5,0.1,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.4,3.7,1.5,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 4.8,3.4,1.6,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 4.8,3.0,1.4,0.1,'Iris-setosa');
INSERT INTO data.iris VALUES( 4.3,3.0,1.1,0.1,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.8,4.0,1.2,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.7,4.4,1.5,0.4,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.4,3.9,1.3,0.4,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.1,3.5,1.4,0.3,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.7,3.8,1.7,0.3,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.1,3.8,1.5,0.3,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.4,3.4,1.7,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.1,3.7,1.5,0.4,'Iris-setosa');
INSERT INTO data.iris VALUES( 4.6,3.6,1.0,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.1,3.3,1.7,0.5,'Iris-setosa');
INSERT INTO data.iris VALUES( 4.8,3.4,1.9,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.0,3.0,1.6,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.0,3.4,1.6,0.4,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.2,3.5,1.5,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.2,3.4,1.4,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 4.7,3.2,1.6,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 4.8,3.1,1.6,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.4,3.4,1.5,0.4,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.2,4.1,1.5,0.1,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.5,4.2,1.4,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 4.9,3.1,1.5,0.1,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.0,3.2,1.2,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.5,3.5,1.3,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 4.9,3.1,1.5,0.1,'Iris-setosa');
INSERT INTO data.iris VALUES( 4.4,3.0,1.3,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.1,3.4,1.5,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.0,3.5,1.3,0.3,'Iris-setosa');
INSERT INTO data.iris VALUES( 4.5,2.3,1.3,0.3,'Iris-setosa');
INSERT INTO data.iris VALUES( 4.4,3.2,1.3,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.0,3.5,1.6,0.6,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.1,3.8,1.9,0.4,'Iris-setosa');
INSERT INTO data.iris VALUES( 4.8,3.0,1.4,0.3,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.1,3.8,1.6,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 4.6,3.2,1.4,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.3,3.7,1.5,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.0,3.3,1.4,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 7.0,3.2,4.7,1.4,'Iris-versicolor');
INSERT INTO data.iris VALUES( 6.4,3.2,4.5,1.5,'Iris-versicolor');
INSERT INTO data.iris VALUES( 6.9,3.1,4.9,1.5,'Iris-versicolor');
INSERT INTO data.iris VALUES( 5.5,2.3,4.0,1.3,'Iris-versicolor');
INSERT INTO data.iris VALUES( 6.5,2.8,4.6,1.5,'Iris-versicolor');
INSERT INTO data.iris VALUES( 5.7,2.8,4.5,1.3,'Iris-versicolor');
INSERT INTO data.iris VALUES( 6.3,3.3,4.7,1.6,'Iris-versicolor');
INSERT INTO data.iris VALUES( 4.9,2.4,3.3,1.0,'Iris-versicolor');
INSERT INTO data.iris VALUES( 6.6,2.9,4.6,1.3,'Iris-versicolor');
INSERT INTO data.iris VALUES( 5.2,2.7,3.9,1.4,'Iris-versicolor');
INSERT INTO data.iris VALUES( 5.0,2.0,3.5,1.0,'Iris-versicolor');
INSERT INTO data.iris VALUES( 5.9,3.0,4.2,1.5,'Iris-versicolor');
INSERT INTO data.iris VALUES( 6.0,2.2,4.0,1.0,'Iris-versicolor');
INSERT INTO data.iris VALUES( 6.1,2.9,4.7,1.4,'Iris-versicolor');
INSERT INTO data.iris VALUES( 5.6,2.9,3.6,1.3,'Iris-versicolor');
INSERT INTO data.iris VALUES( 6.7,3.1,4.4,1.4,'Iris-versicolor');
INSERT INTO data.iris VALUES( 5.6,3.0,4.5,1.5,'Iris-versicolor');
INSERT INTO data.iris VALUES( 5.8,2.7,4.1,1.0,'Iris-versicolor');
INSERT INTO data.iris VALUES( 6.2,2.2,4.5,1.5,'Iris-versicolor');
INSERT INTO data.iris VALUES( 5.6,2.5,3.9,1.1,'Iris-versicolor');
INSERT INTO data.iris VALUES( 5.9,3.2,4.8,1.8,'Iris-versicolor');
INSERT INTO data.iris VALUES( 6.1,2.8,4.0,1.3,'Iris-versicolor');
INSERT INTO data.iris VALUES( 6.3,2.5,4.9,1.5,'Iris-versicolor');
INSERT INTO data.iris VALUES( 6.1,2.8,4.7,1.2,'Iris-versicolor');
INSERT INTO data.iris VALUES( 6.4,2.9,4.3,1.3,'Iris-versicolor');
INSERT INTO data.iris VALUES( 6.6,3.0,4.4,1.4,'Iris-versicolor');
INSERT INTO data.iris VALUES( 6.8,2.8,4.8,1.4,'Iris-versicolor');
INSERT INTO data.iris VALUES( 6.7,3.0,5.0,1.7,'Iris-versicolor');
INSERT INTO data.iris VALUES( 6.0,2.9,4.5,1.5,'Iris-versicolor');
INSERT INTO data.iris VALUES( 5.7,2.6,3.5,1.0,'Iris-versicolor');
INSERT INTO data.iris VALUES( 5.5,2.4,3.8,1.1,'Iris-versicolor');
INSERT INTO data.iris VALUES( 5.5,2.4,3.7,1.0,'Iris-versicolor');
INSERT INTO data.iris VALUES( 5.8,2.7,3.9,1.2,'Iris-versicolor');
INSERT INTO data.iris VALUES( 6.0,2.7,5.1,1.6,'Iris-versicolor');
INSERT INTO data.iris VALUES( 5.4,3.0,4.5,1.5,'Iris-versicolor');
INSERT INTO data.iris VALUES( 6.0,3.4,4.5,1.6,'Iris-versicolor');
INSERT INTO data.iris VALUES( 6.7,3.1,4.7,1.5,'Iris-versicolor');
INSERT INTO data.iris VALUES( 6.3,2.3,4.4,1.3,'Iris-versicolor');
INSERT INTO data.iris VALUES( 5.6,3.0,4.1,1.3,'Iris-versicolor');
INSERT INTO data.iris VALUES( 5.5,2.5,4.0,1.3,'Iris-versicolor');
INSERT INTO data.iris VALUES( 5.5,2.6,4.4,1.2,'Iris-versicolor');
INSERT INTO data.iris VALUES( 6.1,3.0,4.6,1.4,'Iris-versicolor');
INSERT INTO data.iris VALUES( 5.8,2.6,4.0,1.2,'Iris-versicolor');
INSERT INTO data.iris VALUES( 5.0,2.3,3.3,1.0,'Iris-versicolor');
INSERT INTO data.iris VALUES( 5.6,2.7,4.2,1.3,'Iris-versicolor');
INSERT INTO data.iris VALUES( 5.7,3.0,4.2,1.2,'Iris-versicolor');
INSERT INTO data.iris VALUES( 5.7,2.9,4.2,1.3,'Iris-versicolor');
INSERT INTO data.iris VALUES( 6.2,2.9,4.3,1.3,'Iris-versicolor');
INSERT INTO data.iris VALUES( 5.1,2.5,3.0,1.1,'Iris-versicolor');
INSERT INTO data.iris VALUES( 5.7,2.8,4.1,1.3,'Iris-versicolor');
INSERT INTO data.iris VALUES( 6.3,3.3,6.0,2.5,'Iris-virginica');
INSERT INTO data.iris VALUES( 5.8,2.7,5.1,1.9,'Iris-virginica');
INSERT INTO data.iris VALUES( 7.1,3.0,5.9,2.1,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.3,2.9,5.6,1.8,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.5,3.0,5.8,2.2,'Iris-virginica');
INSERT INTO data.iris VALUES( 7.6,3.0,6.6,2.1,'Iris-virginica');
INSERT INTO data.iris VALUES( 4.9,2.5,4.5,1.7,'Iris-virginica');
INSERT INTO data.iris VALUES( 7.3,2.9,6.3,1.8,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.7,2.5,5.8,1.8,'Iris-virginica');
INSERT INTO data.iris VALUES( 7.2,3.6,6.1,2.5,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.5,3.2,5.1,2.0,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.4,2.7,5.3,1.9,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.8,3.0,5.5,2.1,'Iris-virginica');
INSERT INTO data.iris VALUES( 5.7,2.5,5.0,2.0,'Iris-virginica');
INSERT INTO data.iris VALUES( 5.8,2.8,5.1,2.4,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.4,3.2,5.3,2.3,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.5,3.0,5.5,1.8,'Iris-virginica');
INSERT INTO data.iris VALUES( 7.7,3.8,6.7,2.2,'Iris-virginica');
INSERT INTO data.iris VALUES( 7.7,2.6,6.9,2.3,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.0,2.2,5.0,1.5,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.9,3.2,5.7,2.3,'Iris-virginica');
INSERT INTO data.iris VALUES( 5.6,2.8,4.9,2.0,'Iris-virginica');
INSERT INTO data.iris VALUES( 7.7,2.8,6.7,2.0,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.3,2.7,4.9,1.8,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.7,3.3,5.7,2.1,'Iris-virginica');
INSERT INTO data.iris VALUES( 7.2,3.2,6.0,1.8,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.2,2.8,4.8,1.8,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.1,3.0,4.9,1.8,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.4,2.8,5.6,2.1,'Iris-virginica');
INSERT INTO data.iris VALUES( 7.2,3.0,5.8,1.6,'Iris-virginica');
INSERT INTO data.iris VALUES( 7.4,2.8,6.1,1.9,'Iris-virginica');
INSERT INTO data.iris VALUES( 7.9,3.8,6.4,2.0,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.4,2.8,5.6,2.2,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.3,2.8,5.1,1.5,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.1,2.6,5.6,1.4,'Iris-virginica');
INSERT INTO data.iris VALUES( 7.7,3.0,6.1,2.3,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.3,3.4,5.6,2.4,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.4,3.1,5.5,1.8,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.0,3.0,4.8,1.8,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.9,3.1,5.4,2.1,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.7,3.1,5.6,2.4,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.9,3.1,5.1,2.3,'Iris-virginica');
INSERT INTO data.iris VALUES( 5.8,2.7,5.1,1.9,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.8,3.2,5.9,2.3,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.7,3.3,5.7,2.5,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.7,3.0,5.2,2.3,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.3,2.5,5.0,1.9,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.5,3.0,5.2,2.0,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.2,3.4,5.4,2.3,'Iris-virginica');
INSERT INTO data.iris VALUES( 5.9,3.0,5.1,1.8,'Iris-virginica');



commit ;

On lance la SP en appelant:

Select * From stat.f_sp_sql_stat_univariate('data','iris') ;

Cela fonctionne mais je suis très newbie en procédure stockée et en best practice, donc à vos commentaires assassins ;-)
N'hésitez pas à critiquer et ré-engineeré la manière.
Merci d'avoir lu jusqu'ici !

Sotos

Hors ligne

#2 14/12/2017 18:17:27

gleu
Administrateur

Re : Procédure Stockée d'Analyse Statistique en SQL

A priori, la fonction f_sp_sql_stat_univariate peut très bien être remplacée par une seule requête, du style (pas testé) :

INSERT INTO __UNIVARIATE__
Select "TheQuery" From stat.__QBUILDER_UNIVARIATE__
Where "SchemaName" = stat.f_udf_sql_trans_trim( sch )
    And "TableName" = stat.f_udf_sql_trans_trim( tbl )
RETURNING "TheQuery"

Guillaume.

Hors ligne

#3 15/12/2017 11:18:29

Sotos
Membre

Re : Procédure Stockée d'Analyse Statistique en SQL

Hello Guillaume,


Merci de ta réponse,
Plus généralement, est-ce que le monde PostgreSQL a besoin de ce genre de matériel ?
Cela peut aussi être une UDF codée en langage externe "protégé", ou un autre type d'objet de la database.



Il n'y a pas d'index ni de PK sur la table à analyser, parce qu'apriori, je ne sais absolument rien de la démographie de la table.
On se place dans le cas où on analyse des données qui sont uniquement "bien" typées (et c'est déjà pas mal...).

Peut-être que ces fonctions existent déjà ? Je sais que c'est possible dans d'autres technos mais je ne veux pas de mouvement de données, la data reste en DB (je connais le SAS in-database, les éditeurs propriétaires  avec le built-in functions analytiques oracle, teradata, les add-ons comme Fuzzy Logix, etc bref.


PostgreSQL semble manquer de fonctions built-in d'analyse statistique et je ne suis pas convaincu que la DB ne puisse pas faire ce travail correctement ==> la stat suppose de petite volumétrie donc facile à envoyer sur une session SAS, R, Python, mais allez encore une fois, restons en DB ;-).


Aussi, si vous avez des tutos pour "packager" un ensemble de SPs et l'installer en un click, ça serait cool :-)

Merci !

Sotos

Hors ligne

#4 15/12/2017 20:06:22

gleu
Administrateur

Re : Procédure Stockée d'Analyse Statistique en SQL

Plus généralement, est-ce que le monde PostgreSQL a besoin de ce genre de matériel ?

Je ne comprends pas la question, ça va être difficile de répondre smile

Aussi, si vous avez des tutos pour "packager" un ensemble de SPs et l'installer en un click, ça serait cool :-)

Regardez du côté des extensions, c'est fait pour. Voir https://docs.postgresql.fr/10/extend-extensions.html pour les détails.


Guillaume.

Hors ligne

#5 18/12/2017 10:45:21

Sotos
Membre

Re : Procédure Stockée d'Analyse Statistique en SQL

En disant matériel, je voulais dire "le code" :-), un portfolio de fonctions d'analyse statistique allant plus loin que la régression linéaire simple.

Hors ligne

Pied de page des forums