Vous n'êtes pas identifié(e).
- Contributions : Récentes | Sans réponse
Pages : 1
#1 14/12/2017 17:47:13
- Sotos
- Membre
Procédure Stockée d'Analyse Statistique en SQL
Bonjour à tous,
Je développe en ce moment une SP d'analyse statistique univariée quantitative.
En clair, une sous-partie de la PROC SAS UNIVARIATE en rajoutant d'autres métriques stat.
Partie 1:
Une table résultat TEMP TABLE qui s'appelle __UNIVARIATE__Drop Table If Exists __UNIVARIATE__ Cascade ; Create Temporary Table __UNIVARIATE__ ( "TimeStamp" TIMESTAMP( 6) WITH TIME ZONE , "DatabaseName" VARCHAR( 63) , "SchemaName" VARCHAR( 63) , "TableName" VARCHAR( 63) , "ColumnName" VARCHAR( 63) , "N w Null" BIGINT , "N" BIGINT , "1/N" FLOAT , "N-1" BIGINT , "Min" FLOAT , "Max" FLOAT , "Avg" FLOAT , "Avgx 2" FLOAT , "Avgx 3" FLOAT , "Avgx 4" FLOAT , "Sum" FLOAT , "Sum x2" FLOAT , "Sum x3" FLOAT , "Sum x4" FLOAT , "Uniq" BIGINT , "0" BIGINT , "POS" BIGINT , "NEG" BIGINT , "Null" BIGINT , "Geometric Mean" FLOAT , "Harmonic Mean" FLOAT , "Range" FLOAT , "Mid-range" FLOAT , "Kurtosis" FLOAT , "Skewness" FLOAT , "Perc Uniq" FLOAT , "Perc Uniq w Null" FLOAT , "Perc 0" FLOAT , "Perc POS" FLOAT , "Perc NEG" FLOAT , "Perc Null" FLOAT , "Corrected Sum of Squares" FLOAT , "Student Test Statistic" FLOAT , "Standard Deviation Population" FLOAT , "Standard Deviation Sample" FLOAT , "Variance Population" FLOAT , "Variance Sample" FLOAT , "Coefficient Variation" FLOAT , "Sturges" FLOAT , "Rice" FLOAT , "Sqrt" FLOAT , "Doane" FLOAT , "Scott" FLOAT ) On Commit Preserve Rows ;
Partie 2:
Une vue dynamique basée sur l'objet système: information_schema.columns qui génère toutes les requêtes à exécuterDrop View If Exists stat.__QBUILDER_UNIVARIATE__ Cascade ; Create Or Replace View stat.__QBUILDER_UNIVARIATE__ AS Select table_catalog AS "DatabaseName" , table_schema AS "SchemaName" , table_name AS "TableName" , ROW_NUMBER() Over ( Partition By table_catalog, table_schema, table_name Order By table_name) AS RN , ' Select ' || 'Current_Timestamp( 6 ) AS "TimeStamp" ' || ', Cast( '''|| col.table_catalog ||''' AS Varchar(63) ) AS "DatabaseName" ' || ', Cast( '''|| col.table_schema ||''' AS Varchar(63) ) AS "SchemaName" ' || ', Cast( '''|| col.table_name ||''' AS Varchar(63) ) AS "TableName" ' || ', Cast( '''|| col.column_name ||''' AS Varchar(63) ) AS "ColumnName" ' || ', t.* ' || 'From ' || '( ' || 'Select ' || 't.* ' || ', "Max" - "Min" AS "Range" ' || ', ( "Max" + "Min" ) / 2 AS "Mid-range" ' || ', ( ' || '( ' || '( ' || '"1/N" * "Sum x4" ' || '- ' || '4 * "Avg" * "Sum x3" * "1/N" ' || '+ ' || '6 * "Avgx 2" * ( ( "Sum x2" - "N" * "Avgx 2" ) / NULLIF( NULLIF( "N-1", 0 ), -1 ) ) ' || ') ' || '+ ' || '3 * "Avgx 4" ' || ') ' || '/ ' || 'NULLIF( ' || 'Power( ' || 'SQRT( ' || 'Case ' || 'When ( "Sum x2" - "N" * "Avgx 2" ) / NULLIF( NULLIF( "N-1", 0 ), -1 ) >= 0 ' || 'Then ( "Sum x2" - "N" * "Avgx 2" ) / NULLIF( NULLIF( "N-1", 0 ), -1 )' || 'Else NULL ' || 'End ' || ') ' || ', 4 ) ' || ', 0 ) ' || ') ' || '- ' || '( ' || '( 3 * Power( NULLIF( NULLIF( "N-1", 0 ), -1 ), 2 ) ) ' || '/ ' || 'NULLIF( ( NULLIF( "N", 2 ) -2 ) * ( NULLIF( "N", 3 ) -3 ), 0 ) ' || ') ' || 'AS "Kurtosis" ' || ', ( ' || '"1/N" * "Sum x3" - "Avgx 3" - 3 * "Avg" * ( ( "Sum x2" - ( "N" * "Avgx 2" ) ) / NULLIF( NULLIF( "N-1", 0 ), -1 ) ) ' || ') ' || '/ ' || 'NULLIF( ' || 'Power( ' || 'SQRT( ' || 'Case ' || 'When ( "Sum x2" - ( "N" * "Avgx 2" )) ' || '/ ' || 'NULLIF( NULLIF( "N-1", 0 ), -1 ) >= 0 ' || 'Then ( "Sum x2" - ( "N" * "Avgx 2" )) ' || '/ ' || 'NULLIF( NULLIF( "N-1", 0 ),- 1 ) ' || 'Else ' || 'NULL ' || 'End ' || ') ' || ', 3 ) ' || ', 0 ) ' || 'AS "Skewness" ' || ', "Uniq" / Cast( "N" AS Float ) * 100 AS "Perc Uniq" ' || ', "Uniq" / Cast( "N w Null" AS Float ) * 100 AS "Perc Uniq w Null" ' || ', "0" / Cast( "N" AS Float ) * 100 AS "Perc 0" ' || ', "POS" / Cast( "N" AS Float ) * 100 AS "Perc POS" ' || ', "NEG" / Cast( "N" AS Float ) * 100 AS "Perc NEG" ' || ',"Null" / Cast( "N w Null" AS Float ) * 100 AS "Perc Null" ' || ', "Sum x2" - ( "N" * "Avgx 2" ) AS "Corrected Sum of Squares" ' || ', ( "Avg" - 0 ) ' || '/ ' || 'NULLIF( ' || '( ' || 'SQRT( ' || 'Case ' || 'When ( "Sum x2" - ( "N" * "Avgx 2" )) / NULLIF( NULLIF( "N-1", 0 ), -1 ) >= 0 ' || 'Then ( "Sum x2" - ( "N" * "Avgx 2" )) / NULLIF( NULLIF( "N-1", 0 ), -1 ) ' || 'Else NULL ' || 'End ' || ') '|| '/ ' || 'NULLIF( SQRT( Case When "N" >= 0 Then "N" Else NULL End ), 0 ) ' || ') ' || ', 0 ) ' || 'AS "Student Test Statistic" ' || ', SQRT( ' || 'Case ' || 'When ( "Sum x2" -( "N" * "Avgx 2" )) / NULLIF( "N", 0 ) >= 0 ' || 'Then ( "Sum x2" -( "N" * "Avgx 2" )) / NULLIF( "N", 0 ) ' || 'Else NULL ' || 'End ' || ') AS "Standard Deviation Population" ' || ', SQRT( ' || 'Case ' || 'When ( "Sum x2" - ( "N" * "Avgx 2" )) / NULLIF( NULLIF( "N-1", 0 ), -1 ) >= 0 ' || 'Then ( "Sum x2" - ( "N" * "Avgx 2" )) / NULLIF( NULLIF( "N-1", 0 ), -1 ) ' || 'Else NULL ' || 'End ' || ') AS "Standard Deviation Sample" ' || ', ( "Sum x2" - ( "N" * "Avgx 2" ) ) / NULLIF( "N", 0 ) AS "Variance Population" ' || ', ( "Sum x2" - ( "N" * "Avgx 2" ) ) / NULLIF( NULLIF( "N-1", 0 ), -1 ) AS "Variance Sample" ' || ', SQRT( ' || 'Case ' || 'When ( "Sum x2" - "N" * "Avgx 2" ) / NULLIF( NULLIF( "N-1", 0 ), -1 ) >= 0 ' || 'Then ( "Sum x2" - "N" * "Avgx 2" ) / NULLIF( NULLIF( "N-1", 0 ), -1 ) ' || 'Else NULL ' || 'End ' || ') ' || '/ ' || 'NULLIF( "Avg", 0 ) AS "Coefficient Variation" ' || ', ( 1 + 10 / Cast( 3 AS Float ) * LOG( Case When "N" > 0 Then "N" Else NULL End ) ) AS "Sturges" ' || ', ( 2 * Power( "N",( 1 / Cast( 3 AS Float ))) ) AS "Rice" ' || ', ( SQRT( Case When "N" >= 0 Then "N" Else NULL End ) ) AS "Sqrt" ' || ', ( 1 + ( LOG( Case When "N" > 0 Then "N" Else NULL End ) / LOG( 2 ) ) ' || '+ ' || '( ' || 'LOG( ' || 'Case ' || 'When 1 + ' || '( ' || '( "1/N" * "Sum x3" - "Avgx 3" - 3 * "Avg" * (( "Sum x2" - ( "N" * "Avgx 2" )) / NULLIF( NULLIF( "N-1", 0 ), -1 ))) ' || '/ ' || 'NULLIF( ' || 'Power( ' || 'SQRT( ' || 'Case ' || 'When( "Sum x2" - ( "N" * "Avgx 2" )) / NULLIF( NULLIF( "N-1", 0 ), -1 ) >= 0 ' || 'Then( "Sum x2" - ( "N" * "Avgx 2" )) / NULLIF( NULLIF( "N-1", 0 ), -1 ) ' || 'Else NULL ' || 'End ' || ') ' || ', 3 ) ' || ', 0 ) ' || ') > 0 ' || 'Then 1 + ' || '( ' || '( "1/N" * "Sum x3" - "Avgx 3" - 3 * "Avg" * (( "Sum x2" - ( "N" * "Avgx 2" )) / NULLIF( NULLIF( "N-1", 0 ), -1 ))) ' || '/ ' || 'NULLIF( ' || 'Power( ' || 'SQRT( ' || 'Case ' || 'When( "Sum x2" - ( "N" * "Avgx 2" )) / NULLIF( NULLIF( "N-1", 0 ), -1 ) >= 0 ' || 'Then( "Sum x2" - ( "N" * "Avgx 2" )) / NULLIF( NULLIF( "N-1", 0 ), -1 ) ' || 'Else NULL ' || 'End ' || ') ' || ', 3 ) ' || ', 0 ) ' || ') ' || 'Else NULL ' || 'End ) ' || '/ ' || 'LOG( 2 ) ' || ') ' || ') ' || '/* Log_2 (x) = Log_10 (x) / Log_10 (2) */ AS "Doane" ' || ', ( ' || 'Cast( 3.49 AS Float ) ' || '* ' || 'SQRT( ' || 'Case ' || 'When ( "Sum x2" - ( "N" * "Avgx 2" )) / NULLIF( NULLIF( "N-1", 0 ), -1 ) >= 0 ' || 'Then ( "Sum x2" - ( "N" * "Avgx 2" )) / NULLIF( NULLIF( "N-1", 0 ), -1 ) ' || 'Else NULL ' || 'End ' || ') ' || '/ ' || 'Power( Case When "N" >= 0 Then "N" Else NULL End, ( -1 / Cast( 3 AS Float ))) ' || ') AS "Scott" ' || 'From ' || '( ' || 'Select ' || 'Cast( COUNT(*) AS Bigint ) AS "N w Null" ' || ', Cast( COUNT( "'|| col.column_name || '" ) AS Bigint ) AS "N" ' || ', ( 1 / NULLIF( Cast( COUNT( "'|| col.column_name || '" ) AS Float ), 0 ) ) AS "1/N" ' || ', Cast( COUNT( "'|| col.column_name || '" ) AS Bigint ) - 1 AS "N-1" ' || ', MIN( "'|| col.column_name || '") AS "Min" ' || ', MAX( "'|| col.column_name || '") AS "Max" ' || ', AVG( "'|| col.column_name || '") AS "Avg" ' || ', Power( AVG( "'|| col.column_name || '"), 2 ) AS "Avgx 2" ' || ', Power( AVG( "'|| col.column_name || '"), 3 ) AS "Avgx 3" ' || ', Power( AVG( "'|| col.column_name || '"), 4 ) AS "Avgx 4" ' || ', Sum( "'|| col.column_name || '") AS "Sum" ' || ', Sum( Power( "'|| col.column_name || '", 2 )) AS "Sum x2" /* Also called: "Uncorrected Sum of Squares" */ ' || ', Sum( Power( "'|| col.column_name || '", 3 )) AS "Sum x3" ' || ', Sum( Power( "'|| col.column_name || '", 4 )) AS "Sum x4" ' || ', Cast( COUNT( DISTINCT "'|| col.column_name || '" ) AS Bigint ) AS "Uniq" ' || ', Cast( Sum( Case When "'|| col.column_name || '" = 0 Then 1 Else 0 End ) AS Bigint ) AS "0" ' || ', Cast( Sum( Case When "'|| col.column_name || '" > 0 Then 1 Else 0 End ) AS Bigint ) AS "POS" ' || ', Cast( Sum( Case When "'|| col.column_name || '" < 0 Then 1 Else 0 End ) AS Bigint ) AS "NEG" ' || ', Cast( Sum( Case When "'|| col.column_name || '" is NULL Then 1 Else 0 End ) AS Bigint ) AS "Null" ' || ', EXP( AVG( LN( Case When "'|| col.column_name || '"> 0 Then "'|| col.column_name || '" Else NULL End )) ) AS "Geometric Mean" ' || ', 1 / NULLIF( AVG( 1 / NULLIF(( Case When "'|| col.column_name || '"> 0 Then "'|| col.column_name || '" Else NULL End ), 0 )), 0 ) AS "Harmonic Mean" ' || 'From ' || '( ' || 'Select ' || 'Cast( "'|| col.column_name || '" AS Float ) AS "'|| col.column_name || '" ' || 'From "'|| col.table_schema ||'"."'|| col.table_name ||'" ' || ') t ' || ') t ' || ') t ' AS "TheQuery" From information_schema.columns AS col Where 1=1 AND col.data_type IN ( 'numeric' , 'integer' , 'double precision' , 'bigint' , 'real' , 'smallint' ) ;
C'est dans cette table qu'on va aller chercher avec la SP, la requête à exécuter.
Ainsi, pour un certain schéma et une certaine table dans votre database, on dressera un profil de variation pour chaque colonne numérique de cette table.Partie 3:
La SP qui prend simplement la sous-partie des requêtes (liées aux colonnes numériques) à exécuter.
On retourne un résultat sous forme de table du DDL de __UNIVARIATE__DROP FUNCTION If Exists stat.f_udf_sql_trans_trim( col CHARACTER VARYING ) ; CREATE OR REPLACE FUNCTION stat.f_udf_sql_trans_trim( col CHARACTER VARYING ) RETURNS CHARACTER VARYING AS $func$ BEGIN RETURN TRIM( BOTH E'\\x0D' FROM /* gets rid of CR: 'Carriage Return' special character */ TRIM( BOTH E'\\x0A' FROM /* gets rid of LF: 'Line Feed' special character */ TRIM( BOTH E'\\x0B' FROM /* gets rid of vertical TAB: '\v' special character */ TRIM( BOTH E'\\x09' FROM /* gets rid of horizontal TAB: '\t' special character */ TRIM( col) ) ) ) ) /* gets rid of Space character: ' ' special character */ ; END ; $func$ LANGUAGE plpgsql VOLATILE RETURNS NULL ON NULL INPUT EXTERNAL SECURITY DEFINER ; Drop Function If Exists stat.f_sp_sql_stat_univariate( CHARACTER VARYING, CHARACTER VARYING) ; Create Or Replace Function stat.f_sp_sql_stat_univariate( IN sch VARCHAR( 63), IN tbl VARCHAR( 63) ) RETURNS SETOF __UNIVARIATE__ AS $func$ Declare q_row RECORD ; ddl VARCHAR := '' ; Begin For q_row IN Select * From stat.__QBUILDER_UNIVARIATE__ Where 1=1 And "SchemaName" = stat.f_udf_sql_trans_trim( sch ) And "TableName" = stat.f_udf_sql_trans_trim( tbl ) Loop ddl = 'INSERT INTO __UNIVARIATE__ ' || q_row."TheQuery" ; Execute ddl ; ddl := '' ; Return Query Execute q_row."TheQuery" ; End Loop ; Return ; End ; $func$ LANGUAGE plpgsql VOLATILE RETURNS NULL ON NULL INPUT EXTERNAL SECURITY DEFINER ;
J'ai testé cela sur les données d'iris des statisticiens.
begin ;
Create Schema If Not Exists data ;
Create Schema If Not Exists stat ;
Create Table If Not Exists data.iris
(
SepalLength Float
, SepalWidth Float
, PetalLength Float
, PetalWidth Float
, Species Character Varying
)
;
INSERT INTO data.iris VALUES( 5.1,3.5,1.4,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 4.9,3.0,1.4,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 4.7,3.2,1.3,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 4.6,3.1,1.5,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.0,3.6,1.4,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.4,3.9,1.7,0.4,'Iris-setosa');
INSERT INTO data.iris VALUES( 4.6,3.4,1.4,0.3,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.0,3.4,1.5,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 4.4,2.9,1.4,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 4.9,3.1,1.5,0.1,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.4,3.7,1.5,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 4.8,3.4,1.6,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 4.8,3.0,1.4,0.1,'Iris-setosa');
INSERT INTO data.iris VALUES( 4.3,3.0,1.1,0.1,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.8,4.0,1.2,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.7,4.4,1.5,0.4,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.4,3.9,1.3,0.4,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.1,3.5,1.4,0.3,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.7,3.8,1.7,0.3,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.1,3.8,1.5,0.3,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.4,3.4,1.7,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.1,3.7,1.5,0.4,'Iris-setosa');
INSERT INTO data.iris VALUES( 4.6,3.6,1.0,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.1,3.3,1.7,0.5,'Iris-setosa');
INSERT INTO data.iris VALUES( 4.8,3.4,1.9,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.0,3.0,1.6,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.0,3.4,1.6,0.4,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.2,3.5,1.5,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.2,3.4,1.4,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 4.7,3.2,1.6,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 4.8,3.1,1.6,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.4,3.4,1.5,0.4,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.2,4.1,1.5,0.1,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.5,4.2,1.4,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 4.9,3.1,1.5,0.1,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.0,3.2,1.2,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.5,3.5,1.3,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 4.9,3.1,1.5,0.1,'Iris-setosa');
INSERT INTO data.iris VALUES( 4.4,3.0,1.3,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.1,3.4,1.5,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.0,3.5,1.3,0.3,'Iris-setosa');
INSERT INTO data.iris VALUES( 4.5,2.3,1.3,0.3,'Iris-setosa');
INSERT INTO data.iris VALUES( 4.4,3.2,1.3,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.0,3.5,1.6,0.6,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.1,3.8,1.9,0.4,'Iris-setosa');
INSERT INTO data.iris VALUES( 4.8,3.0,1.4,0.3,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.1,3.8,1.6,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 4.6,3.2,1.4,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.3,3.7,1.5,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 5.0,3.3,1.4,0.2,'Iris-setosa');
INSERT INTO data.iris VALUES( 7.0,3.2,4.7,1.4,'Iris-versicolor');
INSERT INTO data.iris VALUES( 6.4,3.2,4.5,1.5,'Iris-versicolor');
INSERT INTO data.iris VALUES( 6.9,3.1,4.9,1.5,'Iris-versicolor');
INSERT INTO data.iris VALUES( 5.5,2.3,4.0,1.3,'Iris-versicolor');
INSERT INTO data.iris VALUES( 6.5,2.8,4.6,1.5,'Iris-versicolor');
INSERT INTO data.iris VALUES( 5.7,2.8,4.5,1.3,'Iris-versicolor');
INSERT INTO data.iris VALUES( 6.3,3.3,4.7,1.6,'Iris-versicolor');
INSERT INTO data.iris VALUES( 4.9,2.4,3.3,1.0,'Iris-versicolor');
INSERT INTO data.iris VALUES( 6.6,2.9,4.6,1.3,'Iris-versicolor');
INSERT INTO data.iris VALUES( 5.2,2.7,3.9,1.4,'Iris-versicolor');
INSERT INTO data.iris VALUES( 5.0,2.0,3.5,1.0,'Iris-versicolor');
INSERT INTO data.iris VALUES( 5.9,3.0,4.2,1.5,'Iris-versicolor');
INSERT INTO data.iris VALUES( 6.0,2.2,4.0,1.0,'Iris-versicolor');
INSERT INTO data.iris VALUES( 6.1,2.9,4.7,1.4,'Iris-versicolor');
INSERT INTO data.iris VALUES( 5.6,2.9,3.6,1.3,'Iris-versicolor');
INSERT INTO data.iris VALUES( 6.7,3.1,4.4,1.4,'Iris-versicolor');
INSERT INTO data.iris VALUES( 5.6,3.0,4.5,1.5,'Iris-versicolor');
INSERT INTO data.iris VALUES( 5.8,2.7,4.1,1.0,'Iris-versicolor');
INSERT INTO data.iris VALUES( 6.2,2.2,4.5,1.5,'Iris-versicolor');
INSERT INTO data.iris VALUES( 5.6,2.5,3.9,1.1,'Iris-versicolor');
INSERT INTO data.iris VALUES( 5.9,3.2,4.8,1.8,'Iris-versicolor');
INSERT INTO data.iris VALUES( 6.1,2.8,4.0,1.3,'Iris-versicolor');
INSERT INTO data.iris VALUES( 6.3,2.5,4.9,1.5,'Iris-versicolor');
INSERT INTO data.iris VALUES( 6.1,2.8,4.7,1.2,'Iris-versicolor');
INSERT INTO data.iris VALUES( 6.4,2.9,4.3,1.3,'Iris-versicolor');
INSERT INTO data.iris VALUES( 6.6,3.0,4.4,1.4,'Iris-versicolor');
INSERT INTO data.iris VALUES( 6.8,2.8,4.8,1.4,'Iris-versicolor');
INSERT INTO data.iris VALUES( 6.7,3.0,5.0,1.7,'Iris-versicolor');
INSERT INTO data.iris VALUES( 6.0,2.9,4.5,1.5,'Iris-versicolor');
INSERT INTO data.iris VALUES( 5.7,2.6,3.5,1.0,'Iris-versicolor');
INSERT INTO data.iris VALUES( 5.5,2.4,3.8,1.1,'Iris-versicolor');
INSERT INTO data.iris VALUES( 5.5,2.4,3.7,1.0,'Iris-versicolor');
INSERT INTO data.iris VALUES( 5.8,2.7,3.9,1.2,'Iris-versicolor');
INSERT INTO data.iris VALUES( 6.0,2.7,5.1,1.6,'Iris-versicolor');
INSERT INTO data.iris VALUES( 5.4,3.0,4.5,1.5,'Iris-versicolor');
INSERT INTO data.iris VALUES( 6.0,3.4,4.5,1.6,'Iris-versicolor');
INSERT INTO data.iris VALUES( 6.7,3.1,4.7,1.5,'Iris-versicolor');
INSERT INTO data.iris VALUES( 6.3,2.3,4.4,1.3,'Iris-versicolor');
INSERT INTO data.iris VALUES( 5.6,3.0,4.1,1.3,'Iris-versicolor');
INSERT INTO data.iris VALUES( 5.5,2.5,4.0,1.3,'Iris-versicolor');
INSERT INTO data.iris VALUES( 5.5,2.6,4.4,1.2,'Iris-versicolor');
INSERT INTO data.iris VALUES( 6.1,3.0,4.6,1.4,'Iris-versicolor');
INSERT INTO data.iris VALUES( 5.8,2.6,4.0,1.2,'Iris-versicolor');
INSERT INTO data.iris VALUES( 5.0,2.3,3.3,1.0,'Iris-versicolor');
INSERT INTO data.iris VALUES( 5.6,2.7,4.2,1.3,'Iris-versicolor');
INSERT INTO data.iris VALUES( 5.7,3.0,4.2,1.2,'Iris-versicolor');
INSERT INTO data.iris VALUES( 5.7,2.9,4.2,1.3,'Iris-versicolor');
INSERT INTO data.iris VALUES( 6.2,2.9,4.3,1.3,'Iris-versicolor');
INSERT INTO data.iris VALUES( 5.1,2.5,3.0,1.1,'Iris-versicolor');
INSERT INTO data.iris VALUES( 5.7,2.8,4.1,1.3,'Iris-versicolor');
INSERT INTO data.iris VALUES( 6.3,3.3,6.0,2.5,'Iris-virginica');
INSERT INTO data.iris VALUES( 5.8,2.7,5.1,1.9,'Iris-virginica');
INSERT INTO data.iris VALUES( 7.1,3.0,5.9,2.1,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.3,2.9,5.6,1.8,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.5,3.0,5.8,2.2,'Iris-virginica');
INSERT INTO data.iris VALUES( 7.6,3.0,6.6,2.1,'Iris-virginica');
INSERT INTO data.iris VALUES( 4.9,2.5,4.5,1.7,'Iris-virginica');
INSERT INTO data.iris VALUES( 7.3,2.9,6.3,1.8,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.7,2.5,5.8,1.8,'Iris-virginica');
INSERT INTO data.iris VALUES( 7.2,3.6,6.1,2.5,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.5,3.2,5.1,2.0,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.4,2.7,5.3,1.9,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.8,3.0,5.5,2.1,'Iris-virginica');
INSERT INTO data.iris VALUES( 5.7,2.5,5.0,2.0,'Iris-virginica');
INSERT INTO data.iris VALUES( 5.8,2.8,5.1,2.4,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.4,3.2,5.3,2.3,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.5,3.0,5.5,1.8,'Iris-virginica');
INSERT INTO data.iris VALUES( 7.7,3.8,6.7,2.2,'Iris-virginica');
INSERT INTO data.iris VALUES( 7.7,2.6,6.9,2.3,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.0,2.2,5.0,1.5,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.9,3.2,5.7,2.3,'Iris-virginica');
INSERT INTO data.iris VALUES( 5.6,2.8,4.9,2.0,'Iris-virginica');
INSERT INTO data.iris VALUES( 7.7,2.8,6.7,2.0,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.3,2.7,4.9,1.8,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.7,3.3,5.7,2.1,'Iris-virginica');
INSERT INTO data.iris VALUES( 7.2,3.2,6.0,1.8,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.2,2.8,4.8,1.8,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.1,3.0,4.9,1.8,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.4,2.8,5.6,2.1,'Iris-virginica');
INSERT INTO data.iris VALUES( 7.2,3.0,5.8,1.6,'Iris-virginica');
INSERT INTO data.iris VALUES( 7.4,2.8,6.1,1.9,'Iris-virginica');
INSERT INTO data.iris VALUES( 7.9,3.8,6.4,2.0,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.4,2.8,5.6,2.2,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.3,2.8,5.1,1.5,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.1,2.6,5.6,1.4,'Iris-virginica');
INSERT INTO data.iris VALUES( 7.7,3.0,6.1,2.3,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.3,3.4,5.6,2.4,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.4,3.1,5.5,1.8,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.0,3.0,4.8,1.8,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.9,3.1,5.4,2.1,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.7,3.1,5.6,2.4,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.9,3.1,5.1,2.3,'Iris-virginica');
INSERT INTO data.iris VALUES( 5.8,2.7,5.1,1.9,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.8,3.2,5.9,2.3,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.7,3.3,5.7,2.5,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.7,3.0,5.2,2.3,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.3,2.5,5.0,1.9,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.5,3.0,5.2,2.0,'Iris-virginica');
INSERT INTO data.iris VALUES( 6.2,3.4,5.4,2.3,'Iris-virginica');
INSERT INTO data.iris VALUES( 5.9,3.0,5.1,1.8,'Iris-virginica');
commit ;
On lance la SP en appelant:
Select * From stat.f_sp_sql_stat_univariate('data','iris') ;
Cela fonctionne mais je suis très newbie en procédure stockée et en best practice, donc à vos commentaires assassins ;-)
N'hésitez pas à critiquer et ré-engineeré la manière.
Merci d'avoir lu jusqu'ici !
Sotos
Hors ligne
#2 14/12/2017 18:17:27
- gleu
- Administrateur
Re : Procédure Stockée d'Analyse Statistique en SQL
A priori, la fonction f_sp_sql_stat_univariate peut très bien être remplacée par une seule requête, du style (pas testé) :
INSERT INTO __UNIVARIATE__
Select "TheQuery" From stat.__QBUILDER_UNIVARIATE__
Where "SchemaName" = stat.f_udf_sql_trans_trim( sch )
And "TableName" = stat.f_udf_sql_trans_trim( tbl )
RETURNING "TheQuery"
Guillaume.
Hors ligne
#3 15/12/2017 11:18:29
- Sotos
- Membre
Re : Procédure Stockée d'Analyse Statistique en SQL
Hello Guillaume,
Merci de ta réponse,
Plus généralement, est-ce que le monde PostgreSQL a besoin de ce genre de matériel ?
Cela peut aussi être une UDF codée en langage externe "protégé", ou un autre type d'objet de la database.
Il n'y a pas d'index ni de PK sur la table à analyser, parce qu'apriori, je ne sais absolument rien de la démographie de la table.
On se place dans le cas où on analyse des données qui sont uniquement "bien" typées (et c'est déjà pas mal...).
Peut-être que ces fonctions existent déjà ? Je sais que c'est possible dans d'autres technos mais je ne veux pas de mouvement de données, la data reste en DB (je connais le SAS in-database, les éditeurs propriétaires avec le built-in functions analytiques oracle, teradata, les add-ons comme Fuzzy Logix, etc bref.
PostgreSQL semble manquer de fonctions built-in d'analyse statistique et je ne suis pas convaincu que la DB ne puisse pas faire ce travail correctement ==> la stat suppose de petite volumétrie donc facile à envoyer sur une session SAS, R, Python, mais allez encore une fois, restons en DB ;-).
Aussi, si vous avez des tutos pour "packager" un ensemble de SPs et l'installer en un click, ça serait cool :-)
Merci !
Sotos
Hors ligne
#4 15/12/2017 20:06:22
- gleu
- Administrateur
Re : Procédure Stockée d'Analyse Statistique en SQL
Plus généralement, est-ce que le monde PostgreSQL a besoin de ce genre de matériel ?
Je ne comprends pas la question, ça va être difficile de répondre
Aussi, si vous avez des tutos pour "packager" un ensemble de SPs et l'installer en un click, ça serait cool :-)
Regardez du côté des extensions, c'est fait pour. Voir https://docs.postgresql.fr/10/extend-extensions.html pour les détails.
Guillaume.
Hors ligne
#5 18/12/2017 10:45:21
- Sotos
- Membre
Re : Procédure Stockée d'Analyse Statistique en SQL
En disant matériel, je voulais dire "le code" :-), un portfolio de fonctions d'analyse statistique allant plus loin que la régression linéaire simple.
Hors ligne
Pages : 1