Skip to content

Commit

Permalink
Merge pull request #110 from HuemulSolutions/develop_2.5
Browse files Browse the repository at this point in the history
Develop 2.5
  • Loading branch information
huemulDeveloper authored Apr 7, 2020
2 parents 45ee105 + 72e63c3 commit 3347370
Show file tree
Hide file tree
Showing 14 changed files with 1,180 additions and 288 deletions.
16 changes: 11 additions & 5 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,16 @@
<modelVersion>4.0.0</modelVersion>
<groupId>com.huemulsolutions.bigdata</groupId>
<artifactId>huemul-bigdatagovernance</artifactId>
<version>2.4</version>
<version>2.5</version>
<name>HuemulSolutions - BigDataGovernance</name>
<description>Enable full data quality and data lineage for BigData Projects.
Huemul BigDataGovernance, es una librería que trabaja sobre Spark, Hive y HDFS. Permite la implementación de una **estrategia corporativa de dato único**, basada en buenas prácticas de Gobierno de Datos.
Huemul BigDataGovernance, es una librería que trabaja sobre Spark, Hive y HDFS. Permite la implementación de una **estrategia corporativa de dato único**, basada en buenas prácticas de Gobierno de Datos.

Permite implementar tablas con control de Primary Key y Foreing Key al insertar y actualizar datos utilizando la librería, Validación de nulos, largos de textos, máximos/mínimos de números y fechas, valores únicos y valores por default. También permite clasificar los campos en aplicabilidad de derechos ARCO para facilitar la implementación de leyes de protección de datos tipo GDPR, identificar los niveles de seguridad y si se está aplicando algún tipo de encriptación. Adicionalmente permite agregar reglas de validación más complejas sobre la misma tabla.

Facilita la configuración y lectura de las interfaces de entrada, permitiendo ajustar los parámetros de lectura en esquemas altamente cambientes, crea trazabilidad de las interfaces con las tablas en forma automática, y almacena los diccionarios de datos en un repositorio central.

Finalmente, también automatiza la generación de código a partir de las definiciones de las interfaces de entrada, y la creación del código inicial de lógica de negocio.

</description>
Finalmente, también automatiza la generación de código a partir de las definiciones de las interfaces de entrada, y la creación del código inicial de lógica de negocio.</description>
<url>http://www.HuemulSolutions.com</url>
<inceptionYear>2018</inceptionYear>
<licenses>
Expand Down Expand Up @@ -159,6 +157,14 @@ Finalmente, también automatiza la generación de código a partir de las defini
<version>1.1.2</version>
</dependency>

<!-- https://mvnrepository.com/artifact/com.hortonworks.hive/hive-warehouse-connector -->
<dependency>
<groupId>com.hortonworks.hive</groupId>
<artifactId>hive-warehouse-connector_2.11</artifactId>
<version>1.0.0.3.1.0.0-78</version>
</dependency>




<!-- https://mvnrepository.com/artifact/org.apache.tika/tika-core -->
Expand Down
474 changes: 474 additions & 0 deletions src/main/resources/Instalacion/huemul_bdg_2.5_minor.sql

Large diffs are not rendered by default.

44 changes: 44 additions & 0 deletions src/main/resources/Instalacion/huemul_cluster_setting_2.5.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
#!/bin/bash
clear
echo "Creating HDFS Paths: START"
hdfs dfs -mkdir /user
hdfs dfs -mkdir /user/data
hdfs dfs -mkdir /user/data/production
hdfs dfs -mkdir /user/data/production/temp
hdfs dfs -mkdir /user/data/production/raw
hdfs dfs -mkdir /user/data/production/master
hdfs dfs -mkdir /user/data/production/dim
hdfs dfs -mkdir /user/data/production/analytics
hdfs dfs -mkdir /user/data/production/reporting
hdfs dfs -mkdir /user/data/production/sandbox
hdfs dfs -mkdir /user/data/production/dqerror
hdfs dfs -mkdir /user/data/production/mdm_oldvalue
hdfs dfs -mkdir /user/data/production/backup
hdfs dfs -mkdir /user/data/experimental
hdfs dfs -mkdir /user/data/experimental/temp
hdfs dfs -mkdir /user/data/experimental/raw
hdfs dfs -mkdir /user/data/experimental/master
hdfs dfs -mkdir /user/data/experimental/dim
hdfs dfs -mkdir /user/data/experimental/analytics
hdfs dfs -mkdir /user/data/experimental/reporting
hdfs dfs -mkdir /user/data/experimental/sandbox
hdfs dfs -mkdir /user/data/experimental/dqerror
hdfs dfs -mkdir /user/data/experimental/mdm_oldvalue
hdfs dfs -mkdir /user/data/experimental/backup
echo "Creating HDFS Paths: FINISH"
echo "STARTING HIVE SETUP"
hive -e "CREATE DATABASE production_master"
hive -e "CREATE DATABASE experimental_master"
hive -e "CREATE DATABASE production_dim"
hive -e "CREATE DATABASE experimental_dim"
hive -e "CREATE DATABASE production_analytics"
hive -e "CREATE DATABASE experimental_analytics"
hive -e "CREATE DATABASE production_reporting"
hive -e "CREATE DATABASE experimental_reporting"
hive -e "CREATE DATABASE production_sandbox"
hive -e "CREATE DATABASE experimental_sandbox"
hive -e "CREATE DATABASE production_DQError"
hive -e "CREATE DATABASE experimental_DQError"
hive -e "CREATE DATABASE production_mdm_oldvalue"
hive -e "CREATE DATABASE experimental_mdm_oldvalue"
echo "STARTING HIVE SETUP"
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@

ALTER TABLE control_columns add column_datatypedeploy varchar(50);

UPDATE control_columns SET column_datatypedeploy = column_datatype WHERE column_datatypedeploy IS NULL;


UPDATE control_config
SET version_mayor = 2
,version_minor = 5
,version_patch = 0
where config_id = 1;



Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ import org.apache.log4j.Level
* @param LocalSparkSession(opcional) permite enviar una sesión de Spark ya iniciada.
*/
class huemul_BigDataGovernance (appName: String, args: Array[String], globalSettings: huemul_GlobalPath, LocalSparkSession: SparkSession = null) extends Serializable {
val currentVersion: String = "2.4"
val currentVersion: String = "2.5"
val GlobalSettings = globalSettings
val warehouseLocation = new File("spark-warehouse").getAbsolutePath
//@transient lazy val log_info = org.apache.log4j.LogManager.getLogger(s"$appName [with huemul]")
Expand Down Expand Up @@ -504,8 +504,12 @@ class huemul_BigDataGovernance (appName: String, args: Array[String], globalSett
""")
}

private var hive_HWC: huemul_ExternalHWC = null
def getHive_HWC: huemul_ExternalHWC = {return hive_HWC}


if (GlobalSettings.externalBBDD_conf.Using_HWC.getActive() == true || GlobalSettings.externalBBDD_conf.Using_HWC.getActiveForHBASE() == true) {
hive_HWC = new huemul_ExternalHWC(this)
}

/*********************
* START METHOD
Expand Down Expand Up @@ -537,6 +541,13 @@ class huemul_BigDataGovernance (appName: String, args: Array[String], globalSett
connHIVE.connection.close()
}
}

//FROM 2.5 --> ADD HORTONWORKS WAREHOSUE CONNECTOR
if (GlobalSettings.externalBBDD_conf.Using_HWC.getActive() == true || GlobalSettings.externalBBDD_conf.Using_HWC.getActiveForHBASE() == true) {
if (getHive_HWC != null)
getHive_HWC.close
}

}

def close() {
Expand Down Expand Up @@ -845,6 +856,10 @@ class huemul_BigDataGovernance (appName: String, args: Array[String], globalSett
_huemul_showDemoLines = value
}

//replicated in huemul_columns
def getCaseType(tableStorage: com.huemulsolutions.bigdata.tables.huemulType_StorageType.huemulType_StorageType, value: String): String = {
return if (tableStorage == com.huemulsolutions.bigdata.tables.huemulType_StorageType.AVRO) value.toLowerCase() else value
}

/**
* Get execution Id from spark monitoring url
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -870,7 +870,7 @@ class huemul_DataFrame(huemulBigDataGov: huemul_BigDataGovernance, Control: huem
Values.Table_Name =dfTableName
Values.BBDD_Name =dfDataBaseName
Values.DF_Alias =AliasToQuery
Values.ColumnName =if (x.getFieldName == null) null else x.getFieldName.get_MyName()
Values.ColumnName =if (x.getFieldName == null) null else x.getFieldName.get_MyName(dMaster.getStorageType)
Values.DQ_Name =x.getMyName()
Values.DQ_Description =s"(Id ${x.getId}) ${x.getDescription}"
Values.DQ_QueryLevel =x.getQueryLevel() // .getDQ_QueryLevel
Expand Down Expand Up @@ -910,7 +910,7 @@ class huemul_DataFrame(huemulBigDataGov: huemul_BigDataGovernance, Control: huem
val SQL_Detail = DQ_GenQuery(AliasToQuery
,s"not (${x.getSQLFormula()})"
,!(x.getFieldName == null) //asField
,if (x.getFieldName == null) "all" else x.getFieldName.get_MyName() //fieldName
,if (x.getFieldName == null) "all" else x.getFieldName.get_MyName(dMaster.getStorageType) //fieldName
,Values.DQ_Id
,x.getNotification()
,x.getErrorCode()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,7 @@ package com.huemulsolutions.bigdata.common
class huemul_ExternalDB() extends Serializable {
var Using_SPARK: huemul_ExternalDBType = new huemul_ExternalDBType().setActive(true).setActiveForHBASE(false)
var Using_HIVE: huemul_ExternalDBType = new huemul_ExternalDBType()
var Using_HWC: huemul_ExternalDBType = new huemul_ExternalDBType()


}
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
package com.huemulsolutions.bigdata.common

import com.hortonworks.hwc.HiveWarehouseSession
import org.apache.spark.sql._
import com.hortonworks.spark.sql.hive.llap.HiveWarehouseSessionImpl

/**
* connect using Hortonworks Warehouse connector
* used by huemul_ExternalDB.Using_HWC
*/
class huemul_ExternalHWC(huemulBigDataGov: huemul_BigDataGovernance) extends Serializable {
@transient private var _HWC_Hive: HiveWarehouseSessionImpl = null
def getHWC_Hive: HiveWarehouseSessionImpl = {
if (_HWC_Hive != null)
return _HWC_Hive

_HWC_Hive = HiveWarehouseSession.session(huemulBigDataGov.spark).build()


return _HWC_Hive
}

def execute_NoResulSet(sql: String): Boolean = {
val _hive = getHWC_Hive
if (_hive == null)
sys.error("can't connect with HIVE, HiveWarehouseSession.session doesnt works")

return _hive.executeUpdate(sql)
}

def close {
val _hive = getHWC_Hive
if (_hive != null)
_hive.session().close()
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,16 @@ class huemul_GlobalPath() extends Serializable {
*
*/

//FROM 2.5
//ADD AVRO SUPPORT
private var _avro_format: String = "com.databricks.spark.avro"
def getAVRO_format(): String = {return _avro_format}
def setAVRO_format(value: String) {_avro_format = value}

private var _avro_compression: String = "snappy"
def getAVRO_compression(): String = {return _avro_compression}
def setAVRO_compression(value: String) {_avro_compression = value}


/**
Returns true if path has value, otherwise return false
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -813,10 +813,11 @@ class huemul_Control (phuemulBigDataGov: huemul_BigDataGovernance, ControlParent
control_Columns_addOrUpd( LocalNewTable_id
,Column_Id
,i
,x.get_MyName()
,x.get_MyName(DefMaster.getStorageType)
,x.Description
,null //--as Column_Formula
,x.DataType.sql
,x.getDataTypeDeploy(huemulBigDataGov.GlobalSettings.getBigDataProvider(), DefMaster.getStorageType).sql
,false //--as Column_SensibleData
,x.getMDM_EnableDTLog
,x.getMDM_EnableOldValue
Expand Down Expand Up @@ -863,9 +864,9 @@ class huemul_Control (phuemulBigDataGov: huemul_BigDataGovernance, ControlParent
x.Relationship.foreach { y =>
control_TablesRelCol_add (IdRel
,PK_Id
,y.PK.get_MyName()
,y.PK.get_MyName(InstanceTable.getStorageType)
,LocalNewTable_id
,y.FK.get_MyName()
,y.FK.get_MyName(DefMaster.getStorageType)
,Control_ClassName)
}
}
Expand Down Expand Up @@ -2283,7 +2284,7 @@ class huemul_Control (phuemulBigDataGov: huemul_BigDataGovernance, ControlParent
huemulBigDataGov.logMessageInfo("control version: insert new version")
val ExecResultCol = huemulBigDataGov.CONTROL_connection.ExecuteJDBC_NoResulSet(s"""
INSERT INTO control_config (config_id, version_mayor, version_minor, version_patch)
VALUES (1,2,2,0)
VALUES (1,2,5,0)
""")

_version_mayor = 2
Expand Down Expand Up @@ -2483,6 +2484,7 @@ class huemul_Control (phuemulBigDataGov: huemul_BigDataGovernance, ControlParent
,p_Column_Description: String
,p_Column_Formula: String
,p_Column_DataType: String
,p_Column_DataTypeDeploy: String
,p_Column_SensibleData: Boolean
,p_Column_EnableDTLog: Boolean
,p_Column_EnableOldValue: Boolean
Expand Down Expand Up @@ -2526,7 +2528,8 @@ class huemul_Control (phuemulBigDataGov: huemul_BigDataGovernance, ControlParent
SET column_position = ${p_Column_Position}
,column_description = CASE WHEN mdm_manualchange = 1 THEN column_description ELSE ${ReplaceSQLStringNulls(p_Column_Description,1000)} END
,column_formula = CASE WHEN mdm_manualchange = 1 THEN column_formula ELSE ${ReplaceSQLStringNulls(p_Column_Formula,1000)} END
,column_datatype = ${ReplaceSQLStringNulls(p_Column_DataType,50)}
,column_datatype = ${ReplaceSQLStringNulls(p_Column_DataType,50)}
${if (getVersionFull() >= 20500) s",column_datatypedeploy = ${ReplaceSQLStringNulls(p_Column_DataTypeDeploy,50)} " else ""}
,column_sensibledata = CASE WHEN mdm_manualchange = 1 THEN column_sensibledata ELSE ${if (p_Column_SensibleData) "1" else "0"} END
,column_enabledtlog = ${if (p_Column_EnableDTLog) "1" else "0"}
,column_enableoldvalue = ${if (p_Column_EnableOldValue) "1" else "0"}
Expand Down Expand Up @@ -2561,6 +2564,7 @@ class huemul_Control (phuemulBigDataGov: huemul_BigDataGovernance, ControlParent
,column_description
,column_formula
,column_datatype
${if (getVersionFull() >= 20500) s",column_datatypedeploy" else ""}
,column_sensibledata
,column_enabledtlog
,column_enableoldvalue
Expand Down Expand Up @@ -2592,6 +2596,7 @@ class huemul_Control (phuemulBigDataGov: huemul_BigDataGovernance, ControlParent
,${ReplaceSQLStringNulls(p_Column_Description,1000)}
,${ReplaceSQLStringNulls(p_Column_Formula,1000)}
,${ReplaceSQLStringNulls(p_Column_DataType,50)}
${if (getVersionFull() >= 20500) s",${ReplaceSQLStringNulls(p_Column_DataTypeDeploy,50)}" else ""}
,${if (p_Column_SensibleData) "1" else "0"}
,${if (p_Column_EnableDTLog) "1" else "0"}
,${if (p_Column_EnableOldValue) "1" else "0"}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@ package com.huemulsolutions.bigdata.datalake

object huemulType_FileType extends Enumeration {
type huemulType_FileType = Value
val TEXT_FILE, PDF_FILE = Value
val TEXT_FILE, PDF_FILE, AVRO_FILE, PARQUET_FILE, ORC_FILE, DELTA_FILE = Value

}
}
Loading

0 comments on commit 3347370

Please sign in to comment.