Data transform

The transformation module is responsible for receiving a Spark dataframe and transforming it with the main transformations of the platform.

SQL transformation

This module makes SQL expressions available to transform. Example:

from pyiris.ingestion.transform import SqlTransformation

sql_transformation = SqlTransformation(name='divide', 
                                       description='Unit price division', 
                                       to_column="cost", 
                                       sql_expression="price/unit")

transformed_dataframe = sql_transformation.transform(dataframe=extracted_dataframe)

Hash transformation

This module returns a hash transformation based on an inputted column. Example:

from pyiris.ingestion.transform import HashTransformation

hash_transformation = HashTransformation(name='Hash CPF', 
                                         description='Hash CPF to be accord of LGPD', 
                                         from_columns=["cpf"])

transformed_dataframe = hash_transformation.transform(dataframe=extracted_dataframe)

Custom transformation

This module gives for the user tools to customize the dataframe, with the main custom features. Example of uses:

from pyiris.ingestion.transform.transformations.custom.custom import divide
from pyiris.ingestion.transform.transformations.custom_transformation import CustomTransformation

custom_transformation = CustomTransformation(name='divisao_preco_medio',
                                             description='Dividing two fictitious columns (valor_venda/quantidade) to generate column praco_medio',
                                             method=divide,
                                             to_column='preco_medio', 
                                             column1='valor_venda', 
                                             column2='quantidade')

transformed_dataframe = custom_transformation.transform(dataframe=extracted_dataframe)

Transform Service

The class pyiris.ingestion.transform.TransformService works as a service. You can execute some transformations in sequence, or only one. Follow the example of uses:

from pyiris.ingestion.transform import TransformService, HashTransformation, SqlTransformation

transform_service = TransformService(
    transformations=[
        SqlTransformation(name='divide', 
                          description='Test just to exemplify with two int columns', 
                          to_column="teste", 
                          sql_expression="ID/ID_QUEM_REPORTOU"),
        HashTransformation(name='Hash CPF', 
                           description='Hash CPF to be accord of LGPD', 
                           from_columns=["cpf"])
    ]

)
transformed_dataframe = transform_service.handler(dataframe=extracted_dataframe)

To have more information, please, access the code docstring in Pyiris modules.