# Data Transform The transformation module is responsible for making the data transformations. ### SQL transformation This module makes SQL expressions available to transform. Example: ~~~Python from pyiris.ingestion.transform import SqlTransformation sql_transformation = SqlTransformation(name='divide', description='Unit price division', to_column="unit_price", sql_expression="price/quantity") transformed_dataframe = sql_transformation.transform(dataframe=extracted_dataset) ~~~ ### Hash transformation This module returns a hash transformation based on an inputted column. Example: ~~~Python from pyiris.ingestion.transform import HashTransformation hash_transformation = HashTransformation(name='Hash CPF', description='Hash CPF to be according to LGPD', from_columns=["cpf"]) transformed_dataframe = hash_transformation.transform(dataframe=extracted_dataset) ~~~ ### Spark Transformation This module will enable users to define one (or multiple) spark transformations to be applied to the dataframe. They can either define their own UDFs to be applied or make use of `pyspark.sql.functions` module. An example of use: ```python import pyspark.sql.functions as f from pyiris.ingestion.transform import SparkTransformation spark_transformation = SparkTransformation(name="circular_transformations", description="Circular calculations on salary", from_column="salary", functions=[f.cos, f.sin, f.tan]) transformed_dataframe = spark_transformation.transform(dataframe=extracted_dataframe) ``` The user will also be able to define aggregated calculations on the desired column, using one (or more) transformation window definitions with our other `pyiris.ingestion.transform.transform_window.TransformWindow` module, as shown below: ```python import pyspark.sql.functions as f from pyiris.ingestion.transform import SparkTransformation from pyiris.ingestion.transform.transform_window import TransformWindow range_window = TransformWindow.build_with_range( window_name="range_window", partition_by="department", order_by="user_id", upper_bound=4, lower_bound=0 ) spark_transformation = SparkTransformation(name="window_transformations", description="Window calculations on salary", from_column="salary", functions=[f.sum, f.min, f.max, f.count, f.avg], windows=[range_window]) transformed_dataframe = spark_transformation.transform(dataframe=extracted_dataframe) ``` Make sure you check the documentation for the `TransformWindow` module, so you'll know exactly how to properly define your transformation window. ### Custom transformation This module gives for the user tools to customize the dataframe, with the main custom features. Example of uses: ~~~Python from pyiris.ingestion.transform.transformations.custom.custom import divide from pyiris.ingestion.transform.transformations.custom_transformation import CustomTransformation custom_transformation = CustomTransformation(name='middle_price', description='Dividing two fictitious columns (price/quantity) to generate column middle_price', method=divide, to_column='middle_price', column1='price', column2='quantity') transformed_dataframe = custom_transformation.transform(dataframe=extracted_dataset) ~~~ #### Custom transformation - snakecase_column_names This method intends to rename all columns of a given dataframe to snake case. The transformations applied are: - replacing letters containing accents and special characters (e.g. replacing 'á', 'à', 'ã' or 'â' to 'a'); - replacing uppercase letters with underscore and lowercase; - removing duplicated undescore; - removing leading and trailing undescore; - removing all characters that are not allowed (a-z0-9_) ###### Example code ~~~Python from pyiris.ingestion.transform.transformations.custom.custom import snakecase_column_names from pyiris.ingestion.transform.transformations.custom_transformation import CustomTransformation custom_transformation = CustomTransformation(name="snakecase_column_names", description="rename columns to snake case", method=snakecase_column_names) transformed_dataframe = custom_transformation.transform(dataframe=extracted_dataset) ~~~ ###### Example outputs already_snake_case_column_name → already_snake_case_column_name notSNAKECaseColumnNameOne → not_snake_case_column_name_one NÕTSnákêCãsèColùmnNãmêTWÕ → not_snake_case_column_name_two ## Transform Service The class **pyiris.ingestion.transform.TransformService** works as a service. You can execute some transformations in sequence, or only one. Follow the example of uses: ~~~Python from pyiris.ingestion.transform import TransformService, HashTransformation, SqlTransformation transform_service = TransformService( transformations=[ SqlTransformation(name='divide', description='Getting middle price', to_column="middle_price", sql_expression="price/quantity"), HashTransformation(name='Hash CPF', description='Hash CPF to be according to LGPD', from_columns=["seller_cpf"]) ] ) transformed_dataframe = transform_service.handler(dataframe=extracted_dataset) ~~~ To have more information and to know better how to use some module, please, access the code docstrings in our **Pyiris modules** section on the left panel.