# Data Transform
The transformation module is responsible for making the data transformations.
### SQL transformation
This module makes SQL expressions available to transform. Example:

~~~Python
from pyiris.ingestion.transform import SqlTransformation

sql_transformation = SqlTransformation(name='divide', 
                                       description='Unit price division',
                                       to_column="unit_price", 
                                       sql_expression="price/quantity")

transformed_dataframe = sql_transformation.transform(dataframe=extracted_dataset)
~~~

### Hash transformation
This module returns a hash transformation based on an inputted column. Example:

~~~Python
from pyiris.ingestion.transform import HashTransformation

hash_transformation = HashTransformation(name='Hash CPF', 
                                         description='Hash CPF to be according to LGPD', 
                                         from_columns=["cpf"])

transformed_dataframe = hash_transformation.transform(dataframe=extracted_dataset)
~~~

### Spark Transformation
This module will enable users to define one (or multiple) spark transformations to be applied to the dataframe.
They can either define their own UDFs to be applied or make use of `pyspark.sql.functions` module. An example of use:

```python
import pyspark.sql.functions as f

from pyiris.ingestion.transform import SparkTransformation

spark_transformation = SparkTransformation(name="circular_transformations",
                                           description="Circular calculations on salary",
                                           from_column="salary",
                                           functions=[f.cos, f.sin, f.tan])

transformed_dataframe = spark_transformation.transform(dataframe=extracted_dataframe)
```

The user will also be able to define aggregated calculations on the desired column, using one (or more) transformation 
window definitions with our other `pyiris.ingestion.transform.transform_window.TransformWindow` module, 
as shown below:

```python
import pyspark.sql.functions as f

from pyiris.ingestion.transform import SparkTransformation
from pyiris.ingestion.transform.transform_window import TransformWindow 

range_window = TransformWindow.build_with_range(
                        window_name="range_window",
                        partition_by="department",
                        order_by="user_id",
                        upper_bound=4,
                        lower_bound=0
                        )


spark_transformation = SparkTransformation(name="window_transformations",
                                           description="Window calculations on salary",
                                           from_column="salary",
                                           functions=[f.sum, f.min, f.max, f.count, f.avg],
                                           windows=[range_window])

transformed_dataframe = spark_transformation.transform(dataframe=extracted_dataframe)
```

Make sure you check the documentation for the `TransformWindow` module, so you'll know exactly how to properly
define your transformation window.

### Custom transformation
This module gives for the user tools to customize the dataframe, with the main custom features. Example of uses:

~~~Python
from pyiris.ingestion.transform.transformations.custom.custom import divide
from pyiris.ingestion.transform.transformations.custom_transformation import CustomTransformation

custom_transformation = CustomTransformation(name='middle_price', 
                                             description='Dividing two fictitious columns (price/quantity) to generate column middle_price', 
                                             method=divide, 
                                             to_column='middle_price', 
                                             column1='price', 
                                             column2='quantity')

transformed_dataframe = custom_transformation.transform(dataframe=extracted_dataset)
~~~

#### Custom transformation - snakecase_column_names

This method intends to rename all columns of a given dataframe to snake case.

The transformations applied are:
- replacing letters containing accents and special characters (e.g. replacing 'á', 'à', 'ã' or 'â' to 'a');
- replacing uppercase letters with underscore and lowercase;
- removing duplicated undescore;
- removing leading and trailing undescore;
- removing all characters that are not allowed (a-z0-9_)

###### Example code
~~~Python
from pyiris.ingestion.transform.transformations.custom.custom import snakecase_column_names
from pyiris.ingestion.transform.transformations.custom_transformation import CustomTransformation

custom_transformation = CustomTransformation(name="snakecase_column_names", 
                                             description="rename columns to snake case", 
                                             method=snakecase_column_names)
transformed_dataframe = custom_transformation.transform(dataframe=extracted_dataset)
~~~
###### Example outputs


already_snake_case_column_name → already_snake_case_column_name
notSNAKECaseColumnNameOne → not_snake_case_column_name_one
NÕTSnákêCãsèColùmnNãmêTWÕ → not_snake_case_column_name_two

## Transform Service
The class **pyiris.ingestion.transform.TransformService** works as a service. You can execute some transformations in sequence, or only one. Follow the example of uses:

~~~Python
from pyiris.ingestion.transform import TransformService, HashTransformation, SqlTransformation

transform_service = TransformService(
    transformations=[
        SqlTransformation(name='divide', 
                          description='Getting middle price', 
                          to_column="middle_price", 
                          sql_expression="price/quantity"),
        HashTransformation(name='Hash CPF', 
                           description='Hash CPF to be according to LGPD', 
                           from_columns=["seller_cpf"])
    ]

)
transformed_dataframe = transform_service.handler(dataframe=extracted_dataset)
~~~

To have more information and to know better how to use some module, please, access the code docstrings in
our **Pyiris modules** section on the left panel.