Analysis

The analysis module was built to help Data Scientists explore and work with data in a more seamless and compute-efficient way.

DataAnalysis

With this object, the user can take a Spark Dataframe and prepare another DataFrame of their choice, being either a Pandas, Koalas or an EDA-DataFrame (also based on Pandas). The main advantage is that the user will always be prevented from transforming to Pandas an overly large DataFrame, with that making sure they can benefit the most from their compute environment. Another benefit to use these modules is that we can start leveraging monitoring and optimization in the future.

from pyiris.intelligence import DataAnalysis

# Transforming a Spark DataFrame to Pandas
pandas_df = DataAnalysis(dataframe).to_pandas()

# Making an EDA Dataframe (based on Pandas)
eda_df = DataAnalysis(dataframe).make_eda(frac=0.1)

# Transforming a Spark DataFrame to Pandas
pandas_df = DataAnalysis(dataframe).to_koalas()