Skip to main content

Set Operation

Allows you to perform addition or subtraction of rows from dataframes with identical schemas and different data.

Parameters

ParameterDescriptionRequired
Dataframe 1First input dataframeTrue
Dataframe 2Second input dataframeTrue
Dataframe NNth input dataframeFalse
Operation typeOperation to perform
- Union: Returns a dataset containing rows in any one of the input Datasets, while preserving duplicates.
- Intersect All: Returns a dataset containing rows in all of the input Datasets, while preserving duplicates.
- Except All: Returns a dataset containing rows in the first Dataset, but not in the other datasets, while preserving duplicates.
True
info

To add more input dataframes, simply click + icon on the left sidebar Set Operation - Add input dataframe

Examples


Operation Type - Union

Example usage of Set Operation - Union

def union(spark: SparkSession, in0: DataFrame, in1: DataFrame, ) -> DataFrame:
return in0.unionAll(in1)

Operation Type - Intersect All

Example usage of Set Operation - Intersect All

def intersectAll(spark: SparkSession, in0: DataFrame, in1: DataFrame, ) -> DataFrame:
return in0.intersectAll(in1)

Operation Type - Except All

Example usage of Set Operation - Except All

def exceptAll(spark: SparkSession, in0: DataFrame, in1: DataFrame, ) -> DataFrame:
return in0.exceptAll(in1)