Set Operation
Allows you to perform addition or subtraction of rows from dataframes with identical schemas and different data.
Parameters
Parameter | Description | Required |
---|---|---|
Dataframe 1 | First input dataframe | True |
Dataframe 2 | Second input dataframe | True |
Dataframe N | Nth input dataframe | False |
Operation type | Operation to perform - Union : Returns a dataset containing rows in any one of the input Datasets, while preserving duplicates.- Intersect All : Returns a dataset containing rows in all of the input Datasets, while preserving duplicates. - Except All : Returns a dataset containing rows in the first Dataset, but not in the other datasets, while preserving duplicates. | True |
info
To add more input dataframes, simply click +
icon on the left sidebar
Examples
Operation Type - Union
- Python
- Scala
def union(spark: SparkSession, in0: DataFrame, in1: DataFrame, ) -> DataFrame:
return in0.unionAll(in1)
object union {
def apply(spark: SparkSession, in0: DataFrame, in1: DataFrame): DataFrame =
in0.unionAll(in1)
}
Operation Type - Intersect All
- Python
- Scala
def intersectAll(spark: SparkSession, in0: DataFrame, in1: DataFrame, ) -> DataFrame:
return in0.intersectAll(in1)
object intersectAll {
def apply(spark: SparkSession, in0: DataFrame, in1: DataFrame): DataFrame =
in0.intersectAll(in1)
}
Operation Type - Except All
- Python
- Scala
def exceptAll(spark: SparkSession, in0: DataFrame, in1: DataFrame, ) -> DataFrame:
return in0.exceptAll(in1)
object exceptAll {
def apply(spark: SparkSession, in0: DataFrame, in1: DataFrame): DataFrame =
in0.exceptAll(in1)
}