Skip to main content

CSV

Allows you to read or write a delimited file (often called Comma Separated File, CSV)

Source Parameters

CSV Source supports all the available spark read options for CSV.

The below list contains the additional parameters to read a CSV file:

ParameterDescriptionRequired
Dataset NameName of the Dataset (read more about Datasets)True
LocationLocation of the file(s) to be loaded
Eg: dbfs:/data/test.csv
True
SchemaSchema to applied on the loaded data. Can be defined/edited as json or inferred using Infer Schema buttonTrue

Target Parameters

CSV Target supports all the available spark write options for CSV.

The below list contains the additional parameters to write a CSV file:

ParameterDescriptionRequired
Dataset NameName of the Dataset (read more about Datasets)True
LocationLocation of the file(s) to be loaded
Eg: dbfs:/data/output.csv
True

Loading a CSV file

Step 1 - Create Source Component

def load_csv(spark: SparkSession) -> DataFrame:
return spark.read\
.schema(
StructType([
StructField("order_id", IntegerType(), True),
StructField("customer_id", IntegerType(), True),
StructField("order_status", StringType(), True),
StructField("order_category", StringType(), True),
StructField("order_date", DateType(), True),
StructField("amount", DoubleType(), True)
])
)\
.option("header", True)\
.option("quote", "\"")\
.option("sep", ",")\
.csv("dbfs:/Prophecy/anshuman@simpledatalabs.com/OrdersDatasetInput.csv")


Writing a CSV file

Step 1 - Create Target Component

def write_as_csv(spark: SparkSession, in0: DataFrame):
in0.write\
.option("header", True)\
.option("sep", ",")\
.mode("error")\
.option("separator", ",")\
.option("header", True)\
.csv("dbfs:/Prophecy/anshuman@simpledatalabs.com/output.csv")