Hive Table
Reads data from or writes data to Hive tables managed by your workspace's Metastore.
note
Please choose the provider as Hive
on properties page.
Source
Source Parameters
Parameter | Description | Required | Default |
---|---|---|---|
Database name | Name of the database | True | |
Table name | Name of the table | True | |
Provider | Must be set to hive | True | |
Filter Predicate | Where clause to filter the table | False | (all records) |
Source Example
Generated Code
Without filter predicate
- Python
- Scala
def Source(spark: SparkSession) -> DataFrame:
return spark.read.table(f"test_db.test_table")
object Source {
def apply(spark: SparkSession): DataFrame = {
spark.read.table("test_db.test_table")
}
}
With filter predicate
- Python
- Scala
def Source(spark: SparkSession) -> DataFrame:
return spark.sql("SELECT * FROM test_db.test_table WHERE col > 10")
object Source {
def apply(spark: SparkSession): DataFrame =
spark.sql("SELECT * FROM test_db.test_table WHERE col > 10")
}
Target
Target Parameters
Parameter | Description | Required | Default |
---|---|---|---|
Database name | Name of the database | True | |
Table name | Name of the table | True | |
Custom file path | Use custom file path to store underlying files | False | |
Provider | Must be set to hive | True | |
Write Mode | Where clause to filter the table | True | (all records) |
File Format | File format to use when saving data | True | parquet |
Partition Columns | Columns to partition by | False | (empty) |
Use insert into | If true , use .insertInto instead of .save when generating code | False | false |
Below are different type of write modes which prophecy provided hive catalog supports.
Write Mode | Description |
---|---|
overwrite | If data already exists, existing data is expected to be overwritten by the contents of the DataFrame. |
append | If data already exists, contents of the DataFrame are expected to be appended to existing data. |
ignore | If data already exists, the save operation is expected not to save the contents of the DataFrame and not to change the existing data. This is similar to a CREATE TABLE IF NOT EXISTS in SQL. |
error | If data already exists, an exception is expected to be thrown. |
Below are different type of file formats during write which prophecy provided hive catalog supports.
- Parquet
- Text file
- Avro
- ORC
- RC file
- Sequence file
Target Example
Generated Code
- Python
- Scala
def Target(spark: SparkSession, in0: DataFrame):
in0.write\
.format("hive")\
.option("fileFormat", "parquet")\
.mode("overwrite")\
.saveAsTable("test_db.test_table")
object Target {
def apply(spark: SparkSession, in: DataFrame): DataFrame = {
in.write
.format("hive")
.option("fileFormat", "parquet")
.mode("overwrite")
.saveAsTable("test_db.test_table")
}
}