File Operation
Helps perform file operations like copy
and move
on different file systems
Parameters
Parameter | Description | Required |
---|---|---|
File System | Local - for operations on driver node file system DBFS - for operations on Databricks file system | True |
Operation | Operation to perform, Copy or Move | True |
Recurse | Boolean for performing Operation recursively. Default is False | False |
Source Path | Path of source file/directory. Eg: /dbfs/source_file.txt, dbfs:/source_file.txt | True |
Destination Path | Path of destination file/directory. Eg: /dbfs/target_file.txt, dbfs:/target_file.txt | True |
info
You can perform operations on DBFS files using Local
file system too by providing path under /dbfs
!
This is because Databricks uses a FUSE mount to provide local access to the files stored in the cloud.
A FUSE mount is a secure, virtual filesystem.
Examples
Copy Single File
- DBFS
- Local
def copy_file(spark: SparkSession):
from pyspark.dbutils import DBUtils
DBUtils(spark).fs.cp(
"dbfs:/Prophecy/example/source/person.json",
"dbfs:/Prophecy/example/target/person.json",
recurse = False
)
def copy_file(spark: SparkSession):
import os
import shutil
shutil.copy2("/dbfs/Prophecy/example/source/person.json",
"/dbfs/Prophecy/example/target/person.json")
Copy All Files From A Directory
- DBFS
- Local
def copy_file(spark: SparkSession):
from pyspark.dbutils import DBUtils
DBUtils(spark).fs.cp(
"dbfs:/Prophecy/example/source/",
"dbfs:/Prophecy/example/target/",
recurse = True
)
def copy_file(spark: SparkSession):
import os
import shutil
shutil.copytree(
"/dbfs/Prophecy/example/source/",
"/dbfs/Prophecy/example/target/",
copy_function = shutil.copy2,
dirs_exist_ok = True
)
Copy Entire Directory
- DBFS
- Local
def copy_file(spark: SparkSession):
from pyspark.dbutils import DBUtils
DBUtils(spark).fs.cp(
"dbfs:/Prophecy/example/source/",
"dbfs:/Prophecy/example/target/source",
recurse = True
)
def copy_file(spark: SparkSession):
import os
import shutil
shutil.copytree(
"/dbfs/Prophecy/example/source/",
"/dbfs/Prophecy/example/target/source",
copy_function = shutil.copy2,
dirs_exist_ok = True
)