Page tree

Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents


Intake-spark is an Intake plugin that provides a unified interface for loading and accessing data in Apache Spark using the Intake data catalog system. Spark is a powerful distributed computing framework for processing large-scale data, but working with it can be challenging because it requires specific knowledge of Spark's API and data sources. Intake-spark simplifies this process by providing a consistent and intuitive interface for loading data into Spark DataFrame. Intake-spark supports several file formats, including Apache Parquet, Avro, CSV, and JSON, and can read data from various storage systems such as HDFS, Amazon S3, and Azure Blob Storage. Intake-spark also allows users to configure advanced settings such as partitioning and caching for improved performance.