Table of Contents
maxLevel 4

Intake-spark

Intake-spark is an Intake plugin that provides a unified interface for loading and accessing data in Apache Spark using the Intake data catalog system. Spark is a powerful distributed computing framework for processing large-scale data, but working with it can be challenging because it requires specific knowledge of Spark's API and data sources. Intake-spark simplifies this process by providing a consistent and intuitive interface for loading data into Spark DataFrame. Intake-spark supports several file formats, including Apache Parquet, Avro, CSV, and JSON, and can read data from various storage systems such as HDFS, Amazon S3, and Azure Blob Storage. Intake-spark also allows users to configure advanced settings such as partitioning and caching for improved performance.

...

Page tree

Versions Compared

Old Version 21

New Version Current

Key

Table of Contents
maxLevel 4

Intake-spark

Page tree

Page History

Versions Compared

Old Version 21

New Version Current

Key

Table of ContentsmaxLevel4

Intake-spark

Table of Contents
maxLevel 4