Scenario-2

Scenario: A retail company receives daily sales transaction files from multiple store locations in an Azure Data Lake folder. Instead of reprocessing all historical data every day, the data engineering team uses Spark Structured Streaming to incrementally load only the newly arrived files into a Delta table. This ensures timely updates to analytics dashboards while optimising compute costs and processing time.

> Get the data. Here I used the two datasets, one dataset_initial and the other data_incremental. Both are in CSV format.

> Create a volume with the name stream_src and upload the data into it.

> Write the code to read the data and sink it into another volume.

Nandish

Search This Blog

Scenario-2

Comments

Post a Comment