An upstream source writes Parquet data as hourly batches to directories named with the current date. A nightly batch job runs the following code to ingest all data from the previous day as indicated by the date variable:
Assume that the fields customer_id and order_id serve as a composite key to uniquely identify each order.
If the upstream system is known to occasionally produce duplicate entries for a single order hours apart, which statement is correct?
Eertyy
Highly Voted 1 year, 2 months agobenni_ale
Most Recent 2 months agopanya
5 months agoimatheushenrique
5 months, 3 weeks agoDavidRou
8 months, 3 weeks agoJay_98_11
10 months, 2 weeks ago5ffcd04
10 months, 4 weeks agokz_data
11 months, 1 week agovivekla
12 months agosturcu
1 year, 1 month agoStarvosxant
1 year, 1 month agothxsgod
1 year, 2 months ago