A task orchestrator has been configured to run two hourly tasks. First, an outside system writes Parquet data to a directory mounted at /mnt/raw_orders/. After this data is written, a Databricks job containing the following code is executed:
Assume that the fields customer_id and order_id serve as a composite key to uniquely identify each order, and that the time field indicates when the record was queued in the source system.
If the upstream system is known to occasionally enqueue duplicate entries for a single order hours apart, which statement is correct?
alexvno
Highly Voted 1 year, 6 months agoIsio05
Highly Voted 1 year, 1 month agoKadELbied
Most Recent 2 months agoa85becd
3 months, 1 week agoEZZALDIN
3 months, 2 weeks agoRandomForest
5 months, 3 weeks agoarekm
6 months, 1 week agobenni_ale
7 months, 3 weeks agocf56faf
8 months agoAnanth4Sap
8 months, 2 weeks agom79590530
8 months, 3 weeks agoarekm
6 months, 1 week agoshaojunni
9 months agoquaternion
11 months agoQuangTrinh
1 year, 1 month agoarekm
6 months, 1 week ago