A data ingestion task requires a one-TB JSON dataset to be written out to Parquet with a target part-file size of 512 MB. Because Parquet is being used instead of Delta Lake, built-in file-sizing features such as Auto-Optimize & Auto-Compaction cannot be used.
Which strategy will yield the best performance without shuffling data?
RandomForest
2 weeks, 6 days ago_lene_
3 weeks, 2 days agoarekm
1 month agotemple1305
2 months agocf56faf
2 months, 3 weeks agoJugiboss
3 months, 2 weeks agom79590530
3 months, 2 weeks agoColje
3 months, 3 weeks agoarekm
1 month agopk07
4 months, 1 week agoshaojunni
4 months, 2 weeks ago03355a2
7 months, 1 week agohpkr
7 months, 4 weeks agoFreyr
8 months, 1 week ago