A data ingestion task requires a one-TB JSON dataset to be written out to Parquet with a target part-file size of 512 MB. Because Parquet is being used instead of Delta Lake, built-in file-sizing features such as Auto-Optimize & Auto-Compaction cannot be used.
Which strategy will yield the best performance without shuffling data?
cf56faf
1 week, 6 days agoJugiboss
1 month agom79590530
1 month agoColje
1 month, 1 week agopk07
2 months agoshaojunni
2 months ago03355a2
5 months agohpkr
5 months, 2 weeks agoFreyr
5 months, 3 weeks ago