A data ingestion task requires a one-TB JSON dataset to be written out to Parquet with a target part-file size of 512 MB. Because Parquet is being used instead of Delta Lake, built-in file-sizing features such as Auto-Optimize & Auto-Compaction cannot be used.
Which strategy will yield the best performance without shuffling data?
aragorn_brego
Highly Voted 1 year agoDef21
Highly Voted 10 months agoazurefan777
2 weeks, 2 days agonedlo
Most Recent 4 weeks agosdas1
2 months, 1 week agosdas1
2 months, 1 week agosdas1
2 months, 1 week agosdas1
2 months, 1 week agovikram12apr
8 months, 2 weeks agohal2401me
8 months, 2 weeks agohal2401me
8 months, 3 weeks agoCurious76
8 months, 4 weeks agovctrhugo
9 months, 2 weeks agoadenis
9 months, 4 weeks agospaceexplorer
10 months agodivingbell17
10 months, 3 weeks ago911land
11 months, 1 week agoalexvno
11 months, 1 week agopetrv
11 months, 4 weeks agoocaj90
1 year agosturcu
1 year, 1 month ago