A data ingestion task requires a one-TB JSON dataset to be written out to Parquet with a target part-file size of 512 MB. Because Parquet is being used instead of Delta Lake, built-in file-sizing features such as Auto-Optimize & Auto-Compaction cannot be used.
Which strategy will yield the best performance without shuffling data?
aragorn_brego
Highly Voted 1 year, 2 months agoDef21
Highly Voted 1 year agoarekm
1 month agocarlosmps
1 month, 4 weeks agoazurefan777
2 months, 4 weeks agoAlejandroU
Most Recent 1 month, 2 weeks agoAlejandroU
1 month, 2 weeks agotemple1305
1 month, 4 weeks agonedlo
3 months, 1 week agosdas1
4 months, 3 weeks agosdas1
4 months, 3 weeks agosdas1
4 months, 3 weeks agosdas1
4 months, 3 weeks agovikram12apr
11 months agohal2401me
11 months agohal2401me
11 months agoCurious76
11 months, 1 week agovctrhugo
12 months agoadenis
1 year agospaceexplorer
1 year agodivingbell17
1 year, 1 month ago911land
1 year, 1 month agoalexvno
1 year, 1 month agopetrv
1 year, 2 months agoocaj90
1 year, 2 months ago