exam questions

Exam AWS Certified Data Analytics - Specialty All Questions

View all questions & answers for the AWS Certified Data Analytics - Specialty exam

Exam AWS Certified Data Analytics - Specialty topic 1 question 87 discussion

A marketing company is storing its campaign response data in Amazon S3. A consistent set of sources has generated the data for each campaign. The data is saved into Amazon S3 as .csv files. A business analyst will use Amazon Athena to analyze each campaign's data. The company needs the cost of ongoing data analysis with Athena to be minimized.
Which combination of actions should a data analytics specialist take to meet these requirements? (Choose two.)

  • A. Convert the .csv files to Apache Parquet.
  • B. Convert the .csv files to Apache Avro.
  • C. Partition the data by campaign.
  • D. Partition the data by source.
  • E. Compress the .csv files.
Show Suggested Answer Hide Answer
Suggested Answer: AC 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
AjithkumarSL
Highly Voted 3 years, 6 months ago
A,C https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/ Refer: Partition your data Optimize columnar data store generation
upvoted 33 times
Huy
3 years, 5 months ago
How about compression? Compression take higher chance for optimizing however A, C is correct because Apache Parquet has data compressed as default.
upvoted 5 times
...
...
pk349
Most Recent 1 year, 12 months ago
AC: I passed the test
upvoted 1 times
...
CleverMonkey092
2 years, 1 month ago
A: for reducing data size C: since the user will query the data by campaign
upvoted 1 times
...
cloudlearnerhere
2 years, 6 months ago
Correct answers are A & C as it is recommended to partition the data as per the requirement and use columnar data store like parquet which compresses the data as well as allows splitting. Correct answers are A & C as it is recommended to partition the data as per the requirement and use columnar data store like parquet which compresses the data as well as allows splitting. Option D is wrong as partitioning of the data should be as per the requirement which currently is queried as per the campaigns. Options B & E are wrong as compressing the .csv files or using Avro would not provide as many benefits as parquet files.
upvoted 3 times
cloudlearnerhere
2 years, 6 months ago
Apache Parquet and Apache ORC are popular columnar data stores. They provide features that store data efficiently by employing column-wise compression, different encoding, compression based on data type, and predicate pushdown. They are also splittable. Generally, better compression ratios or skipping blocks of data means reading fewer bytes from Amazon S3, leading to better query performance.
upvoted 2 times
...
cloudlearnerhere
2 years, 6 months ago
Partitioning divides your table into parts and keeps the related data together based on column values such as date, country, region, etc. Partitions act as virtual columns. You define them at table creation, and they can help reduce the amount of data scanned per query, thereby improving performance. You can restrict the amount of data scanned by a query by specifying filters based on the partition.
upvoted 2 times
cloudlearnerhere
2 years, 6 months ago
Compressing your data can speed up your queries significantly, as long as the files are either of an optimal size (see the next section), or the files are splittable. The smaller data sizes reduce network traffic from Amazon S3 to Athena. Splittable files allow the execution engine in Athena to split the reading of a single file by multiple readers to increase parallelism. If you have a single unsplittable file, then only a single reader can read the file while all other readers may sit idle. Not all compression algorithms are splittable. For Athena, we recommend using either Apache Parquet or Apache ORC, which compress data by default and are splittable.
upvoted 2 times
...
...
...
Raje14k
2 years, 9 months ago
its A & C.
upvoted 1 times
...
rocky48
2 years, 9 months ago
Selected Answer: AC
A: Because columnar format helps to improve performance. Parquet is splittable and compresses by default hence option e is already taken care here. C: Partitioning improves performance. Here since the bulk of the analysis activity is dependent on campaign hence this will be ideal for partitioning (limited partitions and low cardinality)
upvoted 1 times
...
dushmantha
2 years, 9 months ago
Selected Answer: AC
AC is correct.
upvoted 1 times
...
Ob1KN0B
2 years, 11 months ago
Selected Answer: AC
A: Because columnar format helps to improve performance. Parquet is splittable and compresses by default hence option e is already taken care here. C: Partitioning improves performance. Here since the bulk of the analysis activity is dependent on campaign hence this will be ideal for partitioning (limited partitions and low cardinality)
upvoted 2 times
...
Bik000
2 years, 11 months ago
Selected Answer: BC
Answer should be B & C
upvoted 1 times
...
YahiaAglan74
2 years, 11 months ago
Selected Answer: AC
AC is the correct answer
upvoted 1 times
...
certificationJunkie
2 years, 11 months ago
C,E seems to be correct answer. Since all sources are consistent, it is better to partition based on campaign. Also, compression means less data will be scanned which will save costs. Why would you change format to either Avro or Parquet? You would need to build an etl job using Glue etc. to do this which would add to the cost.
upvoted 1 times
...
pidkiller
3 years, 1 month ago
A and C as the data analyst will query per campaign, then it is a good partition key Also, columnar data types such as Parquet are better for analysis than row-based such as Avro.
upvoted 1 times
...
PravinT
3 years, 1 month ago
A and C it is
upvoted 1 times
...
Ajithkt
3 years, 4 months ago
Selected Answer: AC
A & C is correct
upvoted 4 times
...
aws2019
3 years, 5 months ago
A and C
upvoted 1 times
...
Monika14Sharma
3 years, 6 months ago
Correct answer is A&C
upvoted 2 times
...
VikG12
3 years, 6 months ago
A,C it is.
upvoted 2 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago