Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam Professional Data Engineer All Questions

View all questions & answers for the Professional Data Engineer exam

Exam Professional Data Engineer topic 1 question 70 discussion

Actual exam question from Google's Professional Data Engineer
Question #: 70
Topic #: 1
[All Professional Data Engineer Questions]

You are designing storage for very large text files for a data pipeline on Google Cloud. You want to support ANSI SQL queries. You also want to support compression and parallel load from the input locations using Google recommended practices. What should you do?

  • A. Transform text files to compressed Avro using Cloud Dataflow. Use BigQuery for storage and query.
  • B. Transform text files to compressed Avro using Cloud Dataflow. Use Cloud Storage and BigQuery permanent linked tables for query.
  • C. Compress text files to gzip using the Grid Computing Tools. Use BigQuery for storage and query.
  • D. Compress text files to gzip using the Grid Computing Tools. Use Cloud Storage, and then import into Cloud Bigtable for query.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Ganshank
Highly Voted 4 years, 7 months ago
B. The question is focused on designing storage for very large files, with support for compression, ANSI SQL queries, and parallel loading from the input locations. This can be met using GCS for storage and Bigquery permanent tables with external data source in GCS.
upvoted 59 times
atnafu2020
4 years, 4 months ago
why GCS as external since Bigquery can be used as storage as well?
upvoted 10 times
atnafu2020
4 years, 4 months ago
A seems correct for me
upvoted 11 times
atnafu2020
4 years, 3 months ago
Since its best practice, i go by with B not A
upvoted 4 times
...
gopinath_k
3 years, 8 months ago
They want to store the files if you try with bq I think you will need to strike the word compression.
upvoted 2 times
...
...
jkhong
1 year, 11 months ago
The question focuses on "designing storage", rather than designing a data warehouse.
upvoted 5 times
...
...
...
[Removed]
Highly Voted 4 years, 8 months ago
Should be A
upvoted 15 times
tavva_prudhvi
2 years, 5 months ago
Not A : Importing data into BigQuery may take more time compared to creating external tables on data. Additional storage costs by BigQuery is another issue which can be more expensive than Google Storage.
upvoted 7 times
...
...
Nittin
Most Recent 3 months ago
Selected Answer: B
copy to gcs and use external tble in bq
upvoted 1 times
...
carmltekai
4 months ago
Selected Answer: A
Should be A. Check this link for the advantage of load Avro data to BigQuery https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-avro#advantages_of_avro """The Avro binary format: * Is faster to load. The data can be read in parallel, even if the data blocks are compressed. * Doesn't require typing or serialization. * Is easier to parse because there are no encoding issues found in other formats such as ASCII. When you load Avro files into BigQuery, the table schema is automatically retrieved from the self-describing source data."""
upvoted 1 times
carmltekai
4 months ago
While option B can work, it introduces additional complexity by linking Cloud Storage with BigQuery. Directly storing data in BigQuery is more efficient for querying purposes. There are no requirements about cost, So simple is better
upvoted 1 times
...
...
SK1594
8 months ago
B makes sense
upvoted 2 times
...
MaxNRG
11 months, 2 weeks ago
Selected Answer: B
1. Store Avro files in GCS 2. Query them in BigQuery (federated tables)
upvoted 3 times
...
forepick
1 year, 5 months ago
Selected Answer: B
Answer is B. The requirements are: - storage for compressed text files - parallel loads to SQL tool AVRO is a compressed format for text files, which makes it possible to load chunks of a very large file in parallel to BigQuery. gzip files are seamless in GCS though, but cannot load in parallel to BQ.
upvoted 6 times
...
samdhimal
1 year, 10 months ago
Correct Answer: A. Transform text files to compressed Avro using Cloud Dataflow. Use BigQuery for storage and query. This option offers several advantages: - Transforming the text files to compressed Avro using Cloud Dataflow allows for parallel processing of the input data, improving the efficiency of the pipeline. - Compressing the data in Avro format further reduces the storage space required and improves data transfer performance. - Storing the data in BigQuery supports ANSI SQL queries and allows for easy querying of the data. - BigQuery is a fully-managed data warehousing solution, it's scalable and can handle large datasets and concurrent queries, so it's suitable for large text files.
upvoted 3 times
samdhimal
1 year, 10 months ago
Option B is similar to option A but it's using a permanent linked table between Cloud Storage and BigQuery, this approach is not recommended as it's not efficient and could lead to data duplication, and it doesn't take advantage of the parallel processing capabilities of Cloud Dataflow. Option C and D are incorrect because they don't take advantage of the parallel processing capabilities of Cloud Dataflow, and they don't use Avro format for compression which is more efficient and recommended by Google. Storing the data in Cloud Bigtable also doesn't support ANSI SQL queries which is a requirement for this use case.
upvoted 1 times
...
...
jkhong
1 year, 11 months ago
Selected Answer: B
Designing storage solution, not data warehousing -> So Cloud Storage. Support compression -> just use Avro Parallel load -> refers to upload from input locations, NOT download. Load in parallel using -m flag for gsutil cp https://cloud.google.com/storage/docs/uploads-downloads#console
upvoted 3 times
...
odacir
1 year, 11 months ago
Selected Answer: B
C and D are discarted. A and B are possible. A is the best for query, but … the sentence says: ou also want to support compression and parallel load from the input locations using Google recommended practices. BigQuery only support parallel load from storage, storage support parallel load from CLI. So the only option is B.
upvoted 3 times
...
zellck
1 year, 11 months ago
Selected Answer: B
B is the answer.
upvoted 1 times
...
nkit
1 year, 11 months ago
Selected Answer: B
"Very large files" and "long term storage" are two key phrases- both of which indicate to pick cloud storage as option. Hence B is correct.
upvoted 1 times
...
NicolasN
1 year, 11 months ago
Selected Answer: B
All the comments argue about [A] and [B] as a storage destination. But there is a limitation on loading compressed Avro files into BigQuery that cuts the Gordian knot: ❗ "... Compressed Avro files are not supported, but compressed data blocks are ..." From: https://cloud.google.com/bigquery/docs/batch-loading-data#loading_compressed_and_uncompressed_data
upvoted 3 times
izekc
1 year, 7 months ago
No, it is not https://github.com/GoogleCloudPlatform/bigquery-ingest-avro-dataflow-sample
upvoted 1 times
...
ffggrre
1 year, 1 month ago
Compressed AVRO files are supported by BQ
upvoted 1 times
...
...
cloudmon
2 years ago
Selected Answer: A
https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-avro Advantages of Avro: Avro is the preferred format for loading data into BigQuery. Loading Avro files has the following advantages over CSV and JSON (newline delimited): The Avro binary format: Is faster to load. The data can be read in parallel, even if the data blocks are compressed. Doesn't require typing or serialization. Is easier to parse because there are no encoding issues found in other formats such as ASCII. When you load Avro files into BigQuery, the table schema is automatically retrieved from the self-describing source data.
upvoted 2 times
...
Lui1979
2 years, 6 months ago
Selected Answer: B
https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-avro The Avro binary format: Is faster to load. The data can be read in parallel, even if the data blocks are compressed
upvoted 5 times
cloudmon
2 years ago
Your comment supports A more than B
upvoted 2 times
...
...
Didine_22
2 years, 7 months ago
Selected Answer: B
B Because they are talking about the parallel loading from input locations.
upvoted 1 times
...
devric
2 years, 7 months ago
Selected Answer: B
B. The objetive is to follow the best practices.
upvoted 2 times
devric
2 years, 7 months ago
Sory I mean A not B :-)
upvoted 1 times
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...