Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam Professional Data Engineer All Questions

View all questions & answers for the Professional Data Engineer exam

Exam Professional Data Engineer topic 1 question 24 discussion

Actual exam question from Google's Professional Data Engineer
Question #: 24
Topic #: 1
[All Professional Data Engineer Questions]

You have spent a few days loading data from comma-separated values (CSV) files into the Google BigQuery table CLICK_STREAM. The column DT stores the epoch time of click events. For convenience, you chose a simple schema where every field is treated as the STRING type. Now, you want to compute web session durations of users who visit your site, and you want to change its data type to the TIMESTAMP. You want to minimize the migration effort without making future queries computationally expensive. What should you do?

  • A. Delete the table CLICK_STREAM, and then re-create it such that the column DT is of the TIMESTAMP type. Reload the data.
  • B. Add a column TS of the TIMESTAMP type to the table CLICK_STREAM, and populate the numeric values from the column TS for each row. Reference the column TS instead of the column DT from now on.
  • C. Create a view CLICK_STREAM_V, where strings from the column DT are cast into TIMESTAMP values. Reference the view CLICK_STREAM_V instead of the table CLICK_STREAM from now on.
  • D. Add two columns to the table CLICK STREAM: TS of the TIMESTAMP type and IS_NEW of the BOOLEAN type. Reload all data in append mode. For each appended row, set the value of IS_NEW to true. For future queries, reference the column TS instead of the column DT, with the WHERE clause ensuring that the value of IS_NEW must be true.
  • E. Construct a query to return every row of the table CLICK_STREAM, while using the built-in function to cast strings from the column DT into TIMESTAMP values. Run the query into a destination table NEW_CLICK_STREAM, in which the column TS is the TIMESTAMP type. Reference the table NEW_CLICK_STREAM instead of the table CLICK_STREAM from now on. In the future, new data is loaded into the table NEW_CLICK_STREAM.
Show Suggested Answer Hide Answer
Suggested Answer: E 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
jvg637
Highly Voted 4 years, 8 months ago
"E" looks better. For D, the database will be double in size (which increases the storage price) and the user has to spend some more days reloading all the data.
upvoted 31 times
jkhong
1 year, 11 months ago
Also D doesn't make sense since we're filtering IS_NEW to true to only consider future data, which disregards our previously loaded data
upvoted 5 times
...
assU2
2 years ago
"You want to minimize the migration effort without making future queries computationally expensive." Nothing about storage price.
upvoted 4 times
...
...
[Removed]
Highly Voted 4 years, 8 months ago
E - more simple and reasonable. Also recommended if not concerned about cost but simplicity. https://cloud.google.com/bigquery/docs/manually-changing-schemas#changing_a_columns_data_type
upvoted 22 times
Tanzu
2 years, 10 months ago
Due to the hard limitations of bq, Not E is the simple answer by the way!
upvoted 1 times
...
...
Nittin
Most Recent 3 months, 2 weeks ago
Selected Answer: C
Create a view no data migration easy to do but computational efficient queries not sure (?)
upvoted 2 times
...
mark1223jkh
6 months, 1 week ago
E. It recreates the table one time and everything is fixed. Next time you load, load to the new table, you can delete the previous one. Definitely not C. The question says I have to minimize future query effort, which literally means "don't create a view that converts from STR to TIMESTAMP for every row."
upvoted 1 times
...
suwalsageen12
6 months, 2 weeks ago
Selected Answer: E
Option E is correct. The question is asking to consider the Query cost for future. This is a one time job to fix the Timestamp column. no views were created.
upvoted 1 times
...
teka112233
7 months, 1 week ago
Selected Answer: E
Why Option E is the best choice: It modifies the schema with minimal data movement. The original table remains untouched for potential future needs. Future data loads can directly go to the new table with the desired schema. Queries referencing the new table (NEW_CLICK_STREAM) will benefit from the optimized data type for timestamp operations.
upvoted 1 times
...
GYORK
8 months, 2 weeks ago
Selected Answer: C
minimizing effort is key.
upvoted 1 times
...
TVH_Data_Engineer
11 months, 2 weeks ago
Selected Answer: C
A view in Google BigQuery is a virtual table defined by a SQL query. By creating a view that casts the DT column as a TIMESTAMP, you can transform the data format without altering the underlying data in the CLICK_STREAM table. This means you don't have to reload any data, thereby minimizing migration effort.
upvoted 3 times
...
axantroff
1 year ago
Selected Answer: E
Good point about the logical views and the desire to reduce costs. I would vote for E
upvoted 1 times
...
mk_choudhary
1 year, 1 month ago
The best way to minimize the migration effort without making future queries computationally expensive is to create a view and reference it instead of the table. This is because views are materialized when they are queried, so they do not incur any additional overhead. So the answer is (C).
upvoted 1 times
brokeasspanda
1 year ago
C doesn't say materialized view, there's a difference with a regular view so it'll be slower and more expensive on every call to that view.
upvoted 1 times
...
...
rtcpost
1 year, 1 month ago
Selected Answer: E
Option "E" It avoids the need to delete and recreate the entire CLICK_STREAM table, which is time-consuming and requires reloading all data. It allows you to use a simple query to cast the existing DT column as TIMESTAMP values and store the results in a new table, NEW_CLICK_STREAM. You can gradually migrate to the new data format, and your future queries will be able to utilize the TIMESTAMP data type for more efficient processing.
upvoted 2 times
...
sergiomujica
1 year, 2 months ago
Option D duplicates, not a good solution
upvoted 1 times
...
NeoNitin
1 year, 4 months ago
E. E. You can use a special command to change the time on the old cards to the better type "TIMESTAMP" and create a new box called "NEW_CLICK_STREAM." From now on, you'll look at the new box whenever you want to know the time. It's like having a new and better box to keep things tidy and organized. So, the best way to change the time on the little cards to the better type "TIMESTAMP" is option E. It's like using magic to create a new box and making sure everything is still easy to find and work with. It's a clever way to keep track of time and make your website even better!
upvoted 1 times
...
tal_
1 year, 5 months ago
Selected Answer: E
they asked to "change its data type"
upvoted 1 times
...
momosoundz
1 year, 6 months ago
Selected Answer: E
it's E. computationally less expensive than running a view every time
upvoted 1 times
...
boca_2022
1 year, 7 months ago
Selected Answer: E
E is best option
upvoted 1 times
...
techtitan
1 year, 9 months ago
Selected Answer: E
its C vs E. E is better because C will try to do a cast operation everytime query is run making it computationally expensive.
upvoted 5 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...