Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam Professional Data Engineer All Questions

View all questions & answers for the Professional Data Engineer exam

Exam Professional Data Engineer topic 1 question 243 discussion

Actual exam question from Google's Professional Data Engineer
Question #: 243
Topic #: 1
[All Professional Data Engineer Questions]

You are preparing data that your machine learning team will use to train a model using BigQueryML. They want to predict the price per square foot of real estate. The training data has a column for the price and a column for the number of square feet. Another feature column called ‘feature1’ contains null values due to missing data. You want to replace the nulls with zeros to keep more data points. Which query should you use?

  • A.
  • B.
  • C.
  • D.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
52ed0e5
Highly Voted 8 months, 2 weeks ago
Selected Answer: A
Option A is the correct choice because it retains all the original columns and specifically addresses the issue of null values in ‘feature1’ by replacing them with zeros, without altering any other columns or performing unnecessary calculations. This makes the data ready for use in BigQueryML without losing any important information. Option C is not the best choice because it includes the EXCEPT clause for the price and square_feet columns, which would exclude these columns from the results. This is not desirable since you need these columns for the machine learning model to predict the price per square foot
upvoted 5 times
...
datapassionate
Highly Voted 10 months, 1 week ago
Selected Answer: C
Correct answer is C. It both replace NULL with 0 and pass price per square foot of real estate.
upvoted 5 times
George_Zhu
9 months, 2 weeks ago
Option C isn't a good practice. What if any 0 value is contained in the column of squre_feet, then price / 0 will throw an exception. IF(IFNULL(squre_feet, 0) = 0, 0, price/squre_feet).
upvoted 6 times
baimus
1 month, 2 weeks ago
I think the assumption here is that no houses are zero feet in size. If they are, that should be caught in preprocessing, which is outside the short scope of this question. If the answer isn't C, then it's A, which would mean the question is suggesting you need an ML model to calculate price per square for data where you already have both price and square feet as features. In that instance you clearly need to only divide one by the other. Those columns must be intended to be the target, or the whole question is nonsense.
upvoted 1 times
...
...
...
ToiToi
Most Recent 3 weeks, 1 day ago
Selected Answer: C
Gemini told me C Here's why it's the best of the limited choices: Calculates price_per_sqft: It includes the calculation for the target variable your model needs. Handles Nulls: It uses IFNULL(feature1, 0) to replace nulls in feature1 with 0, similar to COALESCE. Most Comprehensive: While it excludes the original price, square_feet, and feature1 columns, it still retains any other columns that might be present in the training_data table.
upvoted 1 times
...
SamuelTsch
3 weeks, 2 days ago
Selected Answer: C
it should be C.
upvoted 1 times
...
baimus
1 month, 2 weeks ago
Selected Answer: C
This must be C, though the wording isn't great. If price and square foot are included in the data, they are either intended to be the target, in which case you need to create that target as per C, or if they are genuinely features, you DO NOT need a machine learning model. If you already know price and square feet, price per square foot is just price/ft2. You don't need ML to predict that, it's just a division. The only context this makes sense in is if they mean "price and square foot are the target, and feature1 is the predictive feature", which means C is correct. The removing nulls from feature1 and the creation of price per square foot is C.
upvoted 1 times
...
47767f9
4 months, 3 weeks ago
Selected Answer: C
Font Cloude 3.5 and GPT 4o, in theoy is better to keep the less amount of features, then price_per_sqft and feature1 cleaned is the best option
upvoted 1 times
...
srinidutt
6 months, 3 weeks ago
EXCEPT means it won't select that column.
upvoted 1 times
...
demoro86
8 months, 4 weeks ago
Selected Answer: A
C is not a valid answer. You are introducing a redundant variable, that could be valid, but removing from the dataset 2 variables that exactly influence in the predictions you are trying to make.
upvoted 3 times
...
demoro86
8 months, 4 weeks ago
C is not a valid answer. You are introducing a redundant variable, that could be valid, but removing from the dataset 2 variables that exactly influence in the predictions you are trying to make.
upvoted 2 times
baimus
1 month, 2 weeks ago
Just to clarify, they don't "influence" the prediction, they are in fact the target. The model needs to predict price per square foot. If you have price, and square foot, they are either 1) the prediction target price/squarefoot, or if not then you absolutely do not need a machine learning model, you just device price by square foot.
upvoted 1 times
...
...
PetrSz
9 months ago
Selected Answer: C
Option C not only handles the null values in feature1 by replacing them with zeros (using IFNULL(feature1, 0) as feature1_cleaned), but it also creates a new feature price_per_sqft by dividing the price by the number of square feet (price/square_feet as price_per_sqft). This new feature directly corresponds to what your team wants to predict (the price per square foot of real estate), and could therefore be very useful for the machine learning model.
upvoted 1 times
...
Selected Answer: C
It should be C. "They want to predict the price per square foot of real estate. The training data has a column for the price and a column for the number of square feet." You need to create the column the model is going to predict.
upvoted 2 times
...
JyoGCP
9 months, 1 week ago
Selected Answer: A
Option A
upvoted 2 times
...
oleg25
9 months, 1 week ago
I didn't get why they mentioned in the task price and square feet columns. Just to irritate us? Do we need to do something with these columns or just with column feature1?
upvoted 5 times
d11379b
8 months ago
I think they just want us to build a “label” (target) column ourselves since there’s no direct value in the training set
upvoted 1 times
d11379b
8 months ago
But I still prefer to choose A since the square_feet column itself may have influence on price, which shouldn’t be removed
upvoted 1 times
...
...
...
Matt_108
10 months, 2 weeks ago
Selected Answer: A
option A clearly
upvoted 1 times
...
raaad
10 months, 3 weeks ago
Selected Answer: A
Straight forward
upvoted 5 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...