Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam Professional Data Engineer All Questions

View all questions & answers for the Professional Data Engineer exam

Exam Professional Data Engineer topic 1 question 89 discussion

Actual exam question from Google's Professional Data Engineer
Question #: 89
Topic #: 1
[All Professional Data Engineer Questions]

You're training a model to predict housing prices based on an available dataset with real estate properties. Your plan is to train a fully connected neural net, and you've discovered that the dataset contains latitude and longitude of the property. Real estate professionals have told you that the location of the property is highly influential on price, so you'd like to engineer a feature that incorporates this physical dependency.
What should you do?

  • A. Provide latitude and longitude as input vectors to your neural net.
  • B. Create a numeric column from a feature cross of latitude and longitude.
  • C. Create a feature cross of latitude and longitude, bucketize it at the minute level and use L1 regularization during optimization.
  • D. Create a feature cross of latitude and longitude, bucketize it at the minute level and use L2 regularization during optimization.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
AHUI
Highly Voted 2 years, 1 month ago
Ans C, use L1 regularization becuase we know the feature is a strong feature. L2 will evenly distribute weights
upvoted 9 times
...
dish11dish
Highly Voted 2 years ago
Selected Answer: C
Option C is correct Use L1 regularization when you need to assign greater importance to more influential features. It shrinks less important feature to 0. L2 regularization performs better when all input features influence the output & all with the weights are of equal size.
upvoted 8 times
...
SamuelTsch
Most Recent 1 month ago
Selected Answer: D
I would like choose D. L1 will ignore the irrelevant features. However, we know that lat and long are cruial for this model. We can't take away their influences. L2 helps in preventing overfitting.
upvoted 2 times
...
MohaSa1
1 month, 1 week ago
Selected Answer: A
This does not seems to be useful, minute level bucketizing will create 3,600 possible buckets per degree squared, not logical, and sparse feature space, Option A seems to be a better choice.
upvoted 1 times
...
Snnnnneee
4 months ago
Selected Answer: B
Bucketing into minutes is inaccurate, up to 1.8 km are grouped. Way too much for real estste. Therefore B
upvoted 1 times
...
uday_examtopic
1 year, 2 months ago
Create a feature cross of latitude and longitude, bucketize it at the minute level and use L2 regularization during optimization. Like option C, we bucketize at the minute level, but this time we apply L2 regularization. L2 regularization, or Ridge Regression, discourages large values of weights in the model without forcing them to become sparse. It can help prevent overfitting, especially when we have a large number of features (as a result of bucketizing and crossing). Given the options, D. Create a feature cross of latitude and longitude, bucketize it at the minute level and use L2 regularization during optimization seems to be the most appropriate. Bucketizing at the minute level captures localized patterns, and L2 regularization can help control the complexity of the model without enforcing sparsity.
upvoted 2 times
...
ckanaar
1 year, 2 months ago
What does bucketizing at the minute level mean in the context of this question?
upvoted 3 times
Surely1987
1 year ago
Coordinates are written with Degrees, minutes and seconds (one minute being equal to about 1.8 km). So you group your coordinates in buckets with a miute precision
upvoted 4 times
...
...
FP77
1 year, 2 months ago
Selected Answer: B
I strongly believe it's B.
upvoted 2 times
...
Mathew106
1 year, 4 months ago
Selected Answer: B
The right answer is B. What the hell does bucketize the feature cross of latitude and longtitude even mean? They are not a time feature. C and D don't even make sense. The L1 regularization is something that doesn't answer anything in the question. The only valid feature engineered here is option B. A is not an engineered feature. Create a feature cross of latitude and longitude, bucketize it at the minute level and use L1 regularization during optimization.
upvoted 1 times
baimus
2 months ago
Bucketising means that we're saying "anyone in this square 1.8km (minute) region is considered a single area" - it's actually recommended as a default way to deal with lat/lon, as they don't really work as seperate columns (or at least we'd be hoping the FCNN buckets them intelligently itself, which it won't mostly)
upvoted 1 times
...
...
Jojo9400
1 year, 4 months ago
D You have to use L2, since you have create a new variable with two already existing the risk of multicollinearity is high, L1 is good for selecting feature to avoid curse of dimensionality not for multicollinearity
upvoted 1 times
...
ga8our
1 year, 6 months ago
Why not L2? L2 (Ridge) uses a squared value coefficient as a penalty term to the loss function, while L1 (Lasso) uses an absolute value coefficient. Isn't a squared penalty stronger than an absolute one? https://towardsdatascience.com/l1-and-l2-regularization-methods-ce25e7fc831c
upvoted 1 times
ckanaar
1 year, 2 months ago
L1 regression forces unimportant coefficients to zero. Since the location is extremely important, L1 will force less important coefficients to zero, thereby further increasing the importance of the location coefficient.
upvoted 2 times
...
...
Oleksandr0501
1 year, 7 months ago
gpt: Option C and D suggest bucketizing the feature cross of latitude and longitude at the minute level and using L1 or L2 regularization during optimization. While regularization can help prevent overfitting, bucketizing at such a granular level may not be necessary and could lead to overfitting. It's also not clear how bucketizing at the minute level would capture the spatial relationship between the latitude and longitude features.
upvoted 2 times
...
PolyMoe
1 year, 10 months ago
Selected Answer: D
D. Create a feature cross of latitude and longitude, bucketize it at the minute level and use L2 regularization during optimization. This will create a new feature that captures the physical dependency of the location of the property on the price, and bucketing it at the minute level will reduce the number of unique values and prevent overfitting. L2 regularization will also help to prevent overfitting by penalizing large weights in the model.
upvoted 1 times
cetanx
1 year, 6 months ago
chat-gpt also says D explanation: This approach effectively creates a grid of the geographical area in your data, allowing the model to learn weights for each grid cell (bucket). This helps capture the spatial relationship between latitude and longitude, which can be crucial for real estate prices. Additionally, using L2 regularization helps prevent overfitting by discouraging complex models, which can be particularly important when working with high-dimensional crossed features.
upvoted 1 times
...
...
zellck
1 year, 11 months ago
Selected Answer: C
C is the answer. https://developers.google.com/machine-learning/crash-course/feature-crosses/video-lecture A feature cross is a synthetic feature formed by multiplying (crossing) two or more features. Crossing combinations of features can provide predictive abilities beyond what those features can provide individually. https://developers.google.com/machine-learning/crash-course/regularization-for-sparsity/l1-regularization
upvoted 3 times
...
crismo04
2 years, 2 months ago
https://medium.com/riga-data-science-club/geographic-coordinate-encoding-with-tensorflow-feature-columns-e750ae338b7c#:~:text=to%20the%20rescue!-,Feature%20Crosses,-Combining%20features%20into
upvoted 2 times
crismo04
2 years, 2 months ago
Feature cross seems to be the right feature option
upvoted 1 times
crismo04
2 years, 2 months ago
So it's B option
upvoted 4 times
...
...
...
[Removed]
2 years, 2 months ago
Selected Answer: C
Regularization + location into one
upvoted 1 times
...
AWSandeep
2 years, 2 months ago
Selected Answer: C
C. Create a feature cross of latitude and longitude, bucketize it at the minute level and use L1 regularization during optimization.
upvoted 5 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...