Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam Professional Data Engineer All Questions

View all questions & answers for the Professional Data Engineer exam

Exam Professional Data Engineer topic 1 question 91 discussion

Actual exam question from Google's Professional Data Engineer
Question #: 91
Topic #: 1
[All Professional Data Engineer Questions]

You work for a bank. You have a labelled dataset that contains information on already granted loan application and whether these applications have been defaulted. You have been asked to train a model to predict default rates for credit applicants.
What should you do?

  • A. Increase the size of the dataset by collecting additional data.
  • B. Train a linear regression to predict a credit default risk score.
  • C. Remove the bias from the data and collect applications that have been declined loans.
  • D. Match loan applicants with their social profiles to enable feature engineering.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
GHN74
Highly Voted 4 years, 2 months ago
A is incorrect as you need to work with the data you have available C is an optimisation not a solution D is ethically incorrect and invasion to privacy, there could be several legal implications with this B although oversimplified but is a workable solution
upvoted 38 times
sergio6
3 years, 1 month ago
Information in social profiles are public
upvoted 1 times
sergio6
3 years, 1 month ago
according to the privacy settings and shareable informations
upvoted 1 times
...
...
...
sumanshu
Highly Voted 3 years, 7 months ago
We have labelled data that contains whether a loan application is accepted or defaulted - So Classification Problem Data. We need to predict (Default Rates for applicants) - I think whether application will be granted or defaulted. - So Binary Classification. No option matches the answer. - if we mark 'B' - It should be Logistic Regression, Instead of Linear Regression
upvoted 21 times
szefco
2 years, 11 months ago
Question says: "to predict default RATES for credit applicants". It is not binary classification, so Linear Regression would work here. I think B is correct answer.
upvoted 22 times
cchen8181
1 year, 6 months ago
Correct approach is to use logistic regression to predict default/not default, and then take the confidence/probability of the outcome as the "default rate". Linear regression doesn't make sense since we are not given a default rate label in our data, we are just given the labels default vs no default.
upvoted 1 times
...
Aaronn14
1 year, 8 months ago
You cannot predict rate. You predict a realization, which is either default or not. This question is terribly written.
upvoted 2 times
...
...
...
nairoh
Most Recent 1 month, 1 week ago
"social" does not mean Social Media. This could be linked to demograhic data, si it could improve the score.
upvoted 1 times
...
TVH_Data_Engineer
11 months, 1 week ago
Selected Answer: B
To predict default rates for credit applicants using the labeled dataset of granted loan applications, the most appropriate course of action would be: B. Train a linear regression to predict a credit default risk score. Here's the rationale for this approach: Appropriate Model for Prediction: Linear regression is a common statistical method used for predictive modeling, particularly when the outcome variable (in this case, the likelihood of default) is continuous. In the context of credit scoring, linear regression can be used to predict a risk score that represents the probability of default. Utilization of Labeled Data: Since you already have a labeled dataset containing information on loans that have been granted and whether they have defaulted, you can use this data to train the regression model. This historical data provides the model with examples of borrower characteristics and their corresponding default outcomes.
upvoted 3 times
...
rocky48
11 months, 3 weeks ago
Selected Answer: B
B. Train a linear regression to predict a credit default risk score.
upvoted 1 times
...
gaurav0480
1 year, 3 months ago
What would be the target variable if B is correct i.e. training a linear regression model? Default/No-Default is a categorical variable one cannot train a linear regression model with this target variable
upvoted 1 times
...
FDS1993
1 year, 4 months ago
Selected Answer: C
C - it is a typical approach in credit loans. Keeping only the accepted loans leads to a bias in the application
upvoted 1 times
...
Mathew106
1 year, 4 months ago
Selected Answer: B
Linear regression is not the good way to solve such a problem, but you can totally apply linear regression to solve a classification problem. Just set the labels to numeric values 0 and 1 and linear regression will try to predict a value inbetween and round to the closest label (0 or 1). Totally not the way to go about it, but actually it's possible.
upvoted 1 times
...
juliobs
1 year, 8 months ago
Selected Answer: D
Cannot be B. This is logistic regression, not linear regression. D is the only acceptable option. Social profile can include things like high or low income, for example. When you apply for a credit you usually have to give this information, so totally legal.
upvoted 1 times
baimus
2 months ago
Credit scores are numbers, so this is a regression. Whether or not a client defaults could be a classification, but the option specifies the use of scores, which is fine.
upvoted 1 times
...
Oleksandr0501
1 year, 7 months ago
D. Matching loan applicants with their social profiles to enable feature engineering is not recommended as it raises privacy concerns and may not be legal in some jurisdictions. Additionally, social profiles may not be a good indicator of creditworthiness, and relying on them may introduce bias or discrimination.
upvoted 1 times
...
...
jin0
1 year, 8 months ago
Because there is no option to know what dataset schema is even though B is needed for this question's purpose Nobody can't select B. So there is none to answer
upvoted 1 times
...
musumusu
1 year, 9 months ago
all options are wrong: Still in favour of B A: ofc its good to have more data but its not clear how much data we have B: Linear can be a workable approach but current situation is not for linear approach, decision tree, random forest etc can be good for it. C: DAta should be unbiased, removing bias is negative for tranining
upvoted 1 times
...
Besss
1 year, 10 months ago
Selected Answer: B
default rates can be predicted with linear regression.
upvoted 1 times
21c17b3
11 months, 3 weeks ago
default rates is classification probability
upvoted 1 times
...
...
Whoswho
1 year, 11 months ago
Answer should actually be a logistic Regression model
upvoted 2 times
...
zellck
1 year, 11 months ago
Selected Answer: B
B is the answer.
upvoted 2 times
...
ladistar
1 year, 11 months ago
The question asks about default RATES, as in you are predicting a continuous variable, not a discrete one (classification). This is a regression problem, so choice B.
upvoted 2 times
...
woyaolai
2 years ago
I used to be a Credit Risk modeler and I think this question is stupid.
upvoted 8 times
...
Azlijaffar
2 years, 1 month ago
Data that you have is binary - Defaulted or not. You want default rates - Linear Regression. HOWEVER. The data that you have is for "ALREADY GRANTED" loan applications and whether they have defaulted or not. But you want to "train a model to predict default rates for credit applicants", which would include applicants who would not be granted the loan. If you just work on that dataset your model will not be as accurate as it wont have considered profiles of applicants that normally would not be granted those loans in the first place. Am I missing something here?
upvoted 2 times
Azlijaffar
2 years, 1 month ago
So answer should be C i think.
upvoted 2 times
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...