Exam Certified Associate Developer for Apache Spark topic 1 question 18 discussion

Actual exam question from Databricks's Certified Associate Developer for Apache Spark

Question #: 18
Topic #: 1

[All Certified Associate Developer for Apache Spark Questions]

Which of the following operations can be used to create a DataFrame with a subset of columns from DataFrame storesDF that are specified by name?

A. storesDF.subset()
B. storesDF.select()
C. storesDF.selectColumn()
D. storesDF.filter()
E. storesDF.drop()

Show Suggested Answer

Suggested Answer: B 🗳️

by 4be8126 at April 26, 2023, 8:27 a.m.

Comments

Submit Cancel

4be8126

Highly Voted 2 years, 2 months ago

Selected Answer: B

The operation that can be used to create a DataFrame with a subset of columns from DataFrame storesDF that are specified by name is storesDF.select(). The select() operation allows you to specify the columns you want to keep in the resulting DataFrame by passing in the column names as arguments. For example, to create a new DataFrame that contains only the columns store_id and store_name from the storesDF DataFrame, you can use the following code: newDF = storesDF.select("store_id", "store_name")

upvoted 6 times

...

YoSpark

Most Recent 11 months, 3 weeks ago

E.storesDF.drop() is also correct. It is just opposite of select. If you have a large number of columns you need to select but a few to drop to meet your requirements, then drop is easier than select.

upvoted 1 times

...

TmData

2 years ago

Selected Answer: B

The select() operation in Spark DataFrame allows you to specify the columns you want to include in the resulting DataFrame. You can provide column names as arguments to the select() operation to create a new DataFrame with only the specified columns.

upvoted 2 times

...