Exam Certified Associate Developer for Apache Spark All Questions

View all questions & answers for the Certified Associate Developer for Apache Spark exam

Exam Certified Associate Developer for Apache Spark topic 1 question 53 discussion

Actual exam question from Databricks's Certified Associate Developer for Apache Spark

Question #: 53
Topic #: 1

[All Certified Associate Developer for Apache Spark Questions]

Which of the following pairs of arguments cannot be used in DataFrame.join() to perform an inner join on two DataFrames, named and aliased with "a" and "b" respectively, to specify two key columns?

A. on = [a.column1 == b.column1, a.column2 == b.column2]
B. on = [col("column1"), col("column2")]
C. on = [col("a.column1") == col("b.column1"), col("a.column2") == col("b.column2")]
D. All of these options can be used to perform an inner join with two key columns.
E. on = ["column1", "column2"]

Show Suggested Answer

Suggested Answer: B 🗳️

by Jtic at May 29, 2023, 3:58 a.m.

Comments

Submit Cancel

azure_bimonster

9 months, 3 weeks ago

Selected Answer: B

B cannot be used as this seems ambiguous

upvoted 1 times

...

Gurdel

11 months, 1 week ago

Selected Answer: B

B throws AnalysisException: [AMBIGUOUS_REFERENCE] Reference `column1` is ambiguous, could be: [`a`.`column1`, `b`.`column1`]

upvoted 1 times

...

According to the following code, only response B returns an error. The key concept here is that dataframes must be "named" AND "aliased". from pyspark.sql.functions import col a = spark.createDataFrame([(1, 2), (3, 4)], ['column1', 'column2']) b = spark.createDataFrame([(1, 2), (5, 6)], ['column1', 'column2']) a = a.alias('a') b = b.alias('b') df = a.join(b, on = [a.column1 == b.column1, a.column2 == b.column2]) display(df) # df = a.join(b, on = [col("column1"), col("column2")]) df = a.join(b, on = [col("a.column1") == col("b.column1"), col("a.column2") == col("b.column2")]) display(df) df = a.join(b, on = ["column1", "column2"]) display(df)

upvoted 3 times

...

newusername

1 year ago

Selected Answer: B

100% B Below code to test: dataA = [Row(column1=1, column2=2), Row(column1=2, column2=4), Row(column1=3, column2=6)] dfA = spark.createDataFrame(dataA)

upvoted 3 times

newusername

1 year ago

# Sample data for DataFrame 'b' dataB = [Row(column1=1, column2=2), Row(column1=2, column2=5), Row(column1=3, column2=4)] dfB = spark.createDataFrame(dataB) # Alias DataFrames as 'a' and 'b' a = dfA.alias("a") b = dfB.alias("b") a.show() b.show() #Option A joinedDF_A = a.join(b, [a.column1 == b.column1, a.column2 == b.column2]) joinedDF_A.show() #Option B #joinedDF_B = a.join(b, [col("column1"), col("column2")]) #joinedDF_B.show() #Option C joinedDF_C = a.join(b, [col("a.column1") == col("b.column1"), col("a.column2") == col("b.column2")]) joinedDF_C.show() #Option E joinedDF_E = a.join(b, ["column1", "column2"]) joinedDF_E.show()

upvoted 3 times

...

juadaves

1 year, 1 month ago

I tried all of the options and I got 2 errors from: B AMBIGUOUS_REFERENCE] Reference `Category` is ambiguous, could be: [`Category`, `Category`] C: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name `df_1`.`Category` cannot be resolved. Did you mean one of the following? [`Category`, `Category`, `Truth`, `Truth`, `Value`].;

upvoted 2 times

Ahmadkt

1 year, 1 month ago

it's B, it seems you didn't do the alias a = df1.alias("a") b = df2.alias("b")

upvoted 1 times

...

Singh_Sumit

1 year, 1 month ago

from pyspark.sql.functions import col df2.alias('a').join(df3.alias('b'), [col("a.name") == col("b.name"), col("a.name") == col("b.name")], 'full_outer').select(df2['name'],'height','age').show() It worked. so every answer is correct.

upvoted 1 times

...

cookiemonster42

1 year, 3 months ago

Selected Answer: C

should be C as in col() we specify only a column name as a string, not a dataframe

upvoted 3 times

...

Jtic

1 year, 6 months ago

Selected Answer: A

A. on = [a.column1 == b.column1, a.column2 == b.column2] This option is valid and can be used to perform an inner join on two key columns. It specifies the key columns using the syntax a.column1 == b.column1 and a.column2 == b.column2.

upvoted 2 times

ZSun

1 year, 5 months ago

I think the question "which one cannot be used to perform inner join", is confusing, Because only A works, the rest of answer is incorrect. The question should be "which one can be used"

upvoted 2 times

...

Exam Certified Associate Developer for Apache Spark All Questions

View all questions & answers for the Certified Associate Developer for Apache Spark exam

Exam Certified Associate Developer for Apache Spark topic 1 question 53 discussion

Comments

azure_bimonster

Gurdel

juliom6

newusername

newusername

juadaves

Ahmadkt

Singh_Sumit

cookiemonster42

Jtic

ZSun

Get IT Certification

New Version GCP Professional Cloud Architect Certificate & Helpful Information

The 5 Most In-Demand Project Management Certifications of 2019