Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam Certified Associate Developer for Apache Spark All Questions

View all questions & answers for the Certified Associate Developer for Apache Spark exam

Exam Certified Associate Developer for Apache Spark topic 1 question 53 discussion

Which of the following pairs of arguments cannot be used in DataFrame.join() to perform an inner join on two DataFrames, named and aliased with "a" and "b" respectively, to specify two key columns?

  • A. on = [a.column1 == b.column1, a.column2 == b.column2]
  • B. on = [col("column1"), col("column2")]
  • C. on = [col("a.column1") == col("b.column1"), col("a.column2") == col("b.column2")]
  • D. All of these options can be used to perform an inner join with two key columns.
  • E. on = ["column1", "column2"]
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
azure_bimonster
9 months, 3 weeks ago
Selected Answer: B
B cannot be used as this seems ambiguous
upvoted 1 times
...
Gurdel
11 months, 1 week ago
Selected Answer: B
B throws AnalysisException: [AMBIGUOUS_REFERENCE] Reference `column1` is ambiguous, could be: [`a`.`column1`, `b`.`column1`]
upvoted 1 times
...
juliom6
1 year ago
Selected Answer: B
According to the following code, only response B returns an error. The key concept here is that dataframes must be "named" AND "aliased". from pyspark.sql.functions import col a = spark.createDataFrame([(1, 2), (3, 4)], ['column1', 'column2']) b = spark.createDataFrame([(1, 2), (5, 6)], ['column1', 'column2']) a = a.alias('a') b = b.alias('b') df = a.join(b, on = [a.column1 == b.column1, a.column2 == b.column2]) display(df) # df = a.join(b, on = [col("column1"), col("column2")]) df = a.join(b, on = [col("a.column1") == col("b.column1"), col("a.column2") == col("b.column2")]) display(df) df = a.join(b, on = ["column1", "column2"]) display(df)
upvoted 3 times
...
newusername
1 year ago
Selected Answer: B
100% B Below code to test: dataA = [Row(column1=1, column2=2), Row(column1=2, column2=4), Row(column1=3, column2=6)] dfA = spark.createDataFrame(dataA)
upvoted 3 times
newusername
1 year ago
# Sample data for DataFrame 'b' dataB = [Row(column1=1, column2=2), Row(column1=2, column2=5), Row(column1=3, column2=4)] dfB = spark.createDataFrame(dataB) # Alias DataFrames as 'a' and 'b' a = dfA.alias("a") b = dfB.alias("b") a.show() b.show() #Option A joinedDF_A = a.join(b, [a.column1 == b.column1, a.column2 == b.column2]) joinedDF_A.show() #Option B #joinedDF_B = a.join(b, [col("column1"), col("column2")]) #joinedDF_B.show() #Option C joinedDF_C = a.join(b, [col("a.column1") == col("b.column1"), col("a.column2") == col("b.column2")]) joinedDF_C.show() #Option E joinedDF_E = a.join(b, ["column1", "column2"]) joinedDF_E.show()
upvoted 3 times
...
...
juadaves
1 year, 1 month ago
I tried all of the options and I got 2 errors from: B AMBIGUOUS_REFERENCE] Reference `Category` is ambiguous, could be: [`Category`, `Category`] C: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name `df_1`.`Category` cannot be resolved. Did you mean one of the following? [`Category`, `Category`, `Truth`, `Truth`, `Value`].;
upvoted 2 times
Ahmadkt
1 year, 1 month ago
it's B, it seems you didn't do the alias a = df1.alias("a") b = df2.alias("b")
upvoted 1 times
...
...
Singh_Sumit
1 year, 1 month ago
from pyspark.sql.functions import col df2.alias('a').join(df3.alias('b'), [col("a.name") == col("b.name"), col("a.name") == col("b.name")], 'full_outer').select(df2['name'],'height','age').show() It worked. so every answer is correct.
upvoted 1 times
...
cookiemonster42
1 year, 3 months ago
Selected Answer: C
should be C as in col() we specify only a column name as a string, not a dataframe
upvoted 3 times
...
Jtic
1 year, 6 months ago
Selected Answer: A
A. on = [a.column1 == b.column1, a.column2 == b.column2] This option is valid and can be used to perform an inner join on two key columns. It specifies the key columns using the syntax a.column1 == b.column1 and a.column2 == b.column2.
upvoted 2 times
ZSun
1 year, 5 months ago
I think the question "which one cannot be used to perform inner join", is confusing, Because only A works, the rest of answer is incorrect. The question should be "which one can be used"
upvoted 2 times
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...