Which of the following pairs of arguments cannot be used in DataFrame.join() to perform an inner join on two DataFrames, named and aliased with "a" and "b" respectively, to specify two key columns?
A.
on = [a.column1 == b.column1, a.column2 == b.column2]
B.
on = [col("column1"), col("column2")]
C.
on = [col("a.column1") == col("b.column1"), col("a.column2") == col("b.column2")]
D.
All of these options can be used to perform an inner join with two key columns.
According to the following code, only response B returns an error. The key concept here is that dataframes must be "named" AND "aliased".
from pyspark.sql.functions import col
a = spark.createDataFrame([(1, 2), (3, 4)], ['column1', 'column2'])
b = spark.createDataFrame([(1, 2), (5, 6)], ['column1', 'column2'])
a = a.alias('a')
b = b.alias('b')
df = a.join(b, on = [a.column1 == b.column1, a.column2 == b.column2])
display(df)
# df = a.join(b, on = [col("column1"), col("column2")])
df = a.join(b, on = [col("a.column1") == col("b.column1"), col("a.column2") == col("b.column2")])
display(df)
df = a.join(b, on = ["column1", "column2"])
display(df)
I tried all of the options and I got 2 errors from:
B
AMBIGUOUS_REFERENCE] Reference `Category` is ambiguous, could be: [`Category`, `Category`]
C:
[UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name `df_1`.`Category` cannot be resolved.
Did you mean one of the following? [`Category`, `Category`, `Truth`, `Truth`, `Value`].;
from pyspark.sql.functions import col
df2.alias('a').join(df3.alias('b'),
[col("a.name") == col("b.name"), col("a.name") == col("b.name")],
'full_outer').select(df2['name'],'height','age').show()
It worked. so every answer is correct.
A. on = [a.column1 == b.column1, a.column2 == b.column2]
This option is valid and can be used to perform an inner join on two key columns. It specifies the key columns using the syntax a.column1 == b.column1 and a.column2 == b.column2.
I think the question "which one cannot be used to perform inner join", is confusing,
Because only A works, the rest of answer is incorrect.
The question should be "which one can be used"
upvoted 2 times
...
...
Log in to ExamTopics
Sign in:
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.
Upvoting a comment with a selected answer will also increase the vote count towards that answer by one.
So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.
azure_bimonster
9 months, 3 weeks agoGurdel
11 months, 1 week agojuliom6
1 year agonewusername
1 year agonewusername
1 year agojuadaves
1 year, 1 month agoAhmadkt
1 year, 1 month agoSingh_Sumit
1 year, 1 month agocookiemonster42
1 year, 3 months agoJtic
1 year, 6 months agoZSun
1 year, 5 months ago