Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam Certified Associate Developer for Apache Spark All Questions

View all questions & answers for the Certified Associate Developer for Apache Spark exam

Exam Certified Associate Developer for Apache Spark topic 1 question 55 discussion

The code block shown below contains an error. The code block is intended to return a new DataFrame that is the result of a cross join between DataFrame storesDF and DataFrame employeesDF. Identify the error.
Code block:
storesDF.join(employeesDF, "cross")

  • A. A cross join is not implemented by the DataFrame.join() operations – the standalone CrossJoin() operation should be used instead.
  • B. There is no direct cross join in Spark, but it can be implemented by performing an outer join on all columns of both DataFrames.
  • C. A cross join is not implemented by the DataFrame.join()operation – the DataFrame.crossJoin()operation should be used instead.
  • D. There is no key column specified – the key column "storeId" should be the second argument.
  • E. A cross join is not implemented by the DataFrame.join() operations – the standalone join() operation should be used instead.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
oussa_ama
3 months, 1 week ago
Selected Answer: C
Cross Join in PySpark: A cross join (also known as a Cartesian product) returns the Cartesian product of the two DataFrames, meaning every row from the first DataFrame is paired with every row from the second DataFrame. In PySpark, the crossJoin() method is used specifically for this type of join.
upvoted 1 times
...
65bd33e
3 months, 1 week ago
Selected Answer: C
The correct identification of the error is: C. A cross join is not implemented by the DataFrame.join() operation – the DataFrame.crossJoin() operation should be used instead. Explanation: In Spark, to perform a cross join between two DataFrames, you should use the crossJoin() method, not the join() method with the "cross" argument.
upvoted 1 times
...
Ahlo
9 months ago
Correct answer C from pyspark.sql import Row df = spark.createDataFrame( [(14, "Tom"), (23, "Alice"), (16, "Bob")], ["age", "name"]) df2 = spark.createDataFrame( [Row(height=80, name="Tom"), Row(height=85, name="Bob")]) df.crossJoin(df2.select("height")).select("age", "name", "height").show() https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.crossJoin.html
upvoted 2 times
...
azure_bimonster
9 months, 2 weeks ago
Selected Answer: D
D is the answer here as key is missing. As per syntax, key is needed.
upvoted 2 times
...
juliom6
1 year ago
Selected Answer: C
C is correct. # https://spark.apache.org/docs/3.1.1/api/python/reference/api/pyspark.sql.DataFrame.crossJoin.html a = spark.createDataFrame([(1, 2), (3, 4)], ['column1', 'column2']) b = spark.createDataFrame([(5, 6), (7, 8)], ['column3', 'column4']) df = a.crossJoin(b) display(df)
upvoted 3 times
...
newusername
1 year ago
Selected Answer: D
I know it looks confusing to have key column for cross join, but it ijoin method syntaxis: https://spark.apache.org/docs/3.1.2/api/python/reference/api/pyspark.sql.DataFrame.join.html see example below : dataA = [Row(column1=1, column2=2), Row(column1=2, column2=4), Row(column1=3, column2=6)] dfA = spark.createDataFrame(dataA) # Sample data for DataFrame 'b' dataB = [Row(column1=1, column2=2), Row(column1=2, column2=5), Row(column1=3, column2=4)] dfB = spark.createDataFrame(dataB) joinedDF = dfA.join(dfB, on=None, how="cross") joinedDF.show() it is possible to do Cross join this way as well DataFrame.crossJoin() but answer C states that df.join () doesn't do cross, which is wrong.
upvoted 3 times
...
4be8126
1 year, 6 months ago
Selected Answer: C
C. A cross join is not implemented by the DataFrame.join()operation – the DataFrame.crossJoin()operation should be used instead.
upvoted 2 times
...
peekaboo15
1 year, 7 months ago
cross join doesn't need a key. Answer is C
upvoted 2 times
4be8126
1 year, 6 months ago
No, the issue is not that the key column is missing. In a cross join, there is no key column to join on. The correct answer is C: a cross join is not implemented by the DataFrame.join() operation – the DataFrame.crossJoin() operation should be used instead.
upvoted 1 times
...
...
ronfun
1 year, 7 months ago
Key is missing. Answer is D.
upvoted 4 times
4be8126
1 year, 6 months ago
No, the issue is not that the key column is missing. In a cross join, there is no key column to join on. The correct answer is C: a cross join is not implemented by the DataFrame.join() operation – the DataFrame.crossJoin() operation should be used instead.
upvoted 1 times
ZSun
1 year, 5 months ago
completely wrong. join(other, on=None, how=None) Joins with another DataFrame, using the given join expression. [source] Parameters: other – Right side of the join on – a string for the join column name, a list of column names, a join expression (Column), or a list of Columns. If on is a string or a list of strings indicating the name of the join column(s), the column(s) must exist on both sides, and this performs an equi-join. how – str, default inner. Must be one of: inner, cross, outer, full, fullouter, full_outer, left, leftouter, left_outer, right, rightouter, right_outer, semi, leftsemi, left_semi, anti, leftanti and left_anti.
upvoted 2 times
ZSun
1 year, 5 months ago
you can specify cross in dataframe.join( how = 'cross') the reason why this code block doesn't work, because the second parameter is on. You need to specify the key column and then use how = 'cross'. otherwise, the function will regard 'cross' for 'on' instead of 'how'
upvoted 2 times
newusername
1 year ago
ZSun is as always right. 4be8126 - it is not a problem to use gpt, but check its answers. Otherwise do not post it anywhere.
upvoted 1 times
...
...
...
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...