exam questions

Exam Certified Associate Developer for Apache Spark All Questions

View all questions & answers for the Certified Associate Developer for Apache Spark exam

Exam Certified Associate Developer for Apache Spark topic 1 question 55 discussion

The code block shown below contains an error. The code block is intended to return a new DataFrame that is the result of a cross join between DataFrame storesDF and DataFrame employeesDF. Identify the error.
Code block:
storesDF.join(employeesDF, "cross")

  • A. A cross join is not implemented by the DataFrame.join() operations – the standalone CrossJoin() operation should be used instead.
  • B. There is no direct cross join in Spark, but it can be implemented by performing an outer join on all columns of both DataFrames.
  • C. A cross join is not implemented by the DataFrame.join()operation – the DataFrame.crossJoin()operation should be used instead.
  • D. There is no key column specified – the key column "storeId" should be the second argument.
  • E. A cross join is not implemented by the DataFrame.join() operations – the standalone join() operation should be used instead.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
mineoolee
1 month, 3 weeks ago
Selected Answer: D
it is wokring data = [ (0, 2, 1100746394), (1, 2, 1474410343) ] df = spark.createDataFrame( data, ['storeId','a', 'openDate'] ) _data = [ ('a', 2, 4444444444), ('c', 2, None), ('b', None, 2222222222) ] _df = spark.createDataFrame( _data, ['storeId','a', 'openDate'] ) df.join(_df, 'a', "cross").show()
upvoted 1 times
mineoolee
1 month, 3 weeks ago
also, df.join(_df, '"cross").show() is working
upvoted 1 times
Kalipe
3 weeks, 6 days ago
it's wrong, it doesn't work or you obviously haven't try it
upvoted 1 times
...
...
...
oussa_ama
5 months, 2 weeks ago
Selected Answer: C
Cross Join in PySpark: A cross join (also known as a Cartesian product) returns the Cartesian product of the two DataFrames, meaning every row from the first DataFrame is paired with every row from the second DataFrame. In PySpark, the crossJoin() method is used specifically for this type of join.
upvoted 2 times
...
65bd33e
5 months, 3 weeks ago
Selected Answer: C
The correct identification of the error is: C. A cross join is not implemented by the DataFrame.join() operation – the DataFrame.crossJoin() operation should be used instead. Explanation: In Spark, to perform a cross join between two DataFrames, you should use the crossJoin() method, not the join() method with the "cross" argument.
upvoted 1 times
...
Ahlo
11 months, 2 weeks ago
Correct answer C from pyspark.sql import Row df = spark.createDataFrame( [(14, "Tom"), (23, "Alice"), (16, "Bob")], ["age", "name"]) df2 = spark.createDataFrame( [Row(height=80, name="Tom"), Row(height=85, name="Bob")]) df.crossJoin(df2.select("height")).select("age", "name", "height").show() https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.crossJoin.html
upvoted 2 times
...
azure_bimonster
12 months ago
Selected Answer: D
D is the answer here as key is missing. As per syntax, key is needed.
upvoted 2 times
...
juliom6
1 year, 2 months ago
Selected Answer: C
C is correct. # https://spark.apache.org/docs/3.1.1/api/python/reference/api/pyspark.sql.DataFrame.crossJoin.html a = spark.createDataFrame([(1, 2), (3, 4)], ['column1', 'column2']) b = spark.createDataFrame([(5, 6), (7, 8)], ['column3', 'column4']) df = a.crossJoin(b) display(df)
upvoted 3 times
...
newusername
1 year, 3 months ago
Selected Answer: D
I know it looks confusing to have key column for cross join, but it ijoin method syntaxis: https://spark.apache.org/docs/3.1.2/api/python/reference/api/pyspark.sql.DataFrame.join.html see example below : dataA = [Row(column1=1, column2=2), Row(column1=2, column2=4), Row(column1=3, column2=6)] dfA = spark.createDataFrame(dataA) # Sample data for DataFrame 'b' dataB = [Row(column1=1, column2=2), Row(column1=2, column2=5), Row(column1=3, column2=4)] dfB = spark.createDataFrame(dataB) joinedDF = dfA.join(dfB, on=None, how="cross") joinedDF.show() it is possible to do Cross join this way as well DataFrame.crossJoin() but answer C states that df.join () doesn't do cross, which is wrong.
upvoted 4 times
tmz1
1 week, 2 days ago
Totally agree. The stament in answer C "A cross join is not implemented by the DataFrame.join()operation" is incorrect. It is implemented and I have tested it. Results below: products_df = spark.table('products') orders_df = spark.table('orders') print(products_df.count()) -> 200 print(orders_df.count()) -> 2140 cross_joined_df = products_df.join(orders_df, None, "cross") print(cross_joined_df.count()) -> 428000
upvoted 1 times
...
...
4be8126
1 year, 9 months ago
Selected Answer: C
C. A cross join is not implemented by the DataFrame.join()operation – the DataFrame.crossJoin()operation should be used instead.
upvoted 2 times
...
peekaboo15
1 year, 9 months ago
cross join doesn't need a key. Answer is C
upvoted 2 times
4be8126
1 year, 9 months ago
No, the issue is not that the key column is missing. In a cross join, there is no key column to join on. The correct answer is C: a cross join is not implemented by the DataFrame.join() operation – the DataFrame.crossJoin() operation should be used instead.
upvoted 1 times
...
...
ronfun
1 year, 10 months ago
Key is missing. Answer is D.
upvoted 4 times
4be8126
1 year, 9 months ago
No, the issue is not that the key column is missing. In a cross join, there is no key column to join on. The correct answer is C: a cross join is not implemented by the DataFrame.join() operation – the DataFrame.crossJoin() operation should be used instead.
upvoted 1 times
ZSun
1 year, 8 months ago
completely wrong. join(other, on=None, how=None) Joins with another DataFrame, using the given join expression. [source] Parameters: other – Right side of the join on – a string for the join column name, a list of column names, a join expression (Column), or a list of Columns. If on is a string or a list of strings indicating the name of the join column(s), the column(s) must exist on both sides, and this performs an equi-join. how – str, default inner. Must be one of: inner, cross, outer, full, fullouter, full_outer, left, leftouter, left_outer, right, rightouter, right_outer, semi, leftsemi, left_semi, anti, leftanti and left_anti.
upvoted 2 times
ZSun
1 year, 8 months ago
you can specify cross in dataframe.join( how = 'cross') the reason why this code block doesn't work, because the second parameter is on. You need to specify the key column and then use how = 'cross'. otherwise, the function will regard 'cross' for 'on' instead of 'how'
upvoted 2 times
newusername
1 year, 3 months ago
ZSun is as always right. 4be8126 - it is not a problem to use gpt, but check its answers. Otherwise do not post it anywhere.
upvoted 1 times
...
...
...
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago