Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.

Unlimited Access

Get Unlimited Contributor Access to the all ExamTopics Exams!
Take advantage of PDF Files for 1000+ Exams along with community discussions and pass IT Certification Exams Easily.

Exam Certified Associate Developer for Apache Spark topic 1 question 27 discussion

Which of the following code blocks returns a new DataFrame with column storeDescription where the pattern "Description: " has been removed from the beginning of column storeDescription in DataFrame storesDF?
A sample of DataFrame storesDF is below:

  • A. storesDF.withColumn("storeDescription", regexp_replace(col("storeDescription"), "^Description: "))
  • B. storesDF.withColumn("storeDescription", col("storeDescription").regexp_replace("^Description: ", ""))
  • C. storesDF.withColumn("storeDescription", regexp_extract(col("storeDescription"), "^Description: ", ""))
  • D. storesDF.withColumn("storeDescription", regexp_replace("storeDescription", "^Description: ", ""))
  • E. storesDF.withColumn("storeDescription", regexp_replace(col("storeDescription"), "^Description: ", ""))
Show Suggested Answer Hide Answer
Suggested Answer: E 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
jds0
1 month, 2 weeks ago
Selected Answer: E
Both D and E work with Spark 3.5.1 but E is better for backward compatibility See code below: from pyspark.sql import SparkSession from pyspark.sql.functions import col, regexp_replace spark = SparkSession.builder.appName("MyApp").getOrCreate() data = [ (0, "Description: Store 0"), (1, "Description: Store 1"), (2, "Description: Store 2"), ] storesDF = spark.createDataFrame(data, ["storeID", "StoreDescription"]) storesDF.withColumn("storeDescription", regexp_replace(col("StoreDescription"), "Description: ", "")).show() storesDF.withColumn("storeDescription", regexp_replace("StoreDescription", "Description: ", "")).show()
upvoted 1 times
...
arturffsi
6 months ago
Both D and E are correct according to the new version
upvoted 1 times
...
azure_bimonster
7 months ago
Selected Answer: E
E is most likely correct in this scenario
upvoted 1 times
...
newusername
12 months ago
Both work: from pyspark.sql import SparkSession from pyspark.sql.functions import regexp_replace,regexp_extract, col spark = SparkSession.builder.appName("test").getOrCreate() data = [ (1, "Description: This is a tech store. Description: This"), (2, "Description: This is a grocery store."), (3, "Description: This is a book store."), ] storesDF = spark.createDataFrame(data, ["storeID", "storeDescription"]) storesDF.show(truncate=False) #Case D print ("Case D") storesDF = storesDF.withColumn("storeDescription", regexp_replace("storeDescription", "^Description: ", "")) storesDF.show(truncate=False) #Case E print ("Case E") storesDF = storesDF.withColumn("storeDescription", regexp_replace(col("storeDescription"), "^Description: ", "")) storesDF.show(truncate=False)
upvoted 3 times
...
Dgohel
1 year, 1 month ago
regexp_replace(str, regexp, rep [, position] ) This is what Databricks documentation says. You guys can debate between D and E but actually question clearly says to remove from the begging of the string. And if you take answer D it takes whole only one constant string “storeDescription” to match pattern and will return empty string after Description for each row. So if you have debate between D, E then E is the correct answer.
upvoted 2 times
...
zozoshanky
1 year, 1 month ago
E is the answer tested
upvoted 2 times
...
NickWerbung
1 year, 2 months ago
Both D and E are correct.
upvoted 1 times
...
SonicBoom10C9
1 year, 3 months ago
Selected Answer: E
It's between D and E, and D is wrong as there is no replacement string expression (which is a required argument/parameter). Thus, E wins as the correct option.
upvoted 1 times
ZSun
1 year, 3 months ago
this is completely wrong explanation. Both D and E has replacement expression, the only difference is how they call the replaced column. Both D and E are correct, but D works for Pyspark 2.0. D and E both work Pyspark 3.0+. Period!
upvoted 7 times
...
ZSun
1 year, 3 months ago
I think what you really mean, "there is no replacement string expression", is for option A. The only difference between A and E, is about the claim of replacement string expression
upvoted 1 times
...
...
sly75
1 year, 4 months ago
Selected Answer: E
Correct answer is E indeed - According to the pyspark doc, the syntax is regexp_replace(str, pattern, replacement) -> it means that it's not a function of the column object - storeDescription is a String field https://spark.apache.org/docs/3.0.0/api/python/pyspark.sql.html#pyspark.sql.functions.regexp_replace
upvoted 2 times
...
pierre_grns
1 year, 4 months ago
Selected Answer: D
Correct answer is D. First, regexp_replace/regexp_extract are from sql.functions. They cannot be applied directly after a column Object => B is incorrect. Second, regexp_replace/regexp_extract accept a STRING Object as a first argument to specify the column. Check the documentation there : https://spark.apache.org/docs/3.0.0/api/python/pyspark.sql.html#module-pyspark.sql.functions => A, C, E are incorrects.
upvoted 2 times
sly75
1 year, 4 months ago
Almost right but it's not about "String object" but "String value". So the correct answer is indeed the answer E ;)
upvoted 2 times
...
...
4be8126
1 year, 4 months ago
Selected Answer: E
The correct answer is option E: storesDF.withColumn("storeDescription", regexp_replace(col("storeDescription"), "^Description: ", "")). This code block uses the withColumn() function to create a new column called storeDescription. It uses the regexp_replace() function to replace the pattern "^Description: " at the beginning of the string in the storeDescription column with an empty string. This effectively removes the pattern from the beginning of the string in each row of the column.
upvoted 4 times
4be8126
1 year, 4 months ago
The correct code block that returns a new DataFrame with column storeDescription where the pattern "Description: " has been removed from the beginning of column storeDescription in DataFrame storesDF is: A. storesDF.withColumn("storeDescription", regexp_replace(col("storeDescription"), "^Description: ")) This code uses the regexp_replace function to replace the pattern "^Description: " (which matches the string "Description: " at the beginning of the string) with an empty string in the column storeDescription. The resulting DataFrame will have the modified storeDescription column. Option B has a syntax error because the regexp_replace function should be called on the column using the dot notation instead of passing it as the second argument. Option C uses the regexp_extract function, which extracts a substring matching a regular expression pattern. It doesn't remove the pattern from the string. Option D has a syntax error because the column name is not wrapped in the col function. Option E is the same as option A, except that it uses the col function unnecessarily.
upvoted 1 times
...
...
4be8126
1 year, 4 months ago
Selected Answer: A
Option A is correct: storesDF.withColumn("productCategories", explode(col("productCategories"))). Explanation: The explode function is used to transform a column of arrays or maps into multiple rows, one for each element in the array or map. In this case, productCategories is a column with arrays of strings. The withColumn function is used to add a new column or update an existing column. The first argument is the name of the new or existing column, and the second argument is the expression that defines the values for the column.
upvoted 1 times
sly75
1 year, 4 months ago
You got the wrong question :°
upvoted 2 times
...
...
ronfun
1 year, 5 months ago
Both D and E are correct answer.
upvoted 2 times
...
TC007
1 year, 5 months ago
Selected Answer: D
This should actually be D sorry for the wrong answer. refer to this, https://sparkbyexamples.com/pyspark/pyspark-replace-column-values/
upvoted 3 times
...
TC007
1 year, 5 months ago
Selected Answer: A
The regexp_replace function is used to remove the pattern "Description: " from the beginning of the column storeDescription. The ^ symbol indicates the beginning of the string, and the pattern "Description: " is replaced with an empty string. This results in a new DataFrame with column storeDescription where the pattern "Description: " has been removed from the beginning of each cell in that column.
upvoted 1 times
4be8126
1 year, 4 months ago
Option A is incorrect because the regexp_replace function requires two arguments: the column to be transformed and the regular expression pattern to be replaced. In the given code block, only the regular expression pattern is provided, but not the column to be transformed. The correct syntax to use regexp_replace on a DataFrame column is regexp_replace(col(column_name), pattern, replacement), where col(column_name) specifies the DataFrame column to be transformed, pattern specifies the regular expression pattern to be replaced, and replacement specifies the new string to replace the matched pattern. Therefore, the correct code block to remove the pattern "Description: " from the beginning of the storeDescription column in DataFrame storesDF is: storesDF.withColumn("storeDescription", regexp_replace(col("storeDescription"), "^Description: ", ""))
upvoted 2 times
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...