The code block shown below contains an error. The code block intended to read a parquet at the file path filePath into a DataFrame. Identify the error. Code block: spark.read.load(filePath, source – "parquet")
A.
There is no source parameter to the load() operation – the schema parameter should be used instead.
B.
There is no load() operation – it should be parquet() instead.
C.
The spark.read operation should be followed by parentheses to return a DataFrameReader object.
D.
The filePath argument to the load() operation should be quoted.
E.
There is no source parameter to the load() operation – it can be removed.
E is correct. The "format" parameter should be used instead of "source" (default "parquet"):
https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrameReader.load.html
format: str, optional
optional string for format of the data source. Default to ‘parquet’.
The parameters for load() function are: path, format, schema, **options
A. Overall it makes sense, but do we really need to use schema?
B. There is load operation, that's FALSE
C. read is used without parenthesis, FALSE
D. It should indeed, but there's no source parameter, FALSE
E. That's true, but we need to put quotes for the filePath, then it's FALSE
Makes it A, but the question is really strange and not clear.
Answer should be E. Removing source and default is 'parquet' anyway. However, it is not ideal to use load, rather the respective method.
https://spark.apache.org/docs/3.1.1/api/python/reference/api/pyspark.sql.DataFrameReader.load.html?highlight=dataframereader%20load#pyspark.sql.DataFrameReader.load
1. pyspark.sql.SparkSession.read Returns a DataFrameReader
https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.SparkSession.read.html#pyspark.sql.SparkSession.read
2. we check this DataFrameReader, it contains both "load" and "parquet" methods.
2.1. for load, load(path, format, schema)
https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrameReader.load.html#pyspark.sql.DataFrameReader.load
Therefore, the answer is A or E.
Typically parquet contains schema information.
I do not like this question, because if reading a parquet file, directly use spark.read.parquet()
The correct code block to read a parquet file would be
spark.read.parquet(filePath).
upvoted 4 times
...
Log in to ExamTopics
Sign in:
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.
Upvoting a comment with a selected answer will also increase the vote count towards that answer by one.
So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.
juliom6
1 year agonewusername
1 year agoSingh_Sumit
1 year, 1 month agoRam459
1 year, 3 months agocookiemonster42
1 year, 3 months agocookiemonster42
1 year, 3 months agoLarrave
1 year, 5 months agoZSun
1 year, 5 months ago4be8126
1 year, 6 months ago