Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam Certified Data Engineer Professional All Questions

View all questions & answers for the Certified Data Engineer Professional exam

Exam Certified Data Engineer Professional topic 1 question 9 discussion

Actual exam question from Databricks's Certified Data Engineer Professional
Question #: 9
Topic #: 1
[All Certified Data Engineer Professional Questions]

A junior member of the data engineering team is exploring the language interoperability of Databricks notebooks. The intended outcome of the below code is to register a view of all sales that occurred in countries on the continent of Africa that appear in the geo_lookup table.
Before executing the code, running SHOW TABLES on the current database indicates the database contains only two tables: geo_lookup and sales.

Which statement correctly describes the outcome of executing these command cells in order in an interactive notebook?

  • A. Both commands will succeed. Executing show tables will show that countries_af and sales_af have been registered as views.
  • B. Cmd 1 will succeed. Cmd 2 will search all accessible databases for a table or view named countries_af: if this entity exists, Cmd 2 will succeed.
  • C. Cmd 1 will succeed and Cmd 2 will fail. countries_af will be a Python variable representing a PySpark DataFrame.
  • D. Both commands will fail. No new variables, tables, or views will be created.
  • E. Cmd 1 will succeed and Cmd 2 will fail. countries_af will be a Python variable containing a list of strings.
Show Suggested Answer Hide Answer
Suggested Answer: E 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
aragorn_brego
Highly Voted 1 year ago
Selected Answer: E
Cmd 1 is a PySpark command that collects the list of countries from the 'geo_lookup' table where the continent is Africa ('AF'). This command will execute successfully, resulting in countries_af being a list of country names (strings) in Python's local memory. Cmd 2 is an SQL command intended to create a view named 'sales_af' from the 'sales' table, filtered by the cities in the countries_af list. However, this will fail because the countries_af variable exists in the Python environment and is not recognized in the SQL context. SQL does not have access to Python variables directly; they are two separate execution contexts within a Databricks notebook. There is no table or view named countries_af that SQL can reference; it is merely a Python list variable. The other options are incorrect because they either assume cross-contextual operation between Python and SQL within a Databricks notebook (which is not possible in the way described in the commands), or they do not correctly interpret the outcome of running the commands.
upvoted 8 times
...
benni_ale
Most Recent 1 month, 3 weeks ago
Selected Answer: E
E , the collect method outputs strings so the python variable bill be a list of string which should not be called as a spark table as in cmd 2
upvoted 1 times
...
imatheushenrique
5 months, 3 weeks ago
E. Cmd 1 will succeed and Cmd 2 will fail. countries_af will be a Python variable containing a list of strings.
upvoted 1 times
...
juliom6
7 months, 2 weeks ago
Selected Answer: E
E is correct. %sql create table geo_lookup (continent varchar(2), country varchar(15)); insert into geo_lookup (continent, country) values ('AF','Nigeria'), ('AF','Kenya'); create table sales (city varchar(15), continent varchar(2)); insert into sales (city, continent) values ('Nigeria','AF'), ('Kenya','AF'); %python countries_af = [x[0] for x in spark.table('geo_lookup').filter("continent='AF'").select('country').collect()] %sql create view sales_af as select * from sales where city in countries_af and continent = "AF"; ParseException: [PARSE_SYNTAX_ERROR] Syntax error at or near 'in'.(line 4, pos 11) i.e. countries_af is a python list of strings and can't be used inside a sql statement
upvoted 3 times
AndreFR
3 months, 1 week ago
%python print(countries_af) type(countries_af)
upvoted 1 times
...
...
leopedroso1
9 months, 2 weeks ago
By simulating this code in databricks we can see an error being thrown in the SQL statement ParseException: [PARSE_SYNTAX_ERROR] Syntax error at or near 'IN'.(line 1, pos 38) == SQL == SELECT * FROM backup.sales WHERE CITY IN countries_af AND CONTINENT = "AF"
upvoted 1 times
...
RiktRikt007
9 months, 2 weeks ago
Selected Answer: B
B shows the actual flow of spark sql, where E shows the question context, i mean from databricks point of view E never looked, it's true that question state that database has no other tables, so ?? that mean databricks will not check for that particular table ? it will right ? i also confused by "database has no other database statement" and E and B both are right, but again B state "if countries table exists then command 2 will run" here "if" used, but question want to describe the language interoperability, so most of us selected E
upvoted 1 times
benni_ale
2 months ago
how could it succed if all people tested sql parse syntax error?
upvoted 1 times
...
...
PrashantTiwari
9 months, 2 weeks ago
E is correct
upvoted 2 times
...
Jay_98_11
10 months, 2 weeks ago
Selected Answer: E
vote for E
upvoted 2 times
...
kz_data
10 months, 2 weeks ago
Selected Answer: E
E is correct answer
upvoted 1 times
...
ismoshkov
1 year ago
Selected Answer: B
https://docs.databricks.com/en/notebooks/notebooks-code.html#mix-languages Variables defined in one language (and hence in the REPL for that language) are not available in the REPL of another language
upvoted 1 times
Naveenkm
12 months ago
It is mentioned there exists only 2 objects in database. so B is not an option
upvoted 1 times
...
Karen1232123
1 year ago
even if it exists, a table or a view won't work in cmd 2
upvoted 2 times
...
...
sturcu
1 year, 1 month ago
Selected Answer: E
correct
upvoted 1 times
...
lucasasterio
1 year, 2 months ago
Selected Answer: E
correct
upvoted 2 times
...
Eertyy
1 year, 3 months ago
E is right nswer
upvoted 2 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...