An organization wants to build a data pipeline to transform its data so it can be reconciled in a data warehouse. The solution must be scalable and require little or no management. Which Google product or service should the organization choose?
D. Dataflow
Dataflow provides unified stream and batch data processing at scale. Use it to create data pipelines that read from one or more sources, transform the data, & write the data to a destination.
Typical use cases for Dataflow:
- Data movement: Ingesting data or replicating data across subsystems.
- ETL (extract-transform-load) workflows that ingest data into a data warehouse such as BigQuery.
- Powering BI dashboards.
- Applying ML in real time to streaming data.
- Processing sensor data or log data at scale.
Dataflow uses the same programming model for both batch and stream analytics. You can ingest, process, and analyze fluctuating volumes of real-time data.
https://cloud.google.com/dataflow/docs/overview
Incorrect options:
A. Cloud Bigtable: Cloud Bigtable is a NoSQL database service designed for storing large amounts of data with low-latency access. While it is useful for certain types of data storage and analysis, it is not specifically designed for building and managing data transformation pipelines.
B. Cloud Storage: Cloud Storage is a scalable object storage service, ideal for storing large volumes of unstructured data. However, it does not offer direct functionality for transforming data or building data pipelines; additional tools or services like Dataflow would be required to transform data stored in Cloud Storage.
C. Pub/Sub: Pub/Sub is a messaging service used for building event-driven systems and real-time messaging. While it is useful for ingesting data into a pipeline, it does not provide the data transformation or reconciliation capabilities needed for this use case. It is often used in combination with other services like Dataflow.
D: Google Cloud Dataflow is a fully-managed service designed to simplify the tasks of processing large amounts of data in both batch and stream modes. Dataflow is especially suited for data integration and ETL (extract, transform, load) tasks, making it ideal for preparing data for a data warehouse. It is built on Apache Beam, providing a unified programming model to define data processing pipelines. The scalable nature of Dataflow means it can adjust resource allocation dynamically based on the workload, and its managed service ensures that the overhead of managing server infrastructure is minimal.
upvoted 3 times
...
Log in to ExamTopics
Sign in:
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.
Upvoting a comment with a selected answer will also increase the vote count towards that answer by one.
So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.
joshnort
12 hours, 7 minutes agojoshnort
12 hours, 6 minutes agoVivek007
2 months, 2 weeks ago