TerramEarth plans to connect all 20 million vehicles in the field to the cloud. This increases the volume to 20 million 600 byte records a second for 40 TB an hour. How should you design the data ingestion?
A.
Vehicles write data directly to GCS
B.
Vehicles write data directly to Google Cloud Pub/Sub
C.
Vehicles stream data directly to Google BigQuery
D.
Vehicles continue to write data using the existing system (FTP)
thanks for sharing the link, but seems pub/sub can handle more streaming data than bigquery. pub/sub 120,000,000 kB per minute (2 GB/s) in large regions, bigquery is 1GB/s
Wow its almost like GCP shouldnt have offloaded their IoT Core product - you cant "Write direct to PubSub".
Its the correct answer but its overly simplified
Writing directly to GCS will cost a fortune to retrieve in GET requests etc
Streamed data is available for real-time analysis within a few seconds of the first streaming insertion into a table.
Instead of using a job to load data into BigQuery, you can choose to stream your data into BigQuery one record at a time by using the tabledata().insertAll() method. This approach enables querying data without the delay of running a load job.
References: https://cloud.google.com/bigquery/streaming-data-into-bigquery
They are sending files through FTP why everyone is missing this point? The max message size in pub sub is 10MB as I remember, I would keep the files solution and try to roll out updates to direct the upload to GCS
So many people pointing out this breaks the BigQuery quota limit but very few pointing out it also breaks the Pub/Sub quote limit.......... So the answer is either not bound by the quota limit (in which case why not BigQuery) both are wrong and we stick with FTP
I know it's B, however the sensors are probably legacy systems, that can not communicate to a pub/sub queue.
Ignoring how huge is to change or adapta 20 million devices is a mistake.
To handle the volume of data that TerramEarth plans to ingest, it is recommended to use a scalable and reliable data ingestion solution such as Google Cloud Pub/Sub. With Cloud Pub/Sub, the vehicles can stream data directly to the service, which can handle the high volume of data and provide a buffer to absorb sudden spikes in traffic. The data can then be processed and stored in a data warehouse such as BigQuery for analysis.
Option A (writing data directly to GCS) may not be suitable for handling high volumes of data in real-time and may result in data loss if the volume exceeds the capacity of GCS.
Option C (streaming data directly to BigQuery) may not be suitable for handling high volumes of data in real-time as it may result in data loss or ingestion delays.
Option D (continuing to write data using the existing system) may not be suitable as the current system may not be able to handle the increased volume of data and may result in data loss or ingestion delays.
We need to buffer, the default limit of BigQuery is 100 API calls per second, till now this cannot be changed. Hence we should ease using Pub/Sub so B.
A voting comment increases the vote count for the chosen answer by one.
Upvoting a comment with a selected answer will also increase the vote count towards that answer by one.
So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.
jcmoranp
Highly Voted 5 years, 1 month agoalexspam88
3 years, 5 months agoBill831231
3 years, 1 month agoJoeShmoe
Highly Voted 5 years agoVegasDegenerate
Most Recent 4 months, 3 weeks agothe1dv
7 months, 3 weeks agoVesta1807
11 months agoMahAli
11 months, 2 weeks agoBiddlyBdoyng
1 year, 5 months agokapara
1 year, 6 months agonunopires2001
1 year, 10 months agoomermahgoub
1 year, 11 months agosank8
1 year, 11 months agosurajkrishnamurthy
1 year, 11 months agomegumin
2 years agoMahmoud_E
2 years, 1 month agoAzureDP900
2 years, 4 months agocdcollector
2 years, 5 months agoamxexam
2 years, 6 months ago[Removed]
2 years, 7 months ago