Guaranteed of Professional-Data-Engineer free question materials and exam dumps for Google certification for IT engineers, Real Success Guaranteed with Updated Professional-Data-Engineer pdf dumps vce Materials. 100% PASS Google Professional Data Engineer Exam exam Today!

Also have Professional-Data-Engineer free dumps questions for you:

NEW QUESTION 1

What is the HBase Shell for Cloud Bigtable?

  • A. The HBase shell is a GUI based interface that performs administrative tasks, such as creating and deleting tables.
  • B. The HBase shell is a command-line tool that performs administrative tasks, such as creating and deleting tables.
  • C. The HBase shell is a hypervisor based shell that performs administrative tasks, such as creating and deleting new virtualized instances.
  • D. The HBase shell is a command-line tool that performs only user account management functions to grant access to Cloud Bigtable instances.

Answer: B

Explanation:
The HBase shell is a command-line tool that performs administrative tasks, such as creating and deleting tables. The Cloud Bigtable HBase client for Java makes it possible to use the HBase shell to connect to Cloud Bigtable.
Reference: https://cloud.google.com/bigtable/docs/installing-hbase-shell

NEW QUESTION 2

You have developed three data processing jobs. One executes a Cloud Dataflow pipeline that transforms data uploaded to Cloud Storage and writes results to BigQuery. The second ingests data from on-premises servers and uploads it to Cloud Storage. The third is a Cloud Dataflow pipeline that gets information from third-party data providers and uploads the information to Cloud Storage. You need to be able to schedule and monitor the execution of these three workflows and manually execute them when needed. What should you do?

  • A. Create a Direct Acyclic Graph in Cloud Composer to schedule and monitor the jobs.
  • B. Use Stackdriver Monitoring and set up an alert with a Webhook notification to trigger the jobs.
  • C. Develop an App Engine application to schedule and request the status of the jobs using GCP API calls.
  • D. Set up cron jobs in a Compute Engine instance to schedule and monitor the pipelines using GCP API calls.

Answer: D

NEW QUESTION 3

You launched a new gaming app almost three years ago. You have been uploading log files from the previous day to a separate Google BigQuery table with the table name format LOGS_yyyymmdd. You have been using table wildcard functions to generate daily and monthly reports for all time ranges. Recently, you discovered that some queries that cover long date ranges are exceeding the limit of 1,000 tables and failing. How can you resolve this issue?

  • A. Convert all daily log tables into date-partitioned tables
  • B. Convert the sharded tables into a single partitioned table
  • C. Enable query caching so you can cache data from previous months
  • D. Create separate views to cover each month, and query from these views

Answer: A

NEW QUESTION 4

You store historic data in Cloud Storage. You need to perform analytics on the historic data. You want to use a solution to detect invalid data entries and perform data transformations that will not require programming or knowledge of SQL.
What should you do?

  • A. Use Cloud Dataflow with Beam to detect errors and perform transformations.
  • B. Use Cloud Dataprep with recipes to detect errors and perform transformations.
  • C. Use Cloud Dataproc with a Hadoop job to detect errors and perform transformations.
  • D. Use federated tables in BigQuery with queries to detect errors and perform transformations.

Answer: A

NEW QUESTION 5

You operate an IoT pipeline built around Apache Kafka that normally receives around 5000 messages per second. You want to use Google Cloud Platform to create an alert as soon as the moving average over 1 hour drops below 4000 messages per second. What should you do?

  • A. Consume the stream of data in Cloud Dataflow using Kafka I
  • B. Set a sliding time window of 1 hour every 5 minute
  • C. Compute the average when the window closes, and send an alert if the average is less than 4000 messages.
  • D. Consume the stream of data in Cloud Dataflow using Kafka I
  • E. Set a fixed time window of 1 hour.Compute the average when the window closes, and send an alert if the average is less than 4000 messages.
  • F. Use Kafka Connect to link your Kafka message queue to Cloud Pub/Su
  • G. Use a Cloud Dataflow template to write your messages from Cloud Pub/Sub to Cloud Bigtabl
  • H. Use Cloud Scheduler to run a script every hour that counts the number of rows created in Cloud Bigtable in the last hou
  • I. If that number falls below 4000, send an alert.
  • J. Use Kafka Connect to link your Kafka message queue to Cloud Pub/Su
  • K. Use a Cloud Dataflow template to write your messages from Cloud Pub/Sub to BigQuer
  • L. Use Cloud Scheduler to run a script every five minutes that counts the number of rows created in BigQuery in the last hou
  • M. If that number falls below 4000, send an alert.

Answer: C

NEW QUESTION 6

Your company is performing data preprocessing for a learning algorithm in Google Cloud Dataflow. Numerous data logs are being are being generated during this step, and the team wants to analyze them. Due to the dynamic nature of the campaign, the data is growing exponentially every hour.
The data scientists have written the following code to read the data for a new key features in the logs. BigQueryIO.Read
.named(“ReadLogData”)
.from(“clouddataflow-readonly:samples.log_data”)
You want to improve the performance of this data read. What should you do?

  • A. Specify the TableReference object in the code.
  • B. Use .fromQuery operation to read specific fields from the table.
  • C. Use of both the Google BigQuery TableSchema and TableFieldSchema classes.
  • D. Call a transform that returns TableRow objects, where each element in the PCollexction represents asingle row in the table.

Answer: D

NEW QUESTION 7

Which of the following is NOT one of the three main types of triggers that Dataflow supports?

  • A. Trigger based on element size in bytes
  • B. Trigger that is a combination of other triggers
  • C. Trigger based on element count
  • D. Trigger based on time

Answer: A

Explanation:
There are three major kinds of triggers that Dataflow supports: 1. Time-based triggers 2. Data-driven triggers. You can set a trigger to emit results from a window when that window has received a certain number of data elements. 3. Composite triggers. These triggers combine multiple time-based or data-driven triggers in some logical way
Reference: https://cloud.google.com/dataflow/model/triggers

NEW QUESTION 8

You are operating a Cloud Dataflow streaming pipeline. The pipeline aggregates events from a Cloud Pub/Sub subscription source, within a window, and sinks the resulting aggregation to a Cloud Storage bucket. The source has consistent throughput. You want to monitor an alert on behavior of the pipeline with Cloud Stackdriver to ensure that it is processing data. Which Stackdriver alerts should you create?

  • A. An alert based on a decrease of subscription/num_undelivered_messages for the source and a rate of change increase of instance/storage/used_bytes for the destination
  • B. An alert based on an increase of subscription/num_undelivered_messages for the source and a rate of change decrease of instance/storage/used_bytes for the destination
  • C. An alert based on a decrease of instance/storage/used_bytes for the source and a rate of change increase of subscription/num_undelivered_messages for the destination
  • D. An alert based on an increase of instance/storage/used_bytes for the source and a rate of change decrease of subscription/num_undelivered_messages for the destination

Answer: B

NEW QUESTION 9

Which of the following are examples of hyperparameters? (Select 2 answers.)

  • A. Number of hidden layers
  • B. Number of nodes in each hidden layer
  • C. Biases
  • D. Weights

Answer: AB

Explanation:
If model parameters are variables that get adjusted by training with existing data, your hyperparameters are the variables about the training process itself. For example, part of setting up a deep neural network is deciding how many "hidden" layers of nodes to use between the input layer and the output layer, as well as how many nodes each layer should use. These variables are not directly related to the training data at all. They are configuration variables. Another difference is that parameters change during a training job, while the hyperparameters are usually constant during a job.
Weights and biases are variables that get adjusted during the training process, so they are not hyperparameters. Reference: https://cloud.google.com/ml-engine/docs/hyperparameter-tuning-overview

NEW QUESTION 10

You work for a bank. You have a labelled dataset that contains information on already granted loan application and whether these applications have been defaulted. You have been asked to train a model to predict default rates for credit applicants.
What should you do?

  • A. Increase the size of the dataset by collecting additional data.
  • B. Train a linear regression to predict a credit default risk score.
  • C. Remove the bias from the data and collect applications that have been declined loans.
  • D. Match loan applicants with their social profiles to enable feature engineering.

Answer: B

NEW QUESTION 11

You have a petabyte of analytics data and need to design a storage and processing platform for it. You must be able to perform data warehouse-style analytics on the data in Google Cloud and expose the dataset as files for batch analysis tools in other cloud providers. What should you do?

  • A. Store and process the entire dataset in BigQuery.
  • B. Store and process the entire dataset in Cloud Bigtable.
  • C. Store the full dataset in BigQuery, and store a compressed copy of the data in a Cloud Storage bucket.
  • D. Store the warm data as files in Cloud Storage, and store the active data in BigQuer
  • E. Keep this ratio as 80% warm and 20% active.

Answer: D

NEW QUESTION 12

You have data pipelines running on BigQuery, Cloud Dataflow, and Cloud Dataproc. You need to perform health checks and monitor their behavior, and then notify the team managing the pipelines if they fail. You also need to be able to work across multiple projects. Your preference is to use managed products of features of the platform. What should you do?

  • A. Export the information to Cloud Stackdriver, and set up an Alerting policy
  • B. Run a Virtual Machine in Compute Engine with Airflow, and export the information to Stackdriver
  • C. Export the logs to BigQuery, and set up App Engine to read that information and send emails if you find a failure in the logs
  • D. Develop an App Engine application to consume logs using GCP API calls, and send emails if you find a failure in the logs

Answer: B

NEW QUESTION 13

In order to securely transfer web traffic data from your computer's web browser to the Cloud Dataproc cluster you should use a(n) .

  • A. VPN connection
  • B. Special browser
  • C. SSH tunnel
  • D. FTP connection

Answer: C

Explanation:
To connect to the web interfaces, it is recommended to use an SSH tunnel to create a secure connection to the master node.
Reference:
https://cloud.google.com/dataproc/docs/concepts/cluster-web-interfaces#connecting_to_the_web_interfaces

NEW QUESTION 14

You are developing a software application using Google's Dataflow SDK, and want to use conditional, for loops and other complex programming structures to create a branching pipeline. Which component will be used for the data processing operation?

  • A. PCollection
  • B. Transform
  • C. Pipeline
  • D. Sink API

Answer: B

Explanation:
In Google Cloud, the Dataflow SDK provides a transform component. It is responsible for the data processing operation. You can use conditional, for loops, and other complex programming structure to create a branching pipeline.
Reference: https://cloud.google.com/dataflow/model/programming-model

NEW QUESTION 15

Which Cloud Dataflow / Beam feature should you use to aggregate data in an unbounded data source every hour based on the time when the data entered the pipeline?

  • A. An hourly watermark
  • B. An event time trigger
  • C. The with Allowed Lateness method
  • D. A processing time trigger

Answer: D

Explanation:
When collecting and grouping data into windows, Beam uses triggers to determine when to emit the aggregated results of each window.
Processing time triggers. These triggers operate on the processing time – the time when the data element is processed at any given stage in the pipeline.
Event time triggers. These triggers operate on the event time, as indicated by the timestamp on each data
element. Beam’s default trigger is event time-based.
Reference: https://beam.apache.org/documentation/programming-guide/#triggers

NEW QUESTION 16

You work for an economic consulting firm that helps companies identify economic trends as they happen. As part of your analysis, you use Google BigQuery to correlate customer data with the average prices of the 100 most common goods sold, including bread, gasoline, milk, and others. The average prices of these goods are updated every 30 minutes. You want to make sure this data stays up to date so you can combine it with other data in BigQuery as cheaply as possible. What should you do?

  • A. Load the data every 30 minutes into a new partitioned table in BigQuery.
  • B. Store and update the data in a regional Google Cloud Storage bucket and create a federated data source in BigQuery
  • C. Store the data in Google Cloud Datastor
  • D. Use Google Cloud Dataflow to query BigQuery and combine the data programmatically with the data stored in Cloud Datastore
  • E. Store the data in a file in a regional Google Cloud Storage bucke
  • F. Use Cloud Dataflow to query BigQuery and combine the data programmatically with the data stored in Google Cloud Storage.

Answer: A

NEW QUESTION 17

Your team is working on a binary classification problem. You have trained a support vector machine (SVM) classifier with default parameters, and received an area under the Curve (AUC) of 0.87 on the validation set. You want to increase the AUC of the model. What should you do?

  • A. Perform hyperparameter tuning
  • B. Train a classifier with deep neural networks, because neural networks would always beat SVMs
  • C. Deploy the model and measure the real-world AUC; it’s always higher because of generalization
  • D. Scale predictions you get out of the model (tune a scaling factor as a hyperparameter) in order to get the highest AUC

Answer: D

NEW QUESTION 18

Which methods can be used to reduce the number of rows processed by BigQuery?

  • A. Splitting tables into multiple tables; putting data in partitions
  • B. Splitting tables into multiple tables; putting data in partitions; using the LIMIT clause
  • C. Putting data in partitions; using the LIMIT clause
  • D. Splitting tables into multiple tables; using the LIMIT clause

Answer: A

Explanation:
If you split a table into multiple tables (such as one table for each day), then you can limit your query to the data in specific tables (such as for particular days). A better method is to use a partitioned table, as long as your data can be separated by the day.
If you use the LIMIT clause, BigQuery will still process the entire table. Reference: https://cloud.google.com/bigquery/docs/partitioned-tables

NEW QUESTION 19

You want to automate execution of a multi-step data pipeline running on Google Cloud. The pipeline includes Cloud Dataproc and Cloud Dataflow jobs that have multiple dependencies on each other. You want to use managed services where possible, and the pipeline will run every day. Which tool should you use?

  • A. cron
  • B. Cloud Composer
  • C. Cloud Scheduler
  • D. Workflow Templates on Cloud Dataproc

Answer: D

NEW QUESTION 20

You need to create a data pipeline that copies time-series transaction data so that it can be queried from within BigQuery by your data science team for analysis. Every hour, thousands of transactions are updated with a new status. The size of the intitial dataset is 1.5 PB, and it will grow by 3 TB per day. The data is heavily structured, and your data science team will build machine learning models based on this data. You want to maximize performance and usability for your data science team. Which two strategies should you adopt? Choose 2 answers.

  • A. Denormalize the data as must as possible.
  • B. Preserve the structure of the data as much as possible.
  • C. Use BigQuery UPDATE to further reduce the size of the dataset.
  • D. Develop a data pipeline where status updates are appended to BigQuery instead of updated.
  • E. Copy a daily snapshot of transaction data to Cloud Storage and store it as an Avro fil
  • F. Use BigQuery’ssupport for external data sources to query.

Answer: DE

NEW QUESTION 21

Which Java SDK class can you use to run your Dataflow programs locally?

  • A. LocalRunner
  • B. DirectPipelineRunner
  • C. MachineRunner
  • D. LocalPipelineRunner

Answer: B

Explanation:
DirectPipelineRunner allows you to execute operations in the pipeline directly, without any optimization. Useful for small local execution and tests
Reference:
https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/runners/DirectPipelineRun

NEW QUESTION 22

Your company has a hybrid cloud initiative. You have a complex data pipeline that moves data between cloud provider services and leverages services from each of the cloud providers. Which cloud-native service should you use to orchestrate the entire pipeline?

  • A. Cloud Dataflow
  • B. Cloud Composer
  • C. Cloud Dataprep
  • D. Cloud Dataproc

Answer: D

NEW QUESTION 23

You are deploying 10,000 new Internet of Things devices to collect temperature data in your warehouses globally. You need to process, store and analyze these very large datasets in real time. What should you do?

  • A. Send the data to Google Cloud Datastore and then export to BigQuery.
  • B. Send the data to Google Cloud Pub/Sub, stream Cloud Pub/Sub to Google Cloud Dataflow, and store the data in Google BigQuery.
  • C. Send the data to Cloud Storage and then spin up an Apache Hadoop cluster as needed in Google Cloud Dataproc whenever analysis is required.
  • D. Export logs in batch to Google Cloud Storage and then spin up a Google Cloud SQL instance, import the data from Cloud Storage, and run an analysis as needed.

Answer: B

NEW QUESTION 24
......

Thanks for reading the newest Professional-Data-Engineer exam dumps! We recommend you to try the PREMIUM 2passeasy Professional-Data-Engineer dumps in VCE and PDF here: https://www.2passeasy.com/dumps/Professional-Data-Engineer/ (239 Q&As Dumps)