Question 1

A company has an encrypted Amazon Redshift cluster. The company recently enabled Amazon Redshift audit logs and needs to ensure that the audit logs are also encrypted at rest. The logs are retained for 1 year. The auditor queries the logs once a month. What is the MOST cost-effective way to meet these requirements?

Accepted Answer

A)   Encrypt the Amazon S3 bucket where the logs are stored by using AWS Key Management Service (AWS KMS) . Copy the data into the Amazon Redshift cluster from Amazon S3 on a daily basis. Query the data as required.
B)   Disable encryption on the Amazon Redshift cluster, configure audit logging, and encrypt the Amazon Redshift cluster. Use Amazon Redshift Spectrum to query the data as required.
C)   Enable default encryption on the Amazon S3 bucket where the logs are stored by using AES-256 encryption. Copy the data into the Amazon Redshift cluster from Amazon S3 on a daily basis. Query the data as required.
D)   Enable default encryption on the Amazon S3 bucket where the logs are stored by using AES-256 encryption. Use Amazon Redshift Spectrum to query the data as required.

Question 2

A company has collected more than 100 TB of log files in the last 24 months. The files are stored as raw text in a dedicated Amazon S3 bucket. Each object has a key of the form year-month-day_log_HHmmss.txt where HHmmss represents the time the log file was initially created. A table was created in Amazon Athena that points to the S3 bucket. One-time queries are run against a subset of columns in the table several times an hour. A data analyst must make changes to reduce the cost of running these queries. Management wants a solution with minimal maintenance overhead. Which combination of steps should the data analyst take to meet these requirements? (Choose three.)

Accepted Answer

A)   Convert the log files to Apace Avro format.
B)   Add a key prefix of the form date=year-month-day/ to the S3 objects to partition the data.
C)   Convert the log files to Apache Parquet format.
D)   Add a key prefix of the form year-month-day/ to the S3 objects to partition the data.
E)   Drop and recreate the table with the PARTITIONED BY clause. Run the ALTER TABLE ADD PARTITION statement.F)  Drop and recreate the table with the PARTITIONED BY clause. Run the MSCK REPAIR TABLE statement.

Question 3

A media analytics company consumes a stream of social media posts. The posts are sent to an Amazon Kinesis data stream partitioned on user_id. An AWS Lambda function retrieves the records and validates the content before loading the posts into an Amazon Elasticsearch Service (Amazon ES)  cluster. The validation process needs to receive the posts for a given user in the order they were received by the Kinesis data stream. During peak hours, the social media posts take more than an hour to appear in the Amazon ES cluster. A data analytics specialist must implement a solution that reduces this latency with the least possible operational overhead. Which solution meets these requirements?

Accepted Answer

A)   Migrate the validation process from Lambda to AWS Glue.
B)   Migrate the Lambda consumers from standard data stream iterators to an HTTP/2 stream consumer.
C)   Increase the number of shards in the Kinesis data stream.
D)   Send the posts stream to Amazon Managed Streaming for Apache Kafka instead of the Kinesis data stream.

Question 4

A power utility company is deploying thousands of smart meters to obtain real-time updates about power consumption. The company is using Amazon Kinesis Data Streams to collect the data streams from smart meters. The consumer application uses the Kinesis Client Library (KCL)  to retrieve the stream data. The company has only one consumer application. The company observes an average of 1 second of latency from the moment that a record is written to the stream until the record is read by a consumer application. The company must reduce this latency to 500 milliseconds. Which solution meets these requirements?

Accepted Answer

A)   Use enhanced fan-out in Kinesis Data Streams.
B)   Increase the number of shards for the Kinesis data stream.
C)   Reduce the propagation delay by overriding the KCL default settings.
D)   Develop consumers by using Amazon Kinesis Data Firehose.

Question 5

A retail company is building its data warehouse solution using Amazon Redshift. As a part of that effort, the company is loading hundreds of files into the fact table created in its Amazon Redshift cluster. The company wants the solution to achieve the highest throughput and optimally use cluster resources when loading data into the company's fact table. How should the company meet these requirements?

Accepted Answer

A)   Use multiple COPY commands to load the data into the Amazon Redshift cluster.
B)   Use S3DistCp to load multiple files into the Hadoop Distributed File System (HDFS)  and use an HDFS connector to ingest the data into the Amazon Redshift cluster.
C)   Use LOAD commands equal to the number of Amazon Redshift cluster nodes and load the data in parallel into each node.
D)   Use a single COPY command to load the data into the Amazon Redshift cluster. B

Question 6

A human resources company maintains a 10-node Amazon Redshift cluster to run analytics queries on the company's data. The Amazon Redshift cluster contains a product table and a transactions table, and both tables have a product_sku column. The tables are over 100 GB in size. The majority of queries run on both tables. Which distribution style should the company use for the two tables to achieve optimal query performance?

Accepted Answer

A)   An EVEN distribution style for both tables
B)   A KEY distribution style for both tables
C)   An ALL distribution style for the product table and an EVEN distribution style for the transactions table
D)   An EVEN distribution style for the product table and an KEY distribution style for the transactions table B

Question 7

A technology company is creating a dashboard that will visualize and analyze time-sensitive data. The data will come in through Amazon Kinesis Data Firehose with the butter interval set to 60 seconds. The dashboard must support near-real-time data. Which visualization solution will meet these requirements?

Accepted Answer

A)   Select Amazon Elasticsearch Service (Amazon ES)  as the endpoint for Kinesis Data Firehose. Set up a Kibana dashboard using the data in Amazon ES with the desired analyses and visualizations.
B)   Select Amazon S3 as the endpoint for Kinesis Data Firehose. Read data into an Amazon SageMaker Jupyter notebook and carry out the desired analyses and visualizations.
C)   Select Amazon Redshift as the endpoint for Kinesis Data Firehose. Connect Amazon QuickSight with SPICE to Amazon Redshift to create the desired analyses and visualizations.
D)   Select Amazon S3 as the endpoint for Kinesis Data Firehose. Use AWS Glue to catalog the data and Amazon Athena to query it. Connect Amazon QuickSight with SPICE to Athena to create the desired analyses and visualizations.

Question 8

A company wants to use an automatic machine learning (ML)  Random Cut Forest (RCF)  algorithm to visualize complex real-word scenarios, such as detecting seasonality and trends, excluding outers, and imputing missing values. The team working on this project is non-technical and is looking for an out-of-the-box solution that will require the LEAST amount of management overhead. Which solution will meet these requirements?

Accepted Answer

A)   Use an AWS Glue ML transform to create a forecast and then use Amazon QuickSight to visualize the data.
B)   Use Amazon QuickSight to visualize the data and then use ML-powered forecasting to forecast the key business metrics.
C)   Use a pre-build ML AMI from the AWS Marketplace to create forecasts and then use Amazon QuickSight to visualize the data.
D)   Use calculated fields to create a new forecast and then use Amazon QuickSight to visualize the data.

Question 9

A global company has different sub-organizations, and each sub-organization sells its products and services in various countries. The company's senior leadership wants to quickly identify which sub-organization is the strongest performer in each country. All sales data is stored in Amazon S3 in Parquet format. Which approach can provide the visuals that senior leadership requested with the least amount of effort?

Accepted Answer

A)   Use Amazon QuickSight with Amazon Athena as the data source. Use heat maps as the visual type.
B)   Use Amazon QuickSight with Amazon S3 as the data source. Use heat maps as the visual type.
C)   Use Amazon QuickSight with Amazon Athena as the data source. Use pivot tables as the visual type.
D)   Use Amazon QuickSight with Amazon S3 as the data source. Use pivot tables as the visual type. C

Question 10

A company stores its sales and marketing data that includes personally identifiable information (PII)  in Amazon S3. The company allows its analysts to launch their own Amazon EMR cluster and run analytics reports with the data. To meet compliance requirements, the company must ensure the data is not publicly accessible throughout this process. A data engineer has secured Amazon S3 but must ensure the individual EMR clusters created by the analysts are not exposed to the public internet. Which solution should the data engineer to meet this compliance requirement with LEAST amount of effort?

Accepted Answer

A)   Create an EMR security configuration and ensure the security configuration is associated with the EMR clusters when they are created.
B)   Check the security group of the EMR clusters regularly to ensure it does not allow inbound traffic from IPv4 0.0.0.0/0 or IPv6 ::/0.
C)   Enable the block public access setting for Amazon EMR at the account level before any EMR cluster is created.
D)   Use AWS WAF to block public internet access to the EMR clusters across the board.

Question 11

A large telecommunications company is planning to set up a data catalog and metadata management for multiple data sources running on AWS. The catalog will be used to maintain the metadata of all the objects stored in the data stores. The data stores are composed of structured sources like Amazon RDS and Amazon Redshift, and semistructured sources like JSON and XML files stored in Amazon S3. The catalog must be updated on a regular basis, be able to detect the changes to object metadata, and require the least possible administration. Which solution meets these requirements?

Accepted Answer

A)   Use Amazon Aurora as the data catalog. Create AWS Lambda functions that will connect and gather the metadata information from multiple sources and update the data catalog in Aurora. Schedule the Lambda functions periodically.
B)   Use the AWS Glue Data Catalog as the central metadata repository. Use AWS Glue crawlers to connect to multiple data stores and update the Data Catalog with metadata changes. Schedule the crawlers periodically to update the metadata catalog.
C)   Use Amazon DynamoDB as the data catalog. Create AWS Lambda functions that will connect and gather the metadata information from multiple sources and update the DynamoDB catalog. Schedule the Lambda functions periodically.
D)   Use the AWS Glue Data Catalog as the central metadata repository. Extract the schema for RDS and Amazon Redshift sources and build the Data Catalog. Use AWS crawlers for data stored in Amazon S3 to infer the schema and automatically update the Data Catalog.

Question 12

A regional energy company collects voltage data from sensors attached to buildings. To address any known dangerous conditions, the company wants to be alerted when a sequence of two voltage drops is detected within 10 minutes of a voltage spike at the same building. It is important to ensure that all messages are delivered as quickly as possible. The system must be fully managed and highly available. The company also needs a solution that will automatically scale up as it covers additional cites with this monitoring feature. The alerting system is subscribed to an Amazon SNS topic for remediation. Which solution meets these requirements?

Accepted Answer

A)   Create an Amazon Managed Streaming for Kafka cluster to ingest the data, and use an Apache Spark Streaming with Apache Kafka consumer API in an automatically scaled Amazon EMR cluster to process the incoming data. Use the Spark Streaming application to detect the known event sequence and send the SNS message.
B)   Create a REST-based web service using Amazon API Gateway in front of an AWS Lambda function. Create an Amazon RDS for PostgreSQL database with sufficient Provisioned IOPS (PIOPS) . In the Lambda function, store incoming events in the RDS database and query the latest data to detect the known event sequence and send the SNS message.
C)   Create an Amazon Kinesis Data Firehose delivery stream to capture the incoming sensor data. Use an AWS Lambda transformation function to detect the known event sequence and send the SNS message.
D)   Create an Amazon Kinesis data stream to capture the incoming sensor data and create another stream for alert messages. Set up AWS Application Auto Scaling on both. Create a Kinesis Data Analytics for Java application to detect the known event sequence, and add a message to the message stream. Configure an AWS Lambda function to poll the message stream and publish to the SNS topic.

Question 13

A company has an application that ingests streaming data. The company needs to analyze this stream over a 5-minute timeframe to evaluate the stream for anomalies with Random Cut Forest (RCF)  and summarize the current count of status codes. The source and summarized data should be persisted for future use. Which approach would enable the desired outcome while keeping data persistence costs low?

Accepted Answer

A)   Ingest the data stream with Amazon Kinesis Data Streams. Have an AWS Lambda consumer evaluate the stream, collect the number status codes, and evaluate the data against a previously trained RCF model. Persist the source and results as a time series to Amazon DynamoDB.
B)   Ingest the data stream with Amazon Kinesis Data Streams. Have a Kinesis Data Analytics application evaluate the stream over a 5-minute window using the RCF function and summarize the count of status codes. Persist the source and results to Amazon S3 through output delivery to Kinesis Data Firehouse.
C)   Ingest the data stream with Amazon Kinesis Data Firehose with a delivery frequency of 1 minute or 1 MB in Amazon S3. Ensure Amazon S3 triggers an event to invoke an AWS Lambda consumer that evaluates the batch data, collects the number status codes, and evaluates the data against a previously trained RCF model. Persist the source and results as a time series to Amazon DynamoDB.
D)   Ingest the data stream with Amazon Kinesis Data Firehose with a delivery frequency of 5 minutes or 1 MB into Amazon S3. Have a Kinesis Data Analytics application evaluate the stream over a 1-minute window using the RCF function and summarize the count of status codes. Persist the results to Amazon S3 through a Kinesis Data Analytics output to an AWS Lambda integration.

Question 14

A company leverages Amazon Athena for ad-hoc queries against data stored in Amazon S3. The company wants to implement additional controls to separate query execution and query history among users, teams, or applications running in the same AWS account to comply with internal security policies. Which solution meets these requirements?

Accepted Answer

A)   Create an S3 bucket for each given use case, create an S3 bucket policy that grants permissions to appropriate individual IAM users. and apply the S3 bucket policy to the S3 bucket.
B)   Create an Athena workgroup for each given use case, apply tags to the workgroup, and create an IAM policy using the tags to apply appropriate permissions to the workgroup.
C)   Create an IAM role for each given use case, assign appropriate permissions to the role for the given use case, and add the role to associate the role with Athena.
D)   Create an AWS Glue Data Catalog resource policy for each given use case that grants permissions to appropriate individual IAM users, and apply the resource policy to the specific tables used by Athena.

Question 15

A large company receives files from external parties in Amazon EC2 throughout the day. At the end of the day, the files are combined into a single file, compressed into a gzip file, and uploaded to Amazon S3. The total size of all the files is close to 100 GB daily. Once the files are uploaded to Amazon S3, an AWS Batch program executes a COPY command to load the files into an Amazon Redshift cluster. Which program modification will accelerate the COPY process?

Accepted Answer

A)   Upload the individual files to Amazon S3 and run the COPY command as soon as the files become available.
B)   Split the number of files so they are equal to a multiple of the number of slices in the Amazon Redshift cluster. Gzip and upload the files to Amazon S3. Run the COPY command on the files.
C)   Split the number of files so they are equal to a multiple of the number of compute nodes in the Amazon Redshift cluster. Gzip and upload the files to Amazon S3. Run the COPY command on the files.
D)   Apply sharding by breaking up the files so the distkey columns with the same values go to the same file. Gzip and upload the sharded files to Amazon S3. Run the COPY command on the files.

Question 16

A banking company is currently using an Amazon Redshift cluster with dense storage (DS)  nodes to store sensitive data. An audit found that the cluster is unencrypted. Compliance requirements state that a database with sensitive data must be encrypted through a hardware security module (HSM)  with automated key rotation. Which combination of steps is required to achieve compliance? (Choose two.)

Accepted Answer

A)   Set up a trusted connection with HSM using a client and server certificate with automatic key rotation.
B)   Modify the cluster with an HSM encryption option and automatic key rotation.
C)   Create a new HSM-encrypted Amazon Redshift cluster and migrate the data to the new cluster.
D)   Enable HSM with key rotation through the AWS CLI.
E)   Enable Elliptic Curve Diffie-Hellman Ephemeral (ECDH
E)   encryption in the HSM.

Question 17

A company is streaming its high-volume billing data (100 MBps)  to Amazon Kinesis Data Streams. A data analyst partitioned the data on account_id to ensure that all records belonging to an account go to the same Kinesis shard and order is maintained. While building a custom consumer using the Kinesis Java SDK, the data analyst notices that, sometimes, the messages arrive out of order for account_id. Upon further investigation, the data analyst discovers the messages that are out of order seem to be arriving from different shards for the same account_id and are seen when a stream resize runs. What is an explanation for this behavior and what is the solution?

Accepted Answer

A)   There are multiple shards in a stream and order needs to be maintained in the shard. The data analyst needs to make sure there is only a single shard in the stream and no stream resize runs.
B)   The hash key generation process for the records is not working correctly. The data analyst should generate an explicit hash key on the producer side so the records are directed to the appropriate shard accurately.
C)   The records are not being received by Kinesis Data Streams in order. The producer should use the PutRecords API call instead of the PutRecord API call with the SequenceNumberForOrdering parameter.
D)   The consumer is not processing the parent shard completely before processing the child shards after a stream resize. The data analyst should process the parent shard completely first before processing the child shards.

Question 18

A manufacturing company wants to create an operational analytics dashboard to visualize metrics from equipment in near-real time. The company uses Amazon Kinesis Data Streams to stream the data to other applications. The dashboard must automatically refresh every 5 seconds. A data analytics specialist must design a solution that requires the least possible implementation effort. Which solution meets these requirements?

Accepted Answer

A)   Use Amazon Kinesis Data Firehose to store the data in Amazon S3. Use Amazon QuickSight to build the dashboard.
B)   Use Apache Spark Streaming on Amazon EMR to read the data in near-real time. Develop a custom application for the dashboard by using D3.js.
C)   Use Amazon Kinesis Data Firehose to push the data into an Amazon Elasticsearch Service (Amazon ES)  cluster. Visualize the data by using a Kibana dashboard.
D)   Use AWS Glue streaming ETL to store the data in Amazon S3. Use Amazon QuickSight to build the dashboard.

Question 19

A media company is using Amazon QuickSight dashboards to visualize its national sales data. The dashboard is using a dataset with these fields: ID, date, time_zone, city, state, country, longitude, latitude, sales_volume, and number_of_items. To modify ongoing campaigns, the company wants an interactive and intuitive visualization of which states across the country recorded a significantly lower sales volume compared to the national average. Which addition to the company's QuickSight dashboard will meet this requirement?

Accepted Answer

A)   A geospatial color-coded chart of sales volume data across the country.
B)   A pivot table of sales volume data summed up at the state level.
C)   A drill-down layer for state-level sales volume data.
D)   A drill through to other dashboards containing state-level sales volume data.

Question 20

A company uses the Amazon Kinesis SDK to write data to Kinesis Data Streams. Compliance requirements state that the data must be encrypted at rest using a key that can be rotated. The company wants to meet this encryption requirement with minimal coding effort. How can these requirements be met?

Accepted Answer

A)   Create a customer master key (CMK)  in AWS KMS. Assign the CMK an alias. Use the AWS Encryption SDK, providing it with the key alias to encrypt and decrypt the data.
B)   Create a customer master key (CMK)  in AWS KMS. Assign the CMK an alias. Enable server-side encryption on the Kinesis data stream using the CMK alias as the KMS master key.
C)   Create a customer master key (CMK)  in AWS KMS. Create an AWS Lambda function to encrypt and decrypt the data. Set the KMS key ID in the function's environment variables.
D)   Enable server-side encryption on the Kinesis data stream using the default KMS key for Kinesis Data Streams.

Exam 4: AWS Certified Data Analytics - Specialty (DAS-C01)

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified
B

Correct Answer
verified
B

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified
C

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

A company uses the Amazon Kinesis SDK to write data to Kinesis Data Streams. Compliance requirements state that the data must be encrypted at rest using a key that can be rotated. The company wants to meet this encryption requirement with minimal coding effort. How can these requirements be met?

Correct Answer
verified

Exam 4: AWS Certified Data Analytics - Specialty (DAS-C01)

Correct AnswerverifiedShow Answer

Correct AnswerverifiedShow Answer

Correct AnswerverifiedShow Answer

Correct AnswerverifiedShow Answer

Correct AnswerverifiedB

Correct AnswerverifiedB

Correct AnswerverifiedShow Answer

Correct AnswerverifiedShow Answer

Correct AnswerverifiedC

Correct AnswerverifiedShow Answer

Correct AnswerverifiedShow Answer

Correct AnswerverifiedShow Answer

Correct AnswerverifiedShow Answer

Correct AnswerverifiedShow Answer

Correct AnswerverifiedShow Answer

Correct AnswerverifiedShow Answer

Correct AnswerverifiedShow Answer

Correct AnswerverifiedShow Answer

Correct AnswerverifiedShow Answer

A company uses the Amazon Kinesis SDK to write data to Kinesis Data Streams. Compliance requirements state that the data must be encrypted at rest using a key that can be rotated. The company wants to meet this encryption requirement with minimal coding effort. How can these requirements be met?

Correct AnswerverifiedShow Answer

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified
B

Correct Answer
verified
B

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified
C

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified