Snowflake SnowPro Advanced Data Scientist Exam - Questions and Answers

Question 1

Which characteristic applies to Snowpark Python stored procedures?

A. They must return a value.
B. They need to have session as an input.
C. They have to contain a process method
D. They cannot contain Snowpark API calls.

Answer : B

Question 2

Which statement will add a column based on the following condition:
A Data Scientist wants to change Product 1 to P1, Product 1A to P1A, and Product 1B to P1B. If none of those are met, the record should be Not P1.

A.
B.
C.
D.

Answer : D

Question 3

For which Snowflake Cortex LLM functions would both input and output tokens be counted? (Choose three.)

A. SUMMARIZE
B. SENTIMENT
C. TRANSLATE
D. EMBED_TEXT_768
E. EMBED_TEXT_1024
F. CLASSIFY_TEXT

Answer : ACF

Question 4

A company’s platform team wants to integrate their existing data lake with Snowflake. The data lake is hundreds of TBs in size and the team does not want to duplicate most of the data into Snowflake. A Data Scientist at the company wants to be able to query and access the data lake’s metadata. There is already an external stage in Snowflake referencing the data lake’s location.
What is the MOST efficient way to integrate the existing data lake into the Snowflake environment?

A. Move the data lake files to an internal stage in Snowflake to allow for access to the metadata and data from the data lake.
B. Using the existing external stage, create SELECT statements that can be run on-demand, referencing the files in the data lake directly.
C. Create a PIPE for each data lake table which will allow for on-demand querying of the data lake files by leveraging the Snowpipe service.
D. Create an external table for each corresponding data lake table to enable querying data stored in files in the data lake as if the data lake table was inside a database.

Answer : D

Question 5

This chart in Snowsight for a New York City ride-share bike service shows the number of trips taken to the destination borough:

A Data Scientist wants to build a classifier that predicts which borough will be the most likely destination when a trip is initiated.
Which techniques should be used to handle the class imbalance depicted in the Snowsight chart? (Choose two.)

A. Introduce regularization parameters.
B. Collapse all minority classes into a single class.
C. Undersample the Manhattan borough for training.
D. Utilize bootstrapping to synthesize additional data for the non-Manhattan boroughs.
E. Utilize Synthetic Minority Oversampling Technique (SMOTE) to oversample the non-Manhattan boroughs for training.

Answer : CE

Question 6

A remote weather sensor malfunctions and produces temperature readings higher than the normal range which was around 69.8°F (21°C).
Ignoring units, what is the correct order of the magnitude of these key measures?

A. Mean> Median > Skewness > 0
(A right skew and a mean greater than the median)
B. Median > Mean > Skewness > 0
(A right skew and a median greater than the mean)
C. Mean > Median > 0 > Skewness
(A left skew and a mean greater than the median)
D. Median > Mean > 0 > Skewness
(A left skew and a median greater than the mean)

Answer : A

Question 7

Which step of the machine learning lifecycle does hyperparameter tuning fall under?

A. Model training
B. Model deployment
C. Model validation
D. Feature engineering

Answer : A

Question 8

A Data Scientist wants to train a supervised machine learning model on a data set containing multiple numeric continuous features. During the exploration phase, the Data Scientist observed that the features are roughly normally distributed with mean and variance being different between features. The Data Scientist built the pipeline leveraging Snowpark ML.
Which class from snowflake.ml.modeling.preprocessing should the Data Scientist use to obtain features with mean zero and unit variance?

A. StandardScaler
B. MinMaxScaler
C. MaxAbsScaler
D. Normalizer

Answer : A

Question 9

A binary JAR file is outputted to score data within Snowflake.
What steps are necessary to get the scoring code functioning in Snowflake? (Choose two.)

A. Create and call a stored procedure.
B. Load the JAR file into a Snowflake stage.
C. Load the JAR file into a table as a VARIANT column.
D. Use an external function to call the JAR file.
E. Create and call a User-Defined Function (UDF).

Answer : BE

Question 10

A Data Scientist needs to build a data set using columns in multiple tables and keep it automatically updated in an incremental fashion.
How can this be done without the need for writing an INSERT or checks for changes in the required tables?

A. Materialized views
B. Views
C. Streams and tasks
D. Dynamic tables

Answer : D

Question 11

A Data Scientist is developing a real-time detection model for a call center. The data is the audio transcript of the live calls between customers and agents.
The model needs to identify if a call is abnormal so the system can send the supervisor an alert immediately. There was a negligible percentage of calls that were reviewed and flagged.
Which method should be used FIRST to separate abnormal calls?

A. Audio signal processing
B. Unsupervised learning: Clustering
C. Supervised learning: Customer segmentation
D. Supervised learning: Call agent segmentation

Answer : B

Question 12

A Data Scientist is using the snowflake.cortex.complete function to generate a response for a company’s knowledge base model.
What parameters are required? (Choose two.)

A. deadline
B. model
C. prompt
D. session
E. options

Answer : BC

Question 13

A Data Scientist executes a SQL NULL argument to a Python User-Defined Function (UDF) in a Snowflake string data type.
What will be returned as a translated Python value?

A. 1
B. False
C. None
D. An empty string

Answer : C

Question 14

A Data Scientist is building a data pipeline for a customer churn model. To enable efficient processing of the model, they add a stream to the customer table.
Which function should be used to check if the stream has new or updated data?

A. SYSTEM$STREAM_STATUS(‘MYSTREAM’)
B. SYSTEM$STREAM_HAS_DATA(‘MYSTREAM')
C. SYSTEM$STREAM_GET UPDATES (‘MYSTREAM’)
D. SYSTEM$STREAM_GET_ TABLE_TIMESTAMP (‘MYSTREAM’)

Answer : B

Question 15

This correlation matrix was created when performing feature engineering:

Which combination of variables is the MOST correlated and could possibly help with feature reduction?

A. DIS and CRIM
B. DIS and TAX
C. RAD and TAX
D. INDUS and CRIM

Answer : C

SnowPro Advanced Data Scientist DSA-C03 v1.0

Question 1

Question 2

Question 3

Question 4

Question 5

Question 6

Question 7

Question 8

Question 9

Question 10

Question 11

Question 12

Question 13

Question 14

Question 15

Talk to us!