an HCL GUVI product

data science banner

Data Science Multiple Choice Questions (MCQs) and Answers

Master Data Science with Practice MCQs. Explore our curated collection of Multiple Choice Questions. Ideal for placement and interview preparation, our questions range from basic to advanced, ensuring comprehensive coverage of Data Science concepts. Begin your placement preparation journey now!

Q121

Q121 A pie chart in Matplotlib displays incorrect proportions. What could be the issue?

A

Wrong data labels

B

Missing data

C

Incorrect sum of values

D

Invalid chart type

Q122

Q122 A scatter plot shows overlapping points, making it hard to interpret. What can improve its readability?

A

Increase marker size

B

Add jitter

C

Use smaller axes

D

Change chart type

Q123

Q123 A line chart is difficult to interpret due to too many data points. What is the best approach to simplify it?

A

Aggregate data

B

Remove the chart

C

Use larger axes

D

Switch to bar chart

Q124

Q124 What is the primary objective of machine learning?

A

To clean data

B

To make predictions based on data

C

To create databases

D

To improve hardware

Q125

Q125 Which of the following is a supervised learning algorithm?

A

K-Means

B

Decision Trees

C

DBSCAN

D

Principal Component Analysis

Q126

Q126 What is overfitting in machine learning?

A

Model performs poorly on training data

B

Model performs well on training data but poorly on new data

C

Model is too simple

D

Model has no bias

Q127

Q127 What is the purpose of a loss function in machine learning?

A

To evaluate model predictions

B

To split datasets

C

To improve visualization

D

To standardize data

Q128

Q128 Why is it important to split data into training and testing datasets?

A

To increase dataset size

B

To evaluate model performance on unseen data

C

To clean data

D

To preprocess features

Q129

Q129 Which Python library provides the train_test_split function?

A

NumPy

B

Pandas

C

scikit-learn

D

Matplotlib

Q130

Q130 How do you train a linear regression model using scikit-learn?

A

model.fit(X, y)

B

model.train(X, y)

C

model.learn(X, y)

D

model.predict(X, y)

Q131

Q131 Which scikit-learn function is used to calculate the accuracy of a classification model?

A

classification_report

B

accuracy_score

C

score

D

confusion_matrix

Q132

Q132 A model's predictions have high bias. What could be the likely issue?

A

Overfitting

B

Underfitting

C

Feature scaling

D

Incorrect testing data

Q133

Q133 A classification model achieves 99% accuracy on the training set but only 60% on the test set. What is the issue?

A

Overfitting

B

Underfitting

C

Data imbalance

D

Feature scaling

Q134

Q134 After training a regression model, the residuals show a clear pattern. What does this imply?

A

Model is accurate

B

Model assumptions are violated

C

Feature scaling is wrong

D

Data is balanced

Q135

Q135 What is the key difference between supervised and unsupervised learning?

A

Supervised uses labeled data, unsupervised does not

B

Both use labeled data

C

Both use unlabeled data

D

Unsupervised requires labels

Q136

Q136 Which of the following is an example of a supervised learning algorithm?

A

K-Means

B

Linear Regression

C

Hierarchical Clustering

D

PCA

Q137

Q137 Which task is best suited for unsupervised learning?

A

Predicting house prices

B

Identifying customer segments

C

Spam classification

D

Stock price prediction

Q138

Q138 What metric is commonly used to evaluate a regression model in supervised learning?

A

Accuracy

B

Mean Squared Error (MSE)

C

Precision

D

Silhouette score

Q139

Q139 Why is clustering considered an unsupervised learning technique?

A

It requires labeled data

B

It uses supervised models

C

It finds patterns in unlabeled data

D

It predicts outcomes

Q140

Q140 Which Python library provides the KMeans function for clustering?

A

NumPy

B

Pandas

C

scikit-learn

D

Matplotlib

Q141

Q141 How do you fit a decision tree classifier in scikit-learn?

A

model.train(X, y)

B

model.fit(X, y)

C

model.learn(X, y)

D

model.split(X, y)

Q142

Q142 Which function in scikit-learn is used to calculate the silhouette score for a clustering model?

A

silhouette_score()

B

cluster_score()

C

clustering_score()

D

silhouette_metric()

Q143

Q143 How do you specify the number of clusters in the KMeans algorithm using scikit-learn?

A

KMeans(n_clusters=n)

B

KMeans(clusters=n)

C

KMeans(n=n)

D

KMeans(n_cluster=n)

Q144

Q144 A supervised model performs poorly on unseen data. What is the likely issue?

A

Data leakage

B

Underfitting

C

Incorrect loss function

D

Missing labels

Q145

Q145 A clustering model produces inconsistent results. What could be the likely cause?

A

Wrong feature scaling

B

Labeled data

C

High accuracy

D

Balanced dataset

Q146

Q146 After applying KMeans, one cluster has very few data points. What should you consider next?

A

Increase cluster count

B

Decrease cluster count

C

Visualize clusters

D

Change the algorithm

Q147

Q147 What is the primary goal of feature engineering in machine learning?

A

Improve model interpretability

B

Reduce dataset size

C

Enhance model performance

D

Avoid overfitting

Q148

Q148 Which technique is commonly used to handle categorical data in feature engineering?

A

Normalization

B

One-hot encoding

C

PCA

D

Standardization

Q149

Q149 Why is feature scaling important in machine learning?

A

Reduces model size

B

Improves convergence during training

C

Handles missing values

D

Reduces overfitting

Q150

Q150 What is feature selection?

A

Adding new features

B

Choosing the best features

C

Removing outliers

D

Scaling data