an HCL GUVI product

data science banner

Data Science Multiple Choice Questions (MCQs) and Answers

Master Data Science with Practice MCQs. Explore our curated collection of Multiple Choice Questions. Ideal for placement and interview preparation, our questions range from basic to advanced, ensuring comprehensive coverage of Data Science concepts. Begin your placement preparation journey now!

Q91

Q91 What is the primary goal of Exploratory Data Analysis?

A

Predict outcomes

B

Summarize data characteristics

C

Visualize predictions

D

Build models

Q92

Q92 Which of the following is a common technique used during EDA?

A

Clustering

B

PCA

C

Descriptive statistics

D

Feature selection

Q93

Q93 What is the significance of identifying skewness in data during EDA?

A

It helps in feature scaling

B

It determines model type

C

It affects data distribution assumptions

D

It improves visualization

Q94

Q94 Which visualization is best suited for analyzing the relationship between two numerical variables?

A

Histogram

B

Boxplot

C

Scatter plot

D

Bar chart

Q95

Q95 Why is it critical to detect multicollinearity during EDA?

A

It improves model accuracy

B

It ensures independence among predictors

C

It removes missing values

D

It selects important features

Q96

Q96 Which Python library is used for creating basic visualizations such as line and bar charts?

A

NumPy

B

Pandas

C

Matplotlib

D

Seaborn

Q97

Q97 How do you compute the correlation matrix for a DataFrame in Python?

A

df.corr()

B

df.describe()

C

df.cov()

D

df.plot()

Q98

Q98 Which visualization technique is useful for identifying clusters in data during EDA?

A

Scatter plot

B

Heatmap

C

Boxplot

D

Pairplot

Q99

Q99 If a dataset contains missing values in a column, what is the simplest way to visualize its impact?

A

Use a scatter plot

B

Use a heatmap

C

Drop the column

D

Fill missing values

Q100

Q100 A dataset shows a perfect correlation of +1 between two variables. What is the likely issue?

A

Multicollinearity

B

Outliers

C

No issue

D

Wrong visualization

Q101

Q101 During EDA, an outlier is identified in a boxplot. What is the best course of action?

A

Remove the outlier

B

Keep the outlier

C

Investigate the outlier

D

Ignore the outlier

Q102

Q102 What is the primary purpose of hypothesis testing in statistics?

A

To clean data

B

To test assumptions

C

To visualize trends

D

To encode features

Q103

Q103 Which statistical measure represents the spread of data values around the mean?

A

Variance

B

Mean

C

Median

D

Skewness

Q104

Q104 When is the p-value considered statistically significant in hypothesis testing?

A

When p > 0.05

B

When p < 0.05

C

When p = 0.1

D

When p > 1

Q105

Q105 What does the standard deviation indicate in a dataset?

A

The central tendency

B

The variability

C

The skewness

D

The correlation

Q106

Q106 What type of statistical analysis helps identify relationships between variables?

A

Correlation analysis

B

Variance analysis

C

Skewness analysis

D

Descriptive statistics

Q107

Q107 What assumption is made about data in a parametric statistical test?

A

Data is categorical

B

Data follows a normal distribution

C

Data has no missing values

D

Data is continuous

Q108

Q108 Which Python library provides the ttest_ind function for hypothesis testing?

A

Pandas

B

NumPy

C

SciPy

D

Matplotlib

Q109

Q109 How can you calculate the mean of a column in a Pandas DataFrame?

A

df.column.mean()

B

df.mean(column)

C

mean(df.column)

D

df.column.calc_mean()

Q110

Q110 A dataset has a column with skewed numerical data. What is the best approach to normalize it?

A

Use log transformation

B

Drop outliers

C

Encode values

D

Use boxplot

Q111

Q111 A dataset's p-value is 0.01 after running a statistical test. What does this imply?

A

Strong evidence against the null hypothesis

B

No evidence against the null hypothesis

C

Data is normally distributed

D

Data has no variance

Q112

Q112 After standardizing data, the z-scores of a column are very high. What could be the issue?

A

Incorrect scaling

B

Outliers

C

Data is normalized

D

No issue

Q113

Q113 What is the primary purpose of data visualization?

A

To analyze data

B

To predict outcomes

C

To represent data visually

D

To encode data

Q114

Q114 Which visualization is best suited for showing data distribution?

A

Line chart

B

Scatter plot

C

Histogram

D

Pie chart

Q115

Q115 Which chart is most effective for comparing parts of a whole?

A

Scatter plot

B

Pie chart

C

Boxplot

D

Line chart

Q116

Q116 What does a boxplot help identify in a dataset?

A

Outliers

B

Correlations

C

Clusters

D

Trends

Q117

Q117 Which of the following is a common mistake in data visualization?

A

Using appropriate scales

B

Choosing the right chart type

C

Overloading charts with data

D

Labeling axes

Q118

Q118 Which Matplotlib function is used to create a simple line chart?

A

plt.scatter()

B

plt.line()

C

plt.plot()

D

plt.bar()

Q119

Q119 How do you create a bar chart in Matplotlib?

A

plt.bar(x, y)

B

plt.plot(x, y)

C

plt.hist(x)

D

plt.scatter(x, y)

Q120

Q120 Which Python library allows for creating highly interactive visualizations with minimal coding?

A

Seaborn

B

Matplotlib

C

Plotly

D

Pandas