
Q61
Q61 What is Data Science primarily focused on?
Data storage
Data visualization
Insight extraction
App development
Q62
Q62 Which of the following is a key aspect of data science?
Building dashboards
Cleaning and analyzing data
Developing web pages
Writing blogs
Q63
Q63 What type of data does Data Science primarily handle?
Only structured
Only unstructured
Both structured and unstructured
None of the above
Q64
Q64 Which of these domains does Data Science NOT directly involve?
Machine learning
Database optimization
Statistics
Data visualization
Q65
Q65 What is a key challenge faced in Data Science projects?
Lack of storage
Model overfitting
Manual calculations
System downtime
Q66
Q66 What role does domain expertise play in Data Science?
It is optional
It provides data storage solutions
It helps understand data context
It prevents coding errors
Q67
Q67 Which of the following is a critical component of a Data Science pipeline?
Web hosting
Feature selection
Presentation design
Software installation
Q68
Q68 In Python, which library is commonly used for numerical computations in Data Science?
NumPy
Matplotlib
Flask
Pandas
Q69
Q69 A Data Scientist receives a dataset with duplicate entries. What is the simplest way to handle this in Pandas?
drop_duplicates()
remove_duplicates()
dropna()
fillna()
Q70
Q70 What is the first step in the Data Science Life Cycle?
Model Building
Data Cleaning
Problem Definition
Evaluation
Q71
Q71 Which phase in the Data Science Life Cycle involves cleaning and preparing data for analysis?
Model Evaluation
Data Cleaning
Data Analysis
Visualization
Q72
Q72 Which step in the Data Science Life Cycle involves determining if the model meets project objectives?
Data Collection
Model Deployment
Evaluation
Visualization
Q73
Q73 What happens during the Data Collection phase of the Data Science Life Cycle?
Data is stored in a database
Data is gathered from multiple sources
Data is split into training and test sets
Data is discarded
Q74
Q74 Which step in the Data Science Life Cycle involves feature engineering and transformation?
Problem Definition
Data Cleaning
Data Preparation
Evaluation
Q75
Q75 Why is the deployment phase critical in the Data Science Life Cycle?
It ensures the model is trained
It makes the model accessible for users
It removes irrelevant data
It generates reports
Q76
Q76 What is a major challenge during the evaluation phase of the Data Science Life Cycle?
Selecting the right metric
Collecting data
Training models
Understanding business goals
Q77
Q77 In Python, which library is commonly used for splitting datasets during the Data Preparation phase?
scikit-learn
NumPy
Pandas
Matplotlib
Q78
Q78 A Data Scientist’s model performs poorly in production compared to testing. What could be the most likely cause?
Overfitting
Clean data
Balanced dataset
Simple model
Q79
Q79 What is the primary goal of data cleaning in Data Science?
To remove duplicates
To visualize data
To identify and fix data quality issues
To split data
Q80
Q80 Why is handling missing values important during data preprocessing?
It ensures model interpretability
It improves model accuracy
It increases data storage
It simplifies code
Q81
Q81 Which technique can be used to handle outliers in numerical data?
Removing them
Normalizing data
Imputation
All of the above
Q82
Q82 What is the effect of standardization in data preprocessing?
It removes duplicates
It ensures data values are centered around zero
It improves data cleaning
It removes missing values
Q83
Q83 Which preprocessing step ensures categorical variables are suitable for numerical models?
Scaling
One-hot encoding
Outlier detection
Normalization
Q84
Q84 When dealing with a dataset containing multiple irrelevant features, which method is most effective?
Data cleaning
Feature selection
One-hot encoding
Standardization
Q85
Q85 In Python, which Pandas method removes rows with missing values?
drop_duplicates()
dropna()
fillna()
replace()
Q86
Q86 How do you replace missing values in a Pandas DataFrame column with the mean of that column?
df.fillna(df.mean())
df.mean().replace()
df.replace_mean()
df.fill(df.mean())
Q87
Q87 Which Python library is best suited for outlier detection using clustering techniques?
scikit-learn
NumPy
Pandas
Matplotlib
Q88
Q88 A dataset has duplicate rows causing issues in analysis. Which Pandas method will you use to fix this?
drop_duplicates()
dropna()
fillna()
groupby()
Q89
Q89 A column contains both numerical and non-numerical values. How should you preprocess it for numerical analysis?
Drop the column
Impute missing values
Use encoding techniques
Normalize data
Q90
Q90 After standardizing a dataset, a model performs poorly. What could be a possible issue?
Data leakage
Overfitting
Outliers
Incorrect scaling

