
Q1
Q1 When should feature extraction be used instead of feature selection?
When raw features are sufficient
When features need transformation
When data is balanced
When model accuracy is high
Q2
Q2 Which scikit-learn function is used to normalize data?
normalize()
standardize()
scale()
transform()
Q3
Q3 How do you perform one-hot encoding in Pandas?
pd.one_hot()
pd.dummies()
pd.categorical()
pd.encoding()
Q4
Q4 Which method in scikit-learn is used for dimensionality reduction?
PCA()
StandardScaler()
KMeans()
OneHotEncoder()
Q5
Q5 A dataset has highly correlated features. How should you handle this issue?
Normalize features
Drop one of the correlated features
Encode features
Use PCA
Q6
Q6 A numerical feature has a skewed distribution. What transformation can address this?
Log transformation
Drop the feature
One-hot encoding
Normalize values
Q7
Q7 A dataset has missing values for important features. What is the best approach to address this?
Remove the rows
Impute values
Drop the feature
Ignore the missing data
Q8
Q8 What is a key characteristic of time series data?
Random observations
Data without timestamps
Sequential observations over time
Categorical data
Q9
Q9 Which of the following is commonly used to detect seasonality in time series data?
Histogram
Autocorrelation
Scatter plot
PCA
Q10
Q10 Why is stationarity important in time series analysis?
It ensures data completeness
It stabilizes variance
It allows for accurate forecasting
It reduces data size
Q11
Q11 What is the purpose of differencing in time series preprocessing?
To detect seasonality
To remove trend and make data stationary
To visualize data
To encode features
Q12
Q12 Which metric is commonly used to evaluate the accuracy of a time series model?
Precision
Mean Absolute Error (MAE)
Silhouette Score
Log Loss
Q13
Q13 Which Python library provides the seasonal_decompose function for analyzing time series components?
Pandas
NumPy
statsmodels
Matplotlib
Q14
Q14 How do you plot a time series in Pandas?
plt.plot(time_series)
time_series.plot()
pd.plot(time_series)
plot(time_series)
Q15
Q15 Which method is used in statsmodels to fit an ARIMA model for time series forecasting?
fit_arima()
arima_fit()
ARIMA().fit()
forecast_arima()
Q16
Q16 A time series dataset shows an upward trend. What preprocessing step is necessary before modeling?
One-hot encoding
Differencing
Scaling
Normalizing
Q17
Q17 A time series forecast consistently underestimates values during high seasons. What could be the issue?
Incorrect seasonality handling
Overfitting
Underfitting
Missing timestamps
Q18
Q18 What is the main goal of Natural Language Processing?
Analyzing numerical data
Understanding and processing human language
Creating images
Performing clustering
Q19
Q19 Which of the following tasks is NOT part of Natural Language Processing?
Sentiment analysis
Speech recognition
Image classification
Text summarization
Q20
Q20 What is tokenization in NLP?
Dividing text into words or subwords
Encoding numerical data
Creating embeddings
Reducing noise in data
Q21
Q21 What is the purpose of stopword removal in text preprocessing?
To normalize text
To reduce dimensionality
To remove common but insignificant words
To correct spelling
Q22
Q22 What is a bag-of-words representation in NLP?
A numerical representation of text
A method to remove stopwords
A type of neural network
A clustering algorithm
Q23
Q23 Which library provides the word_tokenize function for tokenization in Python?
NumPy
NLTK
Pandas
Scikit-learn
Q24
Q24 How do you create a term frequency-inverse document frequency (TF-IDF) matrix in scikit-learn?
TfidfVectorizer.fit_transform()
CountVectorizer.fit_transform()
TfidfTransformer.fit()
transform_TF()
Q25
Q25 Which Python library provides pre-trained word embeddings like Word2Vec?
NLTK
Gensim
Pandas
SpaCy
Q26
Q26 A text classification model performs poorly due to high-dimensional feature space. What preprocessing step can help?
Normalization
Dimensionality reduction
Feature extraction
Stopword removal
Q27
Q27 A sentiment analysis model misclassifies reviews with negations (e.g., "not good"). What could address this?
Using n-grams
Stopword removal
Bag-of-words
TF-IDF
Q28
Q28 Which tool is primarily used for creating interactive and shareable notebooks for data analysis?
RStudio
Jupyter Notebook
PyCharm
Tableau
Q29
Q29 Which library in Python is most commonly used for data manipulation and analysis?
Matplotlib
Pandas
SciPy
NumPy
Q30
Q30 What is the main use of R in Data Science?
Data visualization and statistical analysis
Deep learning
Web development
API creation

