
Q31
Q31 Which of the following is a disadvantage of Jupyter Notebooks?
Lack of interactivity
No real-time collaboration
Limited coding features
Requires high computational resources
Q32
Q32 Which Python library is primarily used for numerical computations?
NumPy
Pandas
Matplotlib
Seaborn
Q33
Q33 How do you load a CSV file into a Pandas DataFrame?
pd.load_csv()
pd.read_csv()
pd.import_csv()
pd.csv()
Q34
Q34 Which function in Jupyter Notebook allows you to create a new cell?
Shift + Enter
Ctrl + Enter
Alt + Enter
Esc + B
Q35
Q35 How do you install a new Python library using Jupyter Notebook?
pip.install(library)
!install library
install(library)
!pip install library
Q36
Q36 A Pandas DataFrame throws an error: "KeyError: column not found." What could be the issue?
Column name mismatch
Empty DataFrame
Incorrect library
Non-numeric data
Q37
Q37 While using Jupyter Notebook, the kernel frequently crashes during computation. What could be the cause?
Unsupported library
Insufficient memory
Incorrect syntax
No internet connection
Q38
Q38 What is the primary challenge addressed by distributed computing?
Storage optimization
Real-time collaboration
Processing large-scale data
Building small-scale applications
Q39
Q39 Which of the following is an example of a distributed computing framework?
Hadoop
Tableau
MySQL
Excel
Q40
Q40 What is the role of a job tracker in Hadoop’s architecture?
Managing storage
Assigning and monitoring tasks
Optimizing visualization
Analyzing datasets
Q41
Q41 Why is fault tolerance important in distributed computing?
It reduces redundancy
It ensures high availability
It speeds up computation
It optimizes resource usage
Q42
Q42 How do you initialize a Spark session in PySpark?
spark = SparkSession.start()
spark = SparkSession.builder.getOrCreate()
spark = Spark.start()
spark = SparkContext.start()
Q43
Q43 Which PySpark method is used to read a CSV file into a DataFrame?
read.csv()
spark.read.csv()
pd.read_csv()
load_csv()
Q44
Q44 How do you write a PySpark DataFrame to a Parquet file?
df.write.csv()
df.write.json()
df.write.parquet()
df.write.format("csv")
Q45
Q45 A Hadoop job fails midway due to a node failure. What ensures task completion in such cases?
Data replication
Parallel computing
Data visualization
Fault detection
Q46
Q46 A PySpark job runs slower than expected. What could be a possible issue?
Incorrect function syntax
Resource underutilization
Balanced partitions
Optimized transformations
Q47
Q47 Why is data privacy important in Data Science?
To increase storage
To protect user rights
To speed up processing
To improve data formats
Q48
Q48 Which of the following is a common ethical concern in AI systems?
Transparency
Data visualization
Efficient computation
Hardware optimization
Q49
Q49 What is data bias in Data Science?
Errors due to missing values
Unrepresentative data causing unfair outcomes
Overfitting
Incomplete preprocessing
Q50
Q50 Which Python library helps ensure secure handling of sensitive data during analysis?
NumPy
PyCrypto
Matplotlib
Pandas
Q51
Q51 How do you anonymize sensitive columns in a Pandas DataFrame?
df.anonymize()
hashlib.hash(df)
df['column'].apply(hashlib.sha256)
df.remove('column')
Q52
Q52 A dataset contains personally identifiable information (PII). What is the recommended practice before analysis?
Encrypt the dataset
Share the data
Ignore PII
Remove or anonymize PII
Q53
Q53 An AI model shows biased outcomes in predictions. What could be the issue?
Data preprocessing error
Biased training data
Correct loss function
Adequate testing
Q54
Q54 What is the primary benefit of case studies in Data Science?
They improve storage efficiency
They provide real-world problem-solving examples
They optimize algorithms
They test software
Q55
Q55 In predictive modeling, which case study metric is most relevant for evaluating accuracy?
Silhouette Score
Mean Absolute Error (MAE)
Execution Time
Data Redundancy
Q56
Q56 Which challenge is commonly highlighted in Data Science case studies involving healthcare?
Lack of computational resources
Data privacy and security
Limited statistical methods
Excessive labeled data
Q57
Q57 Which Python library is commonly used in case studies for creating visualizations to summarize results?
Seaborn
NumPy
PyTorch
Scikit-learn
Q58
Q58 How do you save the results of a machine learning model in Python for later use?
pickle.dump(model, file)
save_model(model)
model.save('file')
file.save(model)
Q59
Q59 During a case study analysis, a DataFrame contains missing values. What is the simplest method to handle this?
Drop rows with missing values
Save the DataFrame
Optimize DataFrame size
Export the DataFrame
Q60
Q60 A Data Science case study involves unbalanced classes in a classification dataset. What preprocessing step can address this?
Normalization
Data augmentation
PCA
Dimensionality reduction

