This set of MCQ(multiple choice questions) focuses on the **Python for Data Science NPTEL Week 4 Answers**

You should practice these questions to improve fundamentals of Data Science needed for various interviews (like company interview, campus interview, walk-in interview), entrance exams, placements and other competitive exams. All the questions in this particular section are based on only “** Python for Data Science NPTEL Week 4 Answers**“.

### Course layout

**Week 1**: Basics of Python Spyder**Week 2:** Sequence data types & associated operations**Week 3:** Data frames**Week 4:** Case study

**NOTE:** You can check your answer immediately by clicking show answer button. Moreover, this set of “Python for Data Science NPTEL Week 4 Answers” contains 10 questions.

Now, start attempting the quiz.

### Python for Data Science NPTEL Week 4 Answers

**Q1.** Which of the following are regression problems? Assume that appropriate data is given.

a) Predicting the house price.

b) Predicting whether it will rain or not on a given day.

c) Predicting the maximum temperature on a given day.

d) Predicting the sales of the ice-creams.

Answer: a), c), d)

**Q2**. Which of the following are binary classification problems?

a) Predicting whether a patient is diagnosed with cancer or not.

b) Predicting whether a team will win a tournament or not.

c) Predicting the price of a second-hand car.

d) Classify web text into one of the following categories: Sports, Entertainment, or Technology.

Answer: a), b)

**Q3.** If a linear regression model achieves zero training error, can we say that all the data points lie on a hyperplane in the (d+1)-dimensional space? Here, d is the number of features.

a) Yes

b) No

Answer: a) Yes

**Q4.** Which of the following machine learning techniques would NOT be appropriate to solve the problem given in the problem statement?

a) kNN

b) Random Forest

c) Logistic Regression

d) Linear regression

Answer: d) Linear regression

**Q5.** After applying logistic regression, what is/are the correct observations from the resultant confusion matrix?

a) True Positive = 29, True Negative = 94

b) True Positive = 94, True Negative = 29

c) False Positive = 5, True Negative = 94

d) None of the above

Answer: a), c)

**Q6.** The logistic regression model built between the input and output variables is checked for its prediction accuracy of the test data. What is the accuracy range (in %) of the predictions made over test data?

a) 60 -79

b) 90 – 95

c) 30 – 59

d) 80 – 89

Answer: b) 90 – 95

**Q7.** How are categorical variables preprocessed before model building?

a) Standardization

b) Dummy variables

c) Correlation

d) None of the above

Answer: b) Dummy variables

**Q8.** A multiple linear regression model is built on the Global Happiness Indes dataset ‘GHI_Report.csv’. What is the RMSE of the baseline model?

a) 2.00

b) 0.50

c) 1.06

d) 0.75

Answer: c) 1.06

**Q9.** A regression model with the following function y = 60 + 5.2x was built to understand the impact of humidity (x) on rainfall (y). The humidity this week is 30 more than the previous week. What is the predicted difference in rainfall?

a) 156 mm

b) 15.6 mm

c) -156 mm

d) None of the above

Answer: a) 156 mm

**Q10.** X and Y are two variables that have a strong linear relationship. Which of the following statements are incorrect?

a) There cannot be a negative relationship between the two variables.

b) The relationship between the two variables is purely causal.

c) One variable may or may not cause a change in the other variable.

d) The variables can be positively or negatively correlated with each other.

Answer: a), b)

### Python for Data Science NPTEL Week 4 Answers

**Q1.** How many unique values are present in the Sbal feature; also, what is the most frequent value within Sbal?

a) 5, Rs. >= 10,000

b) 4, Rs. < 1000

c) 5, Rs. < 1000

d) 4, ‘1000 <= Rs. < 5,000’

**Answer:** c)

**Q2.** Find the average age of those customers who have a credit history [Chist] wherein the dues are not paid earlier

a) 35.54

b) 38.44

c) 33.00

d) None of the above

**Answer:** b)

**Q3.** A Logistic Regression model is built in which none of the features used are standardized. The train to test proportion is 75:25 and the random state is set to 1. The accuracy of the model is ________.

a) Less than 50%

b) Between 50% and 60%

c) Greater than 70%

d) None of the above

**Answer:** c)

**Q4.** Import StandardScaler() from the sklearn.preprocessing package to standardize the features. Use the same train-test proportion and the random state should be set to 1. After standardizing the logistic regression model, by what percentage has the misclassified samples changed?

a) 11.11%

b) 3.7%

c) 20%

d) 39.2%

**Answer:** a)

**Q5.** When KNN classification is applied on the same standardized data at the optimal value for k nearest neighbours, the accuracy achieved is ______.

a) 64%

b) 78%

c) 76.4%

d) None of the above

**Answer:** b)

**Q6.** A multiple linear regression model is built on the Global Happiness Index dataset “GHI_Report.csv”. What is the rmse of the baseline model?

a) 1.99

b) 1.06

c) 0.85

d) 0.33

**Answer:** c)

**Q7.** From the multiple linear regression model built on the GHI index, we get an R-squared value of _______ on the test data subset.

a) 55.63

b) 45.81

c) 75.59

d) 81.46

**Answer:** d)

**Q8.** Which of the following statement/s about Linear Regression is / are true?

a) Linear Regression assumes that there exists a linear relationship between the independent variable and dependent variable.

b) The error terms are assumed to be independent and normally distributed.

c) The percentage of variation in the dependent variable as explained by the independent variable/variables is expressed by R-squared value.

d) Residuals are the product of the predicted value and the actual observed value.

**Answer:** a), b), c)

**Q9.** Which of the following statements is inaccurate about Logistic Regression?

a) Logistic Regression doesn’t require a linear relationship between the dependent and independent variables.

b) The value of the logistic function being a probability will range between 0 and 1.

c) Cost function of Logistic Regression is also called as the Log Loss function.

d) The dependent variable can be of both numerical or categorical type just like the independent variables.

**Answer:** d)

**Q10.** In a KNN model, by which means do we handle categorical variables?

a) Standardization

b) Dummy variables

c) Correlation

d) None of the above

**Answer:** b)

**<< Prev- Python for Data Science Week 3 Assignment Solutions**

DISCLAIMER:Use these answers only for the reference purpose. Quizermania doesn't claim these answers to be 100% correct. So, make sure you submit your assignments on the basis of your knowledge.

**Programming in Java **NPTEL week 1 quiz answers

**Nptel – Deep Learning assignment solutions**

*For discussion about any question, join the below comment section. And get the solution of your query.* Also, try to share your thoughts about the topics covered in this particular quiz.