Introduction to Machine Learning NPTEL Week 2 Solutions NPTEL 2023

This set of MCQ(multiple choice questions) focuses on the Introduction to Machine Learning NPTEL 2023 Week 2 Solutions.

With the increased availability of data from varied sources there has been increasing attention paid to the various data driven disciplines such as analytics and machine learning. In this course we intend to introduce some of the basic concepts of machine learning from a mathematically well motivated perspective. We will cover the different learning paradigms and some of the more popular algorithms and architectures used in each of these paradigms.

Course layout

Answers COMING SOON! Kindly Wait!

Week 1: Assignment answers
Week 2: Assignment answers
Week 3: Assignment answers
Week 4: Assignment answers
Week 5: Assignment answers
Week 6: Assignment answers
Week 7: Assignment answers
Week 8: Assignment answers
Week 9: Assignment answers
Week 10: Assignment answers
Week 11: Assignment answers
Week 12: Assignment answers

NOTE: You can check your answer immediately by clicking show answer button. Introduction to Machine learning NPTEL 2023 Week 2 Assignment Solution” contains 8 questions.

Now, start attempting the quiz.

Introduction to Machine learning NPTEL 2023 Week 2 Solutions

Q1. Given a training data set of 10,000 instances, with each input instance having 17 dimensions and each output instance having 2 dimensions, the dimensions of the design matrix used in applying linear regression to this data is

a) 10000 × 17
b) 10002 × 17
c) 10000 × 18
d) 10000 × 19

Answer: c

Q2.

Suppose we want to add a regularizer to the linear regression loss function, to control the magnitudes of the weights ββ. We have a choice between Ω1(β)=Σpi=1|β|Ω1()=Σ=1|| and Ω2(β)=Σpi=1β2Ω2()=Σ=12 Which one is more likely to result in sparse weights?

a) Ω1
b) Ω2
c) Both Ω1Ω1 and Ω2Ω2 will result in sparse weights
d) Neither of Ω1Ω1 or Ω2Ω2 can result in sparse weights.

Answer: a

Q3. The model obtained by applying linear regression on the identified subset of features may differ from the model obtained at the end of the process of identifying the subset during

a) Forward stepwise selection
b) Backward stepwise selection
c) Forward stagewise selection
d) All of the above

Answer: c

Q4. Consider forward selection, backward selection and best subset selection with respect to the same data set. Which of the following is true?

a) Best subset selection can be computationally more expensive than forward selection
b) Forward selection and backward selection always lead to the same result
c) Best subset selection can be computationally less expensive than backward selection
d) Best subset selection and forward selection are computationally equally expensive
e) Both (b) and (d)

Answer: a

Q5. In the lecture on Multivariate Regression, you learn about using orthogonalization iteratively to obtain regression co-effecients. This method is generally referred to as Multiple Regression using Successive Orthogonalization. In the formulation of the method, we observe that in iteration k, we regress the entire dataset on z0,z1,…zk−10,1,…−1 . It seems like a waste of computation to recompute the coefficients for z00 a total of p times, z11 a total of p−1−1 times and so on. Can we re-use the coefficients computed in iteration j for iteration j+1+1 for zj−1−1 ?

a) No. Doing so will result in the wrong γγ matrix. and hence, the wrong βiβ ’s.
b) Yes. Since zj−1−1 is orthogonal to zj−l∀l≤j1−∀≤1, the multiple regression in each iteration is essentially a univariate regression on each of the previous residuals. Since the regression coefficients for the previous residuals don’t change over iterations, we can re-use the coefficients for further iterations.

Answer: b

Q6. Principal Component Regression (PCR) is an approach to find an orthogonal set of basis vectors which can then be used to reduce the dimension of the input. Which of the following matrices contains the principal component directions as its columns (follow notation from the lecture video)

a) X
b) S
c) Xc
d) V
e) U

Answer: d

Q7. Consider the following five training examples

We want to learn a function f(x)() of the form f(x)=ax+b()=+ which is parameterised by (a,b)(,). Using squared error as the loss function, which of the following parameters would you use to model this function to get a solution with the minimum loss

a) (4,3)
b) (1,4)
c) (4,1)
d) (3,4)

Answer: d

Q8. Here is a data set of words in two languages.

Let us build a nearest neighbours classifier that will predict which language a word belongs to. Say we represent each word using the following features.• Length of the word• Number of consonants in the word• Whether it ends with the letter ’o’ (1 if it does, 0 if it doesn’t)For example, the representation of the word ‘waffle’ would be [6, 2, 0]. For a distance function, use the Manhattan distance. d(a,b)=Σni=1|ai−bi|(,)=Σ=1|−| where a,b∈Rn,∈

Take the input word ‘keto’. With k = 1, the predicted language for the word is?

a) English
b) Vinglish
c) None of the above

Answer: a

<< Previous- Introduction to Machine Learning Week 1 Assignment Solutions

>> Next- Introduction to Machine Learning Week 3 Assignment Solutions

DISCLAIMER: Use these answers only for the reference purpose. Quizermania doesn't claim these answers to be 100% correct. So, make sure you submit your assignments on the basis of your knowledge.

For discussion about any question, join the below comment section. And get the solution of your query. Also, try to share your thoughts about the topics covered in this particular quiz.

Checkout for more NPTEL Courses: Click Here!