1. You are training a classification model with logistic regression. Which of the following statements are true? Check all that apply.【D】
A. Introducing regularization to the model always results in equal or better performance on the training set.
B. Introducing regularization to the model always results in equal or better performance on the training set.
【解析】如果引入的正則化的lambda參數過大,就會導致欠擬合,從而會導致最后的結果更糟。
C. Adding many new features to the model helps prevent overfitting on the training set.
【解析】增加許多新的特征到預測模型里會讓預測模型更好的擬合訓練集的數據,但是如果添加的特征太多,就會有可能導致過擬合,從而導致不能泛化到需要預測的數據,因而導致預測不夠精準。
D. Adding a new feature to the model always results in equal or better performance on examples not in the training set.
【解析】增加新的變量可能會導致過度擬合,從而導致更糟糕的結果預測,而不是訓練集的擬合。
E. Adding a new feature to the model always results in equal or better performance on the training set.
【解析】增加新的特征會讓預測模型更佳具有表達性,從而會更好的擬合訓練集。By adding a new feature, our model must be more (or just as) expressive, thus allowing it learn more complex hypotheses to fit the training set.
2.Which of the following statements are true? Check all that apply.【BD】
A. Suppose you have a multi-class classification problem with three classes, trained with a 3 layer network. Let a(3)1=(hΘ(x))1 be the activation of the first output unit, and similarly a(3)2=(hΘ(x))2 and a(3)3=(hΘ(x))3. Then for any input x, it must be the case that a(3)1+a(3)2+a(3)3=1.
B.In a neural network with many layers, we think of each successive layer as being able to use the earlier layers as features, so as to be able to compute increasingly complex functions.
C.If a neural network is overfitting the data, one solution would be to decrease the regularization parameter λ.
D.If a neural network is overfitting the data, one solution would be to increase the regularization parameter λ.
3.You are using the neural network pictured below and have learned the parameters Θ(1)=[11?1.55.13.72.3] (used to compute a(2)) and Θ(2)=[10.6?0.8] (used to compute a(3)} as a function of a(2)). Suppose you swap the parameters for the first hidden layer between its two units so Θ(1)=[115.1?1.52.33.7] and also swap the output layer so Θ(2)=[1?0.80.6]. How will this change the value of the output hΘ(x) ?【A】
A.It will stay the same.
B.It will increase.
C.It will decrease
D.Insufficient information to tell: it may increase or decrease.
4. Which of the following statements aretrue? Check all that apply. 【BD】
A. Suppose you are traininga logistic regression classifier using polynomial features and want to selectwhat degree polynomial (denoted d in thelecture videos) to use. After training the classifier on the entire trainingset, you decide to use a subset of the training examples as a validation set.This will work just as well as having a validation set that is separate(disjoint) from the training set.
B. Suppose you areusing linear regression to predict housing prices, and your dataset comessorted in order of increasing sizes of houses. It is then important to randomlyshuffle the dataset before splitting it into training, validation and testsets, so that we don’t have all the smallest houses going into the trainingset, and all the largest houses going into the test set.
**C. **It is okay touse data from the test set to choose the regularization parameter λ, but not themodel parameters (θ).
D. A typical splitof a dataset into training, validation and test sets might be 60% training set,20% validation set, and 20% test set.
5.Suppose you have a dataset with n = 10 features and m = 5000 examples. After training your logistic regression classifier with gradient descent, you find that it has underfit the training set and does not achieve the desired performance on the training or cross validation sets. Which of the following might be promising steps to take? Check all that apply.【AC】
A. Use an SVM with a Gaussian Kernel.
【解析】帶有高斯核的SVM可以擬合出更復雜的決策邊界,這意味著可以一定程度上修正前擬合。
B. Use a different optimization method since using gradient descent to train logistic regression might result in a local minimum.
C. Create / add new polynomial features.
D. Increase λ.