本文是在学习Andrew Ng所教授的Machine learning课程过程中所记录的笔记。因为个人知识的不足以及英文教学,难免会有理解偏差的地方,欢迎一起交流。

课程资料:https://www.coursera.org/learn/machine-learning

Machine learning 主要分为两类:

Supervised Learning:regression problem、classification problem

例子:房价估计,良性恶性肿瘤判断

supervised learning:”right answers” given

  • regression: predict continuous valued output(price)
  • classification: discrete valued output(0 or 1)

Unsupervised Learning:clustering algorithm

例子:谷歌新闻,基因,organize computing clusters,social network analysis,market segmentation,astronomical data analysis,cocktail party problem(录音分辨)

编程作业工具:octave\matlab


第一周测验,这道题总是错

A computer program is said to learn from experience E with respect to some task T and some performance measure P if its performance on T, as measured by P, improves with experience E. Suppose we feed a learning algorithm a lot of historical weather data, and have it learn to predict weather. What would be a reasonable choice for P?

这是这节课wiki的地址: https://share.coursera.org/wiki/index.php/ML:Main.

我在这上面找到了答案

Two definitions of Machine Learning are offered. Arthur Samuel described it as: “the field of study that gives computers the ability to learn without being explicitly programmed.” This is an older, informal definition.

Tom Mitchell provides a more modern definition: “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.”

Example: playing checkers.

  • E = the experience of playing many games of checkers
  • T = the task of playing checkers.
  • P = the probability that the program will win the next game.

Linear Regression

regression problem:用training set来进行训练

Linear regression model:

求h的思路:idea

J函数被称为cost function,用于确定h中的θ

h与J的关系,只有θ1

h与J的关系,有θ0和θ1,右边的叫做contour plot等高线图

An algorithm: gradient descent 梯度下降

For minimizing the cost function J.

gradient descent不止用于最小化J函数,它是一个非常通用的算法。

可以把这个算法想象成你在一个山上,要下山,要走下山最快的路

Multivariate linear regression多元线性回归(linear regression with multiple variables)

此时有多个变量,而不只有面积着一个变量。

注意x上标和下标的意思。上标:第几个training example;下标:第几个变量

此时的hypothesis

对其进行线性代数化简

多元变量下的gradient descent

梯度下降运算中的使用技巧

1)feature scaling 特征缩放

如果不同变量(feature,就是那些参数)的取值范围差的很多,(假如只有两个feature)那么画出来的等高线图会特别椭圆,这个时候gradient descent下降的会很慢,θ收敛的很慢。下面为比较

特征缩放一般有两种方式

  • 一种是除以范围绝对值的最大值,使其范围在-1到+1之间。(不用严格满足,比如-3 to 3或者-0.3 to 0.3都可以,只要差的不是太多,并且所有feature的范围接近即可)

  • 另一种方法叫做mean normalization均值归一化

2)learning rate-α的选择

对于gradient descent:

  • “debugging”:how to make sure gradient descent is working correctly.
  • how to choose learning rate α

对于每一次迭代,J函数都应该下降

可以用自动收敛测试,比如每次J函数下降小于1e-3,但是不推荐,因为选阀值很难。

以下这些情况都要选择更小的α

总结:α太小,收敛慢;α太大,J不收敛

吴恩达选择α的方式:三倍一选

features and polynomial regression

合理选择feature(变量)能有效降低假设的复杂度。比如有两个feature房屋的长度和宽度,可以组合成一个feature面积。

有时候也可以选择多项式回归,使得model更适合data。

至于怎么选择后面的课会讲。

normal equation:

目前线性回归的算法有

  1. 梯度下降法,多次迭代逐渐收敛到J函数的全局最小值。
  2. normal equation,method to solve for θ analytically。解析解法,一次求出θ最优值

向量化:

编程的时候向量化,可以让运算速度更快,效率更高