机器学习 Machine learning part 1 - Linear Regression
文章目录
本文是在学习Andrew Ng所教授的Machine learning课程过程中所记录的笔记。因为个人知识的不足以及英文教学,难免会有理解偏差的地方,欢迎一起交流。
课程资料:https://www.coursera.org/learn/machine-learning
Machine learning 主要分为两类:
Supervised Learning:regression problem、classification problem
例子:房价估计,良性恶性肿瘤判断
supervised learning:”right answers” given
- regression: predict continuous valued output(price)
- classification: discrete valued output(0 or 1)
Unsupervised Learning:clustering algorithm
例子:谷歌新闻,基因,organize computing clusters,social network analysis,market segmentation,astronomical data analysis,cocktail party problem(录音分辨)
编程作业工具:octave\matlab
第一周测验,这道题总是错
A computer program is said to learn from experience E with respect to some task T and some performance measure P if its performance on T, as measured by P, improves with experience E. Suppose we feed a learning algorithm a lot of historical weather data, and have it learn to predict weather. What would be a reasonable choice for P?
这是这节课wiki的地址: https://share.coursera.org/wiki/index.php/ML:Main.
我在这上面找到了答案
Two definitions of Machine Learning are offered. Arthur Samuel described it as: “the field of study that gives computers the ability to learn without being explicitly programmed.” This is an older, informal definition.
Tom Mitchell provides a more modern definition: “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.”
Example: playing checkers.
- E = the experience of playing many games of checkers
- T = the task of playing checkers.
- P = the probability that the program will win the next game.
Linear Regression
regression problem:用training set来进行训练。
Linear regression model:
求h的思路:idea
J函数被称为cost function,用于确定h中的θ
h与J的关系,只有θ1
h与J的关系,有θ0和θ1,右边的叫做contour plot等高线图
An algorithm: gradient descent 梯度下降
For minimizing the cost function J.
gradient descent不止用于最小化J函数,它是一个非常通用的算法。
可以把这个算法想象成你在一个山上,要下山,要走下山最快的路
Multivariate linear regression多元线性回归(linear regression with multiple variables)
此时有多个变量,而不只有面积着一个变量。
注意x上标和下标的意思。上标:第几个training example;下标:第几个变量
此时的hypothesis:
对其进行线性代数化简
多元变量下的gradient descent
梯度下降运算中的使用技巧
1)feature scaling 特征缩放
如果不同变量(feature,就是那些参数)的取值范围差的很多,(假如只有两个feature)那么画出来的等高线图会特别椭圆,这个时候gradient descent下降的会很慢,θ收敛的很慢。下面为比较
特征缩放一般有两种方式
- 一种是除以范围绝对值的最大值,使其范围在-1到+1之间。(不用严格满足,比如-3 to 3或者-0.3 to 0.3都可以,只要差的不是太多,并且所有feature的范围接近即可)
- 另一种方法叫做mean normalization均值归一化
2)learning rate-α的选择
对于gradient descent:
- “debugging”:how to make sure gradient descent is working correctly.
- how to choose learning rate α
对于每一次迭代,J函数都应该下降
可以用自动收敛测试,比如每次J函数下降小于1e-3,但是不推荐,因为选阀值很难。
以下这些情况都要选择更小的α
总结:α太小,收敛慢;α太大,J不收敛
吴恩达选择α的方式:三倍一选
features and polynomial regression
合理选择feature(变量)能有效降低假设的复杂度。比如有两个feature房屋的长度和宽度,可以组合成一个feature面积。
有时候也可以选择多项式回归,使得model更适合data。
至于怎么选择后面的课会讲。
normal equation:
目前线性回归的算法有
- 梯度下降法,多次迭代逐渐收敛到J函数的全局最小值。
- normal equation,method to solve for θ analytically。解析解法,一次求出θ最优值
向量化:
编程的时候向量化,可以让运算速度更快,效率更高