机器学习 Machine learning part 1 - Linear Regression

2017-03-20

笔记

阅读次数 0

本文共1k字阅读约4分钟

文章目录

Linear Regression
An algorithm: gradient descent 梯度下降
Multivariate linear regression多元线性回归(linear regression with multiple variables)
梯度下降运算中的使用技巧
1. 1）feature scaling 特征缩放
2. 2）learning rate－α的选择

features and polynomial regression

normal equation：

本文是在学习Andrew Ng所教授的Machine learning课程过程中所记录的笔记。因为个人知识的不足以及英文教学，难免会有理解偏差的地方，欢迎一起交流。

课程资料：https://www.coursera.org/learn/machine-learning

Machine learning 主要分为两类：

Supervised Learning：regression problem、classification problem

例子：房价估计，良性恶性肿瘤判断

supervised learning:”right answers” given

regression: predict continuous valued output(price)
classification: discrete valued output(0 or 1)

Unsupervised Learning：clustering algorithm

例子：谷歌新闻，基因，organize computing clusters，social network analysis，market segmentation，astronomical data analysis，cocktail party problem（录音分辨）

编程作业工具：octave\matlab

第一周测验，这道题总是错

A computer program is said to learn from experience E with respect to some task T and some performance measure P if its performance on T, as measured by P, improves with experience E. Suppose we feed a learning algorithm a lot of historical weather data, and have it learn to predict weather. What would be a reasonable choice for P?

这是这节课wiki的地址： https://share.coursera.org/wiki/index.php/ML:Main.

我在这上面找到了答案

Two definitions of Machine Learning are offered. Arthur Samuel described it as: “the field of study that gives computers the ability to learn without being explicitly programmed.” This is an older, informal definition.

Tom Mitchell provides a more modern definition: “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.”

Example: playing checkers.

E = the experience of playing many games of checkers
T = the task of playing checkers.
P = the probability that the program will win the next game.

Linear Regression

regression problem：用training set来进行训练。

Linear regression model：

求h的思路：idea

J函数被称为cost function，用于确定h中的θ

h与J的关系，只有θ1

h与J的关系，有θ0和θ1，右边的叫做contour plot等高线图

An algorithm: gradient descent 梯度下降

For minimizing the cost function J.

gradient descent不止用于最小化J函数，它是一个非常通用的算法。

可以把这个算法想象成你在一个山上，要下山，要走下山最快的路

Multivariate linear regression多元线性回归(linear regression with multiple variables)

此时有多个变量，而不只有面积着一个变量。

注意x上标和下标的意思。上标：第几个training example；下标：第几个变量

此时的hypothesis：

对其进行线性代数化简

多元变量下的gradient descent

梯度下降运算中的使用技巧

1）feature scaling 特征缩放

如果不同变量（feature，就是那些参数）的取值范围差的很多，（假如只有两个feature）那么画出来的等高线图会特别椭圆，这个时候gradient descent下降的会很慢，θ收敛的很慢。下面为比较

特征缩放一般有两种方式

一种是除以范围绝对值的最大值，使其范围在－1到＋1之间。（不用严格满足，比如－3 to 3或者－0.3 to 0.3都可以，只要差的不是太多，并且所有feature的范围接近即可）

另一种方法叫做mean normalization均值归一化

2）learning rate－α的选择

对于gradient descent：

“debugging”:how to make sure gradient descent is working correctly.
how to choose learning rate α

对于每一次迭代，J函数都应该下降

可以用自动收敛测试，比如每次J函数下降小于1e-3，但是不推荐，因为选阀值很难。

以下这些情况都要选择更小的α

总结：α太小，收敛慢；α太大，J不收敛

吴恩达选择α的方式：三倍一选

features and polynomial regression

合理选择feature（变量）能有效降低假设的复杂度。比如有两个feature房屋的长度和宽度，可以组合成一个feature面积。

有时候也可以选择多项式回归，使得model更适合data。

至于怎么选择后面的课会讲。

normal equation：

目前线性回归的算法有

梯度下降法，多次迭代逐渐收敛到J函数的全局最小值。
normal equation，method to solve for θ analytically。解析解法，一次求出θ最优值

向量化：

编程的时候向量化，可以让运算速度更快，效率更高

原文作者: Gai

原文链接: https://bluesmilery.github.io/blogs/18a3f212/

许可协议: 知识共享署名-非商业性使用 4.0 国际许可协议