My FeedDiscussionsHeadless CMS
New
Sign in
Log inSign up
Learn more about Hashnode Headless CMSHashnode Headless CMS
Collaborate seamlessly with Hashnode Headless CMS for Enterprise.
Upgrade ✨Learn more

Multivariate Linear Regression Cost Too High

Dandy Cheng's photo
Dandy Cheng
·Jan 2, 2020

I was working on price prediction with the data set provided in this link, the imports-85.data.

With horsepower, curb-weight, engine-size and highway-mpg, I tried to normalize (due to the high cost) and run the gradient descent algorithm by implementing the following:

Initialization

data = df[attrs]
m = len(data) # m-training examples
f = len(attrs) # n-features
X = np.hstack((np.ones(shape=(m,1)),np.array(data)))
T = np.zeros(f + 1) # Coefficients of x(0),x(1),...x(n)
norm_price = df.price / 1000
Y = np.array(norm_price)

# Normalization
data['curb-weight'] = (data['curb-weight'] * 0.453592) / 1000    # To kg (e-1000)
data['highway-mpg'] = data['highway-mpg'] * 0.425144    # To km per litre (kml)
data['engine-size'] = data['engine-size'] / 100     # To e-100
data['horsepower'] = data['horsepower'] / 100   # To e-100

col_rename = {
    'curb-weight':'curb-weight-kg(e-1000)',
    'highway-mpg':'highway-kml',
    'engine-size':'engine-size(e-100)',
    'horsepower':'horsepower(e-100)'
}
data.rename(columns=col_rename,inplace=True)

Cost calculation

def calculateCost():
    global m,T,X
    hypot = (X.dot(T) - Y).transpose().dot(X.dot(T) - Y)
    return hypot / (2 * m)

Gradient descent

def gradDescent(threshold,iter = 10000,alpha = 3e-8):
    global T,X,Y,m
    i = 0
    cost = calculateCost()
    cost_hist = [cost]
    while i < iter:
        T = T - (alpha / m) * X.transpose().dot(X.dot(T) - Y)
        cost = calculateCost()
        cost_hist.append(cost)
        i += 1
        if cost <= threshold:
            return cost_hist

I ran the gradient descent with this implementation:

adzkdc-1.png

Without normalization, the cost would be 118634960.460199. With normalization, the cost would be 118.634960460199

As a result, I have a few questions:

  1. Is my normalization technique correct?
  2. After normalization, the cost would be different. How do I set the threshold for the cost after normalization?