Weather predictions

Published

November 4, 2022

Weather predictions

For the first project, Cam wants to work with the weather dataset he found online.

Cam’s Machine Learning class was a ton of fun. Their professor let them choose a problem to solve to allow them to showcase what they’ve learned so far.

Cam wants to predict the probability of snowing based on four factors: the date, the air temperature, the location, and the air pressure.

Which of the following is the most appropriate algorithm that Cam should use to solve this problem?

  1. Cam should use linear regression.

  2. Cam should use logistic regression.

  3. Cam should use K-means.

  4. Cam should use DBSCAN.

Attempt

Logistic Regression is used when the dependent variable(target) is categorical.For example,

  1. To predict whether an email is spam (1) or (0)
  2. Whether the tumor is malignant (1) or not (0)

Consider a scenario where we need to classify whether an email is spam or not. If we use linear regression for this problem, there is a need for setting up a threshold based on which classification can be done. Say if the actual class is malignant, predicted continuous value 0.4 and the threshold value is 0.5, the data point will be classified as not malignant which can lead to serious consequence in real time.

From this example, it can be inferred that linear regression is not suitable for classification problem. Linear regression is unbounded, and this brings logistic regression into picture.

When to use logistic regression?

Logistic regression is applied to predict the categorical dependent variable. In other words, it’s used when the prediction is categorical, for example, yes or no, true or false, 0 or 1. The predicted probability or output of logistic regression can be either one of them, and there’s no middle ground.

In essence, logistic regression helps solve probability and classification problems. In other words, you can expect only classification and probability outcomes from logistic regression.

While logistic regression predicts the categorical variable for one or more independent variables, linear regression predicts the continuous variable. In other words, logistic regression provides a constant output, whereas linear regression offers a continuous output.

Since the outcome is continuous in linear regression, there are infinite possible values for the outcome. But for logistic regression, the number of possible outcome values is limited.

Cam’s problem has two possible outcomes: it will snow or it won’t, and Cam wants his model to return the probability of it. Logistic regression is an excellent fit for any problem with a binary outcome.

Logistic regression estimates the probability of an event occurring based on a dataset of independent variables. Linear regression, on the other hand, predicts the continuous dependent variable using a dataset of independent variables.

K-means and DBSCAN are clustering algorithms and are not a good approach for this problem.