09/20 About Crab Molt model and T-test.

In today’s class, we discussed about two important things: the Crab Molt Model and the T-Test.

The Crab Molt Model is a way to make predictions when we have data that doesn’t follow a normal pattern. Imagine you have information about crabs, and sometimes their sizes don’t follow the usual pattern. This model helps us predict how big a crab will be after it sheds its old shell by looking at its size after the molt.

Post-molt data is what we collect when crabs have just shed their old shells and are growing new ones. Pre-molt data is information we gather from crabs just before they shed their old shells, which can show us how they are changing.

The T-Test is a statistical tool used to figure out if the differences we see between two groups are real or just random. This is really useful in research and data analysis when we want to compare two things to make sure our conclusions are reliable.

9/18- Quadratic model and Over fitting

Quadratic model:

A quadratic model is a variant of mathematical model utilized in statistics, and multiple fields to describe the relationship between dependent and  independent variable by fitting a quadratic equation. This is a form of polynomial regression where the relationship between the variables is modeled as a quadratic function.

The general form of a quadratic model is as follows:

y= ax2+bx+c

In this equation:

  • y represents the variable which is dependent
  • x represents the variable which is independent
  • a , b and c are constants and are not equal to 0(zero)

Quadratic models are applied when there is a relationship in  between the dependent and independent variables is not linear but rather follows a curved, U-shaped, or parabolic pattern. Once the model is fitted, it can be used for making predictions or understanding the relationship between the variables.

Overfitting:

Overfitting erupts when a model learns disturbance in the training data, resulting to poor performance on unseen data. It results in insufficient data. Preventing overfitting includes simplifying the model, collecting more data, selecting relevant features, using regularization, it also includes cross-validation. This results in minimal generalization to new and unseen data.

Heteroscedasticity and Calculating the value of P

Heteroscedasticity

  • Heteroscedasticity, is a statistical term used in regression analysis.
  • It describes a situation where the variability of the errors (residuals) in a regression model is not constant across all levels of the independent variable(s).
Heteroscedasticity can take different forms or types:

 

1) Increasing Heteroscedasticity: In this model, the difference of the residuals increases as the values of the independent variables increase. This manner that as we move along the predictor variables, the spread of the residuals becomes extensive.

2)Decreasing Heteroscedasticity:  In contradiction to increasing heteroscedasticity, this model involves the variance of the residuals decreasing as the values of the independent variable increase. The expansion of residuals channels as you move along the predictor variable.

3)U-shaped Heteroscedasticity: U-shaped heteroscedasticity occurs when the spread of residuals forms a particular U shape as you move along the independent variables. The dissimilarity of residuals is not constant and emerges to be heteroscedastic in a systematic manner.

 

 The Breusch pagan test

  • It is a statistical test applied in regression analysis to verify for the existence of heteroscedasticity in a regression model.
  • The test is to determine whether the residuals is constant across all levels of the independent variables.

Null Hypothesis: If there is no heteroscedasticity the difference of outcomes remains constant .

Alternative Hypothesis: If there is heteroscedasticity the difference of outcomes is not constant and may vary across the tosses.

  • If P value is greater than the  given significance level then it is not subjected to reject the null hypothesis.
  • On the other hand, If P value is less than the  given significance level then it is subjected to reject the null hypothesis.

9/11 Linear regression update

Linear regression is basically finding the best fitting line for the data points provided

  • The equation for linear regression is Y = β0 + β1X1 + β2X2 + … + βnXn + ε
  • Where:
    • Y is the dependent variable.
    • X1, X2, …, Xn are the independent variables.
    • β0 is the intercept.
    • β1, β2, …, βn are the coefficients.
    • ε represents the noise.

Observation from the Datasheet:

The dataset exhibits the factors such as Diabetes , Obesity and Inactivity for all the states in the country of USA for a particular year i.e 2018.But the number of samples for diabetes, obesity and inactivity are not same.