Scatter Plots and Linear Correlation ( Read ) | Statistics | CK Foundation
Informally assess the fit of a function by plotting and analyzing the residual plot. c. Fit a linear of y when x is zero) of a linear model in the context of the data. caztuning.info 8 . The two-way frequency table below shows the favorite .. The table below shows the height and the weight of five starters on a high school basketball team. The form of the equation of a proportional relation is y = kx, where k is the a graph of a proportional relationship is the constant ratio of y to x (the slope of the line). of representing the relationship between the quantities (i.e., diagram, table, a comparison with one graph of a relationship that is linear but not proportional. A medical researcher is studying the relationship between age (x years) and volume of blood (y ml) pumped by each contraction of the heart. The table below shows the distances (to the nearest km) travelled to work by the 50 (b) Use linear interpolation to estimate the median distance travelled to work.
Linear Equation Table
The formula for the correlation coefficient is The value of r varies between A value of r near zero implies that there is little or no linear relationship between variables. We can determine the correlation coefficient for the linear regression equation determined in Example This value for the correlation coefficient is very close to 1. Another measure of the strength of the relationship between the variables in a linear regression equation is the coefficient of determination.
It is computed by squaring the value of r.
Exam-Style Questions on Correlation
It indicates the percentage of the variation in the dependent variable that is a result of the behavior of the independent variable. A value of 1.
Regression Analysis with Excel The development of the simple linear regression equation and the correlation coefficient for our example was not too difficult because the amount of data was relatively small.
However, manual computation of the components of simple linear regression equations can become very time-consuming and cumbersome as the amount of data increases. A12 " entered in cell D7 and shown on the formula bar at the top of the spreadsheet. A linear regression forecast can also be developed directly with Excel using the "Data Analysis" option from the Tools menu we accessed previously to develop an exponentially smoothed forecast.
We first enter the cells from Exhibit Next enter the x value cells, A5: The output range is the location on the spreadsheet that you want to put the output results. This range needs to be large 18 cells by 9 cells and not overlap with anything else on the spreadsheet.
Clicking on "OK" will result in the spreadsheet shown in Exhibit Note that the "Summary Output" has been slightly moved around so that all the results could be included on the screen in Exhibit The "Summary Output" in Exhibit The essential items that we are interested in are the intercept and slope labeled "X Variable 1" in the "Coefficients" column at the bottom of the spreadsheet, and the "Multiple R" or correlation coefficient value shown under "Regression Statistics.
Multiple Regression Another causal method of forecasting is multiple regression, a more powerful extension of linear regression. Linear regression relates demand to one other independent variable, whereas multiple regression reflects the relationship between a dependent variable and two or more independent variables. A multiple regression model has the following general form: For example, the demand for new housing y in a region might be a function of several independent variables, including interest rates, population, housing prices, and personal income.
Development and computation of the multiple regression equation, including the compilation of data, is more complex than linear regression.
Identifying Function Models ( Read ) | Algebra | CK Foundation
The only means for forecasting using multiple regression is with a computer. To demonstrate the capability to solve multiple regression problems with Excel spreadsheets we will expand our State University athletic department example for forecasting attendance at football games that we used to demonstrate linear regression.
Instead of attempting to predict attendance based on only one variable, wins, we will include a second variable for advertising and promotional expenditures as follows: We will use the "Data Analysis" option add-in from the Tools menu at the top of the spreadsheet that we used in the previous section to develop our linear regression equation, and then the "Regression" option from the "Data Analysis" menu.
The resulting spreadsheet with the multiple regression statistics is shown in Exhibit Then we enter the "Input X Range" as A4: B12 as shown in Exhibit The regression coefficients for our x variables, wins and promotion, are shown in cells B27 and B Thus the multiple regression equation is formulated as This equation can now be used to forecast attendance based on both projected football wins and promotional expenditure.
This would seem to suggest that the number of wins has a more significant impact on attendance than promotional expenditures. However, as we have already noted, the number of wins would appear to probably account for a larger part of the variation in attendance. Demand forecasts are a critical part of Vermont Gas Systems' supply chain that stretches across Canada. Gas is transported from suppliers in western Canada to storage facilities along the Trans-Canada pipeline to Vermont Gas Systems' pipeline.
Gas orders must be specified to suppliers at least 24 hours in advance. Enough gas must be ordered to meet customer needs, especially in the winter, but too much will needlessly and expensively tax Trans-Canada Pipelines' facilities.
Vermont Gas Systems has storage capacity available for a buffer inventory of only one hour of gas use so an accurate daily forecast of gas demand is essential. Vermont Gas Systems uses regression to forecast daily gas demand. In its forecast models, gas demand is the dependent variable, and factors such as weather information, industrial customer demand, and changing end-use consumer demand are independent variables.
Avoid using a model that discards some data unless you know that the data being filtered out is invalid. The trend line description reports how many marks were filtered before model estimation. Exponential With the exponential model type, the formula is: What you see is the exponential model in the following form: Because a logarithm is not defined for numbers less than zero, any marks for which the response variable is negative are filtered before model estimation.
Power With the power model type, the formula is: Because a logarithm is not defined for numbers less than zero, any marks for which the response variable or explanatory variable is negative are filtered before model estimation. Polynomial With the polynomial model type, the response variable is transformed into a polynomial series of the specified degree.
- What is a Scatterplot?
- Add Trend Lines to a Visualization
The higher polynomial degrees exaggerate the differences between the values of your data. If your data increases very rapidly, the lower order terms may have almost no variation compared to the higher order terms, rendering the model impossible to estimate accurately. Also, more complicated higher order polynomial models require more data to estimate.
Check the model description of the individual trends line for a red warning message indicating that an accurate model of this type is not possible. Back to top Trend Line Model Terms When you view the description for a trend line model, there are several values listed. This section discusses what each of these values means. Model formula This is the formula for the full trend line model. The formula reflects whether you have specified to exclude factors from the model.
Number of modeled observations The number of rows used in the view. Number of filtered observations The number of observations excluded from the model.
Add Trend Lines to a Visualization - Tableau
Model degrees of freedom The number of parameters needed to completely specify the model. Linear, logarithmic, and exponential trends have model degrees of freedom of 2. Polynomial trends have model degrees of freedom of 1 plus the degree of the polynomial.
For example a cubic trend has model degrees of freedom of 4, since we need parameters for the cubed, squared, linear and constant terms. Residual degrees of freedom DF For a fixed model, this value is defined as the number of observations minus the number of parameters estimated in the model. SSE sum squared error The errors are the difference between the observed value and the value predicted by the model.
In the Analysis of Variance table, this column is actually the difference between the SSE of the simpler model in that particular row and the full model, which uses all the factors. This SSE also corresponds to the sum of the differences squared of the predicted values from the smaller model and the full model. R-Squared R-squared is a measure of how well the data fits the linear model.
It is the ratio of the variance of the model's error, or unexplained variance, to the total variance of the data. When the y-intercept is determined by the model, R-squared is derived using the following equation: When the y-intercept is forced to 0, R-squared is derived using this equation instead: In the latter case, the equation will not necessarily match Excel.
This is because R-squared is not well defined in this case, and Tableau's behavior matches that of R instead of that of Excel. Standard error The square root of the MSE of the full model. An estimate of the standard deviation variability of the "random errors" in the model formula. The values are a comparison of the model without the factor in question to the entire model, which includes all factors.
Individual trend lines This table provides information about each trend line in the view. Looking at the list you can see which, if any, are the most statistically significant. This table also lists coefficient statistics for each trend line.
A row describes each coefficient in each trend line model. For example, a linear model with an intercept requires two rows for each trend line. In the Line column, the p-value and the DF for each line span all the coefficient rows.
The DF column under the shows the residual degrees of freedom available during the estimation of each line. Terms The name of the independent term.
Value The estimated value of the coefficient for the independent term. StdErr A measure of the spread of the sampling distribution of the coefficient estimate. This error shrinks as the quality and quantity of the information used in the estimate grows.
So, a p-value of. Assess Trend Line Significance To see relevant information for any trend line in the view, hover the cursor over it: The first line in the tooltip shows the equation used to compute a value of Profit from a value of Year of Order Date.
The second line, the R-Squared value, shows the ratio of variance in the data, as explained by the model, to the total variance in the data.