Description of Two or o Variables
Independent and dependent variables and the question of causation
In many cases we are interested on how the value of one variable changes when the value of another or more variables change. For example How does a sales of a product change, when advertisement and price change?
In this example there is one variable which we are trying to understand; this commonly called the dependent variable, there are also other variables which are also accompanied by changes in the value; these are called independent variables.
We have to make notice that in the normal x-y plot we would plot the independent variables ( those which we know ) in the x axis , and we would plot the dependent variables in the y axis.
Statistical observations can mislead the interpretation of some data if there is no experimentation, for example: In cities where there are more policemen, the crime rate is higher; therefore police creates more crime, this argumentation is invalid. statistics are so important that basically all that is around us is there because there was some experimentations based on statistics.

Scatter Diagram

One possible explanation for some of the scatter is that the relationship of weight to height may be different for women than for men. Suppose women of a given height weigh less, on the average, than men of the same height. Then the overall relationship between women's weights and heights might be represented by a line or curve below the corresponding curve for men, and this in itself could account for some of the scatter. To explore this hypothesis, we could look at separate scatter diagrams for men and for women, but all of the information can be conveyed on a single diagram in which points for men are represented by a different symbol than points for women. Although there is much overlap, the diagram shows that women tend to cluster to the lower left (shorter and lighter), and for any given height, women tend to weigh less than men. Using an identifier to represent values, or brackets of values, of a second independent variable in a scatter diagram is often a useful diagnostic tool.

Scatter Diagram with Identifier

Of course, considerable scatter remains. Even if we had the data on additional potential explanatory variables, however, we would have trouble graphing their joint effects: we must wait for more powerful analytic tools.

Correlation
  If high values of one variable accompany low values of another (and vice versa), the variables are negatively correlated. For example, low inflation tends to accompany high unemployment, and high inflation tends to accompany low unemployment; therefore, inflation and unemployment are negatively correlated.
If most of the observations have x and y values that are jointly above or jointly below their means, the covariance will tend to be positive. If, on the contrary, whenever x is above its mean y tends to be below its mean, and vice versa, the covariance will tend to be negative.   To find the correlation coefficient, we next divide the covariance by the product of the standard deviations of the two variables. A correlation coefficient of +1 represents a case where the points in a scatter diagram all lie exactly on an upward- sloping straight line; a coefficient of -1 represents a case where the points all lie on a downward-sloping straight line. The correlation coefficient for height vs. weight is +0.745. two variables are linearly related. It is a pairwise measure: if you have three variables, x, y, and z, you can compute the correlation between x and y, between x and z, and between y and z. These pairwise correlations are often presented in a tabular form that is called a correlation matrix." Table 1.4 gives a sample correlation matrix of height vs. weight.
Since the correlation of x vs. y is the1.000 same as y vs. x, a correlation matrix typically gives the correlation between a pair only once.

Simple Description of Effects

We observed in Figure 1.6 that height and weight are positively correlated and this relationship holds true for women as well as men. If we want to specify these relationships m* a quantitative way, we could list the average weight of men in one-inch height increments, and do the same for women. At the opposite extreme, we could assert that each additional inch of height is accompanied by an average of four additional pounds for both men and women, but that women of any given height weigh 25 pounds less, on the average, than men of the same height.  If the relationships among height, weight, and gender were adequately summarized by the preceding assertion, we would call the relationship linear and additive. woman's height is accompanied by the same average increment in weight, reationship between height and weight for either sex is graphed as a straight ine: it is linear. Furthermore, for a man and a woman of the same height, the an's weight, on the average, is asserted to be 25 pounds greater tha ' n the nches tall or are both 72 inches tall. gender is additive. hile linear and additive descriptions of relationships are simple and intuitive, they are not necessarily right. Looking at the same data, one could instead assert that weight should increase with the cube of height, since height is a linear measure and weight a volume measure. Careful analysis of the data might reveal that this nonlinear relationship provides a better description.

Time Series

When time is the only independent variable, the time series reveals the values of some dependent variable over time.

The graph reveals two characteristics of this time series:
1- There is a pronounced seasonal pattern: sales shoot up in December and then decline precipitously in January and February, before returning to a more normal level for the rest of the year.

2- There is a steady upward trend: the December peaks are higher each year, and even the January-February troughs are generally higher; in between, the level also tends to increase with time.

Trends and Seasonals. How can we capture the trend and seasonal effects graphically and display them more vividly? One simple device is to lay out a twelvemonth scale on the horizontal axis and display individual line graphs for each year. This is done in Figure 1.8, which shows that each year's data moved higher on the graph, and that there was a pronounced seasonal pattern within each year. Would this display help you forecast retail sales for December 1988? Why or why not?

Multiple Time Series. Often, when we are trying to understand the behavior of one time series, we introduce another time series to serve as an independent variable. In the retail sales example, we could introduce advertising expenditures over time as an independent variable to help explain how retail sales change over time. We might hypothesize that the stores' advertising expenditures generate sales. Figure 1.9 shows both retail sales and advertising as time series, with different scales for the two series.

This graph shows that:
- Advertising expenditures constituted roughly 1% of retail sales dollars for the entire period.
- The seasonal pattern of advertising expenditures closely mirrors that of retail sales, except that advertising expenditures build up to the December peak more gradually, increasing more in October and November and less in December, than sales do.
Deseasonalization. Because so much of the month-to-month fluctuation in retail sales is due to purely seasonal effects, it is common to report (and think about) sales on a deseasonalized basis. The fact that January sales are below the previous December's is not in itself cause for despair; what you would really like to know is the trend in deseasonalized sales. We shall learn later how to take seasonality into account in forecasting; for now, all you need to know is that retail sales and many other time series that exhibit strong seasonal effects are reported both in natural and in deseasonalized form. Figure 1.10 is a graph of deseasonalized sales. From that graph, you can more easily detect trends in the data.