+ All documents
Home > Documents > Application of European Union Agriculture Policy Estimation of Agriculture Wages

Application of European Union Agriculture Policy Estimation of Agriculture Wages

Date post: 03-Dec-2023
Category:
Upload: independent
View: 0 times
Download: 0 times
Share this document with a friend
81
1 Application of European Union Agriculture Policy Estimation of Agriculture Wages Asmina Simota M2015364 Kristen Scott M20015382 Mara Reis M2014158
Transcript

1

Application of European Union Agriculture Policy

Estimation of Agriculture Wages

Asmina Simota M2015364

Kristen Scott M20015382

Mara Reis M2014158

2

I. Introduction The last decades European Union recognized the need of support to agriculture wages.

This sector is important for two main reasons. Firstly, it produces necessary survival

products and secondly, from a historical view, it is the first human sector of activity. The

purpose of this project is to identify the attributes that most affect this unit and to create

some policy proposals. For better comprehension of the problem that we deal with, I will

include a section that explains briefly the special characteristics of this sector and the

European Union’s adaptation policy.

A.1 Specialties of Agriculture sector

Nowadays, even though the agriculture production increases, the sector’s unemployment

follows in the same way. Moreover, its share in total productions seems to decrease. For

example, the tables below are from the annual report from Eurostat in year 2013:

In the first table we can see that the percentage of people working in this sectors has

decreased over these years. *

Commented [as1]: Na grapsw kati gia etos pout ah xrisimoisw tha de kai gia regression

3

The second table which is part of an excel file, makes obvious the decreasing

participation of the agriculture sector in the Total GDP**

As we mentioned economists have already identified the problem and they highlight the

following

Inelastic demand in price terms

This means that as the production of products increases the percentage of the prices

tends to decrease more, in order to absorb the added product. As a result of this

progress the average income of the sector decreases. For instance, machinery and

technologic development, which push up the production, finally will affect the incomes.

Inelastic demand in terms of income

Additionally, this means that as the income of households increases, families tend to

spend less percentage of their money for agriculture products. And of course the more

developed the country the bigger that effect.

The increasing demand for food marketing services

As countries develop, there is a growing preference for products which have been

passing through more pre-processing. For example, semi cooked meals, frozen fruits and

salads are some of these products. As a result, we have a boost in the secondary sector.

Land

4

The total supply of land is limited. First with the growing need for built up areas in there

is decrease in available land amount. For example, parks, streets, hospital, schools and

apartments are only some categories of increasing built of land areas. Secondly, because

all of these procedures are time consuming, the total supply of land has small elasticity.

Employment

We should notice that agriculture production belongs mainly to family units. That means

that it is hard to identify workers with the required skills. In the most cases children

work or help in the unit without salary or with a small amount of salary

Another important issue is the low movement inside agriculture activity. In simple

terms, if someone owns a farm it’s very difficult to change his base as he needs to

change the place of the whole farm.

Capital (landed, exploitation, financial)

Landed: irrigation systems and land available

Exploitation: tractors, machines and animals. Animals here are included in the capital

as it is obvious.

Financial: financial supply for investments and developments. In the majority of the

countries there are executive banks undertaking this responsibility

We believe this information is enough for someone to understand the problem and the

next steps. Deeper economic analysis can be done in next steps.

A.2 Common agriculture Policy in EU***

The common agriculture policy (CAP) is the agriculture policy of the EU. It implements a

system of agriculture subsidies and other programs.

The main purpose of this ‘’organization’’ is:

1. To increase productivity and ensure the optimal use of the production factors.

2. To ensure a fair standard of living for agriculture community

3. To stabilize markets

4. To secure availability of supplies

5. To provide consumers with food at reasonable prices

Commented [O2]: Can you explain this to me?

Commented [O3]: Is this a note to us that we are removing?

5

The cap was born in 1962.Several policies have been applied since then. Some of the

most important applications were:

1992- The CAP shifts from market support to producer support. Price support is

scaled down, and replaced with direct aid payments to farmers. They are encouraged to

be more environmentally-friendly.

1970’s- Supply management. Farms are so productive that they are producing more

food than is needed. The surpluses are stored and lead to ‘food mountains’. Specific

measures are put in place to align production with market needs.

2000- The CAP centers on Rural Development. The CAP puts more focus on the

economic, social and cultural development of rural Europe.

2003- A CAP reform cuts the link between subsidies and production. Farmers are

more market oriented and, in view of the specific constraints on European agriculture,

they receive an income aid. In exchange, they have to respect strict food safety,

environmental and animal welfare standards.

2011- A new CAP reform seeks to strengthen the economic and ecological

competitiveness of the agricultural sector, to promote innovation, to combat climate

change and to support employment and growth in rural areas.

* http://ec.europa.eu/agriculture/statistics/agricultural/2013/pdf/c5-1-351_en.pdf

** http://www.gapminder.org/data/

*** http://ec.europa.eu/agriculture/50-years-of-cap/files/history/history_book_lr_en.pdf

6

II. Analysis

(Descriptive)Multivariate linear Regression0-OLS

The purpose of this section is to describe in the way the components of agriculture affect

wage. A good analysis in this section not only can give a clear view of the situation but

also can be a potential tool for a good policy application. The base year for this analysis

will be 2007.This choice made based on two main factors. First, the content of the tables

at this year is very rich. The least statistics required to deal with missing values the

better the model. Also, and most important, this year is interesting from an economic

view, as it is when Europe started to have the first symptoms of economic crisis.

The countries that we will use are the EU-27(those that had entered Union by 2007)

1.Austria 7. Estonia 13. Italy 19. Netherlands 25. Sweden

2.Belgium 8. France 14. Ireland 20. Poland 26. Rumania

3.Bulgaria 9. Finland 15. Latvia 21. Portugal 27.U.Kingdom

4.Cyprus 10. German 16. Lithuania 22. Slovenia

5.Czech Re. 11. Greece 17. Luxemburg 23. Slovakia

6.Denmark 12. Hungary 18. Malta 24. Spain

Variables Selection****

After research into economic paths, the results of which were described briefly in the

introduction, we identified the following variables as important for analysis. We will

then check goodness of fit and decide which of them we will keep.

Crop Production Cereal annual

Crop Production Pulses annual

Root crop Production annual

Raw milk production in 1ooolt annual

These variables are enough for our purpose. Production in agriculture can be estimated

by 3 categories of crops, such as cereal, pulses and roots. These are the majority of the

production. Plus, we included raw milk production, which we converted from liters to

kilograms for the analysis. Regarding slaughtering, we believe it’s important but we did

not locate a table specified by each species category. Therefore, if regression shows a

positive coefficient we will have not have the ability to recommend specific policy. Meat

production consists of many species

Wage-Income (target variable) in millions

Commented [O4]: ?

7

The average income is not an ideal variable in this case, because we don’t know the

amount of workers in this sector so we can’t fit it with the other variables. The reason

for that, again, is that the shape of individual farm is more family oriented and workers

are often children. Of course we have the percentages, but this also only represents the

adults, who probably own the farm, or pay some insurance coverage. So, in this case we

found it mandatory to use another point of view, which is the Gross Value of Agriculture,

simply, is the amount of total produced, multiplied by basic producer’s price less the

taxes and all intermediate consumption, but with subsidies. We choose this because we

want to know the real values a producer takes, and then with regression identify the

reasons for variations. Also we kept subsidies in the value because Europe already

applies this subsidy policy so they have an effect on our situation. About the question,

how do we know that producer sells all this production? The answer is that usually

governments absorb the rest of supply. Major impactful political situations such as the

Russian Strawberries embargo in 7/8/2014 cannot predicted.

Livestock Cattle

Livestock Pig

Livestock sheep

Livestock goats

Agriculture land in %

The Livestock variables, as we initially mentioned they belong to the landing and

exploitation capital. Also, they can describe in a satisfying amount the slaughtering

variable which we did not include. Agriculture land was converted to square kilometers.

Total R&D in Euros

R&D is useful for two main reasons. First, it is one of CAPs targets especially in the next

years. Its effect can be very important. Also, as we don’t have a mechanism equipment

variable, we assume that technology exists inside this variable. We should not forget

that technology initially is always a products of research.

% Female agriculture workers

% Male agriculture workers

It is useful to have a separation in women and men and see the results of regression.

Price of diesel oil per 100lt

These prices are significant as they are the main cost for agriculture production. Fuel

and especially diesel can greatly affect income. As we mentioned the income from

agriculture is very sensitive to changes in cost and small enough in relation with the

price that consumers pay for the products. For example, fruits can pass through 3

Commented [as5]: Ligo akoma explenationj Kai mipos bro kai kamia metavliti akoma Isws mixaniko eksoplismo..einai ta pososta swsta

8

different mongers before appear in supermarket shelves. So changes in attributes such

as fuel can have serious impact. Also I will use 1oolt amount to balance it with other

prices.

Consumption of manufacture fertilizer Potassium tones ->Kg

Consumption of manufacture fertilizer Phosphorus tones->Kg

Consumption of manufacture fertilizer Nitrogen tones->Kg

Fertilizers are very interesting this case study. We can see them affect wages from two

different ways. First, they may be the most important component for productivity. A

huge amount of production comes from their usage. Secondly, their cost is very high for

the producer. We can see here that we have impact from opposite directions. Positive

impact from usage, negative from cost. So it is reasonable to use the square of the values

in order to describe the decreasing positive effect of these variables. Tons were

converted to Kg because we want uniformity to our dataset.

Selling price for soft wheat Rice per 100kg

Selling price per 100kg potatoes

Selling prices per 100kg raw milk

Selling price per 100kg cattle

Selling prices per 100kg sheep

Selling price per 100kg pig

Of course selling prices are our main interest. In the animal category sheep and goats

tend to have the same price, so we will adapt for both in the same way. For pigs and

cattles we have high differences so we can’t ignore their individual price. Regarding

crops, of course it is very hard to find data on the production of each plant and also the

price. Thus, we will use some prices as representatives of our categories. We noticed

that with small disturbances all tend to have the main price. Also, our purpose is not to

analyze demand among products, but annual wages. So potatoes can satisfy root

production and wheat general crops.

Direct Taxes on income %of GDP

Expenditure in Education % of GDP

It’s very hard to define the taxes that real a producer pays, as they are separated in

direct and indirect. Thus, we will use the obvious the direct income tax that everybody

pays. Additionally, in education we couldn’t find specific numbers for agriculture

expenditure but we believe the total percentage expenditure in education is a valuable

variable in our research. We will change both of them to real numbers, by multiplying

with the GDP

9

One another thing that we would like to mention.is that we didn’t include at all

biological production or luxury products. We believe that both these cases deserve to be

analyzed as special categories. For example, today, goji berry is a well-known super food

with a per kilo price of more than 100 euros in a supermarket. Here we don’t have at all

an inelastic demand. It is a special category of ’’pharmaceutical’’ food. So is out of our

interest. Also, biological products are also a category based on health interests and are

mainly consumed by wealthier people due to higher costs. In summary, in this project

we focus on the basic agriculture products. Those which are useful as industry input but

also ready for consumption. Those which cover the majority of agriculture GDP.

Furthermore, we would like to note that we will not use, at least in this part, the GDP

even in Total or Agriculture sector. After deep consideration, we believe that an

indicator such as GDP is already represented by our variables, is already an index of

wealth and a summarization of the situation. We want to explore alone all these

attributes so we excluded it. Also, we want to avoid multicolinearity in the equation and

redundancy.

Also, consumption was considered for being very important but we rejected it. The

reason is that even though consumption is the main target of every producer, policies of

the European Union will likely not affect it. We know that Governments just absorb the

rest of production. The main goal of this project is to identify policy agents to push up

wages.

Additionally, the index price of rent and land was very useful. but more than 50% of

values were missed. So with so few observations it was very dangerous to deal with this

and keep it. Also after consideration, we know that in the majority land is used for

family owners so these rents are not so interesting. Yes of course it could be very useful

if we knew. For example, if we had after regression an extreme high correlation with

index of rent & land, that could mean a subsidy per hectare. But there is too high risk of

inconsistency with the current data.

Finally, we would like to notice that in this project we focus only on harvesting and

animal production. So, fishery and forestry variables are excluded.

Of course the diversity of variables could be more rich, but these were the most

important found in available data. Also, after research we do feel we have captured the

most affective

Variables that we would like to have and didn’t find are the:

Industrial equipment in use

Poultry Livestock or Production

10

Final Variables for part 1 (with correcting values to metric to improve the model)

Production Crop Production Cereals, per 100Kg harvest CereCrop

Crop Production Pulses, per 100Kg harvest PulsCrop

Root Crop Production , per 100Kg harvest RootCrop

Raw milk Production 100Kg RawMIlk

Livestock Cattles ,thousands heads LVcattle

Pigs ,thousands heads LVpig

Goats ,thousands heads LVgoat

Sheep, thousands heads LVsheep

Income Gross Value in millions Inc

Stock Agriculture land km^2 ALand

R&D Total R&D in Euros RD

Fertilizers Potassium, per 100 Kg square FPot

Phosphorus, in Kg square Fphos

Nitrogen, Kg square Fnitro

Selling Prices Of Soft Wheat per 100kg PrWheat

Of Potatoes 10okg PrPotat

Cattle 100kg PrCattle

Sheep 100kg PrSheep

Pig 100kg PrPig

Per raw Milk 100 Kg PrMilk

Others % Female agriculture Workers FemalePr

% Male agriculture Workers MalePr

Price of diesel oil per 100lt Prdiesel

Taxes in Euros Tax

Education Expenditure in Euros Educ

Agriculture land in km^2 Land

**** http://www.gapminder.org

http://ec.europa.eu/eurostat

http://data.worldbank.org

11

1.Deal with Missing values

For the purpose of this section SAS miner will be used. The data sets we use are mainly

from Eurostat but they contain some missing values that we have to deal with. This is a

very important step because we don’t have a lot of observations, so the decision of each

missing value can be crucial. There are several methods for replacing values. One of

them is the manual way, based mainly on user’s experience. We will use this method,

because even though we explore only 2007, we have several tables from other years

which can help us to forecast the current value. Also for those without values at all, we

will use the similar variables replace methods, manually. That means that for countries

without elements we will find similar countries and adapt their values. For the last part

the main comparison tool will be the agriculture GDP. Of course after all changes, we

will check again the statistics. High changes especially in standard deviation and mean

denote probably a mistake and uncertainty.

Missing Values

Statistics –with missing values

Commented [as6]: Change Milk production to Kg Consumption apo tones se kilso square root

12

The table above is a good start to take a view of our variables. We can see the mean,

standard deviation, skewness and our missing values number.

Statistics-without missing values

Comparison Statistics Table

Mean Standard Deviation

NonMissing Missing NonMissing Missing

CereCrop 2116.87 2116.87 2658.79 2658.79

Educ 23876.82 24429.11 35474.34 36058.29

FemalePr 5.248 5.9 6.48 6.8

Fnitro 4279093 4556125 5723445 6023222

Fphos 549074.1 557000 741330.7 785764.9

Fpot 1234444 1230292 1708561 1799359

Inc 5736.35 8507.17 5736.35 8507.17

LVpig 5909.981 5909.98 7731.72 7731.72

LVsheep 3522.884 3653.82 6318.57 6406.23

Land 72867.96 72867.96 92117.39 92117.39

LVcattle 3312.278 3312.78 4526.89 4526.89

LVgoat 494.24 570.13 1074.79 1150.76

MalePr 7.38 7.36 5.45 5.69

Prcattle 222.35 211.89 75.43 77.12

13

PrMilk 31.86 32.01 4.41 4.14

PrPig 121.44 115.28 39.43 32.76

PrPotat 24.88 24.88 9.18 9.18

Prsheep 115.99 122.01 114.69 120.26

PrwWheat 19.14 19.14 3.56 3.56

Prdiesel 95.10 95.11 21.05 21.05

PulsCrop 47.58 47.58 72.86 72.86

RD 8490.17 8490.17 14736.32 14736.32

Rawmilk 5299.44 58350.77 72092.21 72711.88

RootCrop 154.3 154.3 215.10 215.10

Taxes 63178.63 63178.63 96843.31 96843.31

We can see that basic statistics (mean and standard deviation) remain the same. That

means that the replacement of missing values was successful so we can continue with

the analysis.

2.Estimation Method

Before we continue to our main section, we should introduce our analysis methods. We

decided that we want to identify a line which connects all the important components of

agriculture Income. We have cross sectional data, which have no special disturbances. So

we feel comfortable to hope that this process is not impossible. As estimation method we

choose the OLS. If all the assumptions of OLS are unviolated then the estimator is BLUE

means that is also the best estimator among all (not necessarily linear). With least error.

3. Model building Now that we are done with missing values we can continue to our main analysis. To

repeat the purpose of this project is to identify the best linear relationship between the

agriculture Income and some predictors. Of course we cannot keep all of the variables

that we introduced. This happened to have a flexibility in our model, in case we need

suddenly one of them. Also, we don’t know the descriptive ability of each variable for the

beginning and we should be flexible to choose among the best of them.

The basis for this part well be the OLS regression assumptions, which briefly are:

1.liner in parameters

2.no perfect colinearity

3.zero conditional mean Estimator is Blue Blue and equal to ML

14

4.homoscedasticity

5.uncortrelated errors

6.normality of errors

Correlation matrix (drop inaccurate & meaningful variables)

We start with the second assumptions, because it is useful to drop “bad” variables from

the model and make it more effective. Colinearity can happen in two ways. First with

high correlation among independent variables. Secondly with a near linear relationship

between Y and X. This can be potentially bad when we try to explain the relation of each

independent variable with the dependent, because makes hard to separate the effects.

In this part, we will focus to the relationship between independent variables, by creating

the Pearson correlation Matrix in Sas Miner.

Pearson Correlation Matrix

15

Generally, the minimum price of accepting correlation is the 0.8. In the table above we

have several values more than this level. Specifically, the problem comes mainly from

the amount of production, education, agriculture land and research. Prices tend to be

uncorrelated in satisfying degree. Fertilizers have some small issues. In regression

analysis usually this happened when you have a lot of variables of the same measure.

Production

Let’s skip Land at this moment, as it will ultimately be excluded from the model, and

focus only to the relevant variables. We can see high positive correlation. This means

that if the production of cereal crops increases then also the other increases. This is a

very useful conclusion for deciding the method that we will use. One solution is using

the sum of all the variables. But this is forbidden because each one has a different price

so it is impossible to find the appropriate corresponding. So, we keep the cereal crop and

because of high correlation I can assume that the effect is the same from the others also.

Taxes

16

Taxes is one of the variables that we will reject. It has high correlation with a great

amount of the others, also, after consideration is not a meaningful variable. When we

choose variables for regression except statistics we should also think the importance as

researchers. Let’s assume that taxes estimator shows negative relationship with income.

Then EU should probably give a subsidy in percentage of total taxes. This is forbidden.

Taxes are the government’s “income”. Then the subsidy helps the government and not

the worker. Also, if we support taxes then easily a government can increase the amount

just to take advantage of this situation.

On the opposite view let’s assume a positive relationship with the estimator. This means

that the more taxes you have the better income. Probably this is an effect of tax per

amount of production. But really this is still not an issue for EU but for individual

governments.

Livestock

We can see a huge correlation between livestock of cattle with several variables. Of

course, It is very hard to define the reasons for that, especially when you are not a

participant in the business. For example, we can see 0.91 correlation between education

and livestock cattle, which does not have an obvious reason. On the other hand, there

are also some obvious effects such as 0.93 correlation between cattle and raw milk. The

more cattle you have the more milk production you get. Also the positive correlation

with taxes is probably an effect of some restricted policy. European union many times

notice the environmental protection policy that it wants to apply, and we know that

production is one major component.

About sheep and goats, we didn’t notice an important high correlation but we know that

they don’t have a large share in the European agriculture economy. So we can keep only

one, such as goats, which are more independent and try to give more space to pigs

whose production have huge impact in Total terms.

17

Finally, in the same way as crops it is impossible because of the price to combine these

elements. We keep goats and pigs.

Fertilizers

Here the situation is different. we don’t have price indexes so we can try the sum and

maybe after a log transformation to fit the values to the model.

High correlation with taxes again likely comes from some environmental policy.

Agriculture employments

Obviously we can’t keep both these variables as they are mutually exclusive and thus

colinear. So if we want we can continue with males, which are the main employers in

agriculture. But, again the main purpose here is not to explore male or female

components. It is to identify support policy. So at least for the basic linear regression we

exclude both of them.

Education and Research

Education and research are highly correlated. This is expected because one effects the

other. The higher education you have the more research you get. Probably people with

more education maybe masters, graduate’s degrees and PhDs are more willing to

discover new patterns that make the economy more productive. But we also considered

that higher education is not a highly important component in agriculture. The skills of

this work is specific, requiring feeding the animals and harvesting the land. It’s not a

business where if you have a masters then you will be a director with increasing wage.

Therefore, we believe that research is the key point. Approaches and technologies to

improve production. For this reason, I decided to use only Research.

Land

18

After consideration we decided to exclude land from the analysis. Firstly, it is highly

correlated with important variables. And this is expected as we know that the problem

of land space is one of major components of poverty. But let’s assume that we solve this

problem and we include it in regression. If the coefficient is high and positive, then if

land increases, wages increases also. This project purposes to identify good policies for

support agriculture income. That means that in this case, EU should find solution to

increase Land share. This is impossible for many reasons. 1)Land is not so easily

available. It is used also for housing, manufacturing, forests, museums.2) It is already

shared. Proprietary rights are respected.3) Employers can’t move easily. Agriculture

workers, usually work in a specific location. They have there their land, farms and etc.

So it’s very hard to give them land in another place to work.

Conversely if we have negative coefficient than we are talking about demand of

harvesting fallow. We believe that the coefficient will be positive. But we should check it

in practice before dropping this variable. We will run a first regression with the

variables remaining after this section and will examine the effect of land. If its small and

positive, then there is no reason to keep it after this high correlation.

Regression results

19

Let me for this moment, not go deeper in this variables decision. We will do that in the

next section. Now we focus just on Land. Of, course we have a positive estimate

coefficient 0.0774. If space of land increases at 1 km^2 then Income (gross value) will

increase at 7.7%. To be more accurate we will include also the test of significance.

Land histogram

The table below shows a normal distribution with a small skewness in the right. Let’s

test the Ho=0 hypothesis for 5% level of significance.

Ho: b3 =0 tcr =1.77093

H1: b3 ≠0 t=3.55

| tcr| < t, so the Ho is rejected and b3 is important. In simple terms our b3 estimator

can be generalized to the population.

However, given that the variable is highly correlated with other variables and that the

coefficient is small and positive as expected, we feel it is appropriate to exclude it.

Final variables table

Cereal production CereCrop

Livestock pig LVpig

Livestock goat LVgoat

Milk price Prmilk

Price pig Prpig

Price goat Prgoat

Price wheat Prwheat

20

Price diesel Prdiesel

Raw milk Rawmilk

Research & Development RD

Education Educ

Total fertilizers Fert

Before continuing, we want to notice that the reason that we don’t use transformations

such as log or squares to solve correlation problem, is that we want to drop some

variables. At this part we have 20 variables to explain the problem. We don’t want to

keep all of them because it will make the analysis confusing. So we used this approach of

correlation checking not only to clean the model but also identify which are the

appropriate variables to use. Some, such as Research & Development, education we will

keep even though they are high correlated for research reasons. For example, the

exclusion of crops production for the project is something that numbers allow but our

knowledge does not. So we keep them, and will try to fix this, with transformations, for

better goodness of fit.

Scatterplot Matrix (Predicted Vs finally predictors)

21

We entered at this part a scatterplot matrix with the variables that we

finally used. We want to identify any wrong values, probably outliers that

will make our model lose explanatory ability. We are lucky we can see in

green columns that there are no extremely concerning cases. We don’t

forget that we don’t have a lot of observations so an exclusion of price will

have a huge impact. This would also a problem in case of a validation tests.

We did not identify or exclude any outliers.

22

23

4.Model building

In this part we are starting to build our model. That means trials of different

relationships and forms of the variables to decide which are the best. Notice here, we

are trying to choose the better model. Probably we cannot avoid heteroscedasticity just

with transformations. But we can build a very good model, with correct signs in

estimators, a satisfying R, small errors and good generalization ability. Then we can

improve it by adding or dropping variables and fixing heteroscedasticity with some

methods.

Because this part can be really chaotic, we should consider carefully what is the question

that we want to answer for this research. Chaotic means, that there are hundreds of

combinations that could be done. So to keep that under control, we start by thinking

seriously what we want to learn from this model, and then get into more technical

factors such as R, errors, normality and the other linear regression assumptions.

24

When we try to build a model we have two things under consideration before we start

going deeply in the analysis:

Dependent variable should be continuous and normal distributed. Ideally, also the

independents.

Researcher’s experience about the topic.

Consideration Procedure

In this model we have to deal with two problems. First the majority of the histograms of

our variables have a right skewness. Log transformations are appropriate to solve this

problem. But, as researchers we anticipate that there are some patterns that we should

follow. Because the second consideration is just a guess, we will give respect to

mathematical assumptions of regression. Ultimately, we cannot avoid several trials until

we identify the correct model (trials are in the Appendix of the project). Then test of

significance can tell as if our model has good generalization ability.

Final Model after histograms based transformation

Final linear Relationship

logIncome=β0+β1*logCerecrop +β2*logPrWheat+β3*logRawmilk+β4*Prmilk^3+

β5*logLVgoat+ β6*logPrgoat+β7*sqrtLVpig+β8*sqrtPrpig+β9*logRD+

β10*sqtPrdiesel+β11*logFert

Transformations

Variables Model

CereCrop Log

Prwheat Log

Rawmilk Log

Prmilk X^3

LVgoat Log

LVpig Sqrt

Prgoat Log

Prpig Sqrt

RD Log

Prdiesel Sqrt

Fert log

Income(target) log

25

All the transformations are based on:

Log transformation of strong right skewness

Square root to weaken right skewness

Cube form for left skewness

For all the variables we tried to achieve normality, especially for the dependent in

case for some tests in next steps.

Histograms

26

27

28

29

30

Now what we should do is to check if any of the linear OLS regression assumptions are

violated.

Linear regression Assumptions

1. Linearity

We should ensure that the model of the dependent variable is a linear combination of all

the independents. There is no a straight forward test for this part but we can check it in

terms of misspecification. We can ensure that no nonlinear functions of the

independent variable should be significant when added to the model. For the

purpose of this section we will use the Ramsey Reset Test. The idea behind this model is

to test how significant are the estimators of quadratic forms for the model.

Notice that two regressions are estimated where the second is a version of the first, with

squared fitted values obtained from the first regression. Note that the squared fitted

values introduce the non-linearity into the specification.

𝐼𝑛𝑐𝑜𝑚𝑒 = 𝛾0 + 𝛾1𝐼𝑛𝑐𝑜𝑚𝑒̂ + 𝛾2𝐼𝑛𝑐𝑜𝑚𝑒2̂ + 𝛾3𝐼𝑛𝑐𝑜𝑚𝑒3̂

Sas output Ramsey Test

We will test for functional form with a t –test of the γ2 for the

The null hypothesis that the correct specification is linear.

The alternative hypothesis is the correct specification is non-linear.

Hypothesis:

Ho: γ2=0 tcr=2.1603

Η1: γ2≠0 |t|=0.07

tcr > |t| we accept the null hypothesis, the estimator is insignificant and we accept the

linearity of the first model.

31

2. No perfect Colinearity

Moderate multicolinearity may not be problematic. However, it can be a problem

because it can increase the variance of the coefficient estimates and make the estimates

sensitive to changes. The coefficient estimators are unstable and difficult to interpret.

Multicolinearity saps the power of the analysis. can cause the coefficients to switch

signs, and make it more difficult to specify the correct model.

For the purpose of this section we will use variance inflation factor(VIF), which indicates

the extent to which multicolinearity is present in a regression. It measures how much

the variance of regression coefficients are inflated as compared to when predictors are

not linearly correlated. A VIF of 5 or greater is a reason to be concerned about this.

We can see that Rawmilk, RD and FERT are highly correlated with A VIF of 18.36, 11.73 and 6.36.

32

We have to recall the correlation matrix to check again the relationships.

Correlation Matrix

Solution choices:

Remove highly correlated predictors Combine variables with ratios

Run different regression if nothing works Standardized predictors

In our case we cannot use combinations because the correlation is probably a spurious one and the variables are not related. For example, Research is highly connected with

Rawmilk. A possible reason that can explain this is that strong research could develop ways to exploit all the ingredients of Milk without wasting anything. For example, when

you make cheese you drop the ‘water’ left, which actually can make soft cheese with

some techniques. This possible explanation but we still identify the relationship as

spurious.

Consideration procedure

When we deal with a regression we should always remind ourselves the history and the question that we want to answer. In this project we want to create a policy to support

increased agriculture Income. In case of Rawmilk if we detect a positive high estimator probably the explanation is that production supports the income and so we should press

for more production. But milk production in farms usually passes through a strong

process in industry to take the appropriate form. Whole milk, cheese, skim milk, yoghurt. So we can’t easily control this relationship as in harvesting products or

animals. We cannot press industries to produce amounts they don’t need. On the other hand, Research is another important variable that shows the need for development and

33

new ways of production, which we believe is very significant in the 21st century given

strong competition. So we keep that variable.

Regarding Fertilizers, we tried to create a new variable; fertilizers used per Crop

Production, but unfortunately we had a high VIF in CereCrop. So, rather, we just exclude

the Rawmilk.

After running the new regression line we get:

Fert is still a little be high but we can ignore it.

Note, we found interesting the standardized approach, but chose go by this way as it is

not well known * **

34

* https://www3.nd.edu/~rwilliam/stats1/x92.pdf

** http://blog.minitab.com/blog/adventures-in-statistics/what-are-the-effects-of-multicollinearity-and-

when-can-i-ignore-them Very fast we check again the linearity assumption with Ramsey Reset test to ensure that everything remain the same.

Hypothesis:

Ho: γ2=0 tcr=2.1788

Η1: γ2≠0 |t|=0.28

tcr > |t| we accept the null hypothesis, the estimator is insignificant and we accept the

linearity of the first model.

3. Zero conditional Mean

This is a mandatory assumption to hold the unbiasedness of the estimators. The error term has zero conditional mean, meaning that the average error is zero at any specific

value of the independent variables. Simply, the error does not depend linearity or

nonlineararily on x. This is assumption is maybe the most serious in cross sectional data but the problem is that there is no a way to test it. Violation of this assumption means

that we have a systematic error among real population when we collect the data. So this can make our estimators biased and our model unable to predict. Because OlS is based

on that assumption we have just to accept that it is true.

In simple terms:

𝑬(𝒖|𝒙𝟏, 𝒙𝟐, … , 𝒙𝒌) =0

4. Normality of error

This assumption is very important in our case as we don’t have a lot of observations so

the central limit theorem is does not exist. Non normality of the errors will have some

impact on the precise p-values of the tests on coefficients etc. But if the distribution is not too grossly non-normal, the tests will still provide good approximations

35

Because we haven’t a lot of observations we will use the Shapiro-Wilk test, and the Q-Q

plot to have an optical view of the results.

Shapiro-Wilk test:

The basic idea behind the Shapiro-Wilk test is to estimate the variance of the sample in

two ways: (1) the regression line in the QQ-Plot allows us to estimate the variance, and (2) the variance of the sample can also be regarded as an estimator of the population

variance. Both estimated values should approximately equal in the case of a normal distribution.

We want to check if 𝑟~𝑁(𝜇0, 𝜎2)

Hypothesis: Ho: Wo>Wa follows normal distribution H1: Wo<Wa, no follows a normal distribution

Pvalue=0.585 > 0.05, so we can not reject the null hypothesis, we can assume Normality

36

QQ-plot

The QQ plot show us very clearly that the distribution is Normal. Even though it’s not

the perfect line we cannot identify no skewness, heavy or light tailed neither a binomial distribution. We accept normality.

Distribution of residuals

37

We can see also in the distribution plot that the residuals have a very very small left skewness but of course we can completely ignore that. Normality is accepted again.

5. Homoscedastic

The assumption of homoscedasticity (same variance of residuals) is very central. It describes the situation that the error term is the same across all values of the

independent variables. The violation of this assumption is called Heteroscedasticity and

is crucial. In simple terms OLS estimator try to minimize the error giving equal weight

to all estimators if heteroscedasticity assumption holds then this min error doesn’t come from all the variables so it’s very hard to identify where the error comes from. Ideally

the variance of the errors should be constant and equal for all observations. For the purpose of this section we will use plots (residuals versus predicted variable) to take a

first view and then Breusch-Pagan test and White test (package XLstat XL).

What we want: Var(ut)=σ

2

Residuals Vs fitted predicted variables

38

We see that the pattern of the data points is getting a little narrower towards the left

end, which is an indication of mild heteroscedasticity. We cannot identify a specific circular or pineal schema. It’s seems a little thinner in the middle but the general idea is

that variables are spread in the whole diagram .We will also conduct some tests.

White test: creates all the squares of independent variables and all the cross products. Run a regression of residuals. The problem with white test is that it can reject the null

hypothesis(homoscedasticity) not only because of no constant variance but also for misspecification. This is especially when your regression includes a lot of variables. So

we will check also Pagan’s test.

Breusch-Pagan: try to identify linear forms of heteroscedasticity. White’s test is actually

a special more relaxed case of that.

In the tables above we can see that both out tests accept the null hypothesis, meaning

that our model is homoscedastic. So we can continue with the other assumptions.

In case it homoscedasticity was violated, we would use the following tests: Weighted least square regression

Generalized linear regression

Pagan Test:

Run auxiliary Regression:

�̂�𝑖2 = 𝛾0 + 𝛾1𝐼𝑛𝑐𝑜𝑚𝑒̂

39

Hypothesis:

Ho: Var(𝜀𝑖)=σ^2, Homoscedasticity

Ηο: Var(𝜀𝑖)=σi^2 , for at least on i of residuals

χcr=19.6751 χ-val=𝑛𝑅2=12.744

χcr > χ-val, we don’t reject Ho and we accept Homoscedasticity

Output for Pagan test:

White Test:

Our initial linear equation: logIncome=β0+β1*logCerecrop +β2*logPrWheat+β3*logRawmilk+β4*Prmilk^3+

β5*logLVgoat+ β6*logPrgoat+β7*sqrtLVpig+β8*sqrtPrpig+β9*logRD+

β10*sqtPrdiesel+β11*logFert

40

�̂�𝑖2 =logIncome+ (all square variables) + (all cross products)

Hypothesis:

Ho: σi=σ, for all i=1,…n

H1: σi≠σ, for at least one residual for i=1,…n

p-va=1 p-v > a we don’t reject Ho so Homoscedasticity Holds a=0.05

Output White test:

6. Uncorrelated errors (no Autocorrelation)

Autocorrelation is mainly a problem in time series data, it comes for systematic errors in measurement or misspecification. For cross section data there are several opinions.

First, that observations are in the form of ID therefore the errors are in ID and we have no issue. The second issue is autocorrelation can come as a misspecification. Usually

spatial correlation. For example, in our case livestock of goat has a relationship with

livestock pig for each country and then their errors would not be independent. Because

we want to be as sure as possible we will assume that the second opinion is most

correct. All the assumptions of regression are based in misspecification. So under

41

contradiction induction if all the other assumptions hold, then they hold for

misspecification and for the assumption of uncorrelated errors.

General we want: 𝑪𝒐𝒓(𝒖𝒊, 𝒖𝒋) = 𝟎 , for i≠j

Consequences if Assumption is not Held:

The OLS estimate are still unbiased and consistent. OLS are inefficient so no longer Blue The estimated variances of the regression coefficients will be biased and

inconsistent, and therefore hypothesis testing is no longer valid. In most

of the cases, the R^2 will be overestimated and the t-statistics will tend to be higher.

7. Residual Sas output plots

42

1.IV. Interpretation of results

After we check all the assumption and noticed that the model we recommend is correctly

specified. (Meaning that all the OLS assumptions are proved so we are assured of the

quality of the estimation) we can pass is interpretation. We want to go deeper to the

model. Check the overall statistics, R-squared, overall F-test and Mean square error;

analyse estimators’ relationship with the dependent variable and check for their

significance. In case that an estimator is not significant meaning we reject the null

hypothesis, we can drop it from the model if we choose.

Sas Output

43

R^2

The R^2 or coefficient of determination is a number that indicates the proportion of

variance explained from a regression model. In simple terms, how well our model fits

the data. The higher this number the better (except zero and one, which means non

availability the one and multicolinearity the other). In our case R^2 is 97.2% which is a

very satisfying fitting. Of course R^2 is not enough alone to present a good model, that

is why we did first the analysis above.

R^2 adjusted

Adjusted R^2 in contrast with R^2 explains the variation explained by only those

independent variables that in reality affect the dependent variable. In contrast with R^2

which only increases with added explanatory variables, this coefficient can decrease

when a predictor enhances the model less than what is predicted. As in the first case the

higher this variable the better (no zero or one). In our case is it is 0.96% which is again

a great number.

Overall F-test

The F-test evaluates the null hypothesis that all regression coefficients are equal to zero

versus the alternative that at least one does not. A significant F-test indicates that the

observed R-squared is reliable, and is not a spurious result of oddities in the data set.

Thus, the F-test determines whether the proposed relationship between the response

variable and the set of predictors is statistically reliable.

Hypothesis:

44

H0: β1 = β2 = ... = βp-1 = 0 , (fit of intercept only model and ours equal)

H1: βj ≠ 0, for at least one value of j

F = (RSSH − RSS)/ (p − 1)/ ((RSS/ (n − p)) ∼ Fp−1, n−p,

F value=56.65

Fcri=2.494291

Fv > Fcr , reject the null hypothesis, means that my model provides

better fit than the Intercept-only model

Root MSE

The RMSE is the square root of the variance of the residuals. It indicates the absolute fit

of the model to the data–how close the observed data points are to the model’s predicted

values. We can imagine it as the standard deviation of the errors. Lower values of RMSE

indicate better fit. RMSE is a good measure of how accurately the model predicts the

response variable. The best measure of that is up to the researcher. In our case is

0.15179. which is a good number and we can accept it.

Tests of significance

After the general idea of fit we should now focus on individual estimators. At this part

we want to check which of our estimators have good generalization ability to the

population also.

45

CereCrop

Hypothesis:

Ho: b2 =0 tcr = 2.119905

H1: b2 ≠0 t=2.4

| tcr| < t, so the Ho is rejected and b2 is important. In simple terms our b2 estimator

can generalized to the population.

PrWheat

Hypothesis:

Ho: b3 =0 tcr = 2.119905

46

H1: b3 ≠0 t=0.59

| tcr| >t, so the Ho is accepted. In simple terms b3 estimator has not so power to

explain the Income of agriculture.

RD

Hypothesis:

Ho: b4 =0 tcr = 2.1199055.67

H1: b4 ≠0 t=5.67

| tcr| >t, so the Ho is rejected and b4 is significant.

PrMilk

Hypothesis:

Ho: b5 =0 tcr = 2.12

H1: b5 ≠0 t= -1.16

| tcr| >t, so the Ho is accepted.

Lvgoat

Hypothesis:

Ho: b6 =0 tcr = 2.12

H1: b6≠0 t= -5.23

| tcr| >t, so the Ho is rejected and the b6 estimator is significant

Prgoat

Hypothesis:

Ho: b7 =0 tcr = 2.12

47

H1: b7≠0 t= 1.84

| tcr| >t, so the Ho is accepted.

LVpig

Hypothesis:

Ho: b8 =0 tcr = 2.12

H1: b8≠0 t= 1.10

| tcr| >t, so the Ho is accepted.

Prpig

Hypothesis:

Ho: b9 =0 tcr = 2.12

H1: b9≠0 t= -0.65

| tcr| >t, so the Ho is accepted.

Prdiesel

Hypothesis:

Ho: b10=0 tcr = 2.12

H1: b10≠0 t= -0.11

| tcr| >t, so the Ho is accepted.

Fert

Hypothesis:

Ho: b11 =0 tcr = 2.12

48

H1: b11≠0 t= -0.11

| tcr| >t, so the Ho is accepted.

Even though we accepted the null hypothesis for many variables. This does not concern

us so much, because those with great Impact in our Income appeared significant.

Interpretation of Significant variables

In this part we close our research with the final step. We will try to recommend a policy

plan based on the conclusions of our model.

We recall that our target variable Income is under log transformation.

CereCrop: it is under of log transformation to address skewness. =0.21

As both variables are under log transformation we can say that 1% increase in Crops

Production will increase the Income in EU at 22%. We don’t forget that our Income is

expressed in terms of profit fixed without taxes costs and including subsidies. But this

does not affect our results because our research is focus in how increase the total wealth

in all the countries. So we are free to express in terms of profits. Also for the reason that

we already referred about the specific policy for agriculture products.

Proposal Cereal Crops

As this variable is very important to agriculture wages EU should try to support this

sector somehow. We realize that a push up of the production is not an easy idea,

probably because we cannot ensure the consumption from the customer. Thus, we

realized that EU could try to increase consumption with specific policy, such as by

putting these kinds of products in militaries or schools. Or, probably should give a

support subsidy to those who are in this kind of activity per amount of production.

Lvgoat: it is under of log transformation to build skewness. =0.31

For the same reasons as previous we can say that 1% increase in livestock of goats will

increase Income by 31%. The thing with these kind of animals is that they participate in

two kind of markets. First their milk goes to dairy product factories, but also their meat

to supermarkets. As a recommendation in this part we have a straight forward subsidy

to these works and a restriction on meat slaughtering of goats. Then the subsidy can go

49

to meat sellers probably for supporting them with the decreasing demand. Of course this

is a speculation because we don’t know the price elasticity of this product.

RD: it is under of log transformation to build skewness. =0.43

Finally, we have that a 1% increase in Research & Development will increase Income

43%. What we get here is that EU should spend more time considering ways to improve

productivity through development. Supporting farms with machinery, or new kind of

fertilizers for increased efficiency. But, research gives the flexibility to play also with the

cost. New technology or techniques can be discovered, which decreases very satisfyingly

the cost of the producer.

Conclusions

Under the model that we built, we realized that our initial opinion that prices are too

important is not true. We understand of course, that all of the prices probably come

from the speciality of this sector and some in elasticities that it has. Thus EU should be

production oriented. And after all we see that Research and development is the most

important component to put under strong consideration.

II. Appendix- Model Trials

.

1. Basic Linear Model Income =β0 +β1*CereCrop+β2*Prwheat+β3*Rawmilk+β4*Prmilk+

β5*LVgoat+β6*Prgoat+β7*LVpig+β8*Prpig+β9*RD+β10*Prdiesel+β11*Fert

50

Residuals scatterplots (basic regression line)

51

2. Log –linear Model (except Prmilk, Prdiesel and Square Fert Income =β0+ β1*logCerecrop+β2*logPrWheat+β3*logRawmilk+β4*Prmilk+

β5*logLVgoat+β6*logPrgoat+β7*logLVpig+β8*logPrpig+β9*logRD+

β10*Prdiesel+β11*sqFert

52

Residuals Table

53

3. Log –linear Model (except Prmilk, Prdiesel) Income

=β0+β1*logCerecrop+β2*logPrWheat+β3*logRawmilk+β4*Prmilk+β5*logLVgoat+

β6*logPrgoat+β7*logLVpig+β8*logPrpig+β9*logRD+β10*Prdiesel+β11*logFert

54

4. Log –linear Model (except Prmilk, Prdiesel and Square Prwheat) Income =β0+β1*logCerecrop +β2*sqPrWheat+β3*logRawmilk+

β4*Prmilk+β5*logLVgoat+ β6*logPrgoat+β7*logLVpig+β8*logPrpig+

β9*logRD+β10*Prdiesel+β11*logFert

It makes it worse. So rejected transformation

55

5. Square Root –linear Model Income =β0 +β1*sqrCereCrop+β2*sqrPrWheat+β3*sqρRawmilk+β4*sqrPrmilk+

β5*sqrLVgoat+ β6*sqrPrgoat+β7*sqrLVpig+β8*sqrPrpig+β9*sqrRD+

β10*sqrPrdiesel+β11*sqrFert

This model is problematic in terms of some estimators. We can see again Diesel price

positive. But, we should notice the great in terms of fertilizers.

We meet again the problematic residual table for Prwheat but now also for LVgoat

and PrGoat

56

57

6. Square Root –linear Model (except LVgoat, PrGoat, Prwheat,

Prdiesel) Income =β0+β1*CereCrop +β2*Prwheat+β3*sqRawmilk+β4*Prmilk+β5*LVgoat+

β6*Prgoat+β7*sqLVpig+β8*sqPrpig+β9*sqRD+β10*Prdiesel+β11*sqFert

No. This model is not a good approach. We can see good variables losing their

abilities. We have increased heteroscedasticity.

58

59

7. Square–linear Model Income =β0+β1*CereCrop +β2*sqPrWheat+β3*sqRawmilk+β4*sqPrmilk+

β5*sqLVgoat+ β6*sqPrgoat+β7*sqLVpig+β8*sqPrpig+β9*sqRD+

β10*sqPrdiesel+β11*sqFert

In the same way as the previous examples, this model is not appropriate

60

61

8. X^3–linear Model Income

=β0+β1*CereCrop^3+β2*sqPrWheat^3+*β3*sqRawmilk^3+β4*sqPrmilk^3+

β5*sqLVgoat^3+ β6*sqPrgoat^3+β7*sqLVpig^3+β8*sqPrpig^3+β9*sqRD^3+

β10*sqPrdiesel^3+β11*sqFert^3

We can see that this model works good not best in PrMilk variable. Generally, it is

not a good model.

62

63

The next section models are focus on transformations to dependent variable

64

9. logY–linear Model logIncome =β0+β1*CereCrop +β2*Prwheat+β3*Rawmilk+β4*Prmilk+β5*LVgoat+

β6*Prgoat+β7*LVpig+β8*Prpig+β9*RD+β10*Prdiesel+β11*Fert

65

66

10.Y^2–linear Model

Income^2=β0+β1*CereCrop +β2*Prwheat+*β3*Rawmilk+β4*Prmilk+β5*LVgoat+

β6*Prgoat+β7*LVpig+β8*Prpig+β9*RD+β10*Prdiesel+β11*Fert

It is a little bit worse than the previous.

67

68

In the next section we will start to test different combination for

each variable.

Mix Models The models that examined are the following

Transformations

Variables Model 11 Model12 Model13 Model14 Model15

CereCrop Log Sqrt Log Sqrt Sqrt

Prwheat Log log Log log log

Rawmilk log Sqrt log Sqrt Sqrt

Prmilk None Log X^3 Log Log

LVgoat Log Log Log Log Log

LVpig Log Log Log Log Log

Prgoat Log Log Log Log Log

Prpig Log Log Log Log Log

RD Log Sqrt sqrt Sqrt Sqrt

Prdiesel Log Sqrt Log Sqrt Sqrt

Fert log log log log log

Income(target) None None log log sqrt

.

11. Mix 11 –linear Model .

Income=β0+β1logCereCrop +β2*logPrWheat+β3*logRawmilk+β4*Prmilk^3+

β5*logLVgoat+ β6*logPrgoat+β7*logLVpig+β8*logPrpig+β9*sqrRD+

β10*logPrdiesel+β11*logFert

69

70

71

12. Mix 12 –linear Model

.

Income=β0+β1sqrCereCrop +β2*logPrWheat+β3*sqrRawmilk+sqr*Prmilk^3+

β5*logLVgoat+ β6*logPrgoat+β7*logLVpig+β8*logPrpig+β9*sqrRD+

β10*sqrPrdiesel+β11*logFert

72

73

13. Mix 13 –linear Model

logIncome=β0+β1logCereCrop +β2*logPrWheat+β3*logRawmilk+β4*Prmilk^3+

β5*logLVgoat+ β6*logPrgoat+β7*logLVpig+β8*logPrpig+β9*sqrRD+

β10*logPrdiesel+β11*logFert

74

75

14. Mix 14 –linear Model

logIncome=β0+β1sqrCereCrop +β2*logPrWheat+β3*sqrRawmilk+sqr*Prmilk^3+

β5*logLVgoat+ β6*logPrgoat+β7*logLVpig+β8*logPrpig+β9*sqrRD+

β10*sqrPrdiesel+β11*logFert

76

77

15. Mix 15 –linear Model

.

sqrIncome=β0+β1sqrCereCrop +β2*logPrWheat+β3*sqrRawmilk+sqr*Prmilk^3+

β5*logLVgoat+ β6*logPrgoat+β7*logLVpig+β8*logPrpig+β9*sqrRD+

β10*sqrPrdiesel+β11*logFert

78

79

4. References sites that we trusted (opinions and theory)

Tools

Sas Miner

Sas 9.3

Xlstat

Data collection

http://www.worldbank.org

http://ec.europa.eu/eurostat

http://www.gapminder.org/data/

Assumptions of OLS

https://www.ecu.edu/cs-dhs/bios/upload/SAS_Regression-2012.pdf

http://www.ats.ucla.edu/stat/sas/webbooks/reg/chapter2/sasreg2.htm

https://www.uvm.edu/~wgibson/Classes/200f09/Technical_notes/Hausman.pdf

http://nationalekonomi.hannes.se/regression-analysis/assumptions

http://www.statisticssolutions.com/homoscedasticity/

https://www3.nd.edu/~rwilliam/stats2/l25.pdf

http://www.lexjansen.com/wuss/2006/posters/POS-Ayyangar.pdf

https://onlinecourses.science.psu.edu/stat501/node/347

http://docs.statwing.com/interpreting-residual-plots-to-improve-your-regression/#x-

unbalanced-header

https://en.wikipedia.org/wiki/Linearity

https://www.ine.pt/revstat/pdf/rs160105.pdf

http://stats.stackexchange.com/questions/55888/zero-conditional-mean-assumption

http://docs.statwing.com/interpreting-residual-plots-to-improve-your-regression/

https://jrvargas.files.wordpress.com/2011/01/wooldridge_j-

_2002_econometric_analysis_of_cross_section_and_panel_data.pdf

80

https://www3.nd.edu/~rwilliam/stats1/x92.pdf

http://www.statistics4u.info/fundstat_eng/ee_shapiro_wilk_test.html

http://www.stat.purdue.edu/~tqin/system101/method/QQplot_sas.htm

http://blog.minitab.com/blog/adventures-in-statistics/what-are-the-effects-of-

multicollinearity-and- when-can-i-ignore-them

Interpretation

http://blog.minitab.com/blog/adventures-in-statistics/how-to-interpret-regression-

analysis-results-p-values-and-coefficients

http://blogs.sas.com/content/iml/2013/06/12/interpret-residual-fit-spread-plot.html

http://blog.minitab.com/blog/adventures-in-statistics/what-is-the-f-test-of-overall-

significance-in-regression-analysis

http://www.geosci-model-dev.net/7/1247/2014/gmd-7-1247-2014.pdf

http://muscle.ucsd.edu/More_HTML/papers/pdf/Lieber_JOR_1990.pdf

http://www.reed.edu/economics/course_pages/red_spots/testing_hypotheses.htm

Agriculture Policy

http://ec.europa.eu/agriculture/statistics/agricultural/2013/pdf/c5-1-351_en.pdf

http://www.gapminder.org/data/ http://ec.europa.eu/agriculture/50-years-of-cap/files/history/history_book_lr_en.pdf

81


Recommended