The traditional econometric approach of linking fuel usage t

2018-10-29

The traditional econometric approach of linking fuel usage to factors that could influence it employs economic theory to derive a model for statistical regression. However, econometric approach may fail to capture all factors that influence rural household\'s choices in regards to cooking fuel. Although the approach in this paper may not be able to fully describe the decision process behind rural household\'s fuel choices either, it may complement previous methods. Many household surveys collect an extensive number of variables which are used in predicting household fuel choices. Although the inclusion of these variables in models of household\'s fuel use would not necessarily increase the accuracy of interpretation, heavy dependence on factors previously not included may lead to alternative interpretations and hypotheses. Furthermore, what influences a household\'s willingness to adopt LPG may be due to a number of factors and involve complex interplay between them, possibly leading to non-linear relations. For these reasons, previous approaches may benefit from being complemented by alternative methods from the field of machine learning. Instead of assuming certain relationship forms, properties are detected in the data and are then generalized and used for prediction. A relatively new algorithm from the field of machine learning, Random Forests (Breiman, 2001), has been successfully used in a variety of fields including alzheimer\'s association (Cutler et al., 2007), gene selection (Díaz-Uriarte and De Andres, 2006) and criminology (Berk et al., 2009). The Random Forest algorithm, as proposed by Breiman (2001), displays many properties that make it suitable for exploratory data analysis. In addition to being one of the most efficient classifiers across many different data types and being able to handle complex relationships, including unknown interactions, it also provides measurements of variable importance, i.e. the impact that these variables have on the classification. This will be further explained in Section 3.1. Random Forest is a non-parametric method, in the sense that no assumptions on the form of the relationship between the response, in the present case fuel usage, and the explanatory variables, i.e. income and education, are needed. The results can then be used as guidance for constructing parametric models which may be compared to Random Forest as a benchmark (Strobl et al., 2009a) or suggest further research.
Data The paper is based on data originally collected for the evaluation of the Rural Electrification Program in Vietnam. $alzheimer\$ The evaluation was undertaken jointly by the World Bank and the Institute of Sociology (IOS) of the Vietnam Institute of Social Sciences. The evaluation was initiated in 2001 and household level data was collected from 42 rural communes in seven provinces at 3 time points, in 2002, 2005 and 2008. Provinces included in the study were drawn from six out of eight regions of Vietnam – from the southern tip to the northernmost mountains. Both the IOS (2009) and the World Bank (Khandker et al., 2009) have released reports based on Retrovirus data. Neither of these reports focused on household fuel usage for cooking. Of the 42 participating communes, 22 were in the process of being electrified as part of the electrification program, 13 were not part of the electrification program and seven were already electrified (IOS, 2009). The chosen arrangement enabled comparison of the development in the different kinds of communes in the original studies based on this data (Khandker et al., 2009; IOS, 2009). From Vietnam\'s eight regions, six were considered for sampling; the remaining two were not part of the studied electrification program. In a stratified sampling approach communes were chosen in a following way: one commune that has already been electrified, three communes that were part of the electrification program and two communes that were assumed to receive no electrification by the end of the study (Khandker et al., 2009). In each commune 30 households were sampled, stratified according to income category: ten of the poorest, ten middle income households and ten of the richest households. The sampling can thus be considered representative for most of Vietnam, except the excluded regions; the Red River delta and the North East region.