联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2022-11-09 09:02

ARCL0103: Spatial Statistics, Network Analysis and Human History

This assessment is relatively short (less than 1000 words, excluding captions, code and bibli-

ography) and asks you to calculate Kvamme’s gain statistic for the predictive model that you

constructed in in practical 4.

1 Background

You will recall that in practical 4 you used multivariate logistic regression to create a predictive

model for a synthetic data set. Visual inspection suggested that the model predictions were

quite accurate, but your task for this assessment is to explore the validity of that model further.

One way of assessing the validity of a model is via a testing sample that was withheld from

the model-building process. The basic idea is to establish how many of the observed sites from

the testing sample fall within the area where sites are predicted to be found. For example, if

16 out of 25 observed sites fall in the area where sites are predicted, then the model can be

expressed as correctly predicting site location 64% of the time. In reality, however, matters are

not quite so simple, for two main reasons:

Prediction is probabilistic Very few, if any, models predict site occurrence with absolute

certainty of presence or absence. Consequently it usually only makes sense to talk about

the model correctly predicting site presence at some specified probability, p, between 0.0

and 1.0. Models tend be more accurate at low probabilities and less accurate at high

probabilities.

Non-sites matter Often it is possible to specify a probability for the occurrence of sites that

is so low that all observed sites do actually fall within the area where sites are predicted,

in other words, so that the model is 100% accurate. However, the corollary is usually that

a large number of non-sites also fall in the area where sites are predicted, so the model is

very inaccurate at predicting the lack of archaeological sites. This would clearly be very

undesirable if the purpose of the model was, for example, to identify a route for a new

road that minimised the damage to archaeological sites.

Clearly then, it is important to consider the accuracy of a model with reference to the

problem at hand. One method that facilitates this is the production of cumulative percent

correct prediction curves for both sites and non-sites (Kvamme 1988; Conolly and Lake 2006).

Figure 1 shows just such a graph, in which the number of sites falling in areas where they

are predicted decreases as the probability of site occurrence increases, while the number of

correct non-sites increases as the probability of site occurrence increases. In this case, if it was

important to avoid damaging sites then one might choose to avoid areas with even a relatively

low probability of site occurrence. However, a further complication which arises at this point is

that the relevant area is likely to be so large (since these are cumulative probabilities the area

in question includes all locations with a low probability or greater) as to render the prediction

Practical Assignment 1a Kvamme’s Gain Statistic 2

virtually worthless. There are at least two solutions to this dilemma. One is to pay attention

to the trade-off between correctly predicting site and non-site locations, while another is to

examine the predictive gain offered by the model (Kvamme 1988: 329) defines the gain, G, as


(1)

where S is the % of the total area where sites are predicted and O is the % of observed sites

within area where they are predicted. G, which is calculated for a specified probability of site

occurrence, ranges from 1 (high predictive utility) through 0 (no predictive utility) to -1 (the

model predicts the reverse of what it is supposed to).

Kvamme developed this measure for archaeological site prediction, and there are other ap-

proaches for the purpose of evaluating binary classification models that are used in other subject

areas and occasionally in archaeology (e.g. receiver operating characteristic curves, F1 scores

or Matthews correlation coefficients). The most important property of Kvamme’s measure is

that it can distinguish a correct but relatively worthless model from an ostensibly less correct

but more useful one. For example, a model that correctly predicts 80% of sites and predicts site

occurrence over 70% of the landscape is probably not very useful, which is reflected in the low

gain of 0.13. On the other hand, a model that correctly predicts 70% of sites and predicts site

occurrence over a mere 5% of the landscape would provide a better basis for many decisions,

which is reflected in the gain of 0.93. Further suggestions for testing predictive models can be

found in (Kvamme 1988).

2 Tasks

The tasks you must complete for this assignment are as follows:

1. Calculate Kvamme’s gain statistic for the model you created in week 4. Calculate the

gain statistic for at least 4 relative probability thresholds. This will require that you:

Use raster map algebra in R (see the end of practical 4 for an example or two) to

calculate four new binary maps, each showing where sites are predicted at a given

probability threshold and above in the map relprob. This can be achieved using

logical map algebra similar to that used to create the dummy variables.

Use the extract() and hist() functions from the R ‘raster’ package to obtain tabular

data that will enable you to calculate the % of the total area where sites are predicted

at the relevant probability threshold and above. If you are struggling with this or

the previous step then I am happy to offer some hints.

2. Describe how you calculated the gain statistic, including details of any R commands and

statistical operations, and report the results. You should provide appropriate maps and

other graphical plots to illustrate the results.

3. Briefly explain how Kvamme’s gain statistic may help evaluate the utility of a predictive

model and comment on your results in the light of this discussion.

Just to emphasise that you are not supposed to build a new model (except as extra bonus

work if you have the spare words and inclination) but rather to work with the one you created

in practical 4 and saved as ’relprob.asc’. So when you wish to do the practical assignment load

the locations data and relprob data into a new R session with the vec() and rast() commands

respectively, and then consider how to create a new map from relprob which shows only those

site probabilities over 0.9, such as with the following map algebra in R:

Practical Assignment 1a Kvamme’s Gain Statistic 3

prob09 <- relprob >= 0.9

Note also that there are several ways to get the number of cells above 0.9 and the number

below from this data, such as by looking at the ’count’ section produced by the following:

hist(prob09, plot=FALSE)

In terms of the number of sites and non-sites falling on 0.9 or above probabilities (or not),

you can extract this as follows:

extract(prob09, locations)

The resulting values in the ’slope’ column are 1 where a site has been predicted by the prob09

surface and 0 where it has not. However, remember your ’locations’ data include both sites and

not-sites, so you might get a better picture from gluing your observed sites and non-sites column

from locations and your extracted predictions as follows:

ObsandPredicted09 <- cbind(locations$value, extract(prob09, locations)$slope)

colnames(ObsandPredicted09) <- c("Observed","Predicted")

ObsandPredicted09

A combination of this kind of output and the hist() output from your four binary maps

should be enough to complete the practical.

3 Allocation of marks

Marks will be allocated as follows:-

30% for correct completion of task 1;

35% for your answer to 2;

35% for your answer to 3.

4 References

Conolly, J. and M. Lake 2006. Geographical Information Systems in Archaeology, Cambridge:

Cambridge University Press.

Kvamme, K.L. 1988. Development and testing of quantitative models, In W.J. Judge and L.

Sebastian (eds.) Quantifying the Present and Predicting the Past: Theory, Method, and

Application of Archaeological Predictive Modeling : 325-428. Denver: U.S. Department of

the Interior, Bureau of Land Management.

Practical Assignment 1a Kvamme’s Gain Statistic 4

Figure 1: Cumulative percent correct predictions for model sites and non-sites for all probabilities

of occurrence. Reproduced from Kvamme 1988: fig 8.11B


相关文章

版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp