Spatial analysis can be used to address questions of site and activity structure. A number of quantitative methods are commonly used in the investigation of spatial patterning at archaeological sites (e.g., Hodder and Ortin 1976; Kintigh and Ammerman 1982; Whallon 1984; also see Kintigh 1990 for a comparison of the most common methods). Data requirements for such analyses vary depending on the method to be used. Applications of nearest neighbor (e.g., Hodder and Ortin 1976) and K-means (e.g., Kintigh and Ammerman 1982) analyses require point provenience data while methods such as unconstrained clustering (e.g., Whallon 1984) can use grid count data (note that unconstrained clustering can also be conducted on point provenience data). The use of spatial analysis methods for archaeological applications is best conducted through the use of several techniques (e.g., Rigaud and Simek 1991; Simek 1987). These methods include the visual inspection of artifact and feature distributions, the application of quantitative methods of cluster analysis and determination if statistically significant differences are evident in assemblage content between clusters.

Artifact distribution maps can be constructed using Surfer or other similar software programs. Artifact class frequencies are tabulated by excavation unit and level. From this data, distribution maps can then be constructed. Such maps can be viewed as heuristic devises that allow one to visually examine the distribution of artifacts across a site (e.g., Jermann and Dunnell 1979). These maps can be constructed using two dimensions (i.e., northing and easting coordinates) or in the examination of three dimensional point proveniences (i.e., adding the vertical dimension).

In most applications of spatial analysis for CRM based projects, the use of grid count data to examine spatial relationships is the most appropriate method. Perhaps the most well known method is unconstrained cluster analysis. In unconstrained cluster analysis, one seeks to group areas of a site with respect to proportional artifact class composition (Whallon 1984). While Whallon originally developed the method for use with point provenience data, he (Whallon 1984:245) notes that the method can be applied to grid count data. Kintigh (1990:194-196) also demonstrates the use of unconstrained clustering using grid count data and suggests that, in many cases, the use of grid count data for unconstrained clustering actually produces better (i.e., more easily interpretable results) than when used with point provenience data. An in depth discussion of the specifics of the method are presented in Whallon (1984; also see Gregg et. al. 1991 and Kintigh 1990 for additional discussions and application). A brief summary is presented here. This discussion assumes that the data is collected as raw counts (i.e., artifact counts per level per excavation unit). Different levels can be examined using this method that will allow the examination of temporal differences in site use.

Counts of artifact classes are calculated for each excavation unit and then transformed into relative percentages. These data are then used in a cluster analysis to combine the data points into groups that tend to be homogeneous with respect to relative densities. A variety of cluster algorithms potentially could be used for the cluster analysis. Once the clusters have been identified, a map can be constructed that shows each unit and its cluster assignment. These can then be inspected for spatial integrity. The clusters, and their spatial relationships, can then be summarized by size, shape, and composition. The use of unconstrained clustering is “well suited for the examination of activity based and many other behavioral/depositional models” (Kintigh 1990:197). For this type of analysis to be appropriate, the excavation of contiguous units (i.e., block excavation) is required.

As conducted by Cultural Resource Analysts, Inc., unconstrained cluster analysis relies on the K-means clustering algorithm for the determination of clusters (for a more detailed treatment of the K-means method see Doran and Hodson 1975; Kintigh 1990; Kintigh and Ammerman 1982; Simek 1987). K-means is a nonhierarchical, divisive clustering technique that attempts to minimize the intra-cluster variances while maximizing the inter-cluster distances (Kintigh and Ammerman 1982:39). Unlike hierarchical methods (e.g., nearest neighbor, average linkage, Ward’s method), nonhierarchical methods avoid problems of “chaining” and artificial boundaries and work on the original input data rather than on a similarity matrix (Doran and Hodson 1975:180-184; Kintigh and Ammerman 1982:39,48). The K-means method works by searching for cluster formations that minimize the global Sum of the Squared Error (SSE) where SSE is defined as the total of the squared distances between the cluster’s centroid and each of its members (measured in Euclidean distance). The basic process is

- Partition the data into K initial clusters
- Proceed through the data and assign items to the cluster whose centroid is nearest, then recalculate the centroids for each cluster and determine if any item is actually closer to the centroid of another cluster; if so, reassign the item to that cluster and recalculate the centroids
- Repeat step 2 until no further reassignments take place (Kintigh and Ammerman 1982:39-41; Johnson and Wichern 1992:597)

A plot of the SSE versus the number of clusters can be used to determine where optimal clustering is occurring. A drop in the SSE curve at a specific cluster number, compared to the surrounding solutions, identifies an optimal solution (cluster). It should be noted that more than one solution might be seen as “optimal,” in these cases, there is an indication of spatial patterning at several levels of detail. Once “optimal” solutions are found, more in depth analyses can be conducted to further examine these “optimal” solutions.