Assigning clients to nearest warehouse (in R)

In this post I provide a coding example of how a group of customers can be assigned to one warehouse each, considering a set of fixed warehouses with unlimited capacity. The underlying assumption is that there are no fixed costs and that costs only depend on the euclidean distance between customer and warehouse. Furthermore, no lead time requirements or other service level related constrains are considered in this problem.

The algorithm is very simple and reminds one of clustering algorithms. It loops through all customers and assigns each customer to the closest warehouse, considering euclidean distance and the latitude-longitude system. Below I define this algorithm as a function:

# function for calculating euclidean distances
euclidean_distance <- function(vc,df){

# function for assigning customers to warehouses
assignment_algorithm <- function(customers,warehouses){
  return_df <-,ncol=3))
  for(i in 1:nrow(customers)){
    return_df[i,] <- c(customers[i,1],customers[i,2],which.min(euclidean_distance(customers[i,],warehouses)))

To test I first build two sets, with randomly located customers and warehouses respectively.

customer_df <-,ncol=2))
colnames(customer_df) <- c("lat","long")
warehouse_df <-,ncol=2))
colnames(warehouse_df) <- c("lat","long")

customer_df[,c(1,2)] <- cbind(runif(n=1000,min=-90,max=90),runif(n=1000,min=-180,max=180))
warehouse_df[,c(1,2)] <- cbind(runif(n=4,min=-90,max=90),runif(n=4,min=-180,max=180))

Below the header of the customer location dataframe:

##         lat      long
## 1 -35.42042 -33.68156
## 2 -50.63025 -64.52526
## 3  43.71663 -36.22302
## 4 -53.30511 135.56315
## 5 -46.32125  84.83210
## 6  83.85849 -60.70374

Below the header of the warehouse location dataframe:

##          lat      long
## 1 -41.007642  118.5673
## 2  81.968627  116.1495
## 3  11.971601  103.5034
## 4  -6.619224 -103.6206

Now I assign customers to warehouses:

# apply function
results_df <- assignment_algorithm(customer_df,warehouse_df)
# display header of result
##         lat      long warehouses
## 1 -35.42042 -33.68156          4
## 2 -50.63025 -64.52526          4
## 3  43.71663 -36.22302          4
## 4 -53.30511 135.56315          1
## 5 -46.32125  84.83210          1
## 6  83.85849 -60.70374          4

In addition, I visualize the results in ggplot2:

ggplot(data = results_df) + 
  geom_point(mapping = aes(x=lat,y=long,color=as.character(warehouses))) +
  scale_color_manual(values=c("darkblue","darkgreen","darkred","yellow")) +
  xlim(-90,90) + ylim(-180,180)

The warehouses are located as follows:

ggplot(data = warehouse_df) + geom_point(mapping = aes(x=lat,y=long)) + xlim(-90,90) + ylim(-180,180)

In another post I show how to locate a warehouse at the center of mass, I at the center of customer demand: Single warehouse problem – Locating warehouse at center of mass (center of mass calculation in R)

I have also written posts on how to divide a group of customers into several smaller clusters, based on spatial proximity. This approach can e.g. be used for locating multiple warehouses at each their center of mass in R.

You May Also Like

Leave a Reply


Sadat says:

Fantastic blog!
I tried using distHaversine() from ‘geosphere’ package to use non-Euclidean distance. It worked great.
Thank you very much.

Hi Sadat. That is great! And thanks for the feedback.

However, I also think it is important to keep in mind the nature of this analysis. For most applications that I have seen, at least when it comes to facility location decision making, I do believe that it does not matter whether it is euclidian distance or haversine distance.

Another user pointed out to me that instead of calculating the center of mass by the weighted euclidean distance mean, I should calculate the geometric mean. Here my repsonse is the same: Both methods are used to deliver a rough ballpark estimate of where my warehouse should be allocated. My final decision will have to depend on many other factors:
– traffic
– routings
– intermodal transport?
– LTL, parcel delivery, FTL? what is the mix?
– which carriers and forwarders are used and how is their pricing implemented? E.g. zone based pricing by FedEx
– inbound or outbound delivery by port (sea)? in that case the port, its service and service fees as well as the freight rates from there will have a huge impact

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.