Thursday, December 17, 2009

Repository

This post is intended as a repository for code and bulky materials. Please don't comment here, as your remarks will be hard to find.

If you have problems, my email name is nastokes and is found at westnet and com and au.

I've heard of occasional problems with downloading R code. I think cut and paste should generally work, but I'm trying to now post copies of files on a document store.

Contents:
Code for GHCN Adjustments Histogram
Code for GHCN stations mean temp per year
Code for KNMI GMST model trends
Code for GHCN v2.mean update
Code for GISS UHI Adjustment
Code for GHCN Temperature Global Trend
Code for GHCN Temperature V1.11
Code for GHCN Temperature V1.4.1


Code for GHCN Adjustments Histogram
#### A program written by Nick Stokes, 13 Dec 2009, to calculate the changes to regression
# slopes caused by adjustments to the GHCN temperatures v2.mean_adj-v2.mean

# A function to calculate regression slope. I hope it is faster than lm()
slope<-function(v,jj){ m=jj-mean(jj) s=(v %*% m)/(m %*% m) s } #####################
# read data from v2.mean and v2.mean_adj, downloaded from http://www1.ncdc.noaa.gov/pub/data/ghcn/v2/
# I edited (emacs) to put a blank between the station number and year, and to change -9999 to NA (add .txt)

# Read in data from the files in matrix form
if(T){ #change to F after you have read in th efiles once
vmean <- matrix(scan("v2.mean.txt", 0, skip=0,na.strings = "NA"), ncol=14, byrow=TRUE) vmean_adj <- matrix(scan("v2.mean_adj.txt", 0, skip=0,na.strings = "NA"), ncol=14, byrow=TRUE) # Now, to save time, move to annual averages
vmean_ann=vmean[,1:3]
vmean_ann[,3]=rowMeans(vmean[,3:14], na.rm = T)
vmean_ann_adj=vmean_adj[,1:3]
vmean_ann_adj[,3]=rowMeans(vmean_adj[,3:14], na.rm = T)
}

# Initialise
vv=rep(0.,200) # regression y vector
jj=rep(0,200) # regression y vector
grad=rep(0.,9999) # gradients (the output result)

len=length(vmean_ann[,1])
jmax=length(vmean_ann_adj[,1])

j=1
k=0
kk=0
m=0
# counters. j is row of adjusted file. m is station counter
# k,kk are local row (year) counter (for station m). k skips NA's, kk doesn't

# loop over all rows in v2.mean
for(i in 1:(len-1)){
kk=kk+1
# to find matching rows, first check diff between stat nos and years
u=vmean_ann_adj[j,]-vmean_ann[i,]
# If the adjusted counter has got ahead of the unadj, wait
if(u[1]<0){ if(j<jmax) j=j+1; u=vmean_ann_adj[j,]-vmean_ann[i,] } # If we have a match, add to regression vec vv[]
if(u[1]==0 & u[2]==0 ){

if(!is.na(u[3])){ # don't add to regression if NA
k=k+1 # local adjusted counter
jj[k]=kk # x for regression
vv[k]=u[3] # discrepancies for regression
}
if(j0){
m=m+1 # m is station counter
grad[m]=slope(vv[1:k],jj[1:k]) # compute regression slope
k=0 # zero local counters
kk=0
}
}
# Now prepare histogram. Comment out jpeg and dev.off() to get screen graphics
jpeg(”GHCNAdjustments.jpg”)
hist(grad[1:m],nclass=200,xlab=”degrees C/decade”,main=”GHCN adjustment change to trend”) # draw histogram
a=c(mean(grad[1:m])); a # Mean slope change
dev.off()


Code for GHCN stations mean temp per year
#### A program written by Nick Stokes, 27/1/10 to plot GHCN temps
## Calculates global ave temp in C for each year


#####################
# read data from v2.mean, downloaded from http://www1.ncdc.noaa.gov/pub/data/ghcn/v2/
# I edited (emacs) to put a blank between the station number and year, and to change -9999 to NA (add .txt)


#  Read in data from the files in matrix form
if(T){   #change T to F after you have read in the files once
vmean <- matrix(scan("v2.mean.txt", 0, skip=0,na.strings = "NA"), ncol=14, byrow=TRUE)
# Now move to annual averages
  vmean_ann=vmean[,1:3]
  vmean_ann[,3]=rowMeans(vmean[,3:14], na.rm = T)/10. 
}

# Initialise
  vv=rep(0.,300)  # Year total differences
  num=rep(0.0000001,300)  # counter for averaging
  yr0=1890   # Start year
  len=length(vmean_ann[,1])

# loop over all rows in v2.mean
  k=0;
  for(i in 1:len){
    u=vmean_ann[i,]
    j=u[2]-yr0
# The next line corrects a bug in the earlier program where j<0 values changed the results
    if(j>0){vv[j]=vv[j]+u[3];num[j]=num[j]+1; }    # Add value to vv
  }
  vv=vv/num  # Averaging
  
# Plot
  jpeg("GHCNcolder.jpg")
  j=1950:2009-yr0
  x=j+yr0
  y=vv[j]
  g=lm(y~x)$coefficients
  plot(x,y,type="l",ylab="C", xlab="Year", col="black", main="Ave GHCN temp")
  lines(x,g[2]*x+g[1],col="red")
  dev.off()


Code for GHCN stations mean temp per year
#### A program written by Nick Stokes, 7/2/10 to collect KNMI model results
#### compute trends, compare with GIStemp and make plots
urls=c(
"http://climexp.knmi.nl/data/itas_cccma_cgcm3_1_sresa1b_0-360E_-90-90N_n_00.dat",
"http://climexp.knmi.nl/data/itas_cccma_cgcm3_1_sresa1b_0-360E_-90-90N_n_01.dat",
"http://climexp.knmi.nl/data/itas_cccma_cgcm3_1_sresa1b_0-360E_-90-90N_n_02.dat",
"http://climexp.knmi.nl/data/itas_cccma_cgcm3_1_sresa1b_0-360E_-90-90N_n_03.dat",
"http://climexp.knmi.nl/data/itas_cccma_cgcm3_1_sresa1b_0-360E_-90-90N_n_04.dat",
"http://climexp.knmi.nl/data/itas_cccma_cgcm3_1_t63_sresa1b_0-360E_-90-90N_n.dat",
"http://climexp.knmi.nl/data/itas_cnrm_cm3_sresa1b_0-360E_-90-90N_n.dat",
"http://climexp.knmi.nl/data/itas_csiro_mk3_0_sresa1b_0-360E_-90-90N_n.dat",
"http://climexp.knmi.nl/data/itas_csiro_mk3_5_sresa1b_0-360E_-90-90N_n.dat",
"http://climexp.knmi.nl/data/itas_gfdl_cm2_0_sresa1b_0-360E_-90-90N_n.dat",
"http://climexp.knmi.nl/data/itas_giss_model_e_h_sresa1b_0-360E_-90-90N_n_00.dat",
"http://climexp.knmi.nl/data/itas_giss_model_e_h_sresa1b_0-360E_-90-90N_n_01.dat",
"http://climexp.knmi.nl/data/itas_giss_model_e_h_sresa1b_0-360E_-90-90N_n_02.dat",
"http://climexp.knmi.nl/data/itas_giss_model_e_r_sresa1b_0-360E_-90-90N_n_00.dat",
"http://climexp.knmi.nl/data/itas_giss_model_e_r_sresa1b_0-360E_-90-90N_n_01.dat",
"http://climexp.knmi.nl/data/itas_giss_model_e_r_sresa1b_0-360E_-90-90N_n_02.dat",
"http://climexp.knmi.nl/data/itas_giss_model_e_r_sresa1b_0-360E_-90-90N_n_03.dat",
"http://climexp.knmi.nl/data/itas_giss_model_e_r_sresa1b_0-360E_-90-90N_n_04.dat",
"http://climexp.knmi.nl/data/itas_ingv_echam4_sresa1b_0-360E_-90-90N_n.dat",
"http://climexp.knmi.nl/data/itas_inmcm3_0_sresa1b_0-360E_-90-90N_n.dat",
"http://climexp.knmi.nl/data/itas_ipsl_cm4_sresa1b_0-360E_-90-90N_n.dat",
"http://climexp.knmi.nl/data/itas_miroc3_2_hires_sresa1b_0-360E_-90-90N_n.dat",
"http://climexp.knmi.nl/data/itas_miub_echo_g_sresa1b_0-360E_-90-90N_n_00.dat",
"http://climexp.knmi.nl/data/itas_miub_echo_g_sresa1b_0-360E_-90-90N_n_01.dat",
"http://climexp.knmi.nl/data/itas_miub_echo_g_sresa1b_0-360E_-90-90N_n_02.dat",
"http://climexp.knmi.nl/data/itas_mpi_echam5_sresa1b_0-360E_-90-90N_n_00.dat",
"http://climexp.knmi.nl/data/itas_mpi_echam5_sresa1b_0-360E_-90-90N_n_01.dat",
"http://climexp.knmi.nl/data/itas_mpi_echam5_sresa1b_0-360E_-90-90N_n_02.dat",
"http://climexp.knmi.nl/data/itas_mpi_echam5_sresa1b_0-360E_-90-90N_n_03.dat",
"http://climexp.knmi.nl/data/itas_mri_cgcm2_3_2a_sresa1b_0-360E_-90-90N_n_00.dat",
"http://climexp.knmi.nl/data/itas_mri_cgcm2_3_2a_sresa1b_0-360E_-90-90N_n_01.dat",
"http://climexp.knmi.nl/data/itas_mri_cgcm2_3_2a_sresa1b_0-360E_-90-90N_n_02.dat",
"http://climexp.knmi.nl/data/itas_mri_cgcm2_3_2a_sresa1b_0-360E_-90-90N_n_03.dat",
"http://climexp.knmi.nl/data/itas_mri_cgcm2_3_2a_sresa1b_0-360E_-90-90N_n_04.dat",
"http://climexp.knmi.nl/data/itas_ncar_ccsm3_0_sresa1b_0-360E_-90-90N_n_00.dat",
"http://climexp.knmi.nl/data/itas_ncar_ccsm3_0_sresa1b_0-360E_-90-90N_n_01.dat",
"http://climexp.knmi.nl/data/itas_ncar_ccsm3_0_sresa1b_0-360E_-90-90N_n_02.dat",
"http://climexp.knmi.nl/data/itas_ncar_ccsm3_0_sresa1b_0-360E_-90-90N_n_03.dat",
"http://climexp.knmi.nl/data/itas_ncar_ccsm3_0_sresa1b_0-360E_-90-90N_n_04.dat",
"http://climexp.knmi.nl/data/itas_ncar_ccsm3_0_sresa1b_0-360E_-90-90N_n_05.dat",
"http://climexp.knmi.nl/data/itas_ncar_ccsm3_0_sresa1b_0-360E_-90-90N_n_06.dat",
"http://climexp.knmi.nl/data/itas_ncar_pcm1_sresa1b_0-360E_-90-90N_n_00.dat",
"http://climexp.knmi.nl/data/itas_ncar_pcm1_sresa1b_0-360E_-90-90N_n_01.dat",
"http://climexp.knmi.nl/data/itas_ukmo_hadcm3_sresa1b_0-360E_-90-90N_n.dat",
"http://climexp.knmi.nl/data/itas_ukmo_hadgem1_sresa1b_0-360E_-90-90N_n.dat")

# These are the names of the files you'll make in the current dir

files=c("cccma_cgcm3_1_00",
"cccma_cgcm3_1_01",
"cccma_cgcm3_1_02",
"cccma_cgcm3_1_03",
"cccma_cgcm3_1_04",
"cccma_cgcm3_1_t63",
"cnrm_cm3",
"csiro_mk3_0",
"csiro_mk3_5",
"gfdl_cm2_0",
"giss_model_e_h_00",
"giss_model_e_h_01",
"giss_model_e_h_02",
"giss_model_e_r_00",
"giss_model_e_r_01",
"giss_model_e_r_02",
"giss_model_e_r_03",
"giss_model_e_r_04",
"ingv_echam4",
"inmcm3_0",
"ipsl_cm4",
"miroc3_2_hires",
"miub_echo_g_00",
"miub_echo_g_01",
"miub_echo_g_02",
"mpi_echam5_00",
"mpi_echam5_01",
"mpi_echam5_02",
"mpi_echam5_03",
"mri_cgcm2_3_2a_00",
"mri_cgcm2_3_2a_01",
"mri_cgcm2_3_2a_02",
"mri_cgcm2_3_2a_03",
"mri_cgcm2_3_2a_04",
"ncar_ccsm3_0_00",
"ncar_ccsm3_0_01",
"ncar_ccsm3_0_02",
"ncar_ccsm3_0_03",
"ncar_ccsm3_0_04",
"ncar_ccsm3_0_05",
"ncar_ccsm3_0_06",
"ncar_pcm1_00",
"ncar_pcm1_01",
"ukmo_hadcm3",
"ukmo_hadgem1")
nums=c(0,1,2,3,4,0,0,0,0,0,0,1,2,0,1,2,3,4,0,0,0,0,0,1,2,0,1,2,3,0,1,2,3,4,0,1,2,3,4,5,6,0,1,0,0,0) #run numbers
nn=c(5,1,1,1,1,1,3,5,1,1,1,1,3,4,5,7,2,1,1,1) #runs per model

n=length(files)
# This is the download step. Note the if(F). Change F to T to download, but then change back to F
for(i in 1:n){
if(F)download.file(urls[i],files[i])
}

# Loop over filenames to download files
if(F){   # Again, you probably only want to do this once. Change to T, then back
nk=109
 ki=1901-v[1,1]+1:nk
s=rep(1:nk); w=numeric(109); k=0
for(i in 1:n){
 k=k+12
 v <- matrix(scan(files[i], 1, skip=3), ncol=13, byrow=TRUE) # Read the table
 s=s+rowSums(v[ki,2:13])  # Sum over months, selecting 1901-2009 rows

 if(nums[i+1]==0){
   w=cbind(w,s/k); nn=c(nn,k); k=0; s=s*0 # make matrix w of annual means per model, averaging
 }
}
v <- matrix(scan("../gistemp.txt", 0, skip=30, nlines=109), ncol=20, byrow=TRUE) # This is a slightly processed gistemp file, with internal headings removed
w=cbind(w,v[,14]*0.01) # to Celcius
}

#Lots of initializations
yt=c(0,4)/100.
yy=c(-1,1); y2=c(1,1);
y8=c(-2,-2,2,2,0,1,1,0)
m=dim(w)[2]   # number models
x=c(0,m)   # number models for x-axis plotting
# yrs are the years you choose to plot. Should include 1980
yrs=c(0:25*2,51)+1950
ky=length(yrs)
# res1 and results collect results over years
result=rep(0,6*ky)
dim(result)=c(ky,6)
res1=result

for(io in 1:ky){
k=yrs[io]:2009-1900

va=rep(0,5)  # variances
dof=va

for(i in 2:m){
z=lm(w[k,i] ~ k)   # Do model regressions
va[1]=vcov(z)[2,2]      # var 1 observs regressions
b=c(z$coefficients[2],2*sqrt(va[1]))  # slope and 2*se
if(i==2)a=b
else a=cbind(a,b) # collect the model slope and se
}

m1=m-1; m2=m-2
mn=mean(a[1,1:m2])*y2
va[2]=mean(a[2,1:m2]^2)/m2   # var 2 model regressions
va[3]=var(a[1,1:m2])/m2   # var 3 mean scatter
va[4]=va[2]+va[3]
va[5]=va[1]+va[4]
se=sqrt(va)
# Welch t-test
va2=va^2
tstat = (mn[1]-b[1])/se
result[io,1:6]=c(yrs[io],2*pnorm(tstat)-1)  # collect t-test results
res1[io,1:5]=c(b[1]-mn[1],2*se[c(1,3,4,5)])  # collect  2*std error


dof[1]=length(k)-1
dof[3]=m2-1
dof[2]=dof[1]*dof[3]
dof[5]=va2[5]/(va2[1]/dof[1]+va2[2]/dof[2]+va2[3]/dof[3])
# end t-test

# Do figs 1-7 for 1980
if(yrs[io]==1980){
# Fig 1 Models + mean with error bars
jpeg("fig1.jpg")
plot(x,yt,xlab="Model type", ylab="trend C/yr", main="Mean trends of 20 models, 1980-2009")
for(i in 1:m2){
lines(c(i,i),a[1,i]+yy*b[2])
lines(i+yy*nn[i]*.1,y2*a[1,i],col="green",lwd=3)
}
lines(x,mn,col="violet",lwd=2)
dev.off()

# Fig 2 Models + mean with error bars and GISS bars
jpeg("fig2.jpg")
plot(x,yt,xlab="Model type", ylab="trend C/yr", main="Trends of 20 models and mean vs GIStemp, 1980-2009")
for(i in 1:m2){
lines(c(i,i),a[1,i]+yy*b[2])
lines(i+yy*nn[i]*.1,y2*a[1,i],col="green",lwd=3)
}
lines(x,mn,col="violet",lwd=2)
polygon(y8[5:8]*m,b[1]+y8[1:4]*se[1],col="pink",density=10)
lines(x,b[1]+y2*b[2],col="red")
lines(x,b[1]-y2*b[2],col="red")
dev.off()

# Fig 3 Model mean vs GIStemp (Lucia's falsification)
jpeg("fig3.jpg")
plot(x,yt,xlab="Model type", ylab="trend C/yr", main="Simple model mean vs GIStemp, 1980-2009")
polygon(y8[5:8]*m,b[1]+y8[1:4]*se[1],col="pink",density=40)
lines(x,mn,col="violet",lwd=2)
lines(x,b[1]+y2*b[2],col="red")
lines(x,b[1]-y2*b[2],col="red")
dev.off()

#Fig 4 Model scatter se vs GIStemp (Douglass)
jpeg("fig4.jpg")
plot(x,yt,xlab="Model type", ylab="trend C/yr", main="Mean trend of 20 models vs GIStemp mean only, 1980-2009")
polygon(y8[5:8]*m,mn+y8[1:4]*se[3],col="green",density=20)
for(i in 1:m2){
points(i,a[1,i],col="green",lwd=3)
}
lines(x,mn,col="violet",lwd=2)
lines(x,b[1]*y2,col="red")
dev.off()

# Fig 5 Model weather noise (whiskers) vs GIStemp
jpeg("fig5.jpg")
plot(x,yt,xlab="Model type", ylab="trend C/yr", main="Model trend errors vs GIStemp, 1980-2009")
polygon(y8[5:8]*m,b[1]+y8[1:4]*se[1],col="pink",density=40)
lines(x,mn,col="violet",lwd=2)
lines(x,b[1]+y2*b[2],col="red")
lines(x,b[1]+c(0,0),col="red")
lines(x,b[1]-y2*b[2],col="red")
for(i in 1:m2){
lines(c(i,i),mn+yy*a[2,i])
}
dev.off()

# Fig 6  Model weather noise band vs GIStemp
jpeg("fig6.jpg")
plot(x,yt,xlab="Model type", ylab="trend C/yr", main="Model trend errors  vs GIStemp, 1980-2009")
polygon(y8[5:8]*m,mn[1]+y8[1:4]*se[2],col="violet",angle=-45,density=20)
polygon(y8[5:8]*m,b[1]+y8[1:4]*se[1],col="pink",density=20)
lines(x,mn,col="violet",lwd=2)
lines(x,b[1]+y2*b[2],col="red")
lines(x,b[1]+c(0,0),col="red")
lines(x,b[1]-y2*b[2],col="red")
for(i in 1:m2){
points(i,a[1,i],col="yellow",lwd=3)
lines(c(i,i),mn+yy*a[2,i])
}
dev.off()


jpeg("fig7.jpg") # Model noise + scatter vs GISTEMP
plot(x,yt,xlab="Model type", ylab="trend C/yr", main="Trends of 20 models vs GIStemp, 1980-2009")
polygon(y8[5:8]*m,mn[1]+y8[1:4]*se[4],col="violet",angle=-45,density=20)
polygon(y8[5:8]*m,b[1]+y8[1:4]*b[2],col="pink",density=40)
lines(x,mn,col="violet",lwd=2)
lines(x,b[1]+y2*b[2],col="red")
lines(x,b[1]+c(0,0),col="red")
lines(x,b[1]-y2*b[2],col="red")
for(i in 1:m2){
lines(c(i,i),a[1,i]+yy*a[2,i])
points(i,a[1,i],col="green",lwd=3)
}
dev.off()
}

} # end io loop over years

# make Fig 8 of various se[1:5]

jpeg("fig8.jpg")
x=yrs[c(1,ky)]
yl=.05*yy
plot(x,yl,xlab="Year", ylab="trend difference C/yr", main="95% error limit of model-GISS trend diff")

g=c("red","navy","sienna","skyblue","green","violet")
gg=c("Mean","Weather noise","Model noise","Model scatter","All noise","All var")

lines(x,yy*0,col="black",lwd=2)
lines(yrs,res1[,1],col=g[1],lwd=2)
for(i in 2:5){
lines(yrs,res1[,1]+res1[,i],col=g[i],lwd=2)
lines(yrs,res1[,1]-res1[,i],col=g[i],lwd=2)
}
x[2]=x[1]+5
for(i in 1:5){
yo=yl[2]*y2*(1-i*.08)
lines(x,yo,col=g[i],lwd=3); text(x[2]+2,yo[1],gg[i],adj=c(0,0.5))
}
dev.off()


Code for Updating stations from v2.meanGHCN stations
The linux shell file:
# batch file for reading v2.mean and associated GHCN files from HOAA
R CMD BATCH GHCNget.r

# Some file processing
gunzip v2.mean.Z
sed -e "s/............/& /" -e "s/-9999/ NA /g" v2.mean>v2.mean.txt

#The main program for producing out.txt, the list of stations with last reporting date
R CMD BATCH GHCNcount.r
# end of shell script


The first R file GHCNget.r
#### A program written by Nick Stokes to download GHCN data

download.file("ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v2/v2.mean.Z","v2.mean.Z")
download.file("ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v2/v2.temperature.inv","v2.temperature.inv")
download.file("ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v2/v2.country.codes","v2.country.codes")
### End of R file


The second R file GHCNcount.r

#### A program written by Nick Stokes, 14/2/10
## Reads v2.mean and outputs out.txt, which lists stations in order of most recent reporting


#####################
# read data from v2.mean, downloaded from http://www1.ncdc.noaa.gov/pub/data/ghcn/v2/
#  Read in data from the files in matrix form
if(T){       #change T to F after you have read in the files oncevmean <- matrix(scan("v2.mean.txt", 0, skip=0,na.strings = "NA"), ncol=14, byrow=TRUE)vinv <- scan("v2.temperature.inv.txt", list(num=0,name="",dat=""), sep=";", quote=NULL, skip=0,na.strings = "NA")
vcode <- scan("v2.country.codes", list(num=0,name=""), sep=";", quote=NULL, skip=0)
}


# Initialise

  len=length(vmean[,1])
  v=array(0, c(length(vinv$num),3))
  vc=rep(1,801)
  for(i in 1:length(vcode$num)){
    vc[as.numeric(vcode$num[i])]=i
  }
  i=1; ii=0; jo=-9

# loop over all rows in v2.mean
  for(kk in 1:(len-1)){
    if(sum(is.na(vmean[kk,]))<12){
      if(vmean[kk,1]!=vmean[i,1]){
    for(j in 3:14){
       if(!is.na(vmean[i,j]))  jj=j-2
    }
    jj=vmean[i,2]*1000+jj
    j=floor(vmean[i,1]/10)
    if(ii>0)jo=v[ii,1]   
    if(j==jo) v[ii,2]=max(v[ii,2],jj)
    else {
    ii=ii+1;v[ii,]=c(j,jj,floor(j/100000000))
    }
      }
      i=kk
    }
  }

# Now output to file "out.txt"

iu=order(-v[1:ii,2],v[1:ii,1],decreasing=FALSE)
h=t(cbind(v[iu,2],vinv$name[iu],vinv$num[iu],vcode$name[vc[v[iu,3]]]))
write(h,file="out.txt",ncolumns=4)


 ###################################################
 GISS UHI Adjustments
#### A program written by Nick Stokes, 23 Feb 201010 to emulate GISS UHI corrections for individual urban stations
## Plots the correction curve


#####################
# v2.mean.txt contains  data from v2.mean, downloaded from http://www1.ncdc.noaa.gov/pub/data/ghcn/v2/
# it has been lightly edited, replacing -999 by NA and putting a space between station number and year
#v2.inv comes direct from GISS at http://data.giss.nasa.gov/gistemp/sources/GISTEMP_sources.tar.gz

# These are big data sets so change T to F once they are in the workspace

if(F)v2mean <- readLines("../GHCN/v2.mean.txt")
if(F)stat <- scan("v2.inv", "", sep=";", quote=NULL, skip=0)

#Initialising
len=length(stat)
pi2=pi/180
statnum=c(42570273000, 42570274001)
statname=c("Anchorage","Matanuskas")
city=1  #  Choose a city here
labt=c(2.4,3.6)
target=statnum[city]
Rad=1000 # Rad of rural network in km
small=.1^12

name=rep(NA,len); ur=name
num=rep(0,len); lat=num; lon=num; lat=num
mav=array(0,c(60,12)); mac=mav+small; km=0:1*1000
av=rep(0,150); urb=av; cv=av; cnt=av+small
jj=numeric(); dj=jj; mn=dj; ia = dj; ib = dj; dg=dj

# Loop thru v2.inv to format station data

for(i in 1:len){
name[i]=substr(stat[i],13,44)
ur[i]=substr(stat[i],101,101)
num[i]=as.numeric(substr(stat[i],1,11))
lat[i]=as.numeric(substr(stat[i],45,50))
lon[i]=as.numeric(substr(stat[i],51,58))
if(num[i]==target)kv=i
}
# get the urban coords
lu=c(lat[kv],lon[kv])*pi2
su=c(sin(lu),cos(lu))

for(i in 1:len){
  if(i==kv){jj=c(jj,i);dj=c(dj,Rad);ktarg=length(jj)}
# Restrict to USA lat>52, rural
ix=floor(num[i]/100000000)       # Country code 425 USA 403 Canada
  if(  ((lat[i]>52 & ix==425) | (lon[i]+130.<0 & ix==403)) & ur[i]=="A"){
   lr=c(lat[i],lon[i])*pi2
   sr=c(sin(lr),cos(lr))
# Spherical distance calc
   cd=su[1]*sr[1]+sr[3]*su[3]*(su[2]*sr[2]+sr[4]*su[4])
   d=round(6371*acos(cd))          # distance on sphere surface
   if(d
  }
}
ck=jj*0+1880
kk=1; k=jj[kk]; ku=num[k]; dk=1-dj[kk]/Rad
lh=length(v2mean)


#  here we scan the whole v2.mean, but skipping all except the target and rurals within range
for(i in 1:lh){      # scan once for averages
  ii=as.numeric(substr(v2mean[i],1,11))  # station number
  if(ii
  if(ii>ku){          #  we've finished with rural kk
    kk=kk+1; if(kk>length(jj))break
    k=jj[kk]; ku=num[k];  dk=1-dj[kk]/Rad
    if(ii!=ku)next      # The next station wasn't in range
  }
  ho=as.numeric(unlist(strsplit(v2mean[i],"  *")))
  if(ho[2]
  ck[kk]=ho[2]      # counter to enforce monotonic
  ia=c(ia,i); ib=c(ib,kk);     #  OK, this row we need. Keep details for next loop

  for(iy in 1:12){      # make monthly averages for deseasonalising
    if(!is.na(ho[iy+2])) { mav[kk,iy]=mav[kk,iy]+ho[iy+2]; mac[kk,iy]=mac[kk,iy]+1;}
  }   
}
mav=mav/mac   # monthly averages

ial=length(ia)
# now loop over known yearly data in the neighborhood
for(iw in 1:ial){
  i=ia[iw]  # row in v2.mean
  kk=ib[iw]; dk=1-dj[kk]/Rad
# read a year as numbers from v2.mean
  ho=as.numeric(unlist(strsplit(v2mean[i],"  *")))  # row vec of numbers
  if(ho[2]<1880)next
  ih=ho[2]-1879  # year index
#  deseasonalise (monthly anomalies)
  hp=ho[3:14]-mav[kk,]
#  now replace NA with zero anomaly
  for(iy in 1:12){if(is.na(hp[iy])) hp[iy]=0.}
# make annual average anomaly, convert tp Celcius
  hk=mean(hp)/10.
  if(!is.na(hk)){
# Set the target and count upper and lower bounds
    if(ho[1]==target*10) { urb[ih]=hk; mn=cbind(mn,v2mean[i]);km[1]=max(km[1],ih);km[2]=min(km[2],ih) }
    else{
      av[ih]=av[ih]+hk*dk  # Distance (dk) weighted ih~year
      cv[ih]=cv[ih]+1      #  counter for how many rurals in each year
      cnt[ih]=cnt[ih]+dk   #  sum weights
     }
     if(is.na(av[ih]))break  # Shouldn't happen now
  }
}

av=av/cnt
km[1]=min(km[1],120)
ko=km[2]:km[1]
urb[ko]=urb[ko]-mean(urb[ko])
av[ko]=av[ko]-mean(av[ko])
corr=av-urb

dh=100000.
for(i in (km[2]-5):(km[1]+5)){
h=lm(corr[ko]~ko+abs(ko-i))
de=deviance(h)
if(de}
v=fitted(h0)
vr=round(v*10)/10

jpeg(paste(statname[city],".jpg",sep=""))
plot(ko+1879,urb[ko],type="l",col="red",xlab="Year",ylab="Temp anom",main=statname[city])
lines(ko+1879,corr[ko],lwd=3)
lines(ko+1879,av[ko],col="green")
lines(ko+1879,v)
lines(ko+1879,vr,col="pink")
yt=labt[city]-1:3*0.4
text(1970,yt[1],statname[city],col="red")
text(1970,yt[2],"Rural avg",col="green")
text(1970,yt[3],"Difference",col="black")
dev.off()

 


Code for GHCN Adjustments Histogram

# A program written by Nick Stokes 30 Mar 2010 to calculate
# global temperatures by LSS method. See www.moyhu.blogspot.com

# Read data from adapted v2.mean and v2.temperature.inv.
# See http://moyhu.blogspot.com/2009/12/repository.html#GHCNupdate for scripts
# v2.mean just has a space inserted before year, and -9999 replaced by NA

  v2 <- matrix(scan("../GHCN/v2.mean.txt", 0, skip=0,na.strings = "NA",nlines= -4999), ncol=14, byrow=TRUE)
  #v2 <- matrix(scan("synth.txt", 0, skip=0,na.strings = "NA",nlines= -4999), ncol=14, byrow=TRUE)
  nn=length(v2[,1]);
  v2[,1]=floor(v2[,1]/10); v2=rbind(v2,1:14*0);
  wd=c(11,1,30,8,7,4,6,1,10,3,2,16,1);
  tinv=read.fwf("../GHCN/v2.temperature.inv",widths=wd,comment.char="");

#initialising
 m=v2[1,1]; yr0=1880; di=c(2009-yr0,12);
 wi = array(NA,di);  ww = wi;
 wj = array(0,di);  wn = wj;
 w1=  1/(1:20); w0 = 1-w1; w1=.1*w1;
 nk=length(levels(factor(v2[,1])))-1;
 x = array(NA,c(nk*12,di[1])); w=x; ink=0:11*nk; s=0;

# This loop just reads the big array v2, averages duplicates, and fills the reorganised data array x[,]
for (i in 1:nn){
  j = v2[i,2]-yr0;
  id=(1:12)[!is.na(v2[i,3:14])];
  if(j>=0){
    k = wn[j,id]+1;
    wn[j,id] = k;
    ww[j,id][is.na(ww[j,id])] = 0.;
    ww[j,id] = ww[j,id]*w0[k] +v2[i,id+2]*w1[k];
  }
  if( m != v2[i+1,1]){  #  end station sequence
    s=s+1;   # station counter
    x[s+ink,]=t(ww);  # consolidated readings;
    ww[,] = wi[,]; wn=wj;
    m=v2[i+1,1];  # checking end of station sequence
  }
 }


nt=length(tinv[,1]);
wtt = sin((1:36-0.5)*pi/36);    wt=rep(wtt,72)  # 36x72 area weight;
d=floor(tinv[,4:5]/5);    # lat and long;
cellnum=d[,1]+19+36*(d[,2]+36);  # 5 deg x 5 deg cells numbered 1 to 648 (18*36);


cellcount=array(0,c(di,36*72));

#count monthly readings over stations for each cell;
for(i in 1:s){
  d=cellnum[i];
  for(k in 1:12){
    mask = !is.na(x[i+ink[k],]);  # count readings for each month in cell;
    cellcount[mask,k,d] = cellcount[mask,k,d]+1;
  } 
 } 

#make weights (cell area/count)
for(i in 1:s){
  d=cellnum[i];
  for(k in 1:12){
    k2=i+ink[k];     mask=!is.na(x[k2,]);
    w[k2,mask] = wt[d]/cellcount[mask,k,d];  # big weight matrix;
  } 
 }

# Now create design matrix, RHS, and solve by 2x2 block factorisation;
# taking advantage of big top left block being diagonal;
# r1, r2 are RHS blocks, a1,a2 diagonals. S is Schur complement;
d=dim(x);

a1=rowSums(w, na.rm=T);
a2=colSums(w, na.rm=T);

r1=rep(0,d[1]);L=r1;b=r1;
r2=rep(0,d[2]);b2=r2;
S = array(0,d[c(2,2)]);

for(i in 1:d[1]){
  r1[i]=sum(w[i,]*x[i,], na.rm=T);
 }
#  Create Schur complement, and r2 (RHS)
for(i in 1:d[2]){
  b=w[,i]/a1;   # off-diagonal column;
  r2[i]=sum( w[,i] * x[,i], na.rm=T);
  b2[i] = r2[i] - sum( b*r1, na.rm=T);
  for(j in i:d[2]){
    S[i,j] = - sum( b*w[,j], na.rm=T);
    S[j,i] = S[i,j];
  }
  S[i,i]=S[i,i]+a2[i];
 }
S[1,1]=S[1,1]+10000.;  # enforce G[1]=0 (else S is singular)
G = solve(S,b2);  # Global temp;

#  Finally get station local (seasonal) offsets L is nkx12 array;
for(i in 1:d[1]){ 
  L[i] = r1[i]-sum(w[i,]*G, na.rm=T);
 }
L=L/a1; L = array(L,c(nk,12));

sm=rep(1,7);sm=sm/sum(sm);
#  Plots
jpeg("Global1978.jpeg");
 ky=1978:2009; k=ky-yr0;  # change k to any range (k+yr0) you prefer;
plot(ky,G[k],type="l", main=paste("Global temperature (offset) ",ky[1]," to ",max(ky)), xlab="Year", ylab="Temp C");
h=lm(G[k]~k);
lines(ky,h$fitted,col="red");
lines(ky,filter(G[k],sm),col="blue");
text(ky[4],G[d[2]],paste("slope ",.00001*floor(1000000*h$coefficients[2])," C/dec"),pos=4);
dev.off()

jpeg("Global1900.jpeg");
ky=1900:2009; k=ky-yr0;  # change k to any range (k+yr0) you prefer;
plot(ky,G[k],type="l", main=paste("Global temperature (offset) ",ky[1]," to ",max(ky)), xlab="Year", ylab="Temp C");
h=lm(G[k]~k);
lines(ky,h$fitted,col="red");
lines(ky,filter(G[k],sm),col="blue");
text(ky[4],G[d[2]],paste("slope ",.00001*floor(1000000*h$coefficients[2])," C/dec"),pos=4);
dev.off()

Test Routine: 

# Making synthetic GHCN v2.mean test data
v=array(0,c(9000,14));
u=1970+0:39;
w=rep(u,225);
v[,2]=w;
v[,1]=floor(1:9000/40.00001)*50+50
seas= 10.*cos(0.5:11.5*pi/6);
for(i in 1:9000){
j=v[i,2]-1970;
id=1:12;
v[i,id+2]=floor(100*(j*0.01+seas[id]));
}
for(i in 3:14){
j=floor(runif(999)*8999)+1;
v[j,i]=NA
}
write(t(v),"synth.txt",ncolumns=14);

Version 1.2 of GHCN processor

First, here is the preprocessor code 

########  Start of preprocessorv1.1.r  ############ 


 # A program written by Nick Stokes 3 April 2010 to preprocess
# the GHCN v2.mean file

# Read data from v2.mean.
# To avoid memory hogging, this code reads and writes in blocks

####   input files and paths - change this to reflect your arrangement

input1="../GHCN/v2.mean";  

####   Utilities
# prints data to the screen and also to a records file.
pr = function(s){
  su=paste(s,collapse=" "); 
  print(su); write(su,"diary.txt",ncolumns=length(s),append=T);
}
#  To watch progress;
timer = function(time0=0){
  tm=as.numeric(format(Sys.time(),"%s"));
  if(time0>0){tm=tm-time0;}
  tm;
}

  ### PROGRAM STARTS HERE;

time0=timer(0);
pr(c("Preprocessing v2.mean ",date()));
loc1="tmp3712.txt";  # temp file will appear in  your dir
con<-file(input1,"r");
#initialising
m=0; yr0=1670;  #  Start year
di=c(2010-yr0,12); dy=yr0:2020+1;
ww = array(0,di[2:1]); wn = ww; ow = rep(F,di[1]);
w1=  1/(1:20); w0 = 1-w1; w1=.1*w1;
s=0; y=rep(0,12); tms=rep(0,6); jm = 0; i12=rep(10,12);
wd=c(12,5,5,5,5,5,5,5,5,5,5,5,5,5);  # for read,fwf(v2.mean) spacings;

# This loop just reads the big array v2, averages duplicates, and fills the reorganised data array x[,]
ii=40000; chunk=ii;   #### chunk is read block size (lines);
x = array(0,c(14,chunk+800));
i=0; jc=0; oy=F;
unlink("v2mean.txt"); write(c("Preprocessing v2.mean",date()),"comment.txt",append=F);

for (ix in 1:772222){
    i=i+1;
    ii=ii+1;
    if(ii>chunk){     # If we're out of data, read a chunk';
      time1=Sys.time();
      pr(c("Times",round(tms[1:3],4)));
      pr(c("Next block time",timer(time0)));
      a=readLines(con,n=chunk);
      substring(a,1)="    ";
      substring(a,12)=" ";
      substring(a,17)=" ";
      substring(a,22)=" ";
      substring(a,27)=" ";
      substring(a,32)=" ";
      substring(a,37)=" ";
      substring(a,42)=" ";
      substring(a,47)=" ";
      substring(a,52)=" ";
      substring(a,57)=" ";
      substring(a,62)=" ";
      substring(a,67)=" ";
      substring(a,72)=" ";
    
      writeLines(a,loc1);   # write a block to a temp file
      pr(c("Start read.fwf at time",timer(time0)));
      v2 <- matrix(scan(loc1, 0, skip=0), ncol=14, byrow=TRUE);  # read back from tmp
      v2[v2==9999]=NA;
      i=1; ii=1; jam = length(v2[,1]); jc=jc+jam; 
      pr(c("Have read",jc,"lines at time",timer(time0)));
      tms[1]=difftime(Sys.time(),time1);tms[3]=tms[3]+tms[1];
    }
    if(i<=jam){
      y[] = v2[i,3:14];
      mm=m; m=v2[i,1];;
    }else{
      m=-1; jj=chunk+100;
    }
  
    if(m != mm){  #  end station sequence - write data for that station;
      time1=Sys.time();
      jj=sum(ow);
      if(jj>0){
    jk=1:jj+jm; jm=jm+jj;
    if(jm>30000 || i>jam){
      write(signif(x[,1:(jm-jj)],4),"v2mean.txt",ncolumns=14,append=TRUE);
      jm=0;jk=1:jj;
      write(paste(c(m,jj,s,i,ii,ix),collapse=" "),"comment.txt",ncolumns=14,append=TRUE);
      if(m==-1)break;
    }
    wn[wn==0]=NA;
    x[3:14,jk] = ww[,ow]/wn[,ow];  # consolidated readings;      
    x[2,jk] = dy[ow];  # consolidated readings;      
    x[1,jk] = rep(s,jj);  # consolidated readings;  
      }  
      if(i> jam)break;
      s=s+1;   # station counter;
      ww[,] = 0; wn[,] = 0; ow[] = F; ih = numeric();
      mm=m;  # checking end of station sequence;
      tms[1]=difftime(Sys.time(),time1);tms[2]=tms[2]+tms[1];
    }
    j = v2[i,2]-yr0;
    if(j>=0){   #  You can cut early years
      id=!is.na(y);
      wn[id,j] = wn[id,j]+i12[id];
      ww[id,j] = ww[id,j]+y[id];
      ow[j]=T;
    }
 }
########  Endof preprocessorv1.1.r  ############ 

  And then the main regional/global averaging code

########  Start of GlobTempLSv1.11.r  ############
# Version 1.11
# A program written by Nick Stokes 3 Apr 2010 to calculate
# global temperatures by LSS method. See www.moyhu.blogspot.com

# Changed slightly 5 April for clarity - shouldn't affect function
# See http://moyhu.blogspot.com/2009/12/repository.html#GHCNupdate for scripts
# v2.mean.txt just has a space inserted before year, and -9999 replaced by NA

# A bug fix on line 89 is the difference between v1.1 and v1.11

#### input files and paths - change this to reflect your arrangement
input1 = "v2mean.txt";  # you have made this file with preprocessorv1.1.r
input2 = "../GHCN/v2.temperature.inv";

####  Function definitions

#  Output plots
plotyr = function(ky = 1900:2009){
  if(ky[1]>yr0){
    jpeg(paste(title[choice],ky[1],".jpeg",collapse=""));
    k = ky-yr0;  # change k to any range (k+yr0) you prefer;
    h = lm(G[k]~k);     trend=    signif(10*h$coef[2],5);
    plot(ky,G[k],type = "l", main = paste("Global temperature (offset) ",title[choice]," ",ky[1]," to ",max(ky)), xlab = paste(c("Trend",trend,"C/Decade"),collapse=" "), ylab = "Temp C");
    lines(ky,h$fitted,col = "red");
    lines(ky,filter(G[k],rep(1./7,7)),col = "blue");    # smoothed
#    text(ky[4],G[max(ky)-2],paste("Trend ",trend," C/dec"),pos = 4);
    dev.off();
    trend;
  }
}
# prints data to the screen and also to a records file.
pr = function(s){
  print(s); write(s,"diary.txt",ncolumns=length(s),append=T);
}
#  To watch progress;
  timer = function(time0=0){
    tm=as.numeric(format(Sys.time(),"%s"));
    if(time0>0){tm=tm-time0;}
    tm;
  }
   
######  Start of program;
time0=timer(0);
if(T){   # Change this to F if you have read in already

v2 <- matrix(scan(input1, 0, skip = 0,na.strings = "NA",nlines = -4999), ncol = 14, byrow = TRUE)
#v2 <- matrix(scan("synth.txt", 0, skip = 0,na.strings = "NA",nlines = -4999), ncol = 14, byrow = TRUE)
nn = length(v2[,1])-1;

wd = c(3,5,3,31,7, 8,5,5,1,5, 2,2,2,2,1,2,16,1 );
colnames=c("country","wmo","mod","name", "lat", "lon","alt","ialt","pop","ipop",
  "topo","sveg","loc","iloc","airpt", "town", "veg","urban");
tv = read.fwf(input2,widths = wd,col.names=colnames,comment.char="");
  } #FF

######  Allocate stations to grid cells
nt = length(tv[,1]);
yr0 = 1880; yr1 = 2009; yrn = rep(yr0,nt);
di=c(12,yr1 - yr0); hs=1:2;

wtt = sin((1:36-0.5)*pi/36);    wt = rep(wtt,each=72)  # 36x72 area weight;
d = floor(tv[,5:6]/5);    # station lat and long;
cellnums = unlist(d[1]*72+d[2]+1333);  # 5 deg x 5 deg cells numbered 1 to 72*36=2592;

#                            Define choice names (make your own)
title = c("Globe","NH","SH","ConUS","Airport","Rural","Urban");
choices = 1:7  #c(4,5,6,7);
restext=paste(c("Output",format(Sys.Date(),"%F:%k:%M"),title[choices],".txt"),collapse="_");
pr(restext);             ic=numeric(0); results=ic;

for(choice in choices){
 
  i01 = 1:nt; i10=rep(NA,nt); orn=rep(F,nt);
  tc = title[choice]; pr(" ");
  pr(paste(tc,"  time (s)",timer(time0)));
#  Here you select the stations to analyse. tv is the table from the .inv file - see colnames above
  if(tc == "NH") i01 = i01[cellnums>1296];
  if(tc == "SH") i01 = i01[cellnums<1297];
  if(tc == "ConUS") i01 = i01[tv$country == 425 && tv$lat<51 && tv$lon > -130];
  if(tc == "Airport") i01 = i01[tv$airpt == "A"];
  if(tc == "Rural") i01 = i01[tv$urban == "A"];
  if(tc == "Urban") i01 = i01[tv$urban != "A"];
  orn[i01] = T;   nk=length(i01);  
  i10[i01] = 1:nk;

#  Now sort v2 monthly data into x[] array, ox[] tracks NA's;
  ip = floor(v2[,1]);
  dix = c(12*nk,di[2]); x=array(0,dix);
  ox=array(F,dix);
  ink=0:11*nk;
  for (i in 1:nn){
    if(orn[ip[i]]){
      j = v2[i,2]-yr0; ii=i10[ip[i]]+ink;
      if( j>0 ){   # This line (only) changed in v1.11
    y=v2[i,3:14];
    ok=!is.na(y);
    x[ii[ok],j]=y[ok];
    ox[ii[ok],j]=T;
      }
    }
  }
 
  pr(paste("Number stations  ",nk,"  time ",timer(time0)));
  if(nk==0) next;
 
  cellcount = array(0,c(di[2],12,36*72));
  dix = dim(x);  ink = 0:11*nk;
 
#count monthly readings over stations for each cell;
  for(i in 1:nk){
    d = cellnums[i01[i]];
    for(k in 1:12){
      mask = ox[i+ink[k],];  # count readings for each month in cell;
      cellcount[mask,k,d] = cellcount[mask,k,d]+1;
    } 
  } 
  w = array(0,dix);
 
# Make weights (cell area/count)
  for(i in 1:nk){
    d = cellnums[i01[i]];
    for(k in 1:12){
      k2 = i+ink[k];     mask = ox[k2,];
      w[k2,mask] = wt[d]/cellcount[mask,k,d];  # big weight matrix;
    } 
  }
# Now create design matrix, RHS, and solve by 2x2 block factorisation;
# taking advantage of big top left block being diagonal;
# r1, r2 are RHS blocks, a1,a2 diagonals. S is Schur complement;
 
  pr(paste("Now solve design matrix ","  time",timer(time0)));
  a1 = rowSums(w, na.rm = T)+1.0e-9;
  a2 = colSums(w, na.rm = T);
 
 
  r1 = rep(0,dix[1]); L = r1; b = r1;
  r2 = rep(0,dix[2]); b2 = r2;
  S = array(0,dix[c(2,2)]);
 
  for(i in 1:dix[1]){
    r1[i] = sum(w[i,]*x[i,], na.rm = T);
  }
#  Create Schur complement S, and r2 (RHS)
  for(i in 1:dix[2]){
    b = w[,i]/a1;   # off-diagonal column;
    r2[i] = sum( w[,i] * x[,i], na.rm = T);
    b2[i] = r2[i] - b %*% r1;
    for(j in i:dix[2]){
      S[i,j] = - sum( b * w[,j]);
      S[j,i] = S[i,j];
    }
    S[i,i] = S[i,i]+a2[i];
  }
  S[1,1] = S[1,1]+10000.;  # enforce G[1] = 0 (else S is singular);
  G = solve(S,b2); G = G - mean(G); # Global temp;
 
#  Finally get station local (seasonal) offsets L is nkx12 array;
  for(i in 1:dix[1]){ 
    L[i] = r1[i]-sum(w[i,]*G, na.rm = T);
  }
  L = L/a1; L = array(L,c(nk,12));
 
#  Plots
  hs=1:2; 
  hs[1]=plotyr(1900:yr1); # Trends in deg/decade;
  hs[2]=plotyr(1978:yr1);
  results=rbind(results,c(hs,signif(G,5)));  ic=c(ic,choice);
  pr(paste(c("Trend ",format(hs,digits=4), "  time (s)",timer(time0)),collapse=" "));
 }

i=dim(results)[1];
write(format(title[ic],width=9), restext, ncolumns=i, append=F);
write(format(I(results),width=9,scientific=F),restext,ncolumns=i,append=T);

########  End of GlobTempLSv1.11.r  ############



Code for GHCN Temperature V1.2


First again the preprocessor


########  Start of preprocessorv1.4.r  ############

 # A program written by Nick Stokes 18 April 2010 to preprocess
# the GHCN files v2.mean file and v2.temperature.inv
# from http://www1.ncdc.noaa.gov/pub/data/ghcn/v2/

# To avoid memory hogging, this code reads and writes in blocks

####   input files and paths - change this to reflect your arrangement

 input1="./v2.mean"; 
 input2="./v2.temperature.inv"; 
 Sea=T; if(Sea)input3="./HadSST2_SST_1850on.txt";

####   Utilities
# prints data to the screen and also to a records file.
  pr = function(s){
  su=paste(s,collapse=" ");
  print(su); write(su,"diary.txt",ncolumns=length(s),append=T);
}


  ### PROGRAM STARTS HERE;

  output1="modified.data";
  output2="modified.inv";
  loc1="tmp3712.txt";  # temp file will appear in  your dir;

  pr(c("Preprocessing v2.mean ",date()));
  con<-file(input1,"r");
#initialising
  m=0; yr0=1670;  #  Start year;
  di=c(2010-yr0,12); dy=yr0:2020+1;
  ww = array(0,di[2:1]); wn = ww; ow = rep(F,di[1]);
  w1=  1/(1:20); w0 = 1-w1; w1=.1*w1;
  s=1; y=rep(0,12); tms=rep(0,6); jm = 0; i12=rep(10,12);
  statyrs = sr = numeric();

  ainv=readLines(input2);
  stnums=as.numeric(substring(ainv,4,11));

# This loop just reads the big array v2, averages duplicates, and fills the reorganised data array x[,]
  chunk=40000; ii=chunk; chunk1 = ii-500;   #### chunk is read block size (lines);
  x = array(0,c(14,chunk+800));
  i=0; jc=0; oy=F;
  unlink("modified.data"); write(c("Preprocessing v2.mean",date()),"comment.txt",append=F);

for (ix in 1:999999){   # All lines in v2.mean
    i=i+1;
    ii=ii+1;
    if(ii>chunk){     # If we're out of data, read a chunk';

      a=readLines(con,n=chunk);
      substring(a,1)="   ";
      iaa = seq(12,72,5);
      for(ia in iaa){      substring(a,ia)=" ";}  # turn -9999 to 9999;   
      writeLines(a,loc1);   # write a block to a temp file;

      v2 <- matrix(scan(loc1, 0, skip=0, quiet=T), ncol=14, byrow=TRUE);  # read back from tmp;
      v2[v2==9999]=NA;
      i=1; ii=1; jam = length(v2[,1]); jc=jc+jam;
      pr(c("Have now read",jc,"lines from v2.mean"));
    }
    if(i<=jam){
      y[] = v2[i,3:14];
      mm=m; m=v2[i,1];
    }else{
      m=-1; jj=chunk+100;
    }
 
    if(m != mm){  #  end station sequence - write data for that station;
      jj=sum(ow);
      if(jj>0){
    jk=1:jj+jm; jm=jm+jj;
    if(jm>chunk1 || i>jam){
      write(signif(x[,1:(jm-jj)],4),output1,ncolumns=14,append=TRUE);
      jm=0;jk=1:jj;
      write(paste(c(m,jj,s,i,ii,ix),collapse=" "),"comment.txt",ncolumns=14,append=TRUE);
      if(m==-1)break;
    }
    wn[wn==0]=NA;
    x[3:14,jk] = ww[,ow]/wn[,ow];  # Convet sum of dups to average;
    x[2,jk] = dy[ow];  # years;
    x[1,jk] = rep(s,jj);  # s=stat number (in order);
    statyrs=cbind(statyrs,c(x[2,jk[1]],x[2,jk[jj]],jj));
      } 
      if(i> jam)break;
      sr=c(sr,m);
      while(m != stnums[s] & s<7281){s=s+1};   # station counter;
      ww[,] = 0; wn[,] = 0; ow[] = F;
      mm=m;  # checking end of station sequence;
    }
    j = v2[i,2]-yr0;
    if(j>=0){   #  You can cut early years
      id=!is.na(y);
      wn[id,j] = wn[id,j]+i12[id];
      ww[id,j] = ww[id,j]+y[id];
      ow[j]=T;
    }
 }
statyrs=cbind(statyrs,c(x[2,jk[1]],x[2,jk[jj]],jj));
pr(date()); close(con);


if(Sea){
con=file(input3,"r");
yr0=1879;
for(i in 1:360){  readLines(con,n=37); }  #skip 1850-1879
ny=131; ng=36*72;
v2 = array(NA,c(14,ny,ng));
w0=ou=rep(F,ng); iyr = jyr = rep(NA,ng);
sy=matrix(0,3,ng);

for(i in 1:ny){
  v2[2,i,]=i+yr0;
  v2[1,i,]=1:ng;
  w1=w0;
  for(j in 3:14){
    a=readLines(con,n=37);
    if(length(a)<2)next;
    writeLines(a,loc1);  # roundabout way of using scan on blocks;
    v=scan(loc1,0,skip=1,quiet=T);
    w2 = v == -99.99;
    v[ w2 ] = NA; w1 = w1 | !w2;  # w1 is T if there was a reading that year
    v2[j,i,]=c(v);
  }
  ou=ou|w1;
  sy[1,w1 & sy[1,]==0] = i;  # begin year
  sy[2,w1] = i;  # end year
 }
 pr(date());   close(con);
 sy=sy+yr0; sy[3,]=sy[2,]-sy[1,]+1; sy[3, sy[1,]==0]=0;
 iou=cumsum(ou); nou=sum(ou);
 statyrs=cbind(statyrs,sy[,ou]);

 for(i in 1:ny)v2[1,i,]=iou[v2[1,i,]]+7280;

#ov=colSums(!is.na(v2[3:14,,]), dims=2)>80;  # rubbing out sea cells with too few readings;
 write(v2,output1,ncolumns=14,append=T);   # Write the modified sea data file;


# Make modified inventory file;

 ll=matrix(0,ng,3);  # lat lon;   
 k=0;
 for(j in 1:36){
   for(i in 1:72){
     k=k+1; ll[k,1]=-5*j+92.5;  ll[k,2]=5*i-182.5; ll[k,3]=k+9980000;  # writing madeup station numbers for sea ;
   }
 }
 pr(date());

 wd = c(3,5,3,31,7, 8,5,5,1,5, 2,2,2,2,1, 2,16,1 );
 a2 = paste(rep(" ",101),collapse="");
 a0 = array(a2,2592);   m = c(45,52,1);
 for(i in 1:3){substring(a0,m[i]) = as.character(ll[,i]);}  # writing lats, lons and nums for sea stations;

 ainv=c(ainv,a0[ou]);  # The big text block
 }

substring(ainv,85)="                ";
for(i in 1:3){substring(ainv,81+5*i)=as.character(statyrs[i,]);}  # writing start/end yrs
write(ainv,output2,ncolumns=1);
print("Finished. Output files modified.data and modified.inv");


########  End of preprocessorv1.4.r  ############
Now the main analysis program
 
############# Start GlobTempLSv1.4.1.r  ##################
# Version 1.4
# A program written by Nick Stokes 18 Apr 2010 to calculate
# global temperatures by LSS method. See www.moyhu.blogspot.com
# Input files are output by the program preprocessorv1.3.r

#### Parameter definitions - these vars are ones that you might want to change
plotdir = ".";  outputdir = "."

# subset selection - title names appear in graph and text output - you need to write a matching condition below;
#title = c("Land","LandSea","SST","NHLandSea","SHLandSea");
#title = c("SST","ConUS","Airport","Rural","Urban","Cut92","Kept","BC","Africa");
title = c("Boliv","Boliv1","Boliv2","Boliv3","GtLakes");
choices = 1:4 #c(1,4,8,9,10,11); # select items from "title" for this run;

UseMaps=T;if(UseMaps)require("maps");

# plot selection;
yr0 = 1880;  # Start of the analysis period - don`t ask for trends or plots before this date;
yr1=2009; plotyrs = matrix( c(1901,yr1, 1979,yr1),nrow=2) #list year pairs, as many plots as you want;

# trend selection start and finish;
trendyrs = matrix( c(1901,2005, 1979,2005),nrow=2) #list year pairs, as many trends as you want;

#### Functions you may want to change
chooser = function(tc){
# The components of tv come from modified.inv;
#   tv$("country","wmo","mod","name","lat", "lon","alt","ialt","pop","ipop","topo","veg","loc","iloc",  ;
  #     "airpt", "town", "startyr","endyr","length","urban");  # see readme.txt for info;

isLand = tv$urban != " ";   # note next change to switch in V1.4 - now you just write the logical expression
switch( tc,      # format of switch: LHS is your choice, RHS is vector returned by chooser()
  "Land" = isLand, # urban is A (rural),B or C for land, " " for sea
  "SST" = !isLand,
  "NHland" = tv$lat>0 & isLand,
  "NHsea" = tv$lat>0 & !isLand,
  "SHLandSea" = tv$lat<0,
  "NHLandSea" = tv$lat>0,
  "SHland" = tv$lat<0 & isLand,
  "SHsea" = tv$lat<0 & !isLand,
  "ConUS" = tv$country == 425 & tv$lat<51 & tv$lon > -130,
  "Airport" = tv$airpt == "A",
  "Rural" = tv$urban == "A",
  "Urban" = tv$urban == "C",
  "Cut92" = tv$endyr<1992 & tv$length>50,
  "Kept" = tv$endyr>2005 & tv$length>60,
  "Africa" = tv$country <190,
  "BC" = grepl(",BC",tv$name) | grepl(",B.C",tv$name),
  "MedCity" = tv$ipop>500 &tv$urban == "C",
  "BigCity" = tv$ipop>2000 &tv$urban == "C",
  "KC" = getdist(39,-95)<1000 &tv$urban == "A",
  "GtLakes" = tv$urban == " " & getdist(42, -88)<1000,
  "Boliv" = getdist(-16,-68)<1200 & isLand, # coords of La Paz
  "Boliv1" = getdist(-16,-68)<1200 & isLand & tv$alt>400,
  "Boliv2" = getdist(-16,-68)<1200 & isLand & tv$endyr>2008,
  "Boliv3" = getdist(-16,-68)<1200 & isLand & tv$endyr>2008 & tv$loc!="CO",
  "LandSea" = tv$lat<100
    )
}

####  Other function definitions;

#  Output plots;

ipcc=c(1,6,19,42,71,96,106,96,71,42,19,6,1); ipcc=ipcc/sum(ipcc);
plotyr = function(ky = 1900:2009){
  if(ky[1]>yr0){
    jpeg(paste(c(plotdir,"/",title[choice],ky[1],".jpeg"),collapse=""));
    k = ky-yr0;  # change k to any range (k+yr0) you prefer;
    h = summary(lm(G[k]~k));     trend=    signif(10*h$coef[2,1:2],3);
    plot(ky,G[k],type = "l", main = paste("Temperature (offset) ",title[choice]," ",ky[1]," to ",max(ky)), xlab = paste(c("Trend",trend[1],"±",trend[2],"C/Decade"),collapse=" "), ylab = "Temp C");
    lines(ky[1:length(h$fitted)],  h$fitted,  col = "red");
    lines(ky,filter(G[k],ipcc),col = "blue");    # smoothed ;
    dev.off();
    trend;
  }
}
  plotstat = function(){
    jpeg(paste(c(plotdir,"/",title[choice],"Stats",".jpeg"),collapse=""));
    plot(1:nyr+yr0,statcount,type = "l", main = paste("Station count",title[choice]),
     ylab="Number Stations");
    dev.off();
  }
  mapstat=function(){
    jpeg(paste(c(plotdir,"/",title[choice],"_map",".jpeg"),collapse=""));
    d=tv[i01,5:6]; h=c(min(d[,2]),max(d[,2]),min(d[,1]),max(d[,1]));
    map("world",xlim=h[1:2],ylim=h[3:4],col="red");
    points(d[,2],d[,1],pch=20,cex=0.5);
    dev.off();
  }
# prints data to the screen and also to a records file;
  pr = function(s){
  print(s); write(s,"diary.txt",ncolumns=length(s),append=T);
}
#  A time function to avoid problems with systems that return NULL (some Win)
  time_g = function(s="%s"){
    a=Sys.time();b="0";if(!is.null(a)){b=format(a,s);}
    b
  }
#  Elapsed time utility
  timer = function(time0=0){
    tm=as.numeric(format(time_g("%s")));
    if(is.null(tm))return(0);
    if(time0>0){tm=tm-time0;}
    tm;
  }
#$ Returns a vector of distances of all stations from (lat,lon) place
getdist = function(lat,lon){
   lu=c(lat, lon)*(pi/180);
   su=c(sin(lu),cos(lu));
   lr=cbind(tv$lat,tv$lon)*(pi/180);
   sr=cbind(sin(lr),cos(lr));
# Spherical distance calc
   cd=su[1]*sr[,1]+sr[,3]*su[3]*(su[2]*sr[,2]+sr[,4]*su[4]);
   round(6371*acos(cd));          # distance on sphere surface;
}
   
######  START OF PROGRAM;

time0=timer(0);
#  Input files
input1 = "modified.data";   # Made by preprocessor - should be in current dir     
input2 = "modified.inv";
if(T){   # Change this to F if you have read data in already

  v2 <- matrix(scan(input1, 0, skip = 0,na.strings = "NA",nlines = -4999), ncol = 14, byrow = TRUE);
  nn = length(v2[,1])-1;
  pr(c("v2 done",timer(time0)));
  wd = c(3,5,3,31,7, 8,5,5,1,5, 2,2,2,2,1, 2,5,5,6,1 );
  colnames=c("country","wmo","mod","name","lat", "lon","alt","ialt","pop","ipop",
         "topo","veg","loc","iloc","airpt", "town", "startyr","endyr","length","urban");
  tv = read.fwf(input2,widths = wd,col.names=colnames,comment.char="");
 }

######  Initialisations
ntv = length(tv[,1]);
yrn = rep(yr0,ntv); nyr=yr1 - yr0;
di=c(12,nyr);       hs=1:2;

wtt = sin((1:36-0.5)*pi/36);    wts = rep(wtt,each=72)  # 36x72 area weight;
d = floor(tv[,5:6]/5);    # station lat and long;
cellnums = unlist(d[1]*72+d[2]+1333);  # 5 deg x 5 deg cells numbered 1 to 72*36=2592;

#Title of output file
restext=paste(c(outputdir,"/Output_",time_g("%F_%H:%M_"),title[choices],".txt"),collapse="");
pr(restext);             ic=numeric(0);
tmp=ic; numtrends=dim(trendyrs)[2];
# now make first col of output file
for(i in 1:numtrends){tmp = c(tmp,paste(c(trendyrs[1,i],"-",trendyrs[2,i]),collapse=""),"Trend_se")}
results=c("Trend","Number",tmp,"Temp",(yr0+1):yr1); res1=results;

######### The main loop for your run
for(choice in choices){
  tc=title[choice]; weight=wts;
  orn = chooser(tc);
 
  pr(" ");    pr(paste(tc,"  time (s)",timer(time0)));
  if(length(orn)<1){pr("This choice yielded no stations. Check criterion.");next}
  i=1:ntv; i01=i[orn];
  nst = length(i01);    i10=rep(NA,ntv);   i10[i01] = 1:nst;

#  Now sort v2 monthly data into x[] array, ox[] tracks NA's;
  ip = floor(v2[,1]);
  dix = c(12*nst,nyr); x=array(0,dix);
  ox=array(F,dix); statcount=rep(0,nyr);
  ist=0:11*nst;
  for (i in 1:nn){
    if(orn[ip[i]]){
      j = v2[i,2]-yr0; ii=i10[ip[i]]+ist;
      if( j>0 && j<=dix[2]){
    y=v2[i,3:14];
    ok=!is.na(y);
    x[ii[ok],j]=y[ok];
    ox[ii[ok],j]=T;
    if(sum(ok)>0)statcount[j]=statcount[j]+1;
      }
    }
  }
 
  pr(paste("Number stations  ",nst,"  time ",timer(time0)));
  if(nst==0) next;
  plotstat();
 
  cellcount = array(0,c(nyr,12,36*72));
  dix = dim(x);
 
#count monthly readings over stations for each cell;
  for(i in 1:nst){
    d = cellnums[i01[i]];
    for(k in 1:12){
      mask = ox[i+ist[k],];  # count readings for each month in cell;
      cellcount[mask,k,d] = cellcount[mask,k,d]+1;
    } 
  } 
  w = array(0,dix);
 
# Make weights (cell area/count)
  for(i in 1:nst){
    jq=i01[i]; d = cellnums[jq]; wgt=weight[d]; if(i>jq)wgt=wgt*5;
    for(k in 1:12){
      k2 = i+ist[k];     mask = ox[k2,];
      w[k2,mask] = wgt/cellcount[mask,k,d];  # big weight matrix;
    } 
  }
# Now create design matrix, RHS, and solve by 2x2 block factorisation;
# taking advantage of big top left block being diagonal;
# r1, r2 are RHS blocks, a1,a2 diagonals. S is Schur complement;
 
  x = x * w;
  pr(paste("Now solve design matrix ","  time",timer(time0)));
  a1 = rowSums(w, na.rm = T)+1.0e-9;

  a2 = colSums(w, na.rm = T)+1.0e-9; a2[1]=a2[1]+0.001;
  r1 = rep(0,dix[1]); L = r1; b = r1;
  r2 = rep(0,dix[2]); b2 = r2;

P=function(a,b){drop(a %*% b);}
  doS=function(r){
    r1=P(w,r); r2=r1/a1;
    (r*a2) - P(r2,w);
  }

  cg = function(b){
    p=r=b;x=b*0;h=0;
  for(i in 1:99){
    p=r+h*p;
    Ap= doS(p);
    a=sum(r*r)/sum(p*Ap);
    x=x+a*p;
    h=sum(r*r); r=r-a*Ap;
    hh=sum(r*r); h=(hh/h);
    if(h>1)break;
  }
    print(paste(c("CG steps",i,"Residual reduction",signif(hh/sum(b*b),4)),collapse=" "));
    x;

}
r1 = rowSums(x);  r2 = colSums(x);
b = r2 - P((r1/a1), w);
G = cg(b);

  base = mean(G); i=c(1961,1990)-yr0; if(i[1]>0 && i[2]  g=G = G - base;  # Global temp normalised to base period;
  G[r2==0] = NA;
#  Finally get station local (seasonal) offsets L is nstx12 array;
  for(i in 1:dix[1]){ 
    L[i] = r1[i]-sum(w[i,]*G, na.rm = T);
  }
  L = L/a1; L = array(L,c(nst,12));
 
#  Plots
  if(UseMaps)mapstat();
  j=dim(plotyrs)[2];
  for(i in 1:j ){
    plotyr(plotyrs[1,i]:plotyrs[2,i]); # Plots in deg/decade;
  }
# calc trends
  for(i in 1:numtrends){
    ii=2*i;      k=trendyrs[,i];  kt=k[1]:k[2]-yr0;
    h=signif(summary(lm(G[kt] ~ kt))$coef[2,1:2]*10,4)
    res1[ii+1:2]=paste(h[1:2]);
  }
# Compile results array
  res1[c(1,2,ii+6)]=c(tc,nst,tc);  res1[ii+3+1:nyr]=signif(G,4);
  results=rbind(results,res1);
  pr(paste(c("Trend ",res1[0:1*2+3], "  time (s)",timer(time0)),collapse=" "));
 }
####  End of Main Loop.  Print Output file
i=dim(results)[1];
write(format(title[ic],width=10), restext, ncolumns=i, append=F);
write(format(I(results),width=10,scientific=F),restext,ncolumns=i,append=T);

############# End GlobTempLSv1.4.1.r  ##################




 

15 comments:

  1. My email name is nastokes and I'm at westnet - plus a com and an au

    ReplyDelete
  2. Hi Nick, I only have your "spam" email address above so there are a couple of emails to you on that! Feel free to delete this post when you have read it! Peter D

    ReplyDelete
  3. Nick I'd be happy to post you the data I have - send me your email address to john at johnlsayers.com

    ReplyDelete
  4. Hi Nick, I've sent you an email to the nastokes westnet address given above. Hope it reaches you.
    Nic Lewis

    ReplyDelete
  5. Nick, nice work, the NOAA link is broke though

    And does anyone have temp data that is not already processed, i.e. not anomoly data, the raw data.

    Ideally never adjusted or manipulated in any way.

    stock@hawaii.rr.com

    ReplyDelete
  6. Nick it looks like you aren't updating the JAXA sea ice extent which has moved and is now available at https://ads.nipr.ac.jp/vishop/vishop-extent.html
    and https://ads.nipr.ac.jp/vishop/data/graph/plot_extent_n_v2.csv

    thanks

    ReplyDelete
    Replies
    1. David,
      Thanks for that. I've updated - should be showing in a few minutes.

      Delete
    2. Alas, not so easy. I've updated manually, but they seem to have blocked automatic download. Investigating...

      Delete
    3. Use wget -U … with the appropriate text that follows (email me if you need a string that works after the -U). Sorry about being a bit cryptic but I don't want to make life easier for trolls.

      Delete
    4. Thanks, Carrick,
      Actually, it just needed a cURL update to support https. But I'll keep in mind the -U flag.

      Delete
  7. Question: Is the Trend Viewer source code available for general consumption? fjmlinar@gmail.com

    ReplyDelete
    Replies
    1. Frank,
      Most of the Trendviewer action is in the Javascript. You can alwasy read that - it's a browser requirement. Just look at the HTML (Ctrl-U) and download any .js files (src="..."). I don't "minify" the code, so it is quite readable.

      The rest is done in R. That just makes the triangle plot and the time series plot, and packs the data for JS. Each triangle dot is a regression, but it has to be done in an efficient way, because it is quite time consuming. The various significance parameters are calculated there too.

      I'll email the R code and some data.

      Delete