Validating a Reconstruction

Validating a Reconstruction#

This notebook compares our paleoclimate reconstruction to both:

The original LMRv2.1 (offline) reconstruction
Multiple instrumental datasets (HadCRUT5, GISTEMP, Berkeley Earth)

We assess the quality of the reconstruction using:

Global mean surface temperature (GMST) comparisons (median and spread)
Climate field skill metrics (correlation, coefficient of efficiency)
Ensemble similarity via the plume distance framework of Emile-Geay et al. (2025)

%load_ext autoreload
%autoreload 2

import cfr
import numpy as np
print(cfr.__version__)
import pens
import xarray as xr
import pandas as pd
import matplotlib.pyplot as plt

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
2025.5.7

Load Recon Job#

Load our reconstruction. We only need to see 'tas_gm' which is the global mean surface temperature (ensemble time series) and 'tas' which is the surface temperature climate field.

Note: Here we are validating the reconstruction using the proxy database derived from Step 2b filtered by archive and resolution. Results will vary slightly by variant of Step 2.

res = cfr.ReconRes('./recons/lmr_reproduce_pda_ptype_res/')
res.load(['tas', 'tas_gm'])

Load GMST from our Recon Job#

Here just loading the 'tas_gm' component of our reconstruction

res_ts = res.recons['tas_gm']

Plot both versions of LMR to see comparison#

We want to see whether the general shape of the two reconstructions match since we are of course, trying to reproduce it as closely as possible. Mainly, we want to see if the varability is similar between reconstructions and see if they both accurately identified the same climate extremes.

fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 8))

res_ts.plot_qs(ax=ax2, ylim=[-0.8, 0.6])
ax2.set_title('LMR Reproduced CFR Global Mean Temperature')

lmr_ens.plot_qs(ax=ax1, ylim=[-0.8, 0.6]) 
ax1.set_title('LMRv2.1 Global Mean Temperature')

plt.tight_layout()
plt.show()

../../_images/5a1c04a0504b26cc9193488ce4fee0f9e586c2acacde119643c5b16873f8d6d1.png

The ensemble distributions, as plotted here, look very similar, which is encouraging. Let us quantify that using the plume distance framework.

Validate Using PENS#

To quantitatively compare the original LMRv2.1 ensemble to the version reproduced using cfr, we apply the plume distance framework introduced by Emile-Geay et al. (2025) using the pens package.

This framework compares collections of climate trajectories (plumes) by evaluating:

Intra-ensemble distances, which describe the spread of trajectories within an ensemble.
Inter-ensemble distances, which measure differences between two ensembles.
The plume distance, a summary metric (°C) that captures the degree of similarity (or dissimilarity) between two plumes while accounting for internal variability.

We compute these distances using the GMST ensembles from the original LMRv2.1 and our cfr reproduction. Both are converted into pens.EnsembleTS objects for ease of comparison.

The closer the plume distance is to zero, the more statistically similar the ensembles are.

The code computes and visualizes the intra and inter-ensemble distances (within LMRv2.1 and the reproduction) and highlights the plume distance on the plot.

import pens
import seaborn as sns

plt.style.use('default')
pens.set_style()

# Convert cfr EnsTS to pens EnsTS
glob_pens = pens.EnsembleTS(time=lmr_ens.time, value=lmr_ens.value)
glob_pens.label = 'Original LMR'
glob_pens.time_unit = 'years'
glob_pens.value_name = 'GMST'
glob_pens.value_unit = '\N{DEGREE SIGN}C'

res_pens = pens.EnsembleTS(time=res_ts.time, value=res_ts.value)
res_pens.label = 'Reproduced LMR'
res_pens.time_unit = 'years'
res_pens.value_name = 'GMST'
res_pens.value_unit = '\N{DEGREE SIGN}C'

# Align time dimension for both EnsTS

glob_time = lmr_ens.time
res_time = res_ts.time.values

common_start = max(glob_time.min(), res_time.min())
common_end = min(glob_time.max(), res_time.max())

# Create time range array
timespan = np.array([common_start, common_end])

# Slice to common period
glob_pens_aligned = glob_pens.slice(timespan)
res_pens_aligned = res_pens.slice(timespan)

orig_intra = glob_pens_aligned.distance()
repro_intra = res_pens_aligned.distance()

# Calculate inter-ensemble distance 
inter_dist = glob_pens_aligned.distance(res_pens_aligned.value)
    
# Calculate plume distance 
plume_dist = glob_pens_aligned.plume_distance(res_pens_aligned.value, max_dist=1.0)

Computing intra-ensemble distance among possible pairs: 100%|██████████| 1999000/1999000 [00:41<00:00, 48146.60it/s]
Computing intra-ensemble distance among possible pairs: 100%|██████████| 12497500/12497500 [03:56<00:00, 52881.11it/s]
Computing inter-ensemble distance: 100%|██████████| 2000/2000 [03:28<00:00,  9.58it/s]
Computing inter-ensemble distance: 100%|██████████| 2000/2000 [03:29<00:00,  9.56it/s]

print("\nDistances between ensembles:")
print(f"Original intra-ensemble distance : {orig_intra},\ len={len(orig_intra)}", )
print(f"Reproduced intra-ensemble distance: {repro_intra},\ len={len(repro_intra)}")
print(f"Inter-ensemble distance: {inter_dist}")
print(f"Plume distance: {plume_dist}")

Distances between ensembles:
Original intra-ensemble distance : [0.40367679 0.21343417 0.04743079 ... 0.26221506 0.05118074 0.29547632],\ len=1999000
Reproduced intra-ensemble distance: [0.90063806 0.13543413 0.26447127 ... 0.20780622 0.14287555 0.06493068],\ len=12497500
Inter-ensemble distance: [0.31020584 0.59043221 0.17531498 ... 0.22892024 0.43669859 0.37176791]
Plume distance: 0.14388837320933334

# Create figure and plot
fig, ax = plt.subplots(figsize=(10, 6))

# Plot KDE for individual ensembles with explicit labels
sns.kdeplot(data=orig_intra, fill=False, ax=ax, common_norm=False, label='Original LMRv2.1')
sns.kdeplot(data=repro_intra, fill=False, ax=ax, common_norm=False, label='Reproduced LMRv2.1')

# Add inter-ensemble distribution
sns.kdeplot(data=inter_dist, fill=True, ax=ax, common_norm=False, color='silver', 
            label='inter-ensemble')

# Add plume distance line
ax.axvline(x=plume_dist, color="black", linestyle="--", label='plume distance')

# Add labels
ax.set_xlabel('Distance')
ax.set_ylabel('Density')
ax.set_title('Distance Distributions')
ax.legend()

plt.tight_layout()
plt.show()

../../_images/4dea7d32429fae2d3bb88cf755fbf43d9859ca3c591d0cd9b811c832a2f4771d.png

The distance between my reproduced LMRv2.1 and the original falls within the spread of intra-ensemble variability, and tracks almost exactly with the original distribution, which shows that the reconstructions are very similar and within expected error of each other.

Validating GMST Against Instrumental Datasets#

Firstly we will validate our reconstruction’s GMST against the original LMRv2.1 (offline) and some instrumental datasets (HadCRUT4, BEST, GISTEMP). Since data was not available for all instrumental datasets, we are using whichever ones we can find an exact match for, all of which came from the initial data folder.

cfr’s EnsTS.compare() will take the ensemble time series and the plot the median and the spread of our original dataset and plot it against the target, or validation, dataset.

Note: Validation is restricted to 1880–2000, the common period across reconstructions and observational datasets.

Validate against HadCRUT5#

Same process as before, but this time we use HadCRUT5. This time we load the dataset from the cloud, then convert the dataframe to a cfr EnsTS (Ensemble Time Series) object

Note: Tardif et al. (2019) used HadCRUT4 to validate, but since we are running this experiment six years later, it is best to use more up-to-date data.

url = 'https://www.metoffice.gov.uk/hadobs/hadcrut5/data/HadCRUT.5.0.2.0/analysis/diagnostics/HadCRUT.5.0.2.0.analysis.summary_series.global.annual.csv'
df = pd.read_csv(url)
df.head()

	Time	Anomaly (deg C)	Lower confidence limit (2.5%)	Upper confidence limit (97.5%)
0	1850	-0.417711	-0.589256	-0.246166
1	1851	-0.233350	-0.411868	-0.054832
2	1852	-0.229399	-0.409382	-0.049416
3	1853	-0.270354	-0.430009	-0.110700
4	1854	-0.291521	-0.432712	-0.150330

had_ts_annual = cfr.EnsTS().from_df(df=df,time_column='Time', value_columns='Anomaly (deg C)')
had_ts_compared = res_ts.compare(
    had_ts_annual, 
    ref_name='HadCRUT5', 
    timespan=(1880, 2000)
)

fig, ax = had_ts_compared.plot_qs(figsize=[12, 4], xlim=[1850,2000])

../../_images/a6a0982ce83aa47bf54024f9f762f92bed217dea60c051976a6b52179a852b84.png

The reconstruction shows a strong match with HadCRUT5, with a correlation of r=0.91 and CE = 0.64. This is a strong match, especially considering that HadCRUT5 was not used in the calibration (GISTEMPv4 was used for temperature). The reconstruction still aligns well with an independent dataset and shows that the reproduction closely emulates LMRv2.1 as well as the instrumental record.

Plot Consensus#

In addition to the individual validation analyses, we can now plot them all together along with a ‘consensus’ time series, which is the mean of all the instrumental datasets.

We create a consensus validation target by averaging HadCRUT5, GISTEMP, and BEST time series.

# combine all datasets to calculate mean
all_refs = np.array([
   had_ts_compared.ref_value[:121],
   gis_ts_compared.ref_value[:121],
   best_ts_compared.ref_value[:121]
])
mean_ref = np.mean(all_refs, axis=0)

mean_ts_annual = cfr.EnsTS(
    time=best_annual.year.values[30:-14], #manually slice dates
    value=mean_ref,
    value_name='Temperature Anomaly'
)

mean_ts_compared = res_ts.compare(
    mean_ts_annual,
    ref_name='Consensus',
    timespan=(1880, 2000)
)

fig, ax = mean_ts_compared.plot_qs(figsize=[12, 4], xlim=[1850, 2000])

../../_images/6ed5b3e696bbd2d6ff8e4ec0e8df24a0489aab08cc42294d38fe65676136fb65.png

# Create base plot without validation plot
fig, ax = had_ts_compared.plot_qs(
   figsize=[16, 8], 
   xlim=(1880, 2000),
   color='indianred',
   plot_valid=False 
)

for compared, color, label in [
   (had_ts_compared, 'black', 'HadCRUT5'),
   (gis_ts_compared, 'blue', 'GISTEMP'),
   (best_ts_compared, 'green', 'Berkeley Earth'),
   (mean_ts_compared, 'purple', 'Consensus')
   
]:
   stats = compared.valid_stats
   label = f'{label} (r={stats["corr"]:.2f}, CE={stats["CE"]:.2f})'
   ax.plot(compared.ref_time, compared.ref_value[:121], color=color, label=label)


plt.legend()
plt.title('CFR Reproduced vs Observational Data 1880-2000')

Text(0.5, 1.0, 'CFR Reproduced vs Observational Data 1880-2000')

../../_images/e59d7bb6b79922d6a6afde02ecd05f82567d8f11bc336f0aa6ec4719a80f529b.png

The reconstruction aligns closely with the instrumental consensus series, with r=0.92 and CE=0.73. This consensus represents an average across multiple observational datasets, so strong agreement here reinforces that the reconstruction captures the core features of the instrumental record. The fact that all targets (HadCRUT5, GISTEMPv4, BEST, and the consensus) fall within the ensemble spread shows the robustness of the reproduced reconstruction.

Summary#

We reproduced the LMRv2.1 global temperature reconstruction using the PAGES2k proxy database from the cfr package.

To get a close match, we filtered the proxies by both archive type and resolution (dt ≤ 1), which brought the number of records closer with those used in the original LMRv2.1.

After validating, the reproduced reconstruction matches the original very closely:

Global mean temperature time series align well with instrumental data sets and with the orignal LMRv2.1
Spatial correlation and CE maps show nearly identical patterns against instrumental data and the orignal LMRv2.1
There is strong agreement with independent observational targets (HadCRUT5, GISTEMPv4, BEST)

In future work:

More thoroughly explore how different filtering choices in the proxy database affect skill.
Include additional calibration targets such as precipitation
Include marine sediments into the proxy network

Validating a Reconstruction

Contents

Validating a Reconstruction#

Load Recon Job#

Load GMST from our Recon Job#

Load LMR offline (original LMRv2.1)#

Plot both versions of LMR to see comparison#

Validate Using PENS#

Validating GMST Against Instrumental Datasets#

Validate against GISS Surface Temperature Analysis (GISTEMP)#

Validate against HadCRUT5#

Validate against Berkeley Earth Surface Temperature (BEST)#

Plot Consensus#

Validate Climate Field#

Load field reconstruction#

Loading reproduced#

Validating our reconstruction using BEST#

Validating the original LMRv2.1 using BEST#

Validating our Reconstruction against original LMRv2.1#

Summary#