Muon isolation dataset

The data.h5 file contains the images, isolation cones and sample-wise weights for signal and background events. The data is already pre-split into training, validation and testing data.

The dataset contains 18 isolation cones and 499970 images of 32 by 32 pixels. The images dataset contains 249979 images for signal and 249991 for background images. The data is split approximately as 83% training, 8.5% validation and 8.5% testing.

You can access the datasets with the following commands:

with h5py.File("data.h5", "r") as hf:
    print(hf.keys())
    for key in list(hf.keys()):
        for dset in ['train', 'valid', 'test']:
            data = hf_save[key+'/'+dset][:]
            print(key, dset, data.shape)
            

The keys are ['images_bg', 'images_signal', 'iso_bg', 'iso_signal', 'weights_bg', 'weights_signal']

For the EFPs:

The EFPs are stored in an h5 file with an 'efps' group and a 'targets' group. The 'efps' group contains 18072 individual efps with the graph description (n,d,k) number and (k,b) parameter given in the title. For example, graph (8,7,9) with (k=2,b=2) is stored as 'efp_8_7_9_k_2_b_2'. Each EFP is a 1 dimensional numpy array of size (49970,1).


For more information please check 

Learning to Isolate Muons
Julian Collado, Kevin Bauer, Edmund Witkowski, Taylor Faucett, Daniel Whiteson, Pierre Baldi
https://arxiv.org/abs/2102.02278