Finding Ghost Particles

Advancing energy regression in neutrino research through machine learning-driven waveform analysis.

Get started now View it on GitHub Our report

The purple button above links to our project repo, click here or the link at the top right corner of this page to see the repo for this website.


Introduction

Our Applied Data Science project in NPML(Neutrino Physics Data Science) applies machine learning to analyze time-series waveform data from the Majorana Demonstrator to detect energies in neutrinoless double beta decay by extracting key features and building ML/DL models. This could explain the matter-antimatter asymmetry and their unique mass properties, but background interference complicates signal identification. By optimizing feature extraction and model selection, our research reinforces neutrino detection precision, advancing our understanding of fundamental particle physics.

This website focuses on the regression of the NPML project. For details on classification, please visit the classification group’s website.

The Neutrino

They are subatomic particles with no electric charge and very little mass, subject to interactions via gravitational and weak nuclear forces – Hard to directly probe.

Waveform gif

The neutrinos are everywhere and naturally produced through nuclear reactions in the sun, supernovae explosions, radioactive decays, cosmic rays, etc.

0vbb


Overview on data and parameters

This section gives our viewers a quick look on the data, extracted parameters and our goal for this project. Please click the link or the sidebar on the left for more details.

Interactive plot of parameters

Raw Waveform

The image below is an example of waveforms,there are millions of waveforms in our datasets, we can extract 12 unique features from each of the waveforms to use in our models. See the parameters section under the data section in the sidebar for more details.

Raw Waveform


Interactive plot for parameters

The interactive plot below allows you to explore different extracted parameters visually. Use the dropdown menu to switch between plots and observe how each parameter behaves in the waveform analysis. This visualization provides a intuition into the key features that drive our machine learning models in the NPML project.

Not all the parameters can be visualized.

Selected Image

Getting started

Step 1: Installation Instructions

How to clone the repository:

git clone https://github.com/axie0927/FindingGhostParticles-RegressionSubgroup.git

Make sure you have Anaconda installed for the next step.

Step 2: Anaconda Environemnt Instructions

1. Replace name_of_environment with a name you like:

conda env create -f environment.yml --name name_of_environment

2. Activate the environment:

conda activate name_of_environment

Step 3: Download the Proprocessed Dataset or Preprocess your own raw Data:

Option 1: Download the preprocessed dataset(Recommended):

  1. Download the preprocessed data from this link, place all the csv files in the ‘Data’ folder under src/Models before running the .py files.
  2. (Optional): If you would like to use the notebooks located at src/Models/Notebooks, also place the files from the link above in the ‘Data’ folder under src/Models/Notebooks.

Option 2: Proprocess your own data:

There are 25 different data files, and this data is not processed. In order to extract parameters from the data, download the raw data and run the Master.py script located in the src folder of the repository. The src folder also contains a parameter-functions folder with each parameter extraction function separately defined. Due to the large size of the data files, the processed data will not be kept in this repository.

  1. Download the raw data at this link.
  2. Create a directory at src/Parameter Extraction and name it ‘data’, place all the raw data in it.
  3. Run the code below in your terminal:
    cd src/Parameter\ Extraction
    
    python3 Master.py
    
  4. Place all the generated csv files in the ‘Data’ folder under src/Models before running the .py files.
  5. (Optional): If you would like to use the notebooks located at src/Models/Notebooks, also place the files from the link above in the ‘Data’ folder under src/Models/Notebooks.

Step 4: Apply the models on the processed dataset:

⚠️ Warning: Make sure there are 4 files in the data folder – MJD_NPML_PCOCESSED.csv,MJD_TEST_PCOCESSED.csv, MJD_Train_PCOCESSED.csv,npml_cut.csv. Where npml_cut.csv is the predictions of classification group.

❗ Important: DeepLearning_NN.py is our best final Model,not only it is applied on test set but also generates the predictions on NPML dataset which is the real world data without known true value, the others are applied only on test set for reference.

1. Move to the Models directory:

cd src/Models

2. Replace the_model_you_like.py with the true model name:

python3 the_model_you_like.py

After finishing the step above, please see your terminal for the results and guides 😉