Finding Ghost Particles
Advancing energy regression in neutrino research through machine learning-driven waveform analysis.
Get started now View it on GitHub Our report
The purple button above links to our project repo, click here or the link at the top right corner of this page to see the repo for this website.
Introduction
Our Applied Data Science project in NPML(Neutrino Physics Data Science) applies machine learning to analyze time-series waveform data from the Majorana Demonstrator to detect energies in neutrinoless double beta decay by extracting key features and building ML/DL models. This could explain the matter-antimatter asymmetry and their unique mass properties, but background interference complicates signal identification. By optimizing feature extraction and model selection, our research reinforces neutrino detection precision, advancing our understanding of fundamental particle physics.
This website focuses on the regression of the NPML project. For details on classification, please visit the classification group’s website.
The Neutrino
They are subatomic particles with no electric charge and very little mass, subject to interactions via gravitational and weak nuclear forces – Hard to directly probe.

The neutrinos are everywhere and naturally produced through nuclear reactions in the sun, supernovae explosions, radioactive decays, cosmic rays, etc.

Overview on data and parameters
This section gives our viewers a quick look on the data, extracted parameters and our goal for this project. Please click the link or the sidebar on the left for more details.
Interactive plot of parameters
Raw Waveform
The image below is an example of waveforms,there are millions of waveforms in our datasets, we can extract 12 unique features from each of the waveforms to use in our models. See the parameters section under the data section in the sidebar for more details.

Interactive plot for parameters
The interactive plot below allows you to explore different extracted parameters visually. Use the dropdown menu to switch between plots and observe how each parameter behaves in the waveform analysis. This visualization provides a intuition into the key features that drive our machine learning models in the NPML project.
Not all the parameters can be visualized.
Getting started
Step 1: Installation Instructions
How to clone the repository:
git clone https://github.com/axie0927/FindingGhostParticles-RegressionSubgroup.git
Make sure you have Anaconda installed for the next step.
Step 2: Anaconda Environemnt Instructions
1. Replace name_of_environment
with a name you like:
conda env create -f environment.yml --name name_of_environment
2. Activate the environment:
conda activate name_of_environment
Step 3: Download the Proprocessed Dataset or Preprocess your own raw Data:
Option 1: Download the preprocessed dataset(Recommended):
- Download the preprocessed data from this link, place all the csv files in the ‘Data’ folder under
src/Models
before running the .py files. - (Optional): If you would like to use the notebooks located at
src/Models/Notebooks
, also place the files from the link above in the ‘Data’ folder undersrc/Models/Notebooks
.
Option 2: Proprocess your own data:
There are 25 different data files, and this data is not processed. In order to extract parameters from the data, download the raw data and run the Master.py script located in the src folder of the repository. The src folder also contains a parameter-functions folder with each parameter extraction function separately defined. Due to the large size of the data files, the processed data will not be kept in this repository.
- Download the raw data at this link.
- Create a directory at
src/Parameter Extraction
and name it ‘data’, place all the raw data in it. - Run the code below in your terminal:
cd src/Parameter\ Extraction
python3 Master.py
- Place all the generated csv files in the ‘Data’ folder under
src/Models
before running the .py files. - (Optional): If you would like to use the notebooks located at
src/Models/Notebooks
, also place the files from the link above in the ‘Data’ folder undersrc/Models/Notebooks
.
Step 4: Apply the models on the processed dataset:
⚠️ Warning: Make sure there are 4 files in the data folder –
MJD_NPML_PCOCESSED.csv
,MJD_TEST_PCOCESSED.csv
,MJD_Train_PCOCESSED.csv
,npml_cut.csv
. Wherenpml_cut.csv
is the predictions of classification group.❗ Important:
DeepLearning_NN.py
is our best final Model,not only it is applied on test set but also generates the predictions on NPML dataset which is the real world data without known true value, the others are applied only on test set for reference.1. Move to the Models directory:
cd src/Models
2. Replace
the_model_you_like.py
with the true model name:python3 the_model_you_like.py
After finishing the step above, please see your terminal for the results and guides 😉