Tales From the Data

Data Pre-processing

This notebook takes raw photometric (energy amounts at different wavelengths of the visible spectrum) from stellar objects and prepares it for analysis in different Gaussian mixture clustering models

- raw data is crossmatched for common objects with data from a catalog of standard (non-variable), stacked stars
- the data is handled in Spark dataframes, then converted to numpy arrays and saved for analysis in GMM_plots notebook
- a plot of the color distributions for the catalog stars is generated at end

Tales From the Data

~an informal portfolio~

Astronomical Data in Spark- PreProcessing