Automatic Miscalibration Diagnosis: Interpreting Probability Integral Transform (PIT) Histograms
Autoři
Podsztavek, O.; Jordan, A.I.; Tvrdík, P.; Polsterer, K.L.
Rok
2024
Publikováno
ESANN 2024 proceedings. Louvain la Neuve: Ciaco - i6doc.com, 2024. p. 137-142. ISBN 978-2-87587-090-2.
Typ
Stať ve sborníku
Pracoviště
Anotace
Quantifying the predictive uncertainty of a model is essential for risk assessment. We address the proper calibration of the predictive uncertainty in regression tasks by employing the probability integral transform (PIT) histogram to diagnose miscalibration. PIT histograms are often difficult to interpret, and therefore we present an approach to an automatic interpretation of PIT histograms based on an interpreter trained with a synthetic data set. Given a PIT histogram of a model and a data set, the interpreter can estimate the data-generating distribution of the data set with the main purpose of identifying the cause of miscalibration.
Consistency check of automatic pipeline measurements of quasar redshifts with Bayesian convolutional networks
Autoři
Škoda, P.; Podsztavek, O.
Rok
2023
Publikováno
Astronomical Data Analysis Software and Systems XXXII. San Francisco: Astronomical Society of the Pacific, 2023.
Typ
Stať ve sborníku
Pracoviště
Anotace
Spectroscopic redshifts of quasars are important inputs for constructing many cosmological models. Redshift measurement is generally considered to be a straightforward task performed by automatic pipelines based on template matching.
Due to the millions of spectra delivered by surveys of SDSS or LAMOST telescopes, it is impossible to verify all redshift measurements of automatic pipelines by a human visual inspection. However, the pipeline results are still taken as the "ground truth" for further statistical inferences.
Nevertheless, because of the similarity of patterns of quasar emission lines in different spectral ranges, an optimal match may be found for a completely different template position, causing severe errors in the measured redshift. For example, it may easily happen that a faint emission star with a noisy spectrum is identified as a high redshift quasar and vice versa.
We show such examples discovered by the consistency check of redshift measurements of the SDSS pipeline and redshift predictions of a regression Bayesian convolutional network. The network is trained on a large amount of human-inspected redshifts and predicts redshifts together with their predictive uncertainties. Therefore, it can also identify cases where predictions are uncertain and thus require human visual inspection.
Lessons Learned from Ariel Data Challenge 2022 - Inferring Physical Properties of Exoplanets From Next-Generation Telescopes
Autoři
Yip, K.H.; Changeat, Q.; Waldmann, I.; Unlu, E.B.; Forestano, R.T.; Roman, A.; Matcheva, K.; Matchev, K.T.; Stefanov, S.; Podsztavek, O.; Morvan, M.; Nikolaou, N.; Al-Refaie, A.; Jenner, C.; Johnson, C.; Tsiaras, A.; Edwards, B.; Alves de Oliveira, C.; Thiyagalingam, J.; Lagage, P.-O.; Cho, J.; Tinetti, G.
Rok
2023
Publikováno
Proceedings of the NeurIPS 2022 Competitions Track. Proceedings of Machine Learning Research, 2023. p. 1-17. Proceedings of Machine Learning Research. vol. 220. ISSN 2640-3498.
Typ
Stať ve sborníku
Pracoviště
Anotace
Exo-atmospheric studies, i.e. the study of exoplanetary atmospheres, is an emerging frontier in Planetary Science. To understand the physical properties of hundreds of exoplanets, astronomers have traditionally relied on sampling-based methods. However, with the growing number of exoplanet detections (i.e. increased data quantity) and advancements in technology from telescopes such as JWST and Ariel (i.e. improved data quality), there is a need for more scalable data analysis techniques. The Ariel Data Challenge 2022 aims to find interdisciplinary solutions from the NeurIPS community. Results from the challenge indicate that machine learning (ML) models have the potential to provide quick insights for thousands of planets and millions of atmospheric models. However, the machine learning models are not immune to data drifts, and future research should investigate ways to quantify and mitigate their negative impact.
Prototype of Interactive Visualisation Tool for Bayesian Active Deep Learning
Autoři
Podsztavek, O.; Škoda, P.; Tvrdík, P.
Rok
2023
Publikováno
Astronomy Data Analysis Software and Systems XXXI. San Francisco: Astronomical Society of the Pacific, 2023. p. 91-94. ISBN 978-1-58381-957-9.
Typ
Stať ve sborníku
Pracoviště
Anotace
n the era of big data in astronomy, we need to develop methods to analyse the data. One such method is Bayesian active deep learning (synergy of Bayesian convolutional neural networks and active learning). To improve the method’s performance, we have developed a prototype of an interactive visualisation tool for a selection of an informative (contains data with high predictive uncertainty, is diverse, but not redundant) data subsample for labelling by a human expert. The tool takes as input a sample of data with the highest predictive uncertainty. These data are projected to 2-D with a dimensionality reduction technique. We visualise the projected data in an interactive scatter plot and allow a human expert to label a selected subsample of data. With this tool, she or he can select a correct subsample with all the previously mentioned characteristics. This should lower the total amount of data labelled because the Bayesian model’s performance will improve faster than when the data are selected automatically.
Spectroscopic redshift determination with Bayesian convolutional networks
Autoři
Podsztavek, O.; Škoda, P.; Tvrdík, P.
Rok
2022
Publikováno
Astronomy and Computing. 2022, 40 ISSN 2213-1337.
Typ
Článek
Pracoviště
Anotace
Astronomy is facing large amounts of data, so astronomers have to rely on automated methods to analyse them. However, automated methods might produce incorrect values. Therefore, we need to develop different automated methods and perform a consistency check to identify them. If there is a lot of labelled data, convolutional neural networks are a powerful method for any task. We illustrate the consistency check on spectroscopic redshift determination with a method based on a Bayesian convolutional neural network inspired by VGG networks. The method provides predictive uncertainties that enable us to (1.) determine unusual or problematic spectra for visual inspection; (2.) do thresholding that allows us to balance between the error of redshift predictions and coverage. We used the 12th Sloan Digital Sky Survey quasar superset as the training set for the method. We evaluated its generalisation capability on about three-quarters of a million spectra from the 16th quasar superset of the same survey. On the 16th quasar superset, the method performs better in terms of the root-mean-squared error than the most used template fitting method. Using redshift predictions of the proposed method, we identified spectra with incorrectly determined redshifts that are unrecognised quasars or were misclassified as them.
Transfer Learning in Large Spectroscopic Surveys
Autoři
Podsztavek, O.; Škoda, P.; Tvrdík, P.
Rok
2021
Publikováno
Astronomical Data Analysis Software and Systems XXX. San Francisco: Astronomical Society of the Pacific, 2021. p. 235-238. Astronomical Society of the Pacific Conference Series. vol. 532. ISBN 978-1-58381-951-7.
Typ
Stať ve sborníku
Pracoviště
Anotace
Transfer learning is a machine learning method that can reuse knowledge across spectroscopic archives with different distributions of observations. We applied transfer learning based on a convolutional neural network to spectra from Large Sky Area Multi-Object Fiber Spectroscopic Telescope and Sloan Digital Sky Survey archives. Taking advantage of known quasars in LAMOST DR5 version 3, we wanted to discover yet unseen quasars in SDSS DR14. Our transfer learning approach reaches 99.6% precision and 98.9% recall. We found examples of quasars previously classified as stars.
Active deep learning method for the discovery of objects of interest in large spectroscopic surveys
Autoři
Škoda, P.; Podsztavek, O.; Tvrdík, P.
Rok
2020
Publikováno
Astronomy & Astrophysics. 2020, 643 ISSN 1432-0746.
Typ
Článek
Pracoviště
Anotace
Context. Current archives of the LAMOST telescope contain millions of pipeline-processed spectra that have probably never been seen by human eyes. Most of the rare objects with interesting physical properties, however, can only be identified by visual analysis of their characteristic spectral features. A proper combination of interactive visualisation with modern machine learning techniques opens new ways to discover such objects.
Aims. We apply active learning classification methods supported by deep convolutional neural networks to automatically identify complex emission-line shapes in multi-million spectra archives.
Methods. We used the pool-based uncertainty sampling active learning method driven by a custom-designed deep convolutional neural network with 12 layers. The architecture of the network was inspired by VGGNet, AlexNet, and ZFNet, but it was adapted for operating on one-dimensional feature vectors. The unlabelled pool set is represented by 4.1 million spectra from the LAMOST data release 2 survey. The initial training of the network was performed on a labelled set of about 13 000 spectra obtained in the 400 Å wide region around Hα by the 2 m Perek telescope of the Ondˇrejov observatory, which mostly contains spectra of Be and related early-type stars. The differences between the Ondˇrejov intermediate-resolution and the LAMOST low-resolution spectrographs were compensated for by Gaussian blurring and wavelength conversion.
Results. After several iterations, the network was able to successfully identify emission-line stars with an error smaller than 6.5%. Using the technology of the Virtual Observatory to visualise the results, we discovered 1 013 spectra of 948 new candidates of emission-line objects in addition to 664 spectra of 549 objects that are listed in SIMBAD and 2 644 spectra of 2 291 objects identified in an earlier paper of a Chinese group led by Wen Hou. The most interesting objects with unusual spectral properties are discussed in detail.
VO-supported Active Deep Learning as a New Methodology for the Discovery of Objects of Interest in Big Surveys
Autoři
Škoda, P.; Podsztavek, O.; Tvrdík, P.
Rok
2020
Publikováno
Astronomical Data Analysis Software and Systems XXIX. San Francisco: Astronomical Society of the Pacific, 2020. p. 163-166. Astronomical Society of the Pacific Conference Series. vol. 527. ISBN 978-1-58381-941-8.
Typ
Stať ve sborníku
Pracoviště
Anotace
Deep neural networks have been proved a very successful method of supervised learning in several research fields. To perform well, they require a massive amount of labelled data, which is challenging to get from most astronomical surveys. To overcome this limitation, we have developed a novel active deep learning method. It is based on an iterative training of a deep network followed by relabelling of a small sample according to a qualified decision of an oracle (usually a human expert). To maximise the scientific return, the oracle brings to the decision the domain knowledge not limited only to the data learned by the network. By combining some external resources to extract the key information by an expert in a field, much more relevant labels are assigned. Setup of an active deep learning platform thus requires incorporation of a Virtual Observatory (VO) client infrastructure as an integral part of a machine learning experiment, which is quite different from current practices. As proof of concept, we demonstrate the efficiency of our method for discovery of new emission-line stars in a multimillion spectra archive of the LAMOST DR2 survey.
Detekce anomálií v otevřených datech o znečištění ovzduší polétavým prachem
Autoři
Rok
2019
Publikováno
DATA A ZNALOSTI & WIKT 2019. Košice: Technická univerzita v Košiciach, 2019. p. 66-71. ISBN 978-80-553-3354-0.
Typ
Stať ve sborníku
Pracoviště
Anotace
Senzorická síť veřejného osvětlení na pražském Karlínském náměstí
poskytuje měření znečištění ovzduší polétavým prachem PM10 jako otevřená
data. V této práci v nich detekujeme anomálie pomocí algoritmů strojového učení
pro predikci časových řad a prahování. Chceme, aby se algoritmus strojového
učení naučil pravidelnosti v datech a pokud se stane něco neočekávaného, tak to
prahováním odhalíme. Experimentovali jsme s lineární regresí a LSTM
rekurentní neuronovou sítí, které jsme mezi sebou porovnávali střední
kvadratickou chybou. Ukázalo se, že lineární regrese, která predikuje z
posledních dvou měření, dosahuje lepších výsledků. Anomálie jsme detekovali z
rozdílů predikovaných a skutečných hodnot. Práh pro detekování anomálií jsme
vypočítali z histogramu rozdílů predikcí a skutečně naměřených hodnot.
Testování ukázalo, že takto navržená metoda dokáže odhalit některé anomálie v
měřeních polétavého prachu PM10, ale mnoho anomálií (například postupně
nabíhajících) nedetekuje.
Comparing Offline and Online Evaluation Results of Recommender Systems
Autoři
Kordík, P.; Řehořek, T.; Bíža, O.; Bartyzal, R.; Podsztavek, O.; Povalyev, I.P.
Rok
2018
Publikováno
REVEAL RecSyS 2018 workshop proceedings. New York: ACM, 2018.
Typ
Stať ve sborníku
Anotace
Recommender systems are usually trained and evaluated on historical data. Offline evaluation is, however, tricky and offline performance can be an inaccurate predictor of the online performance measured in production due to several reasons. In this paper, we experiment with two offline evaluation strategies and show that even a reasonable and popular strategy can produce results that are not just biased, but also in direct conflict with the true performance obtained in the online evaluation. We investigate offline policy evaluation techniques adapted from reinforcement learning and explain why such techniques fail to produce an unbiased estimate of the online performance in the “watch next” scenario of a large-scale movie recommender system. Finally, we introduce a new evaluation technique based on Jaccard Index and show that it correlates with the online performance.