按提交时间
按主题分类
按作者
按机构
您选择的条件: Chenzhou Cui
  • Photometric redshift estimation of galaxies in the DESI Legacy Imaging Surveys

    分类: 天文学 >> 天文学 提交时间: 2023-02-19

    摘要: The accurate estimation of photometric redshifts plays a crucial role in accomplishing science objectives of the large survey projects. The template-fitting and machine learning are the two main types of methods applied currently. Based on the training set obtained by cross-correlating the DESI Legacy Imaging Surveys DR9 galaxy catalogue and SDSS DR16 galaxy catalogue, the two kinds of methods are used and optimized, such as EAZY for template-fitting approach and CATBOOST for machine learning. Then the created models are tested by the cross-matched samples of the DESI Legacy Imaging SurveysDR9 galaxy catalogue with LAMOST DR7, GAMA DR3 and WiggleZ galaxy catalogues. Moreover three machine learning methods (CATBOOST, Multi-Layer Perceptron and Random Forest) are compared, CATBOOST shows its superiority for our case. By feature selection and optimization of model parameters, CATBOOST can obtain higher accuracy with optical and infrared photometric information, the best performance ($MSE=0.0032$, $\sigma_{NMAD}=0.0156$ and $O=0.88$ per cent) with $g \le 24.0$, $r \le 23.4$ and $z \le 22.5$ is achieved. But EAZY can provide more accurate photometric redshift estimation for high redshift galaxies, especially beyond the redhisft range of training sample. Finally, we finish the redshift estimation of all DESI DR9 galaxies with CATBOOST and EAZY, which will contribute to the further study of galaxies and their properties.

  • Photometric redshift estimation of galaxies in the DESI Legacy Imaging Surveys

    分类: 天文学 >> 天文学 提交时间: 2023-02-19

    摘要: The accurate estimation of photometric redshifts plays a crucial role in accomplishing science objectives of the large survey projects. The template-fitting and machine learning are the two main types of methods applied currently. Based on the training set obtained by cross-correlating the DESI Legacy Imaging Surveys DR9 galaxy catalogue and SDSS DR16 galaxy catalogue, the two kinds of methods are used and optimized, such as EAZY for template-fitting approach and CATBOOST for machine learning. Then the created models are tested by the cross-matched samples of the DESI Legacy Imaging SurveysDR9 galaxy catalogue with LAMOST DR7, GAMA DR3 and WiggleZ galaxy catalogues. Moreover three machine learning methods (CATBOOST, Multi-Layer Perceptron and Random Forest) are compared, CATBOOST shows its superiority for our case. By feature selection and optimization of model parameters, CATBOOST can obtain higher accuracy with optical and infrared photometric information, the best performance ($MSE=0.0032$, $\sigma_{NMAD}=0.0156$ and $O=0.88$ per cent) with $g \le 24.0$, $r \le 23.4$ and $z \le 22.5$ is achieved. But EAZY can provide more accurate photometric redshift estimation for high redshift galaxies, especially beyond the redhisft range of training sample. Finally, we finish the redshift estimation of all DESI DR9 galaxies with CATBOOST and EAZY, which will contribute to the further study of galaxies and their properties.

  • HLC2: a highly efficient cross-matching framework for large astronomical catalogues on heterogeneous computing environments

    分类: 天文学 >> 天文学 提交时间: 2023-02-19

    摘要: Cross-matching operation, which is to find corresponding data for the same celestial object or region from multiple catalogues,is indispensable to astronomical data analysis and research. Due to the large amount of astronomical catalogues generated by the ongoing and next-generation large-scale sky surveys, the time complexity of the cross-matching is increasing dramatically. Heterogeneous computing environments provide a theoretical possibility to accelerate the cross-matching, but the performance advantages of heterogeneous computing resources have not been fully utilized. To meet the challenge of cross-matching for substantial increasing amount of astronomical observation data, this paper proposes Heterogeneous-computing-enabled Large Catalogue Cross-matcher (HLC2), a high-performance cross-matching framework based on spherical position deviation on CPU-GPU heterogeneous computing platforms. It supports scalable and flexible cross-matching and can be directly applied to the fusion of large astronomical cataloguesfrom survey missions and astronomical data centres. A performance estimation model is proposed to locate the performance bottlenecks and guide the optimizations. A two-level partitioning strategy is designed to generate an optimized data placement according to the positions of celestial objects to increase throughput. To make HLC2 a more adaptive solution, the architecture-aware task splitting, thread parallelization, and concurrent scheduling strategies are designed and integrated. Moreover, a novel quad-direction strategy is proposed for the boundary problem to effectively balance performance and completeness. We have experimentally evaluated HLC2 using public released catalogue data. Experiments demonstrate that HLC2 scales well on different sizes of catalogues and the cross-matching speed is significantly improved compared to the state-of-the-art cross-matchers.

  • Photometric Redshift Estimation of BASS DR3 Quasars by Machine Learning

    分类: 天文学 >> 天文学 提交时间: 2023-02-19

    摘要: Correlating BASS DR3 catalogue with ALLWISE database, the data from optical and infrared information are obtained. The quasars from SDSS are taken as training and test samples while those from LAMOST are considered as external test sample. We propose two schemes to construct the redshift estimation models with XGBoost, CatBoost and Random forest. One scheme (namely one-step model) is to predict photometric redshifts directly based on the optimal models created by these three algorithms; the other scheme (namely two-step model) is to firstly classify the data into low- and high- redshift datasets, and then predict photometric redshifts of these two datasets separately. For one-step model, the performance of these three algorithms on photometric redshift estimation is compared with different training samples, and CatBoost is superior to XGBoost and Random forest. For two-step model, the performance of these three algorithms on the classification of low- and high-redshift subsamples are compared, and CatBoost still shows the best performance. Therefore CatBoost is regard as the core algorithm of classification and regression in two-step model. By contrast with one-step model, two-step model is optimal when predicting photometric redshift of quasars, especially for high redshift quasars. Finally the two models are applied to predict photometric redshifts of all quasar candidates of BASS DR3. The number of high redshift quasar candidates is 3938 (redshift $\ge 3.5$) and 121 (redshift $\ge 4.5$) by two-step model. The predicted result will be helpful for quasar research and follow up observation of high redshift quasars.

  • Identification of BASS DR3 Sources as Stars, Galaxies and Quasars by XGBoost

    分类: 天文学 >> 天文学 提交时间: 2023-02-19

    摘要: The Beijing-Arizona Sky Survey (BASS) Data Release 3 (DR3) catalogue was released in 2019, which contains the data from all BASS and the Mosaic z-band Legacy Survey (MzLS) observations during 2015 January and 2019 March, about 200 million sources. We cross-match BASS DR3 with spectral databases from the Sloan Digital Sky Survey (SDSS) and the Large Sky Area Multi-object Fiber Spectroscopic Telescope (LAMOST) to obtain the spectroscopic classes of known samples. Then, the samples are cross-matched with ALLWISE database. Based on optical and infrared information of the samples, we use the XGBoost algorithm to construct different classifiers, including binary classification and multiclass classification. The accuracy of these classifiers with the best input pattern is larger than 90.0 per cent. Finally, all selected sources in the BASS DR3 catalogue are classified by these classifiers. The classification label and probabilities for individual sources are assigned by different classifiers. When the predicted results by binary classification are the same as multiclass classification with optical and infrared information, the number of star, galaxy and quasar candidates is separately 12 375 838 (P_S>0.95), 18 606 073 (P_G>0.95) and 798 928 (P_Q>0.95). For these sources without infrared information, the predicted results can be as a reference. Those candidates may be taken as input catalogue of LAMOST, DESI or other projects for follow up observation. The classified result will be of great help and reference for future research of the BASS DR3 sources.

  • HLC2: a highly efficient cross-matching framework for large astronomical catalogues on heterogeneous computing environments

    分类: 天文学 >> 天文学 提交时间: 2023-02-19

    摘要: Cross-matching operation, which is to find corresponding data for the same celestial object or region from multiple catalogues,is indispensable to astronomical data analysis and research. Due to the large amount of astronomical catalogues generated by the ongoing and next-generation large-scale sky surveys, the time complexity of the cross-matching is increasing dramatically. Heterogeneous computing environments provide a theoretical possibility to accelerate the cross-matching, but the performance advantages of heterogeneous computing resources have not been fully utilized. To meet the challenge of cross-matching for substantial increasing amount of astronomical observation data, this paper proposes Heterogeneous-computing-enabled Large Catalogue Cross-matcher (HLC2), a high-performance cross-matching framework based on spherical position deviation on CPU-GPU heterogeneous computing platforms. It supports scalable and flexible cross-matching and can be directly applied to the fusion of large astronomical cataloguesfrom survey missions and astronomical data centres. A performance estimation model is proposed to locate the performance bottlenecks and guide the optimizations. A two-level partitioning strategy is designed to generate an optimized data placement according to the positions of celestial objects to increase throughput. To make HLC2 a more adaptive solution, the architecture-aware task splitting, thread parallelization, and concurrent scheduling strategies are designed and integrated. Moreover, a novel quad-direction strategy is proposed for the boundary problem to effectively balance performance and completeness. We have experimentally evaluated HLC2 using public released catalogue data. Experiments demonstrate that HLC2 scales well on different sizes of catalogues and the cross-matching speed is significantly improved compared to the state-of-the-art cross-matchers.

  • Update of the China-VO AstroCloud

    分类: 天文学 >> 天文学 提交时间: 2016-11-16

    摘要: As the cyber-infrastructure for Astronomical research from Chinese Virtual Observatory (China-VO) project, AstroCloud has been archived solid progresses during the last one year. Proposal management system and data access system are redesigned. Several new sub-systems are developed, including China-VO PaperData, AstroCloud Statics and Public channel. More data sets and application environments are integrated into the platform. LAMOST DR1, the largest astronomical spectrum archive was released to the public using the platform. The latest progresses will be introduced.

  • The LAMOST Data Archive and Data Release

    分类: 天文学 >> 天文学 提交时间: 2016-11-16

    摘要: The Large sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) is the largest optical telescope in China. In last four years, the LAMOST telescope has published four editions data (pilot data release, data release 1, data release 2 and data release 3). To archive and release these data (raw data, catalog, spectrum etc),we have set up a data cycle management system, including the transfer of data, archiving,backup. And through the evolution of four software versions, mature established data release system.

  • Efficient Catalog Matching with Dropout Detection

    分类: 天文学 >> 天文学 提交时间: 2016-11-15

    摘要: Not only source catalogs are extracted from astronomy observations. Their sky coverage is always carefully recorded and used in statistical analyses, such as correlation and luminosity function studies. Here we present a novel method for catalog matching, which inherently builds on the coverage information for better performance and completeness. A modified version of the Zones Algorithm is in-troduced for matching partially overlapping observations, where irrelevant parts of the data are excluded up front for efficiency. Our design enables searches to focus on specific areas on the sky to further speed up the process. Another im-portant advantage of the new method over traditional techniques is its ability to quickly detect dropouts, i.e., the missing components that are in the observed regions of the celestial sphere but did not reach the detection limit in some observations. These often provide invaluable insight into the spectral energy dis-tribution of the matched sources but rarely available in traditional associations.

  • Enhanced management of personal astronomical data with FITSManager

    分类: 天文学 >> 天文仪器与技术 提交时间: 2016-05-05

    摘要: Although the roles of data centers and computing centers are becoming more and more important, and on-line research is becoming the mainstream for astronomy, individual research based on locally hosted data is still very common. With the increase of personal storage capacity, it is easy to find hundreds to thousands of FITS files in the personal computer of an astrophysicist. Because Flexible Image Transport System (FITS) is a professional data format initiated by astronomers and used mainly in the small community, data management toolkits for FITS files are very few. Astronomers need a powerful tool to help them manage their local astronomical data. Although Virtual Observatory (VO) is a network oriented astronomical research environment, its applications and related technologies provide useful solutions to enhance the management and utilization of astronomical data hosted in an astronomer’s personal computer. FITSManager is such a tool to provide astronomers an efficient management and utilization of their local data, bringing VO to astronomers in a seamless and transparent way. FITSManager provides fruitful functions for FITS file management, like thumbnail, preview, type dependent icons, header keyword indexing and search, collaborated working with other tools and on-line services, and so on. The development of the FITSManager is an effort to fill the gap between management and analysis of astronomical data.

  • AstroCloud, a Cyber-Infrastructure for Astronomy Research: Overview

    分类: 天文学 >> 天文仪器与技术 提交时间: 2016-04-27

    摘要: AstroCloud is a cyber-Infrastructure for Astronomy Research initiated by Chinese Virtual Observatory (China-VO) under funding support from NDRC (Na- tional Development and Reform commission) and CAS (Chinese Academy of Sci- ences). Tasks such as proposal submission, proposal peer-review, data archiving, data quality control, data release and open access, Cloud based data processing and analyz- ing, will be all supported on the platform. It will act as a full lifecycle management system for astronomical data and telescopes. Achievements from international Virtual Observatories and Cloud Computing are adopted heavily. In this paper, backgrounds of the project, key features of the system, and latest progresses are introduced.

  • AstroCloud, a Cyber-Infrastructure for Astronomy Research: Data Archiving and Quality Control

    分类: 天文学 >> 天文仪器与技术 提交时间: 2016-04-27

    摘要: AstroCloud is a cyber-Infrastructure for Astronomy Research initiated by Chinese Virtual Observatory (China-VO) under funding support from NDRC (National Development and Reform commission) and CAS (Chinese Academy of Sciences)1(Cui et al. 2014). To archive the astronomical data in China, we present the implementation of the astronomical data archiving system (ADAS). Data archiving and quality control are the infrastructure for the AstroCloud. Throughout the data of the entire life cy- cle, data archiving system standardized data, transferring data, logging observational data, archiving ambient data, And storing these data and metadata in database. Quality control covers the whole process and all aspects of data archiving.