Abstract:
Time series reconstruction is a crucial data processing step in time domain astronomy and serves as the foundation for fitting light curves and conducting time domain analysis. For many large-field time domain surveys, it is necessary to complete this computational process within a single exposure cycle. With the rapid increase in astronomical data, existing methods for astronomical data processing struggle to simultaneously meet the accuracy and efficiency requirements of time-series reconstruction. The memory-based computing general-purpose distributed framework, Spark, holds the potential to improve the efficiency of this process. However, applying Spark directly often encounters issues. MapReduce distributed models like Hadoop and Spark require relatively independent tasks among distributed cluster nodes and minimal data transfer across nodes during execution. Otherwise, frequent communication becomes an efficiency bottleneck for the application of the model. However, due to the presence of boundary problems in cross-matching, it is inevitable to transmit newly added data at the boundaries, severely restricting the concurrency of the model and reducing the acceleration ratio in practical parallel model applications. Therefore, we propose a non-blocking asynchronous execution flow, where each distributed process handles continuous processing exclusively for independent sky regions. The delayed batch appending of additional identification tasks from block-edge newly added celestial bodies in other nodes is determined based on the progress of each process. This ensures that identification calculations are not omitted, thereby improving concurrent efficiency while maintaining algorithm accuracy. Additionally, a research study was conducted on different join strategies between two tables, examining them from both theoretical and experimental perspectives. Furthermore, a join-free strategy was proposed. Finally, the design of an efficient time-series reconstruction system based on the Spark distributed framework validates the aforementioned research. Experimental results demonstrate a significant improvement in the efficiency of the proposed time-series reconstruction algorithm compared to previous research, laying a solid foundation for the analysis of astronomical time-series data in time-domain astronomy.