关键词:
相依函数型数据
均值检验
最低样本量
卡方分布
摘要:
随着科学技术的进步,收集和储存函数型数据成为了可能。像金融市场的高频股票数据、气象里的温度数据、空气PM2.5数据等都是天然的函数型数据,并且这些函数型数据之间是相依的,不再满足独立同分布的条件,又称之为相依性函数型数据。当函数型数据具有相依特征时,样本协方差函数不再是总体协方差函数的一致估计量,导致函数主成分计算不准确,进而影响后续的统计推断。本文将利用长期协方差函数得到更加准确的函数型主成分,证明了检验统计量收敛到卡方分布,并给出效应量的度量方法从而计算最低样本量。最后通过数据模拟以及该方法应用到空气质量指数(AQI)和六大空气主要污染物PM2.5、PM10、SO2、NO2、O3、CO的浓度数据证明方法的有效性。With the advancement of science and technology, it has become possible to collect and store functional data. High frequency stock data in financial markets, temperature data in meteorology, PM2.5 data in the air, etc. are all natural functional data, and these functional data are interdependent and no longer meet the conditions of independent and identically distributed data, also known as dependent functional data. When functional data has dependency characteristics, the sample covariance function is no longer a consistent estimate of the population covariance function, resulting in inaccurate calculation of the principal components of the function, which in turn affects subsequent statistical inference. This article will use long-term covariance functions to obtain more accurate functional principal components, prove that the test statistic converges to a chi square distribution, and provide a measurement method for the effect size to calculate the minimum sample size. Finally, the effectiveness of the method was demonstrated through data simulation and its application to the Air Quality Index (AQI) and concentration data of the six major air pollutants PM2.5, PM10, SO2, NO2, O3, and CO.