关键词:
图卷积网络
图注意力网络
GAT模块
自监督训练
摘要:
为解决单目自监督深度估计在缺乏准确真实深度时对复杂场景几何结构的刻画不足问题,本文在原有的基于图卷积网络(Graph Convolutional Network, GCN)的单目深度估计框架上,引入了图注意力网络(Graph Attention Network, GAT)机制,提出了一种GATDepth模型。该模型通过在解码器阶段采用图注意力模块,能够自适应地为相邻节点分配不同权重,从而更精细地保留场景中的几何拓扑关系与不连续性。DepthNet编码器利用CNN提取多层次视觉特征,而解码器则结合转置卷积上采样和GAT模块融合节点特征。通过目标图像与重构图像之间的光度、重投影及平滑性等多重损失进行自监督训练,模型在KITTI数据集上取得了优异的深度估计性能,尤其在远距物体和物体边缘等关键区域表现突出。实验结果表明,所提方法不仅在保证网络效率的同时更好地捕捉了场景关键几何信息,而且在缺乏高质量真实深度的条件下仍能获得可靠且精细的深度预测。To address the issue of insufficient depiction of complex scene geometry in monocular self-supervised depth estimation due to the lack of accurate ground truth depth, this paper proposes a GATDepth model based on the existing monocular depth estimation framework using Graph Convolutional Networks (GCN). The Graph Attention Network (GAT) mechanism is introduced into the model. By adopting graph attention modules in the decoder stage, the model can adaptively assign different weights to adjacent nodes, thereby more finely preserving the geometric topology and discontinuities in the scene. The DepthNet encoder extracts multi-level visual features using CNNs, while the decoder combines transposed convolutional upsampling and GAT modules to fuse node features. The model is trained in a self-supervised manner through multiple losses such as photometric, reprojection, and smoothness losses between the target image and the reconstructed image. The model achieves excellent depth estimation performance on datasets such as KITTI, especially in key areas such as distant objects and object edges. Experimental results show that the proposed method not only better captures key geometric information of the scene while ensuring network efficiency, but also obtains reliable and fine depth predictions even in the absence of high-quality ground truth depth.