Abstract:Aquaculture increasingly demands continuous, low-disturbance, and automated monitoring of fish growth status to reduce feeding costs, assess health and yield, and support modern intelligent farming. However, in real production settings, growth-state assessment still largely relies on prolonged manual observation, which is labor-intensive, subjective, and difficult to standardize across operators and time, thereby limiting the feasibility of continuous, automated acquisition of fish growth morphometric indicators. Pose estimation based on computer vision, which can locate the positions of keypoints on the fish body, is a crucial method for obtaining morphometric indicators(e.g., total length, standard length, and caudal peduncle width). In practical underwater farming, visual cues are frequently corrupted by turbidity, low illumination, specular reflections, water-surface flicker, and high stocking density, where overlapping individuals and fast maneuvers further induce motion blur and partial occlusion. These factors cause local textures and edges to become ambiguous, making keypoint localization unstable and leading to error accumulation when morphometrics are computed from noisy keypoint pairs. Therefore, the growth morphometric indicators estimation pipeline must be simultaneously robust to underwater degradations and efficient enough for real-time deployment. Under these conditions, a vision-based estimation method for fish morphometric indicators in aquaculture was developed by proposing a Vision Mamba-based robust underwater fish pose estimation framework that outputs reliable keypoint geometry as the basis for growth morphometric indicators obtainment. First, a state-space-model feature pyramid backbone is introduced to enable near-linear-complexity global context aggregation while preserving high-resolution representations and multi-scale feature fusion to capture both global body layout and local keypoint details, thereby improving keypoint separability in cluttered underwater backgrounds. Compared with purely local operators, the backbone explicitly strengthens long-range dependency modeling along the fish body axis, which is essential for slender, deformable objects where distant parts must remain geometrically consistent. Furthermore, multi-scale fusion further improves resilience to scale changes caused by depth variation and camera placement, allowing keypoints to be represented in a unified feature hierarchy. Second, a pose-conditioned deformable feature fusion module is developed. To explicitly guide feature alignment, coarse heatmaps and an uncertainty map are generated, and these pose cues are used as priors to predict anisotropic sampling offsets and modulation factors for deformable resampling. Through this design, fine-grained axial feature extraction is strengthened and subpixel-level localization along the fish body is stabilized, yielding more discriminative features for subsequent prediction heads. Additionally, the uncertainty cue provides an adaptive mechanism to down-weight unreliable regions while reallocating sampling capacity to informative neighborhoods, thereby reducing peak ambiguity and improving keypoint repeatability across frames. In addition, the anisotropic sampling pattern is tailored to elongated morphology, enabling more faithful aggregation of features along the principal body direction under bending and posture changes. Third, a joint loss is formulated by integrating heatmap supervision with object keypoint similarity(OKS)-guided coordinate regression. Accordingly, OKS guidance is incorporated during training to better align optimization with evaluation, and a lightweight equivariance-inspired(shape deformation-adaptive) consistency constraint under small geometric perturbations is imposed to enhance robustness to shape deformations. This objective is designed to alleviate the common train-test mismatch in keypoint learning, where pixel-wise heatmap errors may not fully reflect geometric accuracy under scale-normalized evaluation. The consistency regularization further encourages stable predictions under mild rotations and scaling, which are frequent in real underwater environments due to fish turning and depth fluctuations. To support quantitative morphometric indicator estimation, eleven anatomically meaningful keypoints are defined, covering the head, trunk, caudal peduncle, and caudal fin, and the corresponding keypoint connections are directly mapped to interpretable morphometric metrics. This formulation yields stable geometric correspondences between pose outputs and traditional morphometrics, enabling pixel-distance computation followed by optional conversion to physical size when a calibrated scale factor is available. The selected morphometrics include both globally stable indicators (e.g., total length and standard length) and locally sensitive indicators (e.g., head length and caudal peduncle width) that can reflect condition changes, developmental differences, or potential abnormalities. To enhance field usability, confidence-aware quality control is applied to suppress low-confidence keypoints, and temporal smoothing is employed to reduce transient jitter, ensuring that changes in downstream morphological measurement indicators are physically reasonable and suitable for continuous monitoring. Comprehensive experiments are conducted on a large self-constructed dataset FishPose with 13,468 images, where five common underwater challenges—orientation rotation, shape deformation, motion blur, individual occlusion, and image degradation are explicitly covered to enable robustness evaluation. The results demonstrate that a pose-estimation accuracy of 74.5% is achieved, while the error of growth morphometric indicators estimation is reduced to only 3.5%; overall, superior performance over state-of-the-art pose estimation methods is obtained, and real-time requirements are satisfied. In conclusion, this work presents a vision-based method for estimating fish growth morphometric indicators in aquaculture, enabled by robust underwater fish pose estimation and reliable keypoint geometry extraction. By enabling the high-precision and stable extraction of pose-derived morphometrics in underwater environments, the proposed method can serve as a core visual perception module in modern intelligent aquaculture monitoring systems, offering reliable quantitative inputs for growth assessment, abnormality screening, and management decisions, which can be seamlessly integrated into intelligent aquaculture monitoring systems. Collectively, the proposed method is expected to advance aquaculture monitoring and management, supporting modernized, sustainable fisheries production.