HCX Redeployment Job Stuck
I believe this problem has been resolved in later editions, i.e. HCX 4.5 release. If you have tried to resync the service mesh in HCX 4.3.3 and left with a half-redeployed Network Extension appliance and the task stuck, I feel your pain. However, I have experienced this on a couple of occasions and have a few fixes that should help. This happened to me because the underlying network had routing issues whilst asking the network team to implement a new uplink network.
Firstly, please log a call with VMware support in the first instance as you may be experiencing a different issue from mine. Meanwhile, see If you can redeploy the NE appliance using the /force option. If you see a message saying ”Another workflow is in progress Job_ID" It's not going to work due to the stuck job.
VMware support diagnosed this as a job stuck in the Postgress SQL database and had to run a script to remove the stale entry. I was able to continue to progress with redeploying my appliances.
One potential fix is to restart the application service in the 9443 portal. You need to stop and start this at both ends.
Another option that worked for me was to edit the service mesh and remove the Network Extension service. This effectively removed the appliances and allowed me to edit the service mesh again by adding the NE appliances. This time round the appliances was redeployed. The old task will probably be still stuck, although it's not an issue for us to continue getting the appliances reinstalled. Please note, if you have any stretched VLANs on the Network Extensions, you should make note of these and remove them, otherwise, it will fail to try to remove the appliances out of vSphere. Once the service mesh was updated with the NE service again, I was able to restretch the VLANs on the new appliances.
This is a known issue and has been resolved in the latest release. Please use my suggestions with caution as these were the methods that worked for me. You should get official guidance from VMware before you make any major changes to your environment, especially if you have production workloads running there.