Access Gateway Unable to Check-in to Orchestrator

About

Description: After deploying AGW and Orchestrator, it is time to make AGW accessible from Orchestrator. After following github Magma AGW configuration guide, it was observed that AGW is not able to check-in to Orchestrator.

Environment: AGW and Orc8r deployed.

Affected components: AGW, Orchestrator

Resolution

Diagnose AGW and Orchestrator setup with script checkin_cli.py. If the test is not successful, the script would provide potential root cause for a problem. A successful script will look like below
```
AGW$ sudo checkin_cli.py

1. -- Testing TCP connection to controller-staging.magma.etagecom.io:443 --
2. -- Testing Certificate --
3. -- Testing SSL --
4. -- Creating direct cloud checkin --
5. -- Creating proxy cloud checkin --
Success!
```
If the output is not successful, the script will recommend some steps to resolve the problem. After following the steps the problem has not been resolved, follow below steps.

Make sure that the hostnames and ports specified in control_proxy.yml file in AGW are properly set. Sample control_proxy.yml file

cloud_address: controller.yourdomain.com
cloud_port: 443
bootstrap_address: bootstrapper-controller.yourdomain.com
bootstrap_port: 443

rootca_cert: /var/opt/magma/tmp/certs/rootCA.pem

Verify the certificate rootCA.pem is in the correct location defined in rootca_cert (specified in control_proxy.yml)
Make sure the certificates have not expired. Note: To obtain certificate information you can use openSSL x509 -in certificate -noout -text
- In AGW: rootCA.pem
- In Orc8r: rootCA.pem, controller.cert
Verify the domain is consistent across AGW and Orc8r and the CN matches with the domain
- CN in rootCA.pem AGW
- CN in Orc8r for root and controller certificates.
- The domain in main.tf
Verify connectivity between AGW and Orc8r. Choose the port and domain obtained in control_proxy.yml. You can use telnet, example below: telnet bootstrapper-controller.yourdomain.com 443
Verify the DNS resolution of the bootstrap and controller domain.
- In AGW: You can ping or telnet to your bootstrap and controller domain from AGW to verify which AWS address is being resolved.
- In Orc8r: Verify which external-IP your cluster is assigned. You can use the command: kubectl get services
The address resolved in AGW should be the same defined in Orc8r. If not, verify your DNS resolution.
Verify that there are no errors in AGW magmad service.

AGW$ sudo tail -f /var/log/syslog | grep -i "magmad"
From Orchestrator, get all pods and verify attempts from AGW are reaching Orc8r in nginx and look for any bootstrapping erors in the bootstrapper.
- First, you can use below command to get all pods from orc8r
```
kubectl -n orc8r get pods
```
  For example, boostrapper and nginx Orc8r pods should look something like below:
```
orc8r-bootstrapper-775b5b8f6d-89spq              1/1     Running   0          37d
orc8r-bootstrapper-775b5b8f6d-gfmrp              1/1     Running   0          37d
orc8r-nginx-5f599dd8d5-rz4gm                     1/1     Running   0          37d
orc8r-nginx-5f599dd8d5-sxpzf                     1/1     Running   0          37d
```
- Next, using the pod name get the logs from the pod with below command. Check if there is any problematic log for the related pod
```
kubectl -n orc8r logs -f <nginx podname>
kubectl -n orc8r logs -f <bootstrapper podname>
```
  For example: kubectl -n orc8r get logs orc8r-bootstrapper-775b5b8f6d-89spq
Try restarting magmad services.
```
AGW$ sudo service magma@magmad restart
```
If issue still persists, please file github issues or ask in our support channels https://magmacore.org/join-the-open-source-community/