Technical tidbits from the sysadmin world...
 

AzureDevops Self Hosted Agent in Kubernetes with Workload Identity or Application Registration

There is more than a few blog posts around about how to use the Azure DevOps agent in a container (and by extension Kubernetes), there are 
even a handful of examples for using an Entra Application Registrations or Workload Identies that miss out on a critical step - the 'exit' needs a new token because they are not long lived!

Both the examples below use the Dockerfile and start.sh script from the official docs as a base.
Example 1 - Use AKS Workload Identity to authenticate and register the build agent
- Create a managed identity and add it to Azure Devops as a 'Basic' User, give it "Read" permissions to all build agents and Admin to the pool it will use.
- Setup standard Workload Identity config on the Managed Identity and deploy required service account in AKS
- Configure the start.sh script like below, the main differences are that the script takes the Workload Identity Token File and uses that to get an access token.
- To ensure the container can cleanup on exit, it needs to get a new access token to do the exit - this is missed in all the other examples I've seen, if you don't do this the token aquired at startup is very likely expired and then it can't exit gracefully
#!/bin/bash
          set -e
          clientAssertionType="urn%3Aietf%3Aparams%3Aoauth%3Aclient-assertion-type%3Ajwt-bearer"
          scope="499b84ac-1321-427f-aa17-267ca6975798/.default"
          jwt=$(cat ${AZURE_FEDERATED_TOKEN_FILE})
          access_token=$(curl -X POST -d "scope=${scope}&grant_type=client_credentials&client_id=${AZURE_CLIENT_ID}&client_assertion_type=${clientAssertionType}&client_assertion=${jwt}" https://login.microsoftonline.com/${AZURE_TENANT_ID}/oauth2/v2.0/token --http1.1 | jq '.access_token' | sed -e 's/^"//' -e 's/"$//')
          echo $access_token > /azp/.token
        
          if [ -z "${AZP_URL}" ]; then
          echo 1>&2 "error: missing AZP_URL environment variable"
          exit 1
          fi
        
          if [ -n "${AZP_WORK}" ]; then
          mkdir -p "${AZP_WORK}"
          fi
        
          cleanup() {
          trap "" EXIT
        
          if [ -e ./config.sh ]; then
          print_header "Cleanup. Removing Azure Pipelines agent..."
          #Get a new access token
          jwt_exit=$(cat ${AZURE_FEDERATED_TOKEN_FILE})
          access_token_exit=$(curl -X POST -d "scope=${scope}&grant_type=client_credentials&client_id=${AZURE_CLIENT_ID}&client_assertion_type=${clientAssertionType}&client_assertion=${jwt_exit}" https://login.microsoftonline.com/${AZURE_TENANT_ID}/oauth2/v2.0/token --http1.1 | jq '.access_token' | sed -e 's/^"//' -e 's/"$//')
          echo $access_token_exit > /azp/.token_exit
          # If the agent has some running jobs, the configuration removal process will fail.
          # So, give it some time to finish the job.
          while true; do
          ./config.sh remove --unattended --auth "PAT" --token $(cat "/azp/.token_exit") && break
        
                echo "Retrying in 30 seconds..."
                sleep 30
              done
          fi
          }
        
          print_header() {
          lightcyan="\033[1;36m"
          nocolor="\033[0m"
          echo -e "\n${lightcyan}$1${nocolor}\n"
          }
        
          # Let the agent ignore the token env variables
          export VSO_AGENT_IGNORE="AZP_TOKEN,AZP_TOKEN_FILE"
        
          print_header "1. Determining matching Azure Pipelines agent..."
        
          AZP_AGENT_PACKAGES=$(curl -LsS \
          -u user:$(cat "/azp/.token") \
          -H "Accept:application/json" \
          "${AZP_URL}/_apis/distributedtask/packages/agent?platform=${TARGETARCH}&top=1")
        
          AZP_AGENT_PACKAGE_LATEST_URL=$(echo "${AZP_AGENT_PACKAGES}" | jq -r ".value[0].downloadUrl")
        
          if [ -z "${AZP_AGENT_PACKAGE_LATEST_URL}" -o "${AZP_AGENT_PACKAGE_LATEST_URL}" == "null" ]; then
          echo 1>&2 "error: could not determine a matching Azure Pipelines agent"
          echo 1>&2 "check that account "${AZP_URL}" is correct and the token is valid for that account"
          exit 1
          fi
        
          print_header "2. Downloading and extracting Azure Pipelines agent..."
        
          curl -LsS "${AZP_AGENT_PACKAGE_LATEST_URL}" | tar -xz & wait $!
        
          source ./env.sh
        
          trap "cleanup; exit 0" EXIT
          trap "cleanup; exit 130" INT
          trap "cleanup; exit 143" TERM
        
          print_header "3. Configuring Azure Pipelines agent..."
        
          ./config.sh --unattended \
          --agent "${AZP_AGENT_NAME:-$(hostname)}" \
          --url "${AZP_URL}" \
          --auth "PAT" \
          --token $(cat "/azp/.token") \
          --pool "${AZP_POOL:-Default}" \
          --work "${AZP_WORK:-_work}" \
          --replace \
          --acceptTeeEula & wait $!
        
          print_header "4. Running Azure Pipelines agent..."
        
          chmod +x ./run.sh
        
          # To be aware of TERM and INT signals call ./run.sh
          # Running it with the --once flag at the end will shut down the agent after the build is executed
          ./run.sh --once "$@" & wait $!
Example 2 - Use an Application Registration to authenticate and register the build agent

- Create an Entra Application Registration and add it to Azure Devops as a 'Basic' User, give it "Read" permissions to all build agents and Admin to the pool it will use.
- Generate a secret
- Configure the start.sh script like below, the main differences are that the script takes the App Reg details and uses that to get an access token.
- To ensure the container can cleanup on exit, it needs to get a **new** access token to do the exit - this is missed in all the other examples I've seen, if you don't do this the token aquired at startup is very likely expired and then it can't exit gracefully
 #!/bin/bash
              set -e
              resource="499b84ac-1321-427f-aa17-267ca6975798/.default"
              if [ -z "${AZP_URL}" ]; then
              echo 1>&2 "error: missing AZP_URL environment variable"
              exit 1
              fi
          
              if [ -n "$AZP_CLIENTID" ]; then          
              AZP_TOKEN=$(curl -X POST -d "grant_type=client_credentials&client_id=$AZP_CLIENTID&client_secret=$AZP_CLIENTSECRET&resource=$resource" https://login.microsoftonline.com/$AZP_TENANTID/oauth2/token | jq -r '.access_token')
              fi
          
              if [ -z "$AZP_TOKEN_FILE" ]; then
              if [ -z "$AZP_TOKEN" ]; then
              echo 1>&2 "error: missing AZP_TOKEN environment variable"
              exit 1
              fi
          
              AZP_TOKEN_FILE=/azp/.token
              echo -n $AZP_TOKEN > "$AZP_TOKEN_FILE"
              fi
          
              unset AZP_TOKEN
          
              if [ -n "${AZP_WORK}" ]; then
              mkdir -p "${AZP_WORK}"
              fi
          
              cleanup() {
              trap "" EXIT
          
              if [ -e ./config.sh ]; then
              print_header "Cleanup. Removing Azure Pipelines agent..."
          
                  # If the agent has some running jobs, the configuration removal process will fail.
                  # So, give it some time to finish the job.
                  while true; do
                    #Generate a new Access Token
                    AZP_TOKEN_EXIT=$(curl -X POST -d "grant_type=client_credentials&client_id=$AZP_CLIENTID&client_secret=$AZP_CLIENTSECRET&resource=$resource" https://login.microsoftonline.com/$AZP_TENANTID/oauth2/token | jq -r '.access_token')
                    echo -n $AZP_TOKEN_EXIT > /azp/.token_exit
                    ./config.sh remove --unattended --auth "PAT" --token $(cat "/azp/.token_exit") && break
          
                    echo "Retrying in 30 seconds..."
                    sleep 30
                  done
              fi
              }
          
              print_header() {
              lightcyan="\033[1;36m"
              nocolor="\033[0m"
              echo -e "\n${lightcyan}$1${nocolor}\n"
              }
          
              # Let the agent ignore the token env variables
              export VSO_AGENT_IGNORE="AZP_TOKEN,AZP_TOKEN_FILE"
          
              print_header "1. Determining matching Azure Pipelines agent..."
          
              AZP_AGENT_PACKAGES=$(curl -LsS \
              -u user:$(cat "/azp/.token") \
              -H "Accept:application/json" \
              "${AZP_URL}/_apis/distributedtask/packages/agent?platform=${TARGETARCH}&top=1")
          
              AZP_AGENT_PACKAGE_LATEST_URL=$(echo "${AZP_AGENT_PACKAGES}" | jq -r ".value[0].downloadUrl")
          
              if [ -z "${AZP_AGENT_PACKAGE_LATEST_URL}" -o "${AZP_AGENT_PACKAGE_LATEST_URL}" == "null" ]; then
              echo 1>&2 "error: could not determine a matching Azure Pipelines agent"
              echo 1>&2 "check that account "${AZP_URL}" is correct and the token is valid for that account"
              exit 1
              fi
          
              print_header "2. Downloading and extracting Azure Pipelines agent..."
          
              curl -LsS "${AZP_AGENT_PACKAGE_LATEST_URL}" | tar -xz & wait $!
          
              source ./env.sh
          
              trap "cleanup; exit 0" EXIT
              trap "cleanup; exit 130" INT
              trap "cleanup; exit 143" TERM
          
              print_header "3. Configuring Azure Pipelines agent..."
          
              ./config.sh --unattended \
              --agent "${AZP_AGENT_NAME:-$(hostname)}" \
              --url "${AZP_URL}" \
              --auth "PAT" \
              --token $(cat "/azp/.token") \
              --pool "${AZP_POOL:-Default}" \
              --work "${AZP_WORK:-_work}" \
              --replace \
              --acceptTeeEula & wait $!
          
              print_header "4. Running Azure Pipelines agent..."
          
              chmod +x ./run.sh
          
              # To be aware of TERM and INT signals call ./run.sh
              # Running it with the --once flag at the end will shut down the agent after the build is executed
              ./run.sh --once "$@" & wait $!
Running in Kubernetes
To have static names, the best way to run this in Kubernetes is with a statefulset that way there are static names, when the container exits after a build k8s will replace it.