Uploaded image for project: 'Infrastructure'
  1. Infrastructure
  2. INFRA-1778

pkg.jenkins.io DNS issues

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Minor
    • Resolution: Unresolved
    • Component/s: www
    • Labels:
      None
    • Similar Issues:

      Description

      The issue as described by the networking team:

      On the initial DNS lookup for pkg.jenkins.io our DNS server uses the glue records from the org DNS servers to resolve the IP addresses for the Jenkins.io name servers (which are actually Jenkins-ci.org).

      >dig @b0.org.afilias-nst.org ns1.jenkins-ci.org A

      ; <<>> DiG 9.9.4-RedHat-9.9.4-51.el7 <<>> @b0.org.afilias-nst.org ns1.jenkins-ci.org A 
      ; (2 servers found) 
      ;; global options: +cmd 
      ;; Got answer: 
      ;; >>HEADER<< opcode: QUERY, status: NOERROR, id: 29088 
      ;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 3, ADDITIONAL: 4 
      ;; WARNING: recursion requested but not available

      ;; OPT PSEUDOSECTION: 
      ; EDNS: version: 0, flags:; udp: 4096 
      ;; QUESTION SECTION: 
      ;ns1.jenkins-ci.org. IN A

      ;; AUTHORITY SECTION: 
      jenkins-ci.org. 86400 IN NS ns3.jenkins-ci.org. 
      jenkins-ci.org. 86400 IN NS ns2.jenkins-ci.org. 
      jenkins-ci.org. 86400 IN NS ns1.jenkins-ci.org.

      ;; ADDITIONAL SECTION: 
      ns1.jenkins-ci.org. 86400 IN A 140.211.9.2 
      ns2.jenkins-ci.org. 86400 IN A 173.203.60.151 
      ns3.jenkins-ci.org. 86400 IN A 162.209.106.32

      ;; Query time: 12 msec 
      ;; SERVER: 199.19.54.1#53(199.19.54.1) 
      ;; WHEN: Wed Sep 12 08:35:26 PDT 2018 
      ;; MSG SIZE rcvd: 145

      At this point it appears what is happening is that the cached A records are expiring and our DNS servers try to refresh them. The problem is there is only one DNS server available and our DNS servers cowardly refuse to perform a nslookup on the same server they are trying to get a record for (ns3.jenkins-ci.org can’t be used to lookup ns3.jenkins-ci.org since the ip address isn’t available and this is a loop). Since ns2 and ns1 are not available the lookup fails (SERVFAIL). Our DNS servers flush all of the records for Jenkins.io and Jenkins-ci.org and the lookup works the next time because the glue records are back in play at that point.

      These are the name servers for Jenkins.io. Note there are no glue records so the A records for the NS servers need to be looked up using the normal DNS process.

      >dig @a0.nic.io jenkins.io A

      ; <<>> DiG 9.9.4-RedHat-9.9.4-51.el7 <<>> @a0.nic.io jenkins.io A 
      ; (2 servers found) 
      ;; global options: +cmd 
      ;; Got answer: 
      ;; >>HEADER<< opcode: QUERY, status: NOERROR, id: 29528 
      ;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 3, ADDITIONAL: 1 
      ;; WARNING: recursion requested but not available

      ;; OPT PSEUDOSECTION: 
      ; EDNS: version: 0, flags:; udp: 4096 
      ;; QUESTION SECTION: 
      ;jenkins.io. IN A

      ;; AUTHORITY SECTION: 
      jenkins.io. 86400 IN NS ns2.jenkins-ci.org. 
      jenkins.io. 86400 IN NS ns1.jenkins-ci.org. 
      jenkins.io. 86400 IN NS ns3.jenkins-ci.org.

      ;; Query time: 114 msec 
      ;; SERVER: 65.22.160.17#53(65.22.160.17) 
      ;; WHEN: Wed Sep 12 08:21:05 PDT 2018 
      ;; MSG SIZE rcvd: 107 

      ns1 and ns2 timeout

      >dig @ns1.jenkins-ci.org pkg.jenkins.io A 
      ; <<>> DiG 9.9.4-RedHat-9.9.4-51.el7 <<>> @ns1.jenkins-ci.org 
      pkg.jenkins.io A 
      ; (1 server found) 
      ;; global options: +cmd 
      ;; connection timed out; no servers could be reached

      >dig @ns2.jenkins-ci.org pkg.jenkins.io A 
      ; <<>> DiG 9.9.4-RedHat-9.9.4-51.el7 <<>> @ns2.jenkins-ci.org 
      pkg.jenkins.io A 
      ; (2 servers found) 
      ;; global options: +cmd 
      ;; connection timed out; no servers could be reached 

      >dig @ns3.jenkins-ci.org pkg.jenkins.io A 
      ; <<>> DiG 9.9.4-RedHat-9.9.4-51.el7 <<>> @ns3.jenkins-ci.org 
      pkg.jenkins.io A 
      ; (1 server found) 
      ;; global options: +cmd 
      ;; Got answer: 
      ;; >>HEADER<< opcode: QUERY, status: NOERROR, id: 50814 
      ;; flags: qr aa rd; QUERY: 1, ANSWER: 2, AUTHORITY: 3, ADDITIONAL: 5 
      ;; WARNING: recursion requested but not available 

      ;; OPT PSEUDOSECTION: 
      ; EDNS: version: 0, flags:; udp: 4096 
      ;; QUESTION SECTION: 
      ;pkg.jenkins.io. IN A 

      ;; ANSWER SECTION: 
      pkg.jenkins.io. 3600 IN CNAME mirrors.jenkins.io. 
      mirrors.jenkins.io. 3600 IN A 52.202.51.185 

      ;; AUTHORITY SECTION: 
      jenkins.io. 3600 IN NS ns1.jenkins-ci.org. 
      jenkins.io. 3600 IN NS ns3.jenkins-ci.org. 
      jenkins.io. 3600 IN NS ns2.jenkins-ci.org. 

      ;; ADDITIONAL SECTION: 
      ns1.jenkins-ci.org. 3600 IN A 140.211.9.2 
      ns2.jenkins-ci.org. 3600 IN A 162.209.124.149 <- This address is different at the TLD for org. 
      ns2.jenkins-ci.org. 3600 IN AAAA 
      2001:4802:7801:101:be76:4eff:fe20:b252 
      ns3.jenkins-ci.org. 3600 IN A 162.209.106.32 

      ;; Query time: 80 msec 
      ;; SERVER: 162.209.106.32#53(162.209.106.32) 
      ;; WHEN: Wed Sep 12 09:14:47 PDT 2018 
      ;; MSG SIZE rcvd: 225 

      NOTE: At the TLD ns2.jenkins-ci.org has a different IP address than the entry on ns3.jenkins-ci.org. Since neither server works it doesn’t really matter at this point.

      >dig @d0.org.afilias-nst.org ns2.jenkins-ci.org A 
      ; <<>> DiG 9.9.4-RedHat-9.9.4-51.el7 <<>> @d0.org.afilias-nst.org 
      ns2.jenkins-ci.org A 
      ; (2 servers found) 
      ;; global options: +cmd 
      ;; Got answer: 
      ;; >>HEADER<< opcode: QUERY, status: NOERROR, id: 25531 
      ;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 3, ADDITIONAL: 4 
      ;; WARNING: recursion requested but not available 

      ;; OPT PSEUDOSECTION: 
      ; EDNS: version: 0, flags:; udp: 4096 
      ;; QUESTION SECTION: 
      ;ns2.jenkins-ci.org. IN A 

      ;; AUTHORITY SECTION: 
      jenkins-ci.org. 86400 IN NS ns2.jenkins-ci.org. 
      jenkins-ci.org. 86400 IN NS ns1.jenkins-ci.org. 
      jenkins-ci.org. 86400 IN NS ns3.jenkins-ci.org. 

      ;; ADDITIONAL SECTION: 
      ns2.jenkins-ci.org. 86400 IN A 173.203.60.151 
      ns1.jenkins-ci.org. 86400 IN A 140.211.9.2 
      ns3.jenkins-ci.org. 86400 IN A 162.209.106.32 

      ;; Query time: 12 msec 
      ;; SERVER: 199.19.57.1#53(199.19.57.1) 
      ;; WHEN: Wed Sep 12 09:15:38 PDT 2018 
      ;; MSG SIZE rcvd: 145 

      Once at least one other DNS server for Jenkins-ci.org is online again the lookup issues will stop. If they bring ns2 online they need to fix the IP address so they match at the TLD and the domain. Otherwise, lookups will randomly fail. Best practice for DNS is to always have at least two DNS servers online. This is one of the reasons for that practice.

        Attachments

          Issue Links

            Activity

            Hide
            duemir Denys Digtiar added a comment -

            INFRA-1463 looks related

            Show
            duemir Denys Digtiar added a comment - INFRA-1463  looks related
            Hide
            olblak Olivier Vernin added a comment -

            Pham Vu Tuan is looking at it

            Show
            olblak Olivier Vernin added a comment - Pham Vu Tuan is looking at it

              People

              Assignee:
              pvtuan10 Pham Vu Tuan
              Reporter:
              duemir Denys Digtiar
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated: