Jump to content

User:Effie Mouzeli (WMF)/Howtos/New LVS Kubernetes Service

From Wikitech

Before adding a new LVS service, the service should be:

  • running and responding properly to healthcheck
  • listening to TLS tls.enabled

Puppet Private - Certs

Certificates

Follow the process in Kubernetes/Enabling_TLS#Create_and_place_certificates

DNS and Netbox Patch #1

Make an allocation [DNS/Netbox#How_to_manually_allocate_a_special_purpose_IP_address_in_Netbox]

Note: change the netmask to /32,

Add svc records in operations/dns

  • templates/10.in-addr.arpa
  • templates/wmnet
  • Review and merge
  • login to ns0.wikimedia.org, and run sudo authdns-update. This will pull from operations/dns and generate zonefiles and gnsd configs on each nameserver.
  • Verify: for i in 0 1 2 ; do dig @ns${i}.wikimedia.org -t any my-changed-record.wikimedia.org ; done

Finish up DNS

  • cumin1001:~$ sudo cookbook sre.dns.netbox "Add VIPs for X services"

Puppet Patch #1 - LVS prep

'''hieradata/common/service.yaml:'''
service::catalog:
  echostore:
    description: Echo store, echostore.svc.%{::site}.wmnet  <-- '''change'''
    encryption: true 
    ip: 
      codfw:
        default: 10.2.1.49                                  <-- '''change'''
      eqiad:
        default: 10.2.2.49                                  <-- '''change'''
    lvs: # Properties that are related to LVS setup.
      class: low-traffic
      conftool: 
        cluster: kubernetes
        service: kubesvc
      depool_threshold: '.5'
      enabled: true
      monitors:
        IdleConnection:
          max-delay: 300
          timeout-clean-reconnect: 3
      scheduler: wrr
      protocol: tcp
    monitoring: 
      check_command: check_https_port_status!8082!200!/healthz # <-- '''change PORT''' command for the check in icinga
      critical: false                                          # True in Prod
      sites: 
        codfw:
          hostname: echostore.svc.codfw.wmnet <-- '''change'''
        eqiad:
          hostname: echostore.svc.eqiad.wmnet  <-- '''change'''
    port: 8082                                                 # <-- '''change''' https://wikitech.wikimedia.org/wiki/Service_ports
    sites:
    - eqiad
    - codfw
    state: service_setup
    discovery:                                                 
    - dnsdisc: echostore                                        # <-- '''change'''
      active_active: true


'''hieradata/role/common/kubernetes/worker.yaml:'''
profile::lvs::realserver::pools:
  echostore: {}


'''conftool-data/discovery/services.yaml'''
foo: [eqiad,codfw]
  • sudo cumin 'O:lvs::balancer' 'run-puppet-agent'

Puppet Patch #2 - LVS setup

'''hieradata/common/service.yaml:'''
[...]
    sites:
    - eqiad
    - codfw
    state: lvs_setup <-----
    discovery:
    - dnsdisc: echostore
      active_active: true
  • Run puppet sudo cumin 'O:lvs::balancer' 'run-puppet-agent'
  • Ack PyBal diff checks on Icinga
  • Find primary and secondary low traffic LVSs in modules/lvs/manifests/configuration.pp
  • Log on SAL and restart pybal (secondaries first) sudo systemctl restart pybal
  • Checks: sudo ipvsadm -L -n and curl -v -k http://eventgate-analytics.svc.eqiad.wmnet:31192/_info

Puppet Patch #3 - LVS production

'''hieradata/common/service.yaml:'''
[...]
    sites:
    - eqiad
    - codfw
    state: production <-----
    discovery:
    - dnsdisc: echostore
      active_active: true
  • sudo cumin 'A:icinga or A:dns-auth' run-puppet-agent

DNS Patch #2

Add discovery (DYNA) records in operations/dns

  • templates/wmnet
  • utils/mock_etc/discovery-geo-resources
  • Review and merge
  • login to ns0.wikimedia.org, and run sudo authdns-update. This will pull from operations/dns and generate zonefiles and gnsd configs on each nameserver.

Pool!

$ confctl --object-type discovery select 'dnsdisc=echostore' set/pooled=true