Some details about how the implementation will look like.
Risks
Given the differences in hardware between codfw and eqiad, some of the changes / experiments will have to occur with limited testing outside of eqiad before deployment.
eqiad
In the eqiad datacenter, related to the eqiad1 openstack deployment.
specs for eqiad1
On cloudgw side, each server:
Hardware, misc box
CPU: 16 CPU
RAM: 32 GB
Disk: 500GB
2 x 10Gbps NICs. NICs are bonded/teamed/aggregated for redundancy.
FIXME: for reference, include here some bits about the starting setup of the network?
VLAN
Switched on
L2 Members
L3 Gateway (“to internet”)
cloud-hosts1-eqiad
asw2-b
cr1/2
all cloudvirt eth0
all Ceph OSD eth0
cr1/2 (via asw2-b)
cloud-instances2-eqiad
asw2-b
all cloud VPS
all cloudvirt eth1
cloudnet1003/1004 eth1
cloudnet1003/1004 eth1
cloud-instances-transport1-eqiad
asw2-b
cloudnet1003/1004 eth0
cr1/2
cloud-storage1-eqiad
asw2-b
all cloudcephosd eth1
(none)
stage 1: Route cloud-hosts vlan through cloudsw
The cloud-hosts vlan, which is part of the production realm, is curently routed on cr1/2-eqiad:ae2.1118. Which are the interfaces facing asw2-b-eqiad.
In the optic of better separation of WMCS and production realm, that routing should be moved to cr1/2-eqiad:xe-3/0/4.1118, the interfaces facing cloudsw.
This will contribute to goals (A) and (C) of the cloudsw project.
[!] Reconfigure cloudnet with new gateway IP (to be confirmed)
Update static routes on cloudsw to point to new VIP
Cleanup 208.80.155.88/29 IPs and advertisement (+Netbox)
At this stage:
VLAN
Switched on
L2 Members
L3 Gateway (“to internet”)
cloud-hosts1-eqiad
asw2-b*
cloudsw
cr1/2
all cloudvirt eth0
all Ceph OSD eth0
cr1/2 (via cloudsw)
cloud-instances2-eqiad
asw2-b*
cloudsw
all cloud VPS
all cloudvirt eth1
cloudnet1003/1004 eth1
cloudnet1003/1004 eth1
cloud-instances-transport1-eqiad
asw2-b*
cloudsw
cloudsw
cloudnet1003/1004 eth0
cloudsw
cloud-transit1/2-eqiad
cloudsw
cr1/2
cloudsw
cr1/2
cloud-storage1-eqiad
asw2-b*
cloudsw
all cloudcephosd eth1
(none)
* To be removed when hosts are moved away from that device
stage 2B: enable L3 routing on cloudgw nodes
TBD
stage 3 final status for all main network components
TBD
connectivity between cloudgw and the cloud-hosts1-b-eqiad subnet.
L3:
a single IP address allocated by standard methods for ssh management, puppet, monitoring, etc. Gateway for this subnet lives on core routers, but is switches through cloudgw after stage 1.
L2:
cloudgw has 2 NICs bonded/teamed/aggregated and then trunked with 3 vlans:
connectivity between Neutron (cloudnet) and cloudgw:
L3:
cloudnet keeps the current connection to the cloud-hosts1-b-codfw subnet for ssh management, puppet, monitoring, etc. Gateway for this subnet lives in cloudsw.