WIP - this is draft and work in progress 

Introduction

Purpose

This design document aims to discuss and address the pain points of developing, testing/debugging, maintaining VR codebase as well as look at paint points around VR lifecycle management, operations and upgrades.

This is currently not entirely committed to any release, but tentatively can be developed/delivered in phases across versions with the initial framework, architecture and few non-critical features aim for the next major release 4.17.

References

Document History

VersionAuthor/ReviewerDate
1.0Rohit Yadav

 




Glossary

Feature Specifications

  • Solve for issues around:
    • VR codebase development and maintenance
    • Testing (incl. isolated testing and security testing)
    • VR Upgrades
    • API based programming (CLIs, RPC etc)
  • VR agent:
    • Single-binary (Go-based) or Module (python module/wheel)
    • Maybe Lua-based? (C-like speed with LuaJit)
    • Replaces current py/shell/* scripts
    • Implement a general programming and RPC framework
    • Allows CLI and API/RPC based communication
    • RPC Comms:
      • unix domain socket secured by (multi-hop) ssh tunnel (say /var/lib/routerbus.sock)
      • tcp secured by CA framework certs (over private/link-local nic only, or via multi-hop ssh with some tcp port forwarded over 3922)
    • RPC type:
      • Cmd-Answer patter similar to mgmt server - agent
      • Streaming / websockets (async, event/bus based)
    •  Data-Model:
      • Java→ serialised Json
      • ProtoBuf/IDL (define interface/messages in proto file, generate client and server stubs, interfaces, lib)
    • Database:
      • json/idl objects
      • sqlite3
  • Live Patching:
    • Use (multi-hop) ssh based VR agent patching
    • VR agent  or just cloud-agent runs on all systemvms incl. ssvm/cpvm which can facilitate in things like patching, monitoring, host info/stats/metrics gathering/reporting
  •  Distro:
    • Debian based systemvmtemplate
    • Maybe openwrt or Alpine based; but cost vs benefit is not huge
  • New Multi-hop ssh based communication Utility:
    • A standard multi-hop ssh utility class/client that can allow management server to ssh into systemvms and VRs using proxy/jump:
      • VMware →(direct agents/ms) → VR private nic

      • XenServer → (direct agent/ms) → ssh into XS host on port 22 → proxy jump to → VR link-local (alt. make use of XAPI vmops plugin)

      • KVM → (indirect agent/ms) → KVM host/agent → ssh/forwarding → VR link-local

    • Direct agents (VMware and XenServer) would ssh/scp from mgmt server to systemvms and VRs
    • Indirect agents (KVM) can ssh/scp from KVM hosts to systemvms and VRs
    • A hypervisor agnostic wrapper that is executed via agents to handle Cmd/Answers, but isn't aware or reduces dependency on hypervisor-specific code
    • Long terms deprecate and remove router_proxy scripts
  • Implement features in phases:
    • Implement and deliver parts in phase, over ACS versions
    • core services: iproute (nics, address, routes), firewall/acls, pf, guest networking
    • non-core services: dnsmasq/dhcp-dns, password+metadata server, haproxy/lb, vrrp/keepalived, vpn...
    • misc: health checks, monitoring, other ancillary scripts/cron/features

Architecture and Design description

API and Schema Changes

Service Layer Changes

UI Changes

None

Marvin Tests

For this feature, the following test cases or QA considerations must be made:

  • CRUD tests as user or root admin for all features listed above in API section
  • No labels