VVM Orchestration Design
Overview
Orchestration in the context of clusters refers to the automated coordination, scheduling, and management of distributed workloads to optimize resource utilization, ensure reliability, and maintain desired state across multiple computing units.
Problem Statement
Design reliable orchestration mechanism for VVM (Voedger Virtual Machine) that ensures:
- VVM goroutines work only if leadership is acquired and held
- Clean termination of all goroutines
- Concurrent-safe error handling
- Graceful shutdown capabilities
Functional design
Actors
- VVMHost: Application that starts and manages VVM
- VVM
Use VVM
VVMHost creates a VVM instance and launches it. VVM acquires leadership and starts services. VVMHost waits for vvmProblemCtx and shuts down VVM.
// Create VVM instance
myVVM := vvm.Provide(...)
// Launch VVM
vvmProblemCtx := myVVM.Launch(leadershipDurationSecods, leadershipAcquisitionDuration)
// Wait for `problemCtx` and optionally for other events like os.Interrupt, syscall.SIGTERM, syscall.SIGINT
...
// Shutdown VVM
// Might be called immediately after myVVM.Launch()
err := VVM.Shutdown()
Technical design
Implementation requirements
- Clear ownership and cleanup responsibilities
- All error reporting must use
VVM.updateProblem - All algorithms must be finite
- No active goroutines should remain after VVM.Shutdown (except killerRoutine)
- No data races
- Each channel shall be closed exactly once
- Predictable error propagation
- No goroutine leaks (except the intentional killerRoutine)
Components
-
pkg/vvm
~VVMConfig.Orch~covrd1✅
type NumVVM uint32
type VVMConfig {
...
NumVVM NumVVM // amount of VVMs in the cluster. Default 1
IP net.IP // current IP of the VVM. Used as the value for leaderhsip elections
} -
pkg/ielections
- Purpose: Describe and implement the interface to acquire and manage leadership for a given key
~IElections~covrd2✅~ITTLStorage~covrd3✅- Interface with methods InsertIfNotExist(), CompareAndSwap(), CompareAndDelete(). To be injected into IElection implementation.
~elections~covrd4✅- Implementation of IElections
~ElectionsTestSuite~covrd5✅- Single test function that runs multiple tests against
IElections - It will be used from the components that provide ITTLStorage (pkg/vvm/storage)
- Single test function that runs multiple tests against
~ttlStorageMock~covrd6✅- Mock implementation of ITTLStorage
~ElectionsTest~covrd7✅- Test that uses
ElectionsTestSuiteand ttlStorageMock to testelections
- Test that uses
-
pkg/vvm/storage
~ISysVvmStorage~covrd8✅- Interface to work with sysvvm keyspace
~TTLStorageTest~covrd9✅- Test ITTLStorage using mem provider for IAppStorage
~NewElectionsTTLStorage~covrd10✅- Implementation of ITTLStorage
NewElectionsTTLStorage(ISysVvmStorage) elections.ITTLStorage[TTLStorageImplKey, string]- uses
keyspace(sysvvm)and keys prefixed withpKeyPrefix_VVMLeader = 1
- Incapsulates possible values of
pKeyPrefix ~ElectionsByDriverTests~covrd11✅~KeyPrefix_VVMLeader~covrd16✅- Prefix to store leadership data in keyspace(sysvvm)
-
pkg/vvm/impl_orch.go, pkg/vvm/impl_orch_test.go
- orchestration implementation and tests
VVM
VVM:
problemCtx. Closed byVVM.updateProblem(err)(e.g. leadership loss or service failure)problemErrCh. Channel that receives the error describing the problem, written only once. Content is returned onShutdown()problemErrOnce sync.Onceto ensureproblemErrChis written only oncevvmShutCtx. Closed when VVM should be stopped (Shutdown()is called outside)servicesShutCtx. Closed when VVM services should be stopped (butLeadershipMonitor) (that should be context for services:servicesShutCtxclosed -> services pipeline is stopped. Closed whenShutdown()is called outside)monitorShutCtx. Closed after all services are stopped andLeadershipMonitorshould be stoppedshutdownedCtx. Closed after all (services andLeadershipMonitor) is stoppedleadershipCtx. Context for watching leadership. Closed when leadership is lost.numVVM. Number of VVMs in the clusterip. IP address of the VVM
The error propagation follows these principles:
- Single error channel (
problemErrCh) for reporting critical issues - Write-once semantics using
sync.Once - Non-blocking error reads during shutdown
- Thread-safe error updates via
updateProblem()
Goroutine hierarchy:
- Main (VVMHost)
- Launcher
- LeadershipMonitor
- KillerRoutine
- ServicePipeline
- LeadershipMonitor
- Shutdowner
- Launcher
Each goroutine's lifecycle is controlled by dedicated context cancellation. (except killerRoutine)
VVM.Provide()
~VVM.Provide~covrd17✅- wire
vvm.VVM: consturct apps, interfaces, Service Pipeline. Do not launch anything
VVM.Shutdown()
~VVM.Shutdown~covrd18✅- not launched -> panic
- close(VVM.vvmShutCtx)
- Wait for
VVM.shutdownedCtx - Return error from
VVM.problemErrCh, non-blocking.
VVM.Launch() problemCtx
~VVM.LaunchVVM~covrd19✅- launched already -> panic
- vvmProblemCtx := VVM.Launch(leadershipAcquisitionDuration)
- go Shutdowner
- err := tryToAcquireLeadership()
- construct
IElectionsand storeelectionsCleanup()in VVM - use
IElections
- construct
- if err == nil
- err = servicePipeline
- if err != nil
- call
VVM.updateProblem(err)
- call
- return VVM.problemCtx
Shutdowner: go VVM.shutdowner()
~VVM.Shutdowner~covrd20✅- Wait for
VVM.vvmShutCtx - Shutdown everything but
LeadershipMonitorandelections- close
VVM.servicesShutCtxand wait for services to stop
- close
- Shutdown
LeadershipMonitor(closeVVM.monitorShutCtxand wait forLeadershipMonitorto stop) - Cleanup
electionsVVM.electionsCleanup()- // Note: all goroutines will be stopped and leaderships will be released
- Close
VVM.shutdownedCtx
LeadershipMonitor: go VVM.leadershipMonitor()
~LeadershipMonitor~covrd21✅- wait for any of:
VVM.leadershipCtx(leadership loss)- go
killerRoutine~processKillThreshold~covrd22✅: leadershipDurationSecods/4- After
processKillThresholdseconds kills the process - // Never stoped, process must exit and goroutine must die
- // Yes, this is the anti-patterm "Goroutine/Task/Thread Leak"
VVM.updateProblem(leadershipLostErr)- break
- go
VVM.monitorShutCtx- break
VVM.tryToAcquireLeadership(leadershipDurationSecods, leadershipAquisitionDuration)
-
~VVM.tryToAcquireLeadership~covrd23✅ -
Try to acquire leadership during
leadershipAcquisitionDuration- obtain an instance of
elections.IElections: Interface to acquire and manage leadership for a given key- store
VVM.electionsCleanup
- store
- Leadership key is choosen in the [1,
VVM.numVVM] interval - Leadership value is
VVM.ip - Do not wait for
servicesShutCtxbecause Launch is blocking method
- obtain an instance of
-
If leadership is acquired
- Set
VVM.leadershipCtx - go
LeadershipMonitor
- Set
-
Else return
leadershipAcquisitionErr
VVM.updateProblem(err)
~VVM.updateProblem~covrd24✅- synchronized via
VVM.problemErrOnce- Close
VVM.problemCtx - Write error to
VVM.problemErrChusingVVM.problemErrOnce
- Close
Experiments with LLMs
- Claude
Test design
Automatic
- Basic
~VVM.test.Basic~covrd25✅- provide and launch VVM1
- wait for successful VVM1 start
- provide and launch VVM2
- wait for VVM2 start failure
- Automatic shutdown on leadership loss
~VVM.test.Shutdown~covrd26✅- provide and launch a VVM
- update ttlstorage modify the single value
- bump mock time
- expect the VVM shutdown
- Cancel the leadership on manual shutdown
~VVM.test.CancelLeadership~covrd27✅- provide and launch a VVM
- shut it down on the launcher side
- expect that the leadership in canceled
Manual testing research
- airs-bp3/rsch/20250226-orch
Flow
- scylla.sh
- Start scylla
- bp3_1.sh
- Start the first bp3 instance, it takes the leadership
- docker pull untillpro/airs-bp:alpha
- bp3_2.sh
- Start the second bp3 instance, it waits for the leadership
- bp3_1_stop.sh
- bp3_1 stops
- bp3_2 takes the leadership
- bp3_1.sh
- bp3_1 waits for the leadership
- bp3_2_stop.sh
- bp3_2 stops
- bp3_1 takes the leadership