Sequences
Last updated
Was this helpful?
Last updated
Was this helpful?
As of March 1, 2025, the sequence implementation has several critical limitations that impact system performance and scalability:
Unbound Memory Growth: Sequence data for all workspaces is loaded into memory simultaneously, creating a direct correlation between memory usage and the number of workspaces. This approach becomes unsustainable as applications scale.
Prolonged Startup Times: During command processor initialization, a resource-intensive "recovery process" must read and process the entire PLog to determine the last used sequence numbers. This causes significant startup delays that worsen as event volume grows.
The proposed redesign addresses these issues through intelligent caching, background updates, and optimized storage mechanisms that maintain sequence integrity while dramatically improving resource utilization and responsiveness.
This document outlines the design for sequence number management within the Voedger platform.
A Sequence in Voedger is defined as a monotonically increasing series of numbers. The platform provides a unified mechanism for sequence generation that ensures reliable, ordered number production.
As of March 1, 2025, Voedger implements four specific sequence types using this mechanism:
PLogOffsetSequence: Tracks write positions in the PLog
Starts from 1
WLogOffsetSequence: Manages offsets in the WLog
Starts from 1
To read all events by SELECT
CRecordIDSequence: Generates unique identifiers for CRecords
Starts from 322685000131072
Motivation:
Efficient CRecord caching on the DBMS side (Most CRecords reside in the same partition)
Simple iteration over CRecords
OWRecordIDSequence: Provides sequential IDs for ORecords/WRecords (OWRecords)
Starts from 322680000131072
There are a potentially lot of such records, so it is not possible to use SELECT to read all of them
As the Voedger platform evolves, the number of sequence types is expected to expand. Future development will enable applications to define their own custom sequence types, extending the platform's flexibility to meet diverse business requirements beyond the initially implemented system sequences.
These sequences ensure consistent ordering of operations, proper transaction management, and unique identification across the platform's distributed architecture. The design prioritizes performance and scalability by implementing an efficient caching strategy and background updates that minimize memory usage and recovery time.
[Singleton IDs]https://github.com/voedger/voedger/blob/ec85a5fed968e455eb98983cd12a0163effdc096/pkg/istructs/consts.go#L101
Existing design
65535 + 1
const MaxReservedBaseRecordID = MinReservedBaseRecordID + 0xffff // 131071
const FirstSingletonID = MinReservedBaseRecordID // 65538
const MaxSingletonID = MaxReservedBaseRecordID // 66047, 512 singletons
ClusterAsCRecordRegisterID
Recovery on the first request into the workspace
the istructs.IIDGenerator
instance is kept for the WSID
istructs.IIDGenerator
instance is tuned with the data from the each event of the PLog:
for each CUD:
CUD.ID is set as the current RecordID
save the event after cmd exec:
APs: Applcation Partitions
SequencesTrustLevel:
The SequencesTrustLevel
setting determines how events and table records are written.
0
InsertIfNotExists
InsertIfNotExists
1
InsertIfNotExists
Put
2
Put
Put
NoteSequencesTrustLevel
is not used for the case when we're calling PutPlog()
to mark the event as corrupted. Put()
always used in this case
As of March 1, 2025, record ID sequences may overlap, and only 5,000,000,000 IDs are available for OWRecords, since OWRecord IDs start from 322680000131072, while CRecord IDs start from 322685000131072.
Solutions:
One sequence for all records:
Pros:
πClean for Voedger users
πIDs are more human-readable
πSimpler Command Processor
βCons: CRecords are not cached efficiently
Solution: Let the State read copies of CRecords from sys.Collection, or possibly from an alternative optimized storage to handle large CRecord data
βCons: Why we need CRecords then
πPros: Separation of write and read models
Keep as is:
Pros
πEasy to implement
Cons
β No separation between write and read models
β Only 5 billions of OWRecords (ClusterAsRegisterID < ClusterAsCRecordRegisterID)
Solution: Configure sequencer to use multiple ranges to avoid collisions
πPros: Better control over sequences
https://snapshots.raintank.io/dashboard/snapshot/zEW5AQHECtKLIcUeO2PJnmy3nkQDhp9m?orgId=0
Zero SequencesTrustLevel was introduced to the Air performance testbench on 2025-04-29
Latency is increased from 40 ms to 120 ms with spikes up to 160 ms
Testbench throughput reduced from 4000 command per seconds to 1400 cps
CPU usage is decreased from 75% to 42%
So we can make an educated guess that maximum thoughtput would be reduced by 4000 / 1400 * 42 / 75 = 1.6 times
The proposed approach implements a more efficient and scalable sequence management system through the following principles:
Projection-Based Storage: Each application partition will maintain sequence data in a dedicated projection ???(SeqData
). SeqData is a map that eliminates the need to load all sequence data into memory at once
Offset Tracking: SeqData
will include a SeqDataOffset
attribute that indicates the PLog partition offset for which the stored sequence data is valid, enabling precise recovery and synchronization
LRU Cache Implementation: Sequence data will be accessed through a Most Recently Used (LRU) cache that prioritizes frequently accessed sequences while allowing less active ones to be evicted from memory
Background Updates: As new events are written to the PLog, sequence data will be updated in the background, ensuring that the system maintains current sequence values without blocking operations
Batched Writes: Sequence updates will be collected and written in batches to reduce I/O operations and improve throughput
Optimized Actualization: The actualization process will use the stored SeqDataOffset
to process only events since the last known valid state, dramatically reducing startup times
This approach decouples memory usage from the total number of workspaces and transforms the recovery process from a linear operation dependent on total event count to one that only needs to process recent events since the last checkpoint.
~tuc.VVMConfig.ConfigureSequencesTrustLevel~
β
VVMHost uses cmp.VVMConfig.SequencesTrustLevel.
~tuc.SequencesTrustLevelForPLog~
β
When PLog is written then SequencesTrustLevel is used to determine the write mode
Note: except the update corrupted
case
~tuc.SequencesTrustLevelForWLog~
β
When WLog is written then SequencesTrustLevel is used to determine the write mode
Note: except the case when the wlog event was already stored before. Consider PutWLog is called to re-apply the last event
~tuc.SequencesTrustLevelForRecords~
β
When a record is inserted SequencesTrustLevel is used to determine the write mode
When a record is updated - nothing is done in connection with SequencesTrustLevel
~tuc.StartSequencesGeneration~
β
When: CP starts processing a request
Flow:
sequencer, err := IAppPartition.Sequencer() err
nextPLogOffest, ok, err := sequencer.Start(wsKind, WSID)
if !ok
Actualization is in progress
Flushing queue is full
Returns 503: "server is busy"
~tuc.NextSequenceNumber~
β
When: After CP starts sequences generation
Flow:
sequencer.Next(sequenceId)
~tuc.FlushSequenceNumbers~
β
When: After CP saves the PLog record successfully
Flow:
sequencer.Flush()
~tuc.ReactualizeSequences~
β
When: After CP fails to save the PLog record
Flow:
sequencer.Actualize()
~tuc.InstantiateSequencer~
β
When: Partition with the partitionID
is deployed
Flow:
Instantiate the implementation of the isequencer.ISeqStorage
(appparts.internal.seqStorage, see below)
Instantiate sequencer := isequencer.New(*isequencer.Params)
Save sequencer
so that it will be returned by IAppPartition.Sequencer()
~cmp.IAppPartition.Sequencer~
β
Description: Returns isequencer.ISequencer
Covers: tuc.StartSequencesGeneration
~cmp.VVMConfig.SequencesTrustLevel~
β
Covers: tuc.VVMConfig.ConfigureSequencesTrustLevel
Core:
~cmp.ISequencer~
β
: Interface for working with sequences
~cmp.sequencer~
β
: Implementation of the isequencer.ISequencer
interface
~cmp.sequencer.Start~
β
: Starts Sequencing Transaction for the given WSID
~cmp.sequencer.Next~
β
: Returns the next sequence number for the given SeqID
~cmp.sequencer.Flush~
β
: Completes Sequencing Transaction
~cmp.sequencer.Actualize~
β
: Cancels Sequencing Transaction and starts the Actualization process
Tests:
~test.isequencer.mockISeqStorage~
β
Mock implementation of isequencer.ISeqStorage
for testing purposes
~test.isequencer.NewMustStartActualization~
β
isequencer.New()
must start the Actualization process, Start() must return 0, false
Design: blocking hook in mockISeqStorage
~test.isequencer.Race~
β
If !t.Short() run something like go test ./... -count 50 -race
Some edge case tests:
~test.isequencer.LongRecovery~
β
Params.MaxNumUnflushedValues = 5 // Just a guess
For numOfEvents in [0, 10*Params.MaxNumUnflushedValues]
Create a new ISequencer instance
Check that Next() returns correct values after recovery
~test.isequencer.MultipleActualizes~
β
Repeat { Start {Next} randomly( Flush | Actualize ) } cycle 100 times
Check that the system recovers well
Check that the sequence values are increased monotonically
~test.isequencer.FlushPermanentlyFails~
β
Recovery has worked but then ISeqStorage.WriteValuesAndOffset() fails permanently
First Start/Flush must be ok
Some of the next Start must not be ok
Flow:
MaxNumUnflushedValues = 5
Recover
Mock error on WriteValuesAndOffset
Start/Next/Flush must be ok
loop Start/Next/Flush until Start() is not ok (the 6th times till unflushed values exceed the limit)
~cmp.ISeqStorageImplementation~
β
: Implementation of the isequencer.ISeqStorage
interface
Package: cmp.appparts.internal.seqStorage
~cmp.ISeqStorageImplementation.New~
β
Per App per Partition by AppParts
PartitionID is not passed to the constructor
~cmp.ISeqStorageImplementation.i688~
β
If existing number is less than ??? 322_680_000_000_000 - do not send it to the batcher
Uses VVMSeqStorage Adapter
~cmp.VVMSeqStorageAdapter~
β
: Adapter that reads and writes sequence data to the VVMStorage
PLogOffset in Partition storage: ((pKeyPrefix_SeqStorage_Part, PartitionID) PLogOffsetCC(0) )
~cmp.VVMSeqStorageAdapter.KeyPrefixSeqStoragePart~
β
~cmp.VVMSeqStorageAdapter.KeyPrefixSeqStoragePart.test~
β
~cmp.VVMSeqStorageAdapter.PLogOffsetCC~
β
~cmp.VVMSeqStorageAdapter.PLogOffsetCC.test~
β
Numbers: ((pKeyPrefix_SeqStorage_WS, AppID, WSID) SeqID)
~cmp.VVMSeqStorageAdapter.KeyPrefixSeqStorageWS~
β
~cmp.VVMSeqStorageAdapter.KeyPrefixSeqStorageWS.test~
β
Method:
Test for Record
Create a new VIT instance on an owned config with VVMConfig.TrustedSequences = false
Insert a doc to get the last recordID: simply exec c.sys.CUD
and get the ID of the new record
Corrupt the storage: Insert a conflicting key that will be used on creating the next record:
VIT.IAppStorageProvider.AppStorage(test1/app1).Put()
Build pKey
, cCols
for the record, use just inserted recordID+1
Value does not matter, let it be []byte{1}
Try to insert one more record using c.sys.CUD
Expect panic
Test for PLog, WLog offsets - the same tests but sabotage the storage building keys for the event
Tests:
~it.SequencesTrustLevel0~
β: Intergation test for SequencesTrustLevel = 0
~it.SequencesTrustLevel1~
β: Intergation test for SequencesTrustLevel = 1
~it.SequencesTrustLevel2~
β: Intergation test for SequencesTrustLevel = 2
~it.BuiltInSequences~
β: Test for initial values: WLogOffsetSequence, WLogOffsetSequence, CRecordIDSequence, OWRecordIDSequence
Design process:
History:
CP creates new istructs.IIDGenerator
instance
IIDGenerator.UpdateOnSync
is called
istructs.IIDGenerator
instance is provided to IEvents.PutPlog()
istructs.IIDGenerator.Next()
is called to convert rawID->realID for ODoc in arguments and each resulting CUD
partitionID
is calculated using request WSID and amount of partitions declared in AppDeploymentDescriptor
Handle
- Initial requirements and discussion