Why and when you should use SoA
2016-03-25- What is SoA?
- Why is SoA useful?
- Isn't SoA awkward to use?
- Implementing SoA in D
- When to use SoA
- Isn't SoA premature optimization?
What is SoA?
SoA just means Structure of arrays
//AoS: Array of structures
struct Vec2{
float x;
float y;
}
Array!Vec2 vectors;
//SoA: Structure of arrays
struct Vec2{
float[] x;
float[] y;
}
Why is SoA useful?
Imagine you want to write a small udp game server
with a client server architecture
. You will have a server where clients can connect to. The server
needs to remember which clients are currently connected. The server polls messages with recvfrom
and in case you are not familiar with udp recvfrom
returns the packet that was sent to the port where to socket was bound to and the address.
When a packet comes in the first thing you probably want to know is if the packet came from a connected client. You would be inclined to write it like this:
struct Server{
struct RemoteClient{
Address address;
SysTime lastReceivedPacket;
//more data
}
Array!RemoteClient remoteClients;
void poll(){
//Address address
//recvfrom(buffer, address);
}
}
If we want to know which client has sent the package we can just use the remoteClients
array to find the correct remoteClient
. The problem is that we need to iterate over RemoteClient
but we are only really interested in the address field. That means we needlessly load all the other data like lastReceivedPacket
even if we don't need it.
And if you are curious how much data could be inside a RemoteClient
in a real world application, here is the struct of Enet Peer. It might not be the fairest comparison because it is a Peer
and not a RemoteClient
but it should illustrate the point that your data might grow fairly large.
typedef struct _ENetPeer
{
ENetListNode dispatchList;
struct _ENetHost * host;
enet_uint16 outgoingPeerID;
enet_uint16 incomingPeerID;
enet_uint32 connectID;
enet_uint8 outgoingSessionID;
enet_uint8 incomingSessionID;
ENetAddress address; /**< Internet address of the peer */
void * data; /**< Application private data, may be freely modified */
ENetPeerState state;
ENetChannel * channels;
size_t channelCount; /**< Number of channels allocated for communication with peer */
enet_uint32 incomingBandwidth; /**< Downstream bandwidth of the client in bytes/second */
enet_uint32 outgoingBandwidth; /**< Upstream bandwidth of the client in bytes/second */
enet_uint32 incomingBandwidthThrottleEpoch;
enet_uint32 outgoingBandwidthThrottleEpoch;
enet_uint32 incomingDataTotal;
enet_uint32 outgoingDataTotal;
enet_uint32 lastSendTime;
enet_uint32 lastReceiveTime;
enet_uint32 nextTimeout;
enet_uint32 earliestTimeout;
enet_uint32 packetLossEpoch;
enet_uint32 packetsSent;
enet_uint32 packetsLost;
enet_uint32 packetLoss; /**< mean packet loss of reliable packets as a ratio with respect to the constant ENET_PEER_PACKET_LOSS_SCALE */
enet_uint32 packetLossVariance;
enet_uint32 packetThrottle;
enet_uint32 packetThrottleLimit;
enet_uint32 packetThrottleCounter;
enet_uint32 packetThrottleEpoch;
enet_uint32 packetThrottleAcceleration;
enet_uint32 packetThrottleDeceleration;
enet_uint32 packetThrottleInterval;
enet_uint32 pingInterval;
enet_uint32 timeoutLimit;
enet_uint32 timeoutMinimum;
enet_uint32 timeoutMaximum;
enet_uint32 lastRoundTripTime;
enet_uint32 lowestRoundTripTime;
enet_uint32 lastRoundTripTimeVariance;
enet_uint32 highestRoundTripTimeVariance;
enet_uint32 roundTripTime;
enet_uint32 roundTripTimeVariance;
enet_uint32 mtu;
enet_uint32 windowSize;
enet_uint32 reliableDataInTransit;
enet_uint16 outgoingReliableSequenceNumber;
ENetList acknowledgements;
ENetList sentReliableCommands;
ENetList sentUnreliableCommands;
ENetList outgoingReliableCommands;
ENetList outgoingUnreliableCommands;
ENetList dispatchedCommands;
int needsDispatch;
enet_uint16 incomingUnsequencedGroup;
enet_uint16 outgoingUnsequencedGroup;
enet_uint32 unsequencedWindow [ENET_PEER_UNSEQUENCED_WINDOW_SIZE / 32];
enet_uint32 eventData;
size_t totalWaitingData;
} ENetPeer;
Now let us see how it would look with SoA
.
struct Server{
struct RemoteClients{
size_t length;
Address[] address;
SysTime[] lastReceivedPacket;
//more data
}
RemoteClients remoteClients;
void poll(){
//Address address
//recvfrom(buffer, address);
}
}
We can access all addresses with remoteClients.address
and we don't need to load unnecessary data into the cache.
Isn't SoA awkward to use?
In most languages it is.
struct RemoteClients{
size_t length;
Address[] address;
SysTime[] lastReceivedPacket;
//more data
}
The definition is simplified because we need to allocate the arrays, grow them if we want to have dynamic arrays. We also need to worry about inserting and removing elements, it shouldn't happen that we only add an address to RemoteClients
without adding lastReceivedPacket
. That is because the data is loosely coupled. Previously with AoS
we could access the RemoteClient
with remoteClients[index]
but now we are accessing a RemoteClient
by its components
remoteClients.addresses[index]
and remoteClients.lastReceivedPacket[index]
.
Implementing SoA in D
First let us start with a demonstration.
struct Vec2{
float x;
float y;
}
auto s = SOA!(Vec2)();
s.insertBack(1.0f, 2.0f);
s.insertBack(Vec2(1.0, 2.0f));
writeln(s.x); // [1, 1]
writeln(s.y); // [2, 2]
We can still create a struct with our data, SOA
will then look at the struct and create the correct arrays internally. insertBack
is now a bit different from a normal array because internally we have as many arrays as we have fields. That means insertBack
needs to be variadic. Alternatively insertBack
could also accept the struct itself.
The following code is not intended to be production ready code, it is merely a proof of concept.
struct SOA(T){
import std.experimental.allocator;
import std.experimental.allocator.mallocator;
import std.meta: staticMap;
import std.typecons: Tuple;
import std.traits: FieldNameTuple;
alias toArray(T) = T[];
alias toType(string s) = typeof(__traits(getMember, T, s));
alias MemberNames = FieldNameTuple!T;
alias Types = staticMap!(toType, MemberNames);
alias ArrayTypes = staticMap!(toArray, Types);
this(size_t _size, IAllocator _alloc = allocatorObject(Mallocator.instance)){
alloc = _alloc;
size = _size;
allocate(size);
}
ref auto opDispatch(string name)(){
import std.meta: staticIndexOf;
alias index = staticIndexOf!(name, MemberNames);
static assert(index >= 0);
return containers[index];
}
void insertBack(Types types){
if(length == size) grow;
foreach(index, ref container; containers){
container[length] = types[index];
}
length = length + 1;
}
void insertBack(T t){
if(length == size) grow;
foreach(index, _; Types){
containers[index][length] = __traits(getMember, t, MemberNames[index]);
}
length = length + 1;
}
size_t length() const @property{
return _length;
}
~this(){
if(alloc is null) return;
foreach(ref container; containers){
alloc.dispose(container);
}
}
private:
void length(size_t len)@property{
_length = len;
}
Tuple!ArrayTypes containers;
IAllocator alloc;
size_t _length = 0;
size_t size = 0;
short growFactor = 2;
void allocate(size_t size){
if(alloc is null){
alloc = allocatorObject(Mallocator.instance);
}
foreach(index, ref container; containers){
container = alloc.makeArray!(Types[index])(size);
}
}
void grow(){
import std.algorithm: max;
size_t newSize = max(1,size * growFactor);
size_t expandSize = newSize - size;
if(size is 0){
allocate(newSize);
}
else{
foreach(ref container; containers){
alloc.expandArray(container, expandSize);
}
}
size = newSize;
}
}
alias toArray(T) = T[];
alias toType(string s) = typeof(__traits(getMember, T, s));
alias MemberNames = FieldNameTuple!T;
alias Types = staticMap!(toType, MemberNames);
alias ArrayTypes = staticMap!(toArray, Types);
MemberNames
are just the names of the fields. For example struct Vec2{float x; float y}
will have the type AliasSeq!("x", "y")
. toType
takes the member name and turns it into an actual type. In the example above toType!("x")
would return the type float
.
Now we can convert the member names into actual types with the help of staticMap
. In the example above AliasSeq!("x", "y")
would be transformed into AliasSeq!(float, float)
.
We are almost there we just now need to convert the types to arrays. AliasSeq!(float, float)
to AliasSeq!(float[], float[])
. We do this with toArray
and staticMap
After that we can create a tuple of arrays
Tuple!ArrayTypes containers;
Inserting elements is fairly easy now.
void insertBack()(Types types){
if(length == size) grow;
foreach(index, ref container; containers){
container[length] = types[index];
}
length = length + 1;
}
We already now what types insertBack
should accept. It should accept the types of the fields of the struct. We then iterate over containers
at compile time, which is our tuple of arrays.
Then we just access the correct argument
with types[index]
and insert it into the array.
We can also insert the struct itself.
void insertBack(T t){
if(length == size) grow;
foreach(index, _; Types){
containers[index][length] = __traits(getMember, t, MemberNames[index]);
}
length = length + 1;
}
We iterate over the Types to get the index
. We use index
to get the correct container and to find the correct field name of the struct. This works because the order is always the same.
The code above for Vec2
would roughly look like this
void insertBack(Vec2 t){
if(length == size) grow;
containers[0][length] = t.x;
containers[1][length] = t.y;
length = length + 1;
}
We can access the arrays with the field names. In D this is very easy to do with opDispatch
.
ref auto opDispatch(string name)(){
import std.meta: staticIndexOf;
alias index = staticIndexOf!(name, MemberNames);
static assert(index >= 0);
return containers[index];
}
In the example above for Vec2
we can get to the array of all x's with s.x
or all y's with 's.y'. opDispatch
would roughly look like this at compile time if we call s.x
.
ref auto opDispatch(){
import std.meta: staticIndexOf;
alias index = staticIndexOf!("x", MemberNames);
static assert(index >= 0);
return containers[index];
}
We just get the index of opDispatch
name
in MemberNames
, if it is not inside MemberNames
opDispatch
will fail. If it is inside MemberNames
we just access the correct container with the index.
struct Server{
struct RemoteClients{
Address address;
SysTime lastReceivedPacket;
//more data
}
SOA!RemoteClient remoteClients;
void poll(){
//Address address
//recvfrom(buffer, address);
}
}
When to use SoA
Firs ot all SoA
is not a silver bullet and it doesn't mean you should replace AoS
with SoA
everywhere in your code base.
SoA
makes sense if:
- You know that you want to store your data in an array.
- You want partial access to the data.
But sometimes you still want to access all components of your data. An example would be a vector.
struct Vec3{
float x;
float y;
float z;
}
Most operations will use all components anyways like add, subtract, dot, length and many more. And even if you sometimes end up with
struct Vec3{
float x;
float y;
float z;
}
Array!Vec3 positions;
positions[].filter!(v => v.x < 10.0f);
and you want to filter all vectors where the x component
is less than 10.0f
, you will still only load two additional floats. Also a Vec3
struct won't get bigger in time, other data structures might grow and become a bottleneck in the future.
Isn't SoA premature optimization?
In my opinion it is not. The problem with AoS
is that if it becomes a performance bottleneck in the future, you will have to refactor a lot of code. For example you might want to pack your data into a struct hot and cold like this:
struct Bar{
struct Hot{
Data1 d1;
Data2 d2;
...
}
struct Cold{
Data3 d3;
Data4 d4;
...
}
Hot* hot;
Cold* cold;
}
but depending on the language you will still have to refactor a lot of code. It might save you some trouble to think about your data access early on.
Jonathan Blow has a language demonstration that covers SoA and anonymous variables.
Quick note: Jonathan Blow
's using
is very similar to alias this
in D.
SoA
isn't much worse compared to AoS
depending on the language you use.
//AoS
remoteClients[index].address;
//vs
//SoA
remoteClients.address[index];
But SoA
scales much better because you can partially access your data without needlessly loading unrelevant data into your cache.