Evolution
FlatBuffers enables the schema to evolve over time while still maintaining forwards and backwards compatibility with old flatbuffers.
Some rules must be followed to ensure the evolution of a schema is valid.
Rules
Adding new tables, vectors, structs to the schema is always allowed. Its only
when you add a new field to a table that certain rules
must be followed.
Addition
New fields MUST be added to the end of the table definition.
This allows older data to still be read correctly (giving you the default value of the added field if accessed).
Older code will simply ignore the new field in the flatbuffer.
You can ignore this rule if you use the id attribute on all the fields of a
table.
Removal
You MUST not remove a field from the schema, even if you don't use it anymore. You simply stop writing them to the buffer.
Its encouraged to mark the field deprecated by adding the deprecated
attribute. This will skip the generation of accessors and setters in the code,
to enforce the field not to be used any more.
Name Changes
Its generally OK to change the name of tables and fields, as these are not serialized to the buffer. It may break code that would have to be refactored with the updated name.
Examples
The following examples uses a base schema and attempts to evolve it a few times.
The versions are tracked by V1, V2, etc.. and CodeV1 means code compiled
against the V1 schema.
Table Evolution
Lets start with a simple table T with two fields.
table T {
a:int;
b:int;
}
First lets extend the table with a new field.
table T {
a:int;
b:int;
c:int;
}
This is OK. CodeV1 reading V2 data will simply ignore the presence of the
new field c. CodeV2 reading V1 data will get a default value (0) when
reading c.
table T {
a:int (deprecated);
b:int;
c:int;
}
This is OK, removing field a via deprecation. CodeV1, CodeV2 and CodeV3
reading V3 data will now always get the default value of a, since it is not
present. CodeV3 cannot write a anymore. CodeV3 reading old data (V1 or
V2) will not be able to access the field anymore, since no generated accessors
are omitted.
Add a new field, but this time at the beginning.
table T {
c:int;
a:int;
b:int;
}
This is NOT OK, as it makes V2 incompatible. CodeV1 reading V2 data
will access a but will read c data.
CodeV2 reading V1 data will access c but will read a data.
Remove a field from the schema.
table T {
b:int;
}
This is NOT OK. CodeV1 reading V2 data will access a but read b data.
CodeV2 reading V1 data will access b but will read a data.
Lets add a new field to the beginning, but use id attributes.
table T {
c:int (id: 2);
a:int (id: 0);
b:int (id: 1);
}
This is OK. This adds the a new field in the beginning, but because all the
id attributes were added, it is OK.
Let change the types of the fields.
table T {
a:uint;
b:uint;
}
This is MAYBE OK, and only in the case where the type change is the same
width. This is tricky if the V1 data contained any negative numbers. So
this should be done with care.
Lets change the default values of the existing fields.
table T {
a:int = 1;
b:int = 2;
}
This is NOT OK. Any V1 data that did not have a value written to the
buffer relied on generated code to provide the default value.
There MAY be cases where this is OK, if you control all the producers and consumers, and you can update them in tandem.
Lets change the name of the fields
table T {
aa:int;
bb:int;
}
This is generally OK. You've renamed fields will break all code and JSON files that use this schema, but you can refactor those without affecting the binary data, since the binary only address fields by id and offset, not by names.
Union Evolution
Lets start with a simple union U with two members.
union U {
A,
B
}
Lets add a another variant to the end.
union U {
A,
B,
another_a: A
}
This is OK. CodeV1 will not recognize the another_a.
Lets add a another variant to the middle.
union U {
A,
another_a: A,
B
}
This is NOT OK. CodeV1 reading V2 data will interpret B as another_a.
CodeV2 reading V1 data will interpret another_a as B.
Lets add a another variant to the middle, this time adding a union "discriminant".
union U {
A = 1,
another_a: A = 3,
B = 2
}
This is OK. Its like you added it to the end, but using the discriminant value to physically place it elsewhere in the union.
Version Control
FlatBuffers relies on new field declarations being added at the end, and earlier
declarations to not be removed, but be marked deprecated when needed. We think
this is an improvement over the manual number assignment that happens in
Protocol Buffers (and which is still an option using the id attribute
mentioned above).
One place where this is possibly problematic however is source control. If user
A adds a field, generates new binary data with this new schema, then tries to
commit both to source control after user B already committed a new field also,
and just auto-merges the schema, the binary files are now invalid compared to
the new schema.
The solution of course is that you should not be generating binary data before
your schema changes have been committed, ensuring consistency with the rest of
the world. If this is not practical for you, use explicit field ids, which
should always generate a merge conflict if two people try to allocate the same
id.
Checking Conformity
To check that schema are properly evolved, the flatc compiler has
a option to do just that:
--conform FILE
Where FILE is the base schema the rest of the input schemas must evolve from.
It returns 0 if they are properly evolved, otherwise returns a non-zero value
and provides errors on the reason why the schema are not properly evolved.
As an example, the following checks if schema_v2.fbs is properly evolved from
schema_v1.fbs.
flatc --conform schema_v1.fbs schema_v2.fbs