Alternative Data Types
Rust's primitive data types can form the basis for more complex data types. These data types are custom, but are provided when certain primitive types are not able to accommodate some special scenarios.
In the Polkadot SDK, a great example of this is usage of the sp_arithmetic
crate, which provides
minimal, redefined primitives for basic types related to numerics specifically for Susbtrate-based
blockchain runtimes.
sp_core
, another crate, provides predefined types for dealing with cryptographic primitives or
large numbers. Let's explore how to access and use these data types, and why they're useful in the
context of runtime development.
Anytime you feel lost, make sure to read the tip right below any new concepts that are introduced!
Using Larger Data Types
When wanting to represent very large numbers, such as a hash or even an elliptic-curve derived
public or private key, it often requires up to 256 bits of data. However, the Rust standard library
supplies data types up to u128
, or 128-bit long numbers. An alternative is by using an existing
data structure, such as a String
, however this could be costly, as well as cumbersome process when
reading it later.
These very large numbers are able to be accommodated by using external types. While these may be slower to process, in cryptographic contexts, it allows us to represent complex numbers numerically.
Some Substrate libraries, such as sp_core
, provide primitives for dealing with these large
numbers. Let's take a look at U256
, aka a 256-bit integer type:
pub struct U256(pub [u64; 4]);
Because Rust cannot just simply store an entire 256-bit number in one variable, a custom data type
has to be created. This particular type, U256
, actually stores four u64
numbers.
In cryptographic contexts, this is especially, useful, as we can now represent 256-bit numbers numerically.
"Deterministic" Data Types - Floating Point Numbers
Floating point numbers have presented issues in traditional computing for decades. To summarize, floating point calculations are not deterministic, as different architectures may calculate the end result differently - a 32-bit system will calculate a floating point number differently from a 64-bit system. This video goes into more depth about the issues presented here..
While in most cases, an inaccuracy to the 100th or 1000th place is not a terrible thing, in the case of dealing with things such as balances in a blockchain, floating point rounding errors could easily result in different nodes calculating different balances and potentially never reaching a consensus!
For this reason, the notion of floating point primitives, such as f32
and f64
, cannot, and
should not, be used in the context of the blockchain runtime.
sp-arithmetic
provides data types to deal with numbers less than zero, and allows them to be dealt
safely in runtime. Unlike floating point arithmetic, which is not deterministic, these data types
allow fixed point arithmetic. Fixed point arithmetic provides a uniform, deterministic result, as
they operate on parts of a whole rather than the relative nature of floating point arithmetic.
Negative Numbers - Unsigned Integers vs. Signed Integers
You may notice that Substrate specifically uses unsigned for many data types which are represented
for a number. An example of this is a BlockNumber
, which is a u32
, or even the type for an
account balance, which is usually represented as u128
.
Unsigned types cannot be negative, meaning that all primitives used within a runtime are all positive integers. This is party due to a few reasons:
- The notion of a negative balance does not exist for on-chain balances. Even for a
BlockNumber
, a negative block number is invalid and unreasonable in any scenario. u32
and other unsigned types give a higher, positive bound than a type likei32
. WithBlockNumber
as an example. This would allow the total number of blocks a network could generate to be much higher than a signed type.
u8
for the BlockNumber type, how long would the chainrun before it overflowed?
Give it some thought, and pick an answer! What would a smaller data type imply for something like
BlockNumber
, which the network uses to progress?
It wouldn't overflow; the error would be handled
After the limit of u8::MAX
blocks
Correct!
The limit of a u8::MAX
would indicate the limit for the blockchain, and thus how many blocks can
be generated.
Context-driven types
In runtime development, data types should be chosen more carefully. Because a runtime instance is anticipated to run for a long period of time, ideally without too many breaking upgrades, fundamental primitives and their underlying types must be able to withstand different scenarios or network load.
They can have a direct impact on the chain itself; from the state and how those types are stored, to the chain's continued operation and ensuring it can run for an amount of time without interruption.
A Deeper Look at Scalar & Compound Types
Type Aliases
Type aliases are used to shorten long, generic types. For example, the following is how one may access a balance from an interface exposed by the balances' pallet in Substrate:
type BalanceOf<T> = <<T as Config>::Currency as Currency<<T as frame_system::Config>::AccountId>>::Balance;
This type alias truncates a long(er) type which is used to access the Balances interface from one of
the most widely used pallets, pallet_balance
. You may notice type aliasing being used quite
frequently, as this greatly aids in code readability and saves the trouble of typing!
Regarding "Sized" Types
In Rust, all types are either sized or unsized. Sized
is a trait which is implicitly placed on
every type with a known, constant type at compile time. In Rust, the notion of a type being Sized
refers to whether or not its reasonably able to tell the size at compile time. Unsized types, such
as dynamically sized typed, are stored on the heap and referenced via a pointer.
As stated previously, Rust is a statically typed language, meaning variables (amongst other tokens within the language) must be known at compile time.
However, especially in more trait-oriented code bases you may be dealing with dynamically sized types, which while useful for demonstrating polymorphic and scalable code, it can introduce some extra complexities in consuming those particular APIs.
All local variables, functions parameters, const items, and static items (variables on the stack)
must be Sized
.
Luckily, the pointer type in Rust is always Sized
- this is why we are able to declare &str
, but
cannot use str
:
let sized_str: &str = ""; // string literals always default to &str
If we dereference sized_str
, which will give us str
, the compiler will throw an error indicating
that it cannot possibly know the size of str
:
let unsized_str: str = *""; // string literals always default to &str
error[E0277]: the size for values of type `str` cannot be known at compilation time
|
| let let unsized_str: str = *"";
| ^^^^^^ doesn't have a size known at compile-time
|
str
on its own is actually an undefined slice of u8
, or [u8]
. This does not have a defined
size, which is why &str
must be used. &str
refers to the actual slice of bytes stored on the
heap with a pointer, which is Sized
.
Why does this matter?
This section may seem out of place, but later when dealing with more exotic and dynamic types, such as in the context of Substrate, this will aid in understanding the decisions of the various APIs that Substrate exposes. Trait objects (and their respective virtual tables), smart pointers, and other dynamically sized types all become more commonplace in bigger projects that allow for more decisions centered around the types of the APIs themselves to take place.
Rust comes with different ways to deal with unsized types, which become increasingly more commonplace when dealing with more generic codebases, where not all items are completely defined.