Solana Internals Part 3: The Transaction Processing Unit (TPU)

January 23, 2022

Solana recently experienced severe performance degradation due to network congestion. The TPS (number of transactions processed per second) dropped by orders of magnitude (from thousands to tens) for several hours.

Technically, this problem is caused by performance bugs in Solana, in particular — the transaction processing unit (TPU). During market volatility, bots are heavily spraying duplicate spam and that bogs down the TPU.

This article elaborates on the design of the TPU and highlights some intricacies.

Transactions

What’s included in a transaction?

When a user submits a transaction, it includes a precompiled representation of a sequence of instructions, called “message:

The message must be signed by one or or more keypairs:

The signed signatures are also included in the transaction, and together with the message content, are sent to the Solana cluster via RPCRequest:

The Transaction Processing Unit

Upon receiving a transaction, the TPU has three main stages to process it.

  1. fetch_stage batches input from a UDP socket and sends it to 2.
  2. sigverify_stage verifies if the signature in the transaction is valid and send the transaction to 3.
  3. banking_stage processes the verified transaction
All these three stages are executed by different threads communicated via message passing using crossbeam_channel (a multi-producer multi-consumer channel).

1. fetch_stage

The TPU creates a channel of unbounded capacity with (packet_sender, packet_receiver):

let (packet_sender, packet_receiver) = unbounded();
let fetch_stage = FetchStage::new_with_sender(
    transactions_sockets,
    tpu_forwards_sockets,
    tpu_vote_sockets,
    exit,
    &packet_sender,
    &vote_packet_sender,
    poh_recorder,
    tpu_coalesce_ms,
);

The fetch_stage reads the packets on the transaction sockets, and simply forwards them to the sigverify_stage using packet_sender .

2. sigverify_stage

The sigverify_stage receives the transaction packets from packet_receiver and uses TransactionSigVerifier to verify if the signature in each packet is valid.

let (verified_sender, verified_receiver) = unbounded();
let sigverify_stage = {
    let verifier = TransactionSigVerifier::default();
    SigVerifyStage::new(packet_receiver, verified_sender, verifier)
};

It assumes each packet contains one transaction, and the packets are verified in parallel using all available CPU cores (and it can also be done on GPU if available).

Note that the TPU creates another channel (verified_sender, verified_receiver), and it uses verified_sender to forward the verified transactions to the next stage (banking_stage).

The verifier is of significant interest
It not only verifies the signature but is also piggybacked to filter out redundant packets and discard excessive packets in order to improve performance. The fixes to the recent performance degradation are applied in this component.

It contains three steps:

  • deduper — The filter that removes duplicated transactions (typically sent by bots)
  • discard_excess_packets — The filter that discards excessive packets from each IP address. It groups packets by IP addresses, and allocates max_packets evenly across addresses.
  • verify_batches — it uses ed25519_dalek to verify message signatures in those packets that are not discarded in the previous steps.

The discard_excess_packets function is defined as:

The ed25519_dalek::PublicKey.verify function is defined as:

It takes a signature and a message as input, and verifies the signature with respect to the message using the key pair’s public key.

Note that the ed25519_dalek::PublicKey.verify function is non-trivial and subtle, and it is not audited.

3. banking_stage

The banking_stage creates a thread which executes in a loop to process the received transactions batch by batch. The number of transactions in each batch is limited by

const MAX_NUM_TRANSACTIONS_PER_BATCH: usize = 128;

The banking_stage uses an important component called bank to load and execute transactions. The function is defined as:

For each transaction, the bank uses MessageProcessor to process the transaction message:

This method calls each instruction in the message over the set of loaded accounts.

For each instruction, it calls the program entrypoint and verifies that the result of the call does not violate the bank’s accounting rules.

Internally, the bank creates an InvokeContext to execute each instruction:

let result = invoke_context.process_instruction(
    &instruction.data,
    &instruction_accounts,    
    program_indices,
    &mut compute_units_consumed,    
    timings,
);

Each transaction has a limited compute budget (by default 200_000 units), defined in ComputeBudget :

The bank involves a lot of complications to execute an instruction, such as

  • loading the specified programs
  • creating the rbpf vm to execute the bfp code
  • dealing with CPI (cross-program invocation) via syscalls
  • verifying that the called programs have not misbehaved
  • measuring the computing units, etc.

We will elaborate on these details and the bank lifecycle in the next article.


Soteria audit

Soteria is founded by leading minds in the fields of blockchain security and software verification.

We are pleased to provide audit services to high-impact Dapps on Solana. Please visit soteria.dev or email contact@soteria.dev