Spectre and Cache
We all know that it takes much time when CPU accessing memory considering CPU caculation speed. In order to reduce the large amount of time consumption caused by CPU access,
This article will not introduce the exact structure of memory, if you get interested in it, please refer to this.
Let’s assume that our computer will do this operation:
tmp = Array[x];
CPU is trying to get
Array[x] from cache first instead of memory, and here are two situations:
Array[x]is in the cache, so CPU gets its value.
Array[x]is not in the cache, so CPU needs to access memory to get its value, which will cost more time.
Branch Prediction and Speculative Execution
Modern CPUs are multi-core and multi-threaded, that is, support parallel execution of instructions. Therefore, if there is a branch jump condition that is related to a data, but the cache does not hit the data and can only perform the memory fetch operation, the program will not serially block to wait for the end of the memory fetch to perform the branch jump, but will attempt to perform branch prediction and speculative execution. When the memory data read by the CPU comes back, the CPU then confirms according to the content of the data and the logic of the branch condition whether its speculative execution is valid, and if it is invalid, the calculation result is discarded and restored to the previous state. If effective, continue execution, which greatly improves the efficiency of the CPU.
But here is the thing: If the result of speculative execution is discarded, the code executed during speculative execution may still affect the CPU cache. When the CPU is restoring the state, the cache will not be restored, and the CPU will only restore the state of the related registers.
Let’s look at this demo:
// a1_size and a2 are not in the cache
In the demo above, CPU knows that it is a conditional judgment, and because
a1_size is not in the cache, CPU needs to access memory to get the value.
Then, CPU saves a checkpoint, and executes instructions behind speculatively, which means that CPU treats
idx as a variable that is smaller than
idx is large enough, it will cause OOB reading, and
a1[idx] will be treated as an index of
a2 and CPU will access memory based on
a1[idx]. Finally accessed data in
a2 will be submitted from the memory to the CPU cache.
a1_size is read from the memory, the CPU checks the branch condition again, and it finds that it does not meet the condition and cannot continue execution. So it will discard the current calculation result, then restore to the previous state, and continue to execute the code of another branch. But the data in
a2 has been submitted to the CPU cache.
The demo may be a little bit different from real PoC, but it’s easier for us to understand.
How does PoC work?
The content we want to leak is defined:
char* secret = "The Magic Words are Squeamish Ossifrage.";
malicious_x is the shift of
secret relative to
size_t malicious_x = (size_t)(secret - (char *)array1); /* default for malicious_x */
int score, len = strlen(secret);
valuestores the character with the highest hit (the most likely character)
valuestores the character with the second highest hit
scorestores the number of hits with the highest character
scorestores the number of hits for the second highest character
Guess the data byte by byte:
while (--len >= 0)
Let’s look inside
readMemoryByte to see what happens.
In this function, we will try 1000 times to guess the answer, and break if the confidence condition is met:
if (results[j] >= (2 * results[k] + 5) || (results[j] == 2 && results[k] == 0))
At the beginning, we need to flush
array2 from cache:
for (i = 0; i < 256; i++)
Then there is another loop of 30 times. In this loop, it can be regarded as 5 6-cycles. The first 5 cycles are used to train the CPU, and then the conditional branch prediction is triggered for the last:
/* 30 loops: 5 training runs (x=training_x) per attack run (x=malicious_x) */
After these 30 cycles, through side-channel attacks, the access time of each item in
array2 is counted, and the ASCII value of the string is deduced:
/* Time reads. Order is lightly mixed up to prevent stride prediction */
At last, we need to find the highest and second highest results:
/* Locate highest & second-highest results results tallies in j/k */
victim_function, it’s similiar to the demo mentioned above:
uint8_t temp = 0; /* Used so compiler won't optimize out victim_function() */
The first 5 cycles are essentially a kind of training for the CPU. Therefore, when the branch judgment of the sixth cycle causes speculative execution, the CPU is more inclined to choose the “no cross-border” branch, and therefore speculative execution. After the boundary check fails, the speculative execution result is discarded, but the value of
array2[array1[x] * 512] has been stored in the cache. When reading the content of
array2[mix_t * 512], if you find that the reading time is abnormally fast, you can determine that
mix_t is the answer.
CPU Spectre is a novel cache-level attack method. However, this attack method also has its limitations, that is, the attacker needs to have a greater degree of freedom in the entire memory allocation and process control, and it is difficult to find such an exploit at the application level.