[BUG] VMM/PMM State Corruption

**Describe the bug**
There is a bug in the VMM/PMM where it corrupts its state by invalidating a frame that is still mapped somewhere else.

**To Reproduce**
1. The testcase `test_kernel_task_func2` demonstrates the issue (#341)
2. Set `KT2_ROUNDS` to 1 and `KT2_CHUNKS` to 1000, to rule out the that the issue is caused by the unmapping/deallocation
3. The testcase will either result in an assertion in the PMM getting triggered or by locking-up
4. (optional) Fix the `srand` seed to one where you repeatedly observe the assertions

**Expected behavior**
The testcase just passes. Throwing more memory at the issue, by increasing `.qemu_config:QEMU_RAM`, has seemingly no impact.

**Additional context**
After a certain number of allocations, `mfn_invalid` suddenly starts to return false. This is caused by the underlying `mbi_get_memory_range` call. If you print out the `_start` and `_end variable`, you will notice that shortly before the assertions gets triggered, these ranges change. However, the elements of `multiboot_mmap` are never actively modified once initialized.

<details>
<summary>Patch:</summary>

```
diff --git a/arch/x86/boot/multiboot.c b/arch/x86/boot/multiboot.c
--- a/arch/x86/boot/multiboot.c
+++ b/arch/x86/boot/multiboot.c
@@ -234,12 +234,15 @@ int mbi_get_avail_memory_range(unsigned index, addr_range_t *r) {
 
 int mbi_get_memory_range(paddr_t pa, addr_range_t *r) {
     paddr_t _start, _end;
+    printk("%s()\n", __func__);
 
     for (unsigned int i = 0; i < multiboot_mmap_num; i++) {
         multiboot2_memory_map_t *entry = &multiboot_mmap[i];
 
         _start = _paddr(entry->addr);
         _end = _paddr(_start + entry->len);
+        printk("start: 0x%lx\n", _start);
+        printk("end:   0x%lx\n", _end);
 
         if (pa >= _start && pa < _end)
             goto found;
```
</details>

<details>
<summary>Results:</summary>

```
mbi_get_memory_range()
start: 0x0
end:   0x9fc00
start: 0x9fc00
end:   0xa0000
start: 0xf0000
end:   0x100000
start: 0x100000
end:   0xffe0000
mbi_get_memory_range()
start: 0xffffffffff000
end:   0x1fffffffffe000
start: 0xffffffffff000
end:   0x1fffffffffe000
start: 0xffffffffff000
end:   0x1fffffffffe000
start: 0xffffffffff000
end:   0x1fffffffffe000
start: 0xffffffffff000
end:   0x1fffffffffe000
start: 0xffffffffff000
end:   0x1fffffffffe000
start: 0xffffffffff000
end:   0x1fffffffffe000
************** PANIC **************
CPU[1]: BUG in tmp_map_mfn() at line 45
***********************************
```
</details>

To better localize the issue, we put many `BUG_ON(mfn_invalid(mfn))` into the pagetable code. The two interesting checks are following, as the first one passes while the second one gets triggers the assertion:

```
diff --git a/arch/x86/pagetables.c b/arch/x86/pagetables.c
--- a/arch/x86/pagetables.c
+++ b/arch/x86/pagetables.c
@@ -250,7 +250,9 @@ static mfn_t get_pgentry_mfn(mfn_t tab_mfn, pt_index_t index, unsigned long flag
         mfn = frame->mfn;
         set_pgentry(entry, mfn, flags);
         tab = tmp_map_mfn(mfn);
+        BUG_ON(mfn_invalid(tab_mfn));
         clean_pagetable(tab);
+        BUG_ON(mfn_invalid(tab_mfn));
     }
     else {
         /* Page table already exists but its flags may conflict with our. Maybe fixup */
```

Based on this and the values that `_start` and `_end` take, it seems that `clean_pagetable` cleans a frame that is used by the memory manager itself. So a frame that is already used seems to be given out by the pmm. We have not found the exact cause of this yet.


Props to Sandro Rüegge (@sparchatus) for helping me to debugging this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] VMM/PMM State Corruption #342

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[BUG] VMM/PMM State Corruption #342

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions