friffri
I thought I'd give an update on this.
A bit after that post I implemented privilege separation (previously everything ran in kernel mode) and a lot of syscalls (very boring work, by the way).
After that I started implementing a shell, but unfortunately I ran into a bug which had me stuck for a very long time and I only managed to fix it now.
It was a very weird one, assembly instructions were not working correctly. Imagine running a line of code like "x = 0" and seeing immediately after that "x" was not zero.
I thought it was a bug in QEMU. I wanted to test if this was true, and the best way seemed to be to finally get my OS working in real hardware.
So I loaded my OS into an SD card, inserted it into my raspberry pi and ... it was not even starting. I couldn't even get serial port output and without a debugger the best I could do was think really hard, compile and try. Eventually I managed the system to get single characters in the output but my code was not behaving correctly at all and I couldn't even get simple loops to work.
That sent me on a side-quest in the side-quest: I wanted to get a physical debugger, connect to the physical raspberry pi and see what the hell is going on. I found a website ( https://sysprogs.com/VisualKernel/tutorials/raspberry/jtagsetup/ ) which explained how to do that for debugging the Linux kernel: not exactly my case, but the procedure is the same . I got a J-Link and hooked it up and had this monstrosity on my desk:
After I connected this thing once, I realized it's so fragile it was only a matter of time until I threw everything in the trash. This whole setup worked exactly 1 time and I was never able to replicate afterwards.
I decided to learn a bit of KiCad and design a Pi Hat that would act as an adapter between the ribbon cable and the raspberry pi. I got it printed from JLPCB and here it is:
Much tidier now! I did make a mistake on the PCB, but luckily it was fixable with a solder bridge.
With a hardware debugger I was able to finally see what was going on: turns out my code was getting misplaced in memory, so all jumps and memory loads were messed up. Once I figured that out it was an easy fix and my code started. Now I had 2 more problems: my SD card doesn't seem to be working on real hardware. I haven't fixed that yet, mostly because it's very annoying to debug as the SD card uses the same pin as the debugger so once you initialize the SD card hardware the debugger connection gets dropped. As a workaround I embedded the entire disk image inside the kernel itself.
So I could now test if my problem was really a bug in QEMU or something else. Turns out ... QEMU was right, I had the same issue also on real hardware. So I didn't know what to do, I had no ideas left and decided to do something else with my life for a month or two.
That's until I asked for help on another forum and someone immediately figured out that the CPU cache was probably messed up. They were right, adding a cache flush instruction after switching processes immediately fixed the issue.
A 1 line fix, I feel very dumb having wasted so much time on something like this.