#linuxcnc-devel Logs
Jul 01 2020
#linuxcnc-devel Calendar
03:30 PM cerna: Is somebody working on fixing this https://github.com/LinuxCNC/linuxcnc/blob/9ad9e729976b45daa53cfea71902ebd068436daa/tests/rtapi_printf.0/test.sh#L3 test?
04:44 PM jepler: what's the problem?
04:52 PM jepler: https://github.com/LinuxCNC/linuxcnc/pull/897
04:57 PM cerna: C++11: https://pastebin.com/e11zCpf2 on older distros: http://buildbot.linuxcnc.org/buildbot/builders/1301.rip-precise-rtai-i386/builds/6149/steps/runtests/logs/stdio
04:58 PM cerna: Ah, you have found it already.
05:16 PM jepler: thanks for the mention, even if it was vague
05:29 PM andypugh: I am making progess tracking down this RTAI thing.
05:29 PM andypugh: I am at least now seeing clues.
05:29 PM andypugh: https://paste.ubuntu.com/p/pVyW7mWWvG/
05:30 PM Tom_L: yay!
05:31 PM andypugh: jepler: I am a bit concerned about Streamer. It calls switch_rtai_tasks a _lot_
05:34 PM andypugh: Sorry. I mean streamer: https://paste.ubuntu.com/p/S6WZRPzDrH/
05:35 PM andypugh: No, I mean Sampler…
05:35 PM Tom_L: :)
05:36 PM andypugh: But what bothers me is various modules get “unused” in the middle of all that.
05:36 PM jepler: the instruction 48 0f ae 0a decodes as fxrstor64 [rdx]
05:37 PM jepler: which is basically "restore SSE "floating point" registers from the buffer with address in rdx"
05:37 PM andypugh: That is what objdump has siggested too.
05:37 PM jepler: > RDX: ffff9ee4b6183f28
05:37 PM jepler: that's not an obviously bad address, so it must be a subtly wrong address
05:38 PM jepler: hmm how far off the stack pointer is it? RSP: 0000:ffff9ee4b6183eb0
05:38 PM jepler: 120 bytes greater than RSP
05:39 PM andypugh: I am confused that, whilst objdump leads me to that instruction too: https://paste.ubuntu.com/p/dRkHKJQdFb/
05:39 PM andypugh: the printk statements at the top of that module are not seen
05:40 PM jepler: > If a memory operand is not aligned on a 16-byte boundary, regardless of segment. (See alignment check exception [#AC] below.)
05:40 PM jepler: OK, this is another stack fuckery
05:40 PM jepler: the pointer value in rdx is not a multiple of 16, its hex ends ....8
05:40 PM andypugh: Yes, even I can see that
05:41 PM andypugh: Where is that quote from?
05:41 PM jepler: https://www.felixcloutier.com/x86/fxrstor
05:41 PM jepler: it is a reason for this instruction to give "#GP(0)" general protection fault
05:41 PM andypugh: W00t!
05:42 PM andypugh: (Sorry, this is the first progress in months of staring blankly at this)
05:42 PM jepler: you sound pretty excited for someone whose computer keeps crashing and rebooting
05:42 PM Tom_L: what detectives we have here!
05:42 PM andypugh: I know we don’t have a fix, but we do have an identifiable problem.
05:43 PM jepler: well, we know the kernel sets flags that lead to 8 byte stack alignment but that to get 64-bit math you need 16 byte stack alignment; we went through it with the other rtai fork
05:45 PM jepler: I forget what I ended up suggesting, except that I believed it needed to be applied in rtai and included in the CFLAGS that rtai provides for use building kernel modules. Something in the vicinity of: -mstackrealign -mincoming-stack-boundary=3 -mpreferred-stack-boundary=4
05:46 PM jepler: can I just say, it's obvious nobody is using this code!?
05:47 PM andypugh: RTAI Vulcano does have these flags.
05:47 PM andypugh: Or at leastg a sub-set, it is certainly an issue Paolo is familar with,
05:53 PM jepler: 702 save_fpenv(fpu_task->fpu_reg);
05:53 PM jepler: seems like it has to be this line, but in that case the location of the FPU environment doesn't seem like it should be related to the stack
05:54 PM jepler: base/sched/sched.c:static RT_TASK *rt_smp_fpu_task[RTAI_NR_CPUS];
05:54 PM jepler: base/sched/sched.c:#define fpu_task (rt_smp_fpu_task[cpuid])
05:54 PM jepler: fpu_task is a macro that redirects into the global static rt_smp_fpu_task according to the cpuid
05:55 PM jepler: oh I confused rdi and rdx
05:57 PM andypugh: Crashed again
05:57 PM andypugh: [10700.619602] RDX: ffff889448303f20 RSI: 0000000000000001 RDI: ffff88948fa10100
05:58 PM jepler: same instruction though?
05:58 PM andypugh: [10700.619592] Code: e0 37 87 c0 48 39 d9 74 25 4c 8d 81 00 01 00 00 31 c0 89 c2 4c 89 c7 48 0f ae 07 48 89 1c f5 e0 37 87 c0 48 8d 93 00 01 00 00 <48> 0f ae 0a 0f 31 48 c1 e2 20 89 c0 48 89 2c f5 00 d1 86 c0 48 8d
05:59 PM jepler: [ 575.335970] Code: 48 39 f2 0f 84 ea fe ff ff 4c 8d 92 00 01 00 00 31 c0 89 c1 4c 89 d7 48 0f ae 07 4a 89 34 c5 e0 67 8c c0 48 8d 96 00 01 00 00 <48> 0f ae 0a e9 c0 fe ff ff 0f 1f 40 00 48 83 ec 08 8b 35 a6 a3 02
05:59 PM jepler: similar code, it's `48 0f ae 0a` again
05:59 PM jepler: but it's not all the same
06:00 PM andypugh: 329 leads is to 329: 48 0f ae 0a fxrstor64 (%rdx)
06:00 PM jepler: but this time rdx is a multiple of 16 so .. you have two problems?
06:01 PM andypugh: I have many problems :-)
06:02 PM jepler: I'm bummed that the new_task->lnxtsk code is very similar, that means that whatever _this_ is, uspace+rtai is likely to be affected too
06:03 PM andypugh: Is the <48> in there some kind of subtle clue?
06:03 PM jepler: the angle brackets mean "this is the first byte of the faulting instruction"
06:03 PM jepler: that's all
06:03 PM andypugh: Yes, I have already shown that the problem also exists in LXRT realtime. Otherwise there would be a LiveCD using that by now.
06:04 PM jepler: yay boo
06:17 PM andypugh: Should I be concerned that that data around the faulting instruction is not a perfect match for the Objdump?
06:19 PM andypugh: Offset 368 in this:
06:19 PM andypugh: 355: 48 0f ae 07 fxsave64 (%rdi)
06:19 PM andypugh: 359: 48 89 1c f5 00 00 00 mov %rbx,0x0(,%rsi,8)
06:19 PM andypugh: 360: 00
06:19 PM andypugh: 361: 48 8d 93 00 01 00 00 lea 0x100(%rbx),%rdx
06:19 PM andypugh: 368: 48 0f ae 0a fxrstor64 (%rdx)
06:19 PM andypugh: 36c: 0f 31 rdtsc
06:19 PM andypugh: 36e: 48 c1 e2 20 shl $0x20,%rdx
06:19 PM andypugh: 372: 89 c0 mov %eax,%eax
06:19 PM andypugh: 374: 48 89 2c f5 00 00 00 mov %rbp,0x0(,%rsi,8)
06:19 PM andypugh: 37b: 00
06:22 PM jepler: Presumably 48 89 1c f5 00 00 00 becomes 48 89 1c f5 e0 37 87 due to "relocation", i.e., the address of _something_ becomes fixed when the modules are loaded
06:22 PM jepler: objdump has a "-r" flag which will at least tell you where the relocations are
06:23 PM jepler: in a different run those specific offsets will vary, too
06:23 PM andypugh: OK
06:28 PM andypugh: Now, I added a prink at the top of that function, but I am not seeing the output. But looking at the objdump, with the added printk the changes _seemed_ to be at the bottom. Is there any chance that the compiler would decide to put a printk out-of-order?
06:38 PM andypugh: Is there an objdump flag to reset the counter at the top of each function? And (better still) to see the C-code line that correspondss to it?
06:38 PM andypugh: addr2line can do that bit…
06:40 PM jepler: objdump's "-S" flag tries to add source
06:40 PM jepler: you can "gdb" a .ko file and "disas" a function by name. "disas/s" (and deprecated /m) also try to associate the source and the disassembly together
06:40 PM jepler: these both assume debug information is included, as by -g flag at compile time
06:40 PM jepler: (why -g, nobody knows:)
06:41 PM jepler: for better gebugging
06:41 PM Tom_L: a flag for the gods
06:41 PM jepler: and this is certainly a god bug, in the same way higgs ended up called the god particle
06:42 PM andypugh: I am compiling with -g (in RTAI and LinuxCNC) and using the -S but it seems not to be working. Possibly I am putting the -g in too early, or too late
06:43 PM andypugh: KCONFIG_DIR=./base/config/kconfig
06:43 PM andypugh: CC ?= gcc -g
06:43 PM andypugh: #CC ?= gcc -Wall -pedantic
06:44 PM andypugh: I don’t speak “make” so ?= is a new one to me.
06:45 PM Tom_L: set a variable if it is not already set
06:45 PM Tom_L: https://www.gnu.org/software/make/manual/html_node/Setting.html
08:10 PM Centurion-Dan2 is now known as Centurion_Dan