#linuxcnc-devel Logs

May 12 2020

#linuxcnc-devel Calendar

05:46 AM hazzy-m: highoctane: If you want to check axis homed states rather than joints you can use stat.hommed[Anum] which will return either 0 or 1
12:31 PM andypugh: jepler: Are you around? I am hoping you know more than me about RTAPI.
12:57 PM jepler: andypugh: I am now. what's up?
12:57 PM andypugh: Still strugging with RTAI.
12:58 PM jepler: this crash at module unload?
12:58 PM andypugh: With debugging on I have found that the system (nearly) always crashes at the same point, and there was a warning printed at that point in cases where it doesn’t crash
12:59 PM jepler: can you remind me what message(s) you're seeing? I wonder if they're the same as me or not. Only 1 of seb's 4 were the same as mine.
12:59 PM andypugh: https://pastebin.ubuntu.com/p/SBksz6jKSP/
12:59 PM jepler: typical one for me: https://gist.github.com/jepler/212d2093f7feb22232f6d2a432e9cacd
01:00 PM andypugh: (when it crashes, we see nothing after HAL: component 04 removed, name = 'sampler'
01:01 PM andypugh: Ah, no. Completely different.
01:01 PM jepler: I'm not sure whether this is the same or different than mine, which can be reproduced without involving linuxcnc at all
01:01 PM andypugh: I suspect multiple problems.
01:02 PM jepler: is "failed to pause" something you added locally, I don't find the string in my source?
01:02 PM andypugh: If the RTAI testsuite crashes then it seems that is something to mention to RTAI. Though they might not care.
01:02 PM andypugh: Yes, I added that.
01:03 PM andypugh: rt_task_suspend in RTAI is returning -EINVAL
01:03 PM andypugh: https://github.com/NTULINUX/RTAI/blob/master/src/sched/api.c#L335
01:04 PM andypugh: So, I am wondering if we are deleting a task that is still being polled by RTAI, with unfortunate results?
01:16 PM jepler: I don't spot what's wrong with our code, that we could get einval from rt_task_suspend.
01:16 PM andypugh: I don’t see _why_ it is returning -EINVAL. I added a printk in the RTAI code, and that says: task->magic = 9ad25f6f RT_TASK_MAGIC = 9ad25f6f
01:16 PM jepler: typedef struct rt_task_struct { ... } RT_TASK attribute ((aligned (L1_CACHE_BYTES)));
01:16 PM jepler: This type is declared as being aligned to L1_CACHE_BYTES, but we do not align it
01:16 PM jepler: then is there a second way it can return EINVAL besides the magic test?
01:17 PM andypugh: It’s a short routine, at the link above
01:20 PM andypugh: (RTE_UNBLKD is definitely not -22 )
01:21 PM jepler: there's no use of rt_task_suspend in their "testsuite" so perhaps it's untested
01:21 PM jepler: if you print X and Y and they print the same hex value; and yet the code that compares the enters the branch that says they're unequal, something really weird is happening
01:22 PM andypugh: I am glad it isn’t me missing something ovious. In python you might guess one was a string.
01:23 PM andypugh: But these are both formatted %x
01:23 PM jepler: include/rtai_sched.h:#define RT_TASK_MAGIC 0x9ad25f6f // nam2num("rttask")
01:23 PM andypugh: I am going to comment-out the erro return and see what we get
01:24 PM jepler: the rtai testsuite doesn't use rt_task_init, either, it uses "rt_task_init_schmod".
01:25 PM jepler: uspace_rtai uses rt_task_init (but it has a different signature than in kernel space, returning a pointer to a freshly allocated RT_TASK), and it has a lockup problem at unload as well
01:26 PM jepler: uspace doesn't print anything if the return values are unexpected
01:28 PM andypugh: It seems like something should be suspending the tasks before deleting them, rather than warning then suspending them in the delete routine.
01:29 PM andypugh: I might need to back-track, I am not sure that the -22 comes directly from rt_task_suspend. It certainly doesn’t now...
01:30 PM andypugh: Hmm, sorry, complete red herring it seems.
01:31 PM jepler: simplest assumption is, bugs in the rtai latency test, uspace-rtai and kernel-mode linuxcnc rtapi are not ALL going to be identical, so if they all 3 crash at unload it's something in the core of rtai that needs to be solved by them
01:31 PM jepler: or else there are 3 different bugs, all of which are reproduced by "try doing something 9000 times, eventually it'll crash"
01:33 PM andypugh: I have found an error in RTAPI: https://github.com/LinuxCNC/linuxcnc/blob/master/src/rtapi/rtai_rtapi.c#L906
01:33 PM andypugh: Probably not the cause of the problem, but the cause of the -EINVAL
01:34 PM jepler: I still don't see it, so tell me
01:34 PM andypugh: rt_task_suspend returns positive values on sucess, I think.
01:34 PM jepler: oh, I see.
01:35 PM jepler: so it has successfully suspended, and the "suspend depth" is now greater than zero
01:35 PM jepler: task->suspdepth = 1;... return task->suspdepth;
01:37 PM andypugh: Whatever that means
01:39 PM andypugh: Changing rtai_rtapi.c rtapi_task_pause() to check for <0 instead removes the “WARNING: Still running” warning too.
01:39 PM jepler: you can suspend a task 37 times; if you unsuspend it only 36 times, it's still suspended
01:39 PM andypugh: So, that looks like a fix to make anyway
01:40 PM jepler: if you suspend it RTE_UNBLKD times, then you get a special hat
01:40 PM jepler: I agree
01:40 PM jepler: I doubt it will address the original crashing issue, but that's me being a pessimist
01:40 PM andypugh: Surely the trick is to unsuspend it EINVAL times, for real fun?
01:41 PM andypugh: I shere your pessimism, but the excuse to walk away from the PC for long enough to run 10000 sycles of realtime_torture tempts me to try it anyway.
01:41 PM jepler: hehe
01:47 PM jepler: I wonder if creating 1000 tasks would be 1000x likelier to crash
02:06 PM andypugh: That’s interesting. I still got a crash (after 1580 cycles) but at a different point in the unload process.
02:07 PM andypugh: I am repeating to see if something has actually changed.
02:33 PM andypugh: jepler: Ref your Gist of a crash, Alec says “RTAI does not work in QEMU”.
02:33 PM Tom_L: andypugh, you get to a point i can test again if you want
03:14 PM dwrobel: fyi: “RTAI does not work in QEMU”: I have: $ uname -r -> 3.4-9-rtai-686-pae running on $ qemu-x86_64 --version -> qemu-x86_64 version 4.1.1
03:23 PM andypugh: Well, it runs, Jeplers test looked to manage 1300 cycles of the RTAI testsuite.
08:23 PM jepler: andypugh: we don't need "rtai meets realtime deadlines in qemu", we just need "rtai doesn't crash in qemu". anyway, rtai crashes on real hardware so ..