Jan 22 2022
12:22 AM seb_kuzminsky: norias: the linuxcnc build artifact is a bunch of executables and libraries that can be used to run linuxcnc. If you want to build an ISO you need to do more stuff (see for example https://github.com/LinuxCNC/buster-live-build)
12:28 AM seb_kuzminsky: does anyone know if rtai_malloc() and rtai_kmalloc() zero the shared memory on the first allocation?
12:28 AM pere: seb_kuzminsky: hi. :)
12:28 AM seb_kuzminsky: hi pere
12:28 AM seb_kuzminsky: i think i just found & fixed the bug that's been breaking the build lately
12:28 AM pere: seb_kuzminsky: how are things? been busy elsewhere myself.
12:28 AM seb_kuzminsky: things are good here!
12:28 AM seb_kuzminsky: how about you?
12:31 AM pere: I was wrong, rtai_malloc() uses rt_shm_alloc(), but also its documentation, <URL: https://www.rtai.org/userfiles/documentation/magma/html/api/group__shm.html#ga24 > say nothing of clearing the memory. I would thus make sure to clear the memory myself. malloc() do not clear the memory either, so it make sense.
12:32 AM seb_kuzminsky: shmget() does clear memory the first time through
12:33 AM seb_kuzminsky: we have code (in src/rtapi/rtai_rtapi.c and /rtai_ulapi.c) that has been working for decades, despite the lack of promises in the rtai docs
12:35 AM pere: sure, the kernel is supposed to only return cleared memory, to avoid information leak out of the kernel.
12:36 AM pere: I had code that worked for 10 years and suddenly broke, when CPUs got support for read only memory segments. :)
12:36 AM pere: the code was broken, of course, trying to write to a read only variable, but that error had been ignored for a long time.
12:41 AM seb_kuzminsky: it's amazing any of this stuff works at all, ever
12:48 AM pere: well, there is calloc() if you want zeroed memory. :)
12:52 AM seb_kuzminsky: when building for uspace we get our shared memory from posix shmget(), which zeros when the memory is first allocated (and then doesn't touch the memory when later callers get shm for the same key, of course). The thing i'm not 100% certain about is what happens in our rtai builds.
12:52 AM seb_kuzminsky: they *work*, but the rtai docs don't *say* that they'll work
12:54 AM seb_kuzminsky: we do manually zero part of the memory we get from rtai
12:55 AM pere: defensive programming is perhaps a good approach here, when you are unsure? ie always clear to be sure.
12:57 AM seb_kuzminsky: yeah, i agree
12:57 AM seb_kuzminsky: https://github.com/LinuxCNC/linuxcnc/blob/master/src/rtapi/rtai_rtapi.c#L1018-L1031
12:57 AM seb_kuzminsky: we clear the first 4 bytes by hand
12:58 AM pere: seb_kuzminsky: btw, what is your view on <URL: https://github.com/LinuxCNC/linuxcnc/pull/1470 >? Should newer po4a be used or the adoc files be changed?
12:58 AM seb_kuzminsky: hal_data is one of these rtapi shmem blocks: https://github.com/LinuxCNC/linuxcnc/blob/master/src/hal/hal_priv.h#L155-L186
12:59 AM seb_kuzminsky: the first 4 bytes is the `version` field, which we check at startup - if it's zero we know that hal_data has not been initialized yet, so we initialize it and set `version` to a non-zero magic cookie
01:01 AM seb_kuzminsky: the punchline is that we access other fields of hal_data without explicitly initializing them, namely `mutex` which is after the first 4 bytes: https://github.com/LinuxCNC/linuxcnc/blob/master/src/hal/hal_lib.c#L2880-L2895
01:01 AM seb_kuzminsky: if `mutex` is initialized to a non-zero value, it will appear locked, and the system hangs instead of starting up
01:02 AM seb_kuzminsky: but that never happens, so mutex must be initialized to zero by whatever rtapi_shmem_new() got it from
01:03 AM seb_kuzminsky: in uspace (current default) that's shmget(), which does zero the memory the first time through: https://man7.org/linux/man-pages/man2/shmget.2.html
01:03 AM seb_kuzminsky: rtai *must* behave the same way, since linuxcnc works on rtai, i just can't find any documentation that makes this promise explicit
01:03 AM pere: perhaps you get lucky, and the first use of this memory is when it is fresh from the kernel, but later calls might get random garbage?
01:04 AM seb_kuzminsky: i don't think that's the case - in our CI system we have RTAI machines that have run the test suite hundreds or thousands of times between reboots, and they never crash or lock up
01:05 AM pere: reboots are not the issue here, but process startup time. I assume each test start a new process.
01:05 AM seb_kuzminsky: yes, each test starts a new process
01:06 AM seb_kuzminsky: each test runs several processes, plus (in RTAI) a bunch of kernel modules
01:06 AM seb_kuzminsky: they all use rtapi to access memory that they all share
01:07 AM seb_kuzminsky: one of the users is the first one, and it realizes this (because hal_data->version is zero) and initializes the data structure
01:07 AM seb_kuzminsky: subsequent users of the shared memory see that hal_data->version is non-zero, and know from this that the data structure is initialized and ready to use
01:08 AM pere: I talk about malloc(), not shmget()... shmget will always get the memory from the kernel.
01:09 AM seb_kuzminsky: the memory from malloc(3) is uninitialized, and that's fine. the code i'm worrying about doesn't use malloc
01:09 AM pere: The pending Debian package related changes I talk about is <URL: https://github.com/LinuxCNC/linuxcnc/pull/1396 >. I guess it is needed for the next upload.
01:09 AM seb_kuzminsky: i'll take a look
01:11 AM seb_kuzminsky: the debian/copyright stuff seems right, but i'm not 100% sure about the debian/rules.in changes
01:11 AM seb_kuzminsky: it looks harmless, but i thought we cleaned include/ already?
01:12 AM pere: seb_kuzminsky: I am not sure about the rules changes either.
01:12 AM pere: if you want to handle those separately, please state so in the PR, then we can split out the rules changes into a separate PR.
01:16 AM seb_kuzminsky: ok i added a comment to that effect
01:28 AM pere: as for the po4a docs migration, I have a build system proposal working with debuild, and am waiting for silopolis[m] to complete the work on the french migration. The output from that effort is a fuzzy PO file to be included in the build patch in my po4a-build branch before pushing it towards master.
01:32 AM seb_kuzminsky: that sounds great!
01:32 AM seb_kuzminsky: i'm super excited for that
01:39 AM pere: and several translations have been updated on <URL: https://hosted.weblate.org/projects/linuxcnc/#languages >. Sadly no-one is at 100% yet. Italian is closest, and no-one is working on the Italian translation on weblate yet. :)
01:48 AM pere: also, I am sorry for not following up on the plan to organize a meeting. I was hoping for feedback from the other organizers on the june proposal, but given that no-one objected, I guess we move on with it.
01:59 PM pere: smoe: around?
09:21 PM norias: hmpf
09:21 PM norias: I should start logging
09:22 PM norias: looks like someone mentioned me but it's scrilled away
09:24 PM norias: index?
09:26 PM norias: seb_kuzminsky: so, the build artifact is assumed to run on a base install of debian, then?
11:58 PM linuxcnc-build_: build #2096 of 1640.rip-buster-rtpreempt-amd64 is complete: Failure [4failed compile runtests] Build details are at http://buildbot.linuxcnc.org/buildbot/builders/1640.rip-buster-rtpreempt-amd64/builds/2096 blamelist: CMorley <email@example.com>
11:58 PM linuxcnc-build_: build #8494 of 0000.checkin is complete: Failure [4failed] Build details are at http://buildbot.linuxcnc.org/buildbot/builders/0000.checkin/builds/8494 blamelist: CMorley <firstname.lastname@example.org>