#linuxcnc-devel Logs

Nov 17 2023

11:38 AM rigid: why is the amount of spindles and joints located in EMC_TRAJ_STAT and not EMC_MOTION_STAT? e.g. emcTrajSetSpindles() even sends EMCMOT_SET_NUM_SPINDLES which in turn sets motion_num_spindles [1] instead of num_spindles. This looks wrong.
11:47 AM rigid: [1] https://github.com/LinuxCNC/linuxcnc/blob/master/src/emc/motion/motion.c#L53
11:47 AM rigid: I mean, does that make sense semantically to associate number of joints/spindles to EMC_TRAJ?
11:47 AM rigid: I can't see a reason not to move them into EMC_MOTION. That way we wouldn't need to send the state of _every_ possible joint/spindle on EMC_MOTION_STAT updates.
11:49 AM rmu: emcTrajSetSpindles sends EMCMOT_SET_NUM_SPINDLES to realtime
11:49 AM rigid: yes, the TRAJ context receives a message to set the number of spindles (why? what has traj to do with number of installed spindles anyway?). Then it sets the number of spindles in the realtime EMCMOT context.
11:50 AM rigid: yep, but why traj? not motion?
11:50 AM rmu: what do you mean with "TRAJ context"? emsTrajSetSpindles is called only once after reading the ini file
11:51 AM rmu: if SPINDLES is a parameter in the TRAJ ini file section, the naming is at least consistent
11:52 AM rigid: i mean it's in the wrong place in the code.
11:52 AM rmu: but i don't have any strong feeling how that should be named
11:52 AM rigid: of course it's only called once, spindlecount is not changeable during runtime.
11:52 AM rmu: why is it in the wrong place?
11:52 AM rmu: this call has nothing to do with EMC_STAT
11:53 AM rigid: since amount of spindles/joint seem to relate more to motion instead of trajectory, wouldn't you say? what's the connection here?
11:53 AM rmu: spindle override etc.. also has "Traj" in the name
11:53 AM rigid: EMC_MOTION_STAT
11:54 AM rigid: it has to do with it since EMCMOT_MAX_JOINTS is used there and it already stores numExtraJoints
11:54 AM rmu: are you talking about spindles or about joints?
11:54 AM rigid: spindle override has to do with trajectory. spindlecount doesn't.
11:54 AM rigid: both
11:55 AM rigid: (s. my initial question)
11:55 AM rmu: if I understand you correctly you mean in general that configuration data has no place in the status updates?
11:55 AM rigid: rmu: doesn't https://github.com/LinuxCNC/linuxcnc/blob/master/src/emc/motion/motion.c#L53 look wrong to you?
11:56 AM rigid: rmu: ah, no. config in status updates is fine (if it's necessary)
11:56 AM rigid: i mean that joints and spindles should be moved from EMC_TRAJ_STAT to EMC_MOTION_STAT
11:57 AM rigid: with all consequences (including moving it in the ini file)
11:57 AM rigid: currently I can't access the number of configured joints/spindles from inside EMC_MOTION_* context. that sucks.
11:58 AM rigid: quite some resources are wasted like this
12:02 PM rmu: motion_num_spindles ... what is wrong with that?
12:03 PM rmu: where do you get at EMC_MOTION_STAT and not EMC_TRAJ_STAT?
12:04 PM rmu: either way the STAT stuff has nothing to do with usrmot
12:05 PM rmu: traj is part of EMC_MOTION_STAT https://github.com/LinuxCNC/linuxcnc/blob/master/src/emc/nml_intf/emc_nml.hh#L942
12:06 PM rmu: so if you have a EMC_MOTION_STAT structure TRAJ is right in there
12:12 PM rigid: rmu: motion_num_spindles and num_spindles seems redundant, don't you think?
12:13 PM rmu: static config data can probably be moved to somewhere else, yes
12:15 PM rigid: I didn't say STAT has anything to do with usrmot. just with motion.
12:16 PM rigid: I mean if this all looks right, I'm fine and leave it. Otherwise I'd just fix it without further investigation and see if tests break.
12:17 PM rmu: I'm confused :)
12:18 PM rmu: motion_num_spindles and num_spindles in motion.c indeed looks very suspicious
12:18 PM rigid: sorry, my english no good
12:19 PM rigid: rmu: take this: https://github.com/LinuxCNC/linuxcnc/blob/master/src/emc/nml_intf/emc.cc#L1851 don't you think it's bad to call update() EMCMOT_MAX_JOINTS times even if there are only say 3 joints?
12:20 PM rigid: you wouldn't update every joint linuxcnc could handle but just the ones that are actually configured.
12:21 PM rigid: but then you discover, that the amount of joints is stored in the wrong context.
12:21 PM rmu: rigid: if you would change that then the wire format of the NML message would change with number of joints, you wouldn't want that
12:22 PM rmu: cms update is the serialization / deserialization operation depending on context, so one time it is "in", one time it is "out"
12:22 PM rigid: no it wouldn't change. there's always EMCMOT_MAX_JOINTS space reserved. https://github.com/LinuxCNC/linuxcnc/blob/master/src/emc/nml_intf/emc_nml.hh#L943
12:22 PM rmu: boost serialize abuses the "&" operator in its place
12:23 PM rigid: but update() wouldn't be triggered.
12:23 PM rigid: not sure how expensive it is in comparision... but it probably is
12:24 PM rigid: but what I didn't see is that traj is part of EMC_MOTION_STAT, so it can be accessed
12:24 PM rigid: totally missed that. thanks
12:25 PM rigid: still numExtraJoints and motion_num_spindles look fishy
12:25 PM rmu: i really don't want to dive into that code again
12:26 PM rmu: in shared memory, the structure is fixed
12:26 PM rmu: but when serializing out into a stream that happens in these update functions IIRC
12:28 PM rmu: i.e. in xup, update calls into xdr_* functions from standard unix / posix RPC lib
12:30 PM rigid: hm, I didn't notice any difference when disabling/enabling calls to ...->update()
12:30 PM rigid: the data just isn't filled in at the receiving end when there's no update() call
12:31 PM rmu: rigid: try that over tcp
12:31 PM rigid: i do constantly
12:31 PM rigid: that's how I made it work :)
12:32 PM rmu: i guess it garbles everything that comes later
12:32 PM rigid: wdym?
12:32 PM rmu: NML / CMS doesn't really have any marking of the field that is serialized
12:32 PM rigid: only thing I see is backwards compatibility. both hosts need to run the same "protocol version" of course
12:33 PM rigid: no, there's no mark. it's just undefined
12:33 PM rmu: so if you call update for 3 joints on one end and for 9 joints on the other end --> boom
12:33 PM rigid: i.e. how it has been initialized in the NMLMsg constructor
12:33 PM rigid: why would you do that?
12:33 PM rmu: one end writes 3 joint structures into stream, the other end reads 9
12:34 PM rmu: that is what i mean with change of the wire format
12:34 PM rigid: you mean traj->joints differ?
12:34 PM rmu: you make your client dependent on config of server
12:34 PM rigid: if you write 3 joints, the other end always reads 3. never 9.
12:35 PM rigid: you wouldn't read the jointcount from the client config but from the EMC_TRAJ_STAT from the server
12:35 PM rigid: that's why it's there I guess
12:35 PM rmu: this https://github.com/LinuxCNC/linuxcnc/blob/master/src/emc/nml_intf/emc.cc#L1851 is doing MAX_JOINTS always
12:35 PM rigid: yep. that's excessive
12:36 PM rmu: you could change it to use number of configured joints but that is kinda fragile
12:36 PM rigid: should be "this->traj->joints" with enforcing a previous stat update I suppose.
12:36 PM rigid: fragile?
12:40 PM rmu: it introduces a dynamic into the wire format that is not there now ;)
12:40 PM rigid: afais it's more robust since it doesn't transfer invalid data
12:41 PM rmu: it isn't invalid, non-configured studd should all be 0
12:41 PM rmu: in shmem it probably reduces into a memcpy and doesn't really matter if its 100 bytes or 500
12:41 PM rigid: it's data for a joint that isn't there. any access would be an access to invalid data.
12:41 PM rigid: 0 would be invalid in this case
12:42 PM rigid: i don't think any dynamic is added. NML seems to always transfer the largest possible message
12:42 PM rmu: it is a waste of bandwidth if you communicate it via network, yes
12:42 PM rigid: also a waste of cycles since update() isn't exactly free
12:42 PM rmu: NML doesn't care, it transfers the stuff you update() in that order
12:43 PM rmu: and client/server side has to call those update()s in the exact same manner, number and sequence
12:44 PM rigid: yeah, well... client/server will always run the same codebase
12:44 PM rigid: sure, it's not backwards compatible but previous changes weren't aswell
12:44 PM rigid: everything protocol version related in the code is purely pro forma :)
12:44 PM rmu: just saying, if you make the on-wire format depending on configuration and that somehow gets out of sync...
12:45 PM rmu: but shouldn't happen
12:45 PM rmu: that stuff is much easier with flatbuffers
12:45 PM rigid: yeah, it's a problem that client/server share the same config. but it's solvable since the *Status objects hold the server config. And somehow it's elegant for the user to just copy the .ini
12:46 PM rigid: (...and do differing stuff in the .nml config)
12:46 PM rmu: there with "tables" you can serialize a field or you can skip it
12:46 PM rmu: if it is not required
12:46 PM rigid: yeah, you could even be backwards compatible then. or have separate codebases for stuff.
12:47 PM rmu: i know from experience with evil hacks around boost::serialize and c# that things tend do de-synchronise and cause surprises
12:47 PM rmu: s/things/code/
12:48 PM rmu: how do you handle excessive status updates over tcp?
12:48 PM rmu: with current configs you get those with 1kHz i think
12:48 PM rigid: well, xdr is rock solid. I wouldn't compare it to boost. But tests for various serialization options of libnml would really make sense.
12:49 PM rmu: no, that makes no sense
12:49 PM rigid: nope, I don't get 1kHz status updates for the xemc process
12:49 PM rmu: what is your "TRAJ" period configured to?
12:50 PM rmu: IIRC every run sends one update
12:50 PM rigid: disabling xdr and using something else in linuxcnc.nml totally makes sense. it's all there and working. breaking it unintentionally since there's no test is what doesn't make sense.
12:50 PM rmu: nobody is breaking anything
12:51 PM rmu: why would you replace xdr?
12:51 PM rigid: BASE_PERIOD = 35000, SERVO_PERIOD = 100000, anything else is default
12:51 PM rmu: that is the realtime threads
12:51 PM rigid: nobody is breaking anything <- famous last words
12:51 PM rigid: tests do make sense ;)
12:51 PM rmu: yes so write tests
12:52 PM rigid: well I did but people don't seem to like them
12:52 PM rigid: no mergy
12:52 PM rigid: when there's f**kin' code, then there's need for a f**kin' test to test it.
12:53 PM rmu: hmm.
12:53 PM rigid: 100% coverage or bust! :-P
12:53 PM rmu: i mean CYCLE_TIME in [TRAJ]
12:54 PM rigid: it's unset. what's the default?
12:54 PM rmu: don't know. the frequency of your status updates.
12:54 PM rigid: i only see CYCLE_TIME for [TASK] and [DISPLAY] in the docs
12:56 PM rmu: maybe i mean TASK
12:56 PM rmu: if that is set to 0.001 it should send status updates with 1kHz
12:57 PM rmu: or is there some limiter in CMS that limits messages over TCP?
12:57 PM rigid: [TASK] CYCLE_TIME is 1.00 but there's not going a status update every second over the network
12:57 PM rigid: oh wait, that's the wrong config
12:57 PM rmu: CYCLE_TIME 1.0 seems really long... doesn that work? that can only send one command per second to usrmot IIRC
12:58 PM rigid: CYCLE_TIME = 0.010
12:58 PM rigid: yeah, was wondering.
12:59 PM rmu: so how many status updates do you send? how do you configure that?
01:00 PM rmu: what happens with "slow" clients?
01:00 PM rmu: or hanging tcp connection?
01:00 PM rigid: wym? linuxcnc sends the status updates. I don't do anything :)
01:01 PM rmu: so how many status updates are there? you seemd to indicate less than 1 per second?
01:01 PM rigid: my wifi laptop<->rpi conn isn't exactly fast/low latency. really slow connections probably need to suck balls (like with any connection that's too slow for the job)
01:01 PM rmu: with CYCLE_TIME = 0.01 in task it should be 100 per second
01:01 PM rigid: dunno, how to check? I didn't wireshark for specific message types. But the preview moves smoothly.
01:02 PM rigid: much more smooth than with X11 forwarding btw.
01:02 PM rmu: buffering with slow connection probably bunches updates together
01:02 PM rigid: (via the same link)
01:02 PM rmu: you are the NML expert
01:02 PM rigid: yeah, i suppose there's a lot of black magic happening in CMS and the downstream network stack
01:02 PM rmu: just add debug on client
01:02 PM rigid: well.. no, i learned a lot about it in the last weegs
01:03 PM rmu: no there is no black magic in CMS
01:03 PM rigid: ks
01:03 PM rmu: it just really convoluted
01:04 PM rmu: with your pi, just try unplugging the network for say 10s and look what happens when you plug in again
01:04 PM rmu: or try randomly dropping a tcp packet from that connection now and then
01:06 PM rigid: hm... i saw quite some interesting stuff. can you explain this: https://github.com/LinuxCNC/linuxcnc/blob/master/src/libnml/cms/tcp_srv.hh#L44 ?
01:07 PM rigid: when I unplug the cable for more than 5s, a timeout will occur
01:07 PM rigid: although 5s is not universially configured. timeout values are wild in linuxcnc + UIs
01:08 PM rigid: randomly dropping a tcp packet would result in a resend? because... well, tcp?
01:08 PM rigid: more realistic would be UDP over perfect, ultrafast, low-latency link
01:09 PM rigid: but for UI you wouldn't really need a solid link. ideally.
01:10 PM rigid: didn't test that, yet. seems my wifi is stable enough
01:11 PM rmu: that macro MAX_TCP_BUFFER_SIZE is unused
01:12 PM rmu: yes it will resend, but may delay the whole shebang
01:12 PM rigid: hehe, or IS it? :)
01:12 PM rigid: search for 16
01:13 PM rmu: search for MAX_TCP_BUFFER_SIZE is nil
01:13 PM rigid: sure, that's expected behaviour. same if you pause one of the processes when running all locally.
01:13 PM rmu: hmm. if i pause the gui locally, it will just miss status updates
01:13 PM rigid: 16 is a widely used buffersize
01:14 PM rmu: if you pause the remote gui, status updates will be buffered
01:14 PM rigid: is missing status updates not the same thing as "delay the whole shebang"?
01:14 PM rmu: i meant it delays the tcp connection
01:15 PM rigid: i can kill the UI, restart it and status will be the same as before the kill
01:15 PM rmu: dropping random packets from the tcp connection will get all kinds of logic ("traffic shaping") involved that may slow down your tcp conn to a crawl
01:16 PM rigid: in theory, i should be able to kill the UI during a running job and just resume it. didn't try tho. initialization of the UI might mess it up.
01:16 PM rmu: those edge-case stuff is supposedly all handled in libraries like 0mq
01:16 PM rmu: don't know it that would work with axis, my gui can do that
01:16 PM rigid: dropped TCP packets sounds like hardware error or random bitflips due to solar winds
01:17 PM rmu: or just not enough bandwidth
01:17 PM rigid: i currently don't care for it (or rather solve it outside linuxcnc)
01:17 PM rmu: that will always happen occasionally if you go over the real internet
01:17 PM rigid: currently, i'd be happy if linuxcnc would do what's it's job.
01:17 PM rmu: or with WIFI
01:18 PM rigid: nah it wouldn't... you'd use dedicated tunnels if you're serious. if not, then dropped packets are just things you cannot avoid.
01:19 PM rmu: what should a tunnel achieve... if your connection is full packets will be dropped
01:19 PM rigid: i'm fine on wifi with my current patchset
01:19 PM rigid: tooltable doesn't work... I might want that at some place
01:20 PM rmu: with wifi you will get outages (for seconds) every time your friendly neighbourhood wifi wardriver comes around
01:20 PM rmu: or the SO is cooking microwave popcorn in a leaky microwave ;)
01:21 PM rmu: tooltable is a mess. talk to rene about that.
01:21 PM rigid: with a dedicated tunnel or T1 or SDSL or whatever the telco guys are chhosing, you get guaranteed bandwidth
01:21 PM rigid: they do firealarms and video surveilance that way for example
01:22 PM rmu: no harm in surveillance if you drop a frame occasionally
01:22 PM rmu: or with firealarms
01:22 PM rigid: i'll test a faulty connection. wanna know what happens
01:23 PM rigid: framedrops must not happen. not with surveilance. not with broadcast. not with streaming. in that order.
01:23 PM rmu: mesa fpga network comms also tolerate a dropped packet now and then, and that is really a dedicated link (direct nw cable between computer and mesa card)
01:23 PM rigid: talking about fortune 500 level. not your average blockbuster video cam :-P
01:24 PM rigid: rmu: with linuxcnc you'd use UDP and backplane networking. that's better than direct nw cable.
01:24 PM rigid: like PCIE -> PCIE
01:25 PM rigid: but the hobbyist can drive her china mill over wifi just fine, i guess. will try.
01:26 PM rmu: what do you mean with "better than direct network cable"
01:26 PM rigid: better in any aspect
01:26 PM rmu: surely you need some kind of signallinge in your "UDP over backplane networking"
01:27 PM rmu: ethernet has galvanic isolation, i guess you could achieve that with "PCIE over fiber" if thats a thing
01:28 PM rmu: but mesa ethernet hardware or beckhoff ethercat stuff is just a CAT6 away...
01:29 PM rmu: bandwidth of even 100mbit ethernet in the context of this simple CNC stuff is "unlimited"
01:29 PM rigid: rmu: https://docs.nvidia.com/drive/drive_os_5.1.6.1L/nvvib_docs/index.html#page/DRIVE_OS_Linux_SDK_Development_Guide/System%20Programming/sys_components_non_transparent_bridging.html
01:30 PM rigid: this will appear as NIC to the kernel
01:31 PM rigid: wouldn't make sense to use TCP here
01:31 PM rigid: that's UDP territory for ultrafast IPC
01:31 PM rmu: why would you even use UDP there
01:32 PM rmu: UDP also has overhead, it usually runs on top of IPv4 or IPv6 and has checksums
01:32 PM rigid: no overhead for unnecessary TCP safeguards
01:32 PM rigid: you think UDP has even remotely as much overhead as TCP?
01:32 PM rigid: tcp_joke_udp.jpg
01:32 PM rmu: so they seem to use SAS / SATA signalling
01:33 PM rigid: there are multiple implementations. it's not a standard.
01:33 PM rmu: UDP doesn't need ACK has no windows no slow start no nagle no retransmit no reordering
01:33 PM rmu: but it has checksums
01:34 PM rmu: some applications would prefer getting mangled packets than those packets being dropped
01:34 PM rigid: yeah, checksums are nice. PCIE has them, too.
01:34 PM rmu: so every now and then a new VOIP or video protocol pops up that is not UDP
01:35 PM rmu: on that PCIE direct connection stuff you would probably talk something like RDMA
01:36 PM rmu: but AFAIK that is more a buzzword than a protocol
01:36 PM rigid: and develop your own kernel driver?
01:36 PM rigid: i mean it's nice to have access from userspace
01:38 PM rmu: what would make sense for linuxcnc would be an investigation into useable userspace network stacks / ethernet drivers
01:38 PM rigid: what are the use cases for such low latencies in the linuxcnc realm?
01:39 PM rigid: i mean, even the highest res quadrature encoder can only move so fast
01:39 PM rmu: talking to mesa hardware with 4kHz and more
01:39 PM rmu: doing motor control in linuxcnc ;)
01:40 PM rmu: and you could really be sure that nothing in the kernel is interfering
01:40 PM rigid: bitbanging for that matter. there's kernel support for PWM
01:40 PM rigid: (and userspace)
01:41 PM rmu: bit-banging ethernet would be fun
01:41 PM rigid: iirc they do it with some ATtiny MCU
01:41 PM rmu: you could possibly avoid a bunch of buffering
01:42 PM rmu: (with userspace ethernet driver)
01:42 PM rigid: hehe https://github.com/osnr/rpi-bitbang-ethernet
01:42 PM rigid: linux network stack is pretty much as configurable as it gets
02:30 PM rmu: rigid: what i wanted to say regarding that tcp stuff, you should check that a blocked TCP connection doesn't also block CMS writing into it, that would block TASK and that would be problematic
02:37 PM rigid: nah, there's timeouts everywhere
03:14 PM -!- #linuxcnc-devel mode set to +v by ChanServ