#linuxcnc-devel Logs

Nov 13 2018

#linuxcnc-devel Calendar

04:24 AM rmu: indentation of change_prolog in 5.3 of http://linuxcnc.org/docs/html/remap/remap.html#_remapping_tool_change_related_codes_t_m6_m61 is messed up
09:43 AM rmu: dgarr: curious about what?
09:54 AM dgarr: statistics of .time pins
10:05 AM rmu: the strange thing is there is some periodicity to it
10:05 AM rmu: will have to dig in the code where these values are assigned
10:13 AM rmu: i'm seeing something close to 10hz modulating the times
10:28 AM rene_dev_: Dgarr thanks for your explanation, I get that
10:28 AM rene_dev_: but it doesn’t answer my question :D you can do exactly the same with floats
10:29 AM rene_dev_: Rmu what is the problem you are having?
10:29 AM rene_dev_: dgarr I do the same in the stmbl, the servo never moves on enable
10:31 AM rmu: rene_dev_: no real problem, just reevaluating the whole raspberry pi situation ;)
10:33 AM rmu: rene_dev_: servo thread execution usually takes around 240-300usec, but this time is modulated with about 10Hz and goes up to 600usec
10:33 AM rmu: 700 even
10:33 AM rmu: all while idle
10:34 AM rmu: classicladder alone takes 100usec which seems excessive
10:35 AM rene_dev_: I think each comp has a max time pin
10:35 AM rene_dev_: so you can see which one it is
10:37 AM rene_dev_: if it is spread across all comps, there is another irq preempting it
10:38 AM rmu: there is no tmax pin, but time show last execution time if i'm right
10:38 AM rene_dev_: try changing the servo thread period, see if that changes the 10hz
10:38 AM rmu: and it spreads, looks like whole core grinds to a halt
10:39 AM rmu: no messages about throttling in dmesg, cpu clock governor is performance
10:40 AM rmu: other IRQ preempting should not slow down all functs
10:41 AM rene_dev_: yes, if the irq fires at a different frequency, and hits a different function every time
10:41 AM rmu: i am plotting the time in halscope
10:42 AM rmu: so i see each servo period, and the symptom is, a "long" servo period has all functs being slow
10:44 AM rene_dev_: could be usb related, the PI doesn’t have a real isb host controller, so a lot of usb stuff is done in software. I recall something about usb irq issues on the pi
10:45 AM rmu: no hw irqs fire on that core according to /proc/interrupts
10:46 AM rene_dev_: Hmm, ok
10:46 AM rmu: https://paste.ubuntu.com/p/vdTF5bMbcb/
10:46 AM rmu: i suspect memory/cache issues
10:48 AM rene_dev_: Ah, so it does fire on another cpu. Sounds like a cache/Bus issue
10:48 AM rmu: the usb stuff is happening on core 0
10:51 AM rmu: will experiment with to other system
10:52 AM rene_dev_: Try without usb, but that requires using a serial console
10:59 AM rene_dev_: Maybe the rt patch doesn’t patch the usb irq properly?
11:01 AM pcw_home: There's definitely something busted with the USB IRQ on RPI (look at it with top...)
11:03 AM pcw_home: You see similar behavior on X86 I also think it may be bus contention/memory allocation/cache related
11:06 AM pcw_home: 10 hz may be axis GUI update rate
11:08 AM pcw_home: on the RPI I see the latency get much worse when you do large GUI ops like moving windows
11:10 AM pcw_home: Though I have had the RPI up and running Axis/SPI hardware for about 4 months now with no RT errors so there does seem to be an upper bound to the latency
11:25 AM rmu: pcw_home: in this case i was running "headless" with remote-display through ssh
11:32 AM rmu: somehow it is strange that execution time varies nearly 3-fold
11:39 AM pcw_home: Yes I've wondered about similar artifacts on X86 (motion command handler on this machine normally takes a < 1 usec but has peaks of ~60 usec)
11:40 AM rene_dev_: Keep in mind that ethernet on the pi puts a lot of load on the usb
11:41 AM pcw_home: Im not using Ethernet on mine and USB usage is still crazy
11:41 AM pcw_home: good portion of CPU servicing USB interrupts
11:42 AM pcw_home: I think it maybe related to a quick fix of a USB related crash on Preempt-RT
11:44 AM rmu: the real-time core is isolated and should not do anything at all except realtime stuff
11:45 AM pcw_home: but the core may have to wait for shared hardware
11:47 AM rmu: memory/cache?
11:47 AM rmu: does the realtime stuff call any syscalls?
11:48 AM rmu: logging/debug stuff? perhaps it is a kernel issue
11:56 AM pcw_home: Not sure, I do know cache size does have an effect
11:57 AM pcw_home: http://freeby.mesanet.com/mch.png motion command handler very peaky latency
11:59 AM pcw_home: err time not latency
12:23 PM rene_dev_: rmu pcw_home I guess the usb IP and some other stuff share the bus, so the cpu has to wait
12:24 PM rene_dev_: for the stm32 there is a bus matrix, where you can see which busses are connected to what
12:24 PM rene_dev_: there is some way to see such a block in linux, but cant remember how
12:25 PM rene_dev_: the real problem is that the synopsis otc core is really really stupid, so what a normal host controller does in hardware, all has to happen in the driver
12:25 PM rene_dev_: otg
12:25 PM rene_dev_: in fact, its the same core as on the stm32 ;D
12:26 PM rmu: found out something: i used nohz_full=3, that means that the isolated realtime core receives no scheduler ticks
12:26 PM rmu: but that seems to change RCU behaviour and may have effects on syscalls
12:27 PM rmu: without that parameter it looks a little bit less insane
12:52 PM pcw_home: That's my experience also (on X86) nohz seems to worsen latency vs periodic tics
01:11 PM rene_dev_: rmu did you ever test the stmbl?
01:26 PM rmu: rene_dev_: not yet
01:27 PM rmu: rene_dev_: more pressing stuff interrupted
01:27 PM rene_dev_: yeah, same here :(
01:28 PM rmu: i need to get that damn toolchanger going
01:29 PM rene_dev_: are you also desperatly waiting for my pocket number fix? :O
01:30 PM rmu: hehe, yes, among other stuff
01:31 PM rmu: the PLC stuff is insane
01:31 PM rmu: the manual toolchange button is not easy to integrate with the automatic stuff
01:32 PM rene_dev_: what plc?
01:32 PM rmu: the integrated classicladder ;)
01:32 PM rene_dev_: I never used classicladder
01:33 PM rmu: i also have some prehistoric "siemens" s200 lying around but i shoot something befor i use them
01:33 PM rmu: that was probably not a proper english sentence ;)
01:34 PM rmu: the whole concept is insane if you have more than 10 rungs
01:35 PM rmu: the classicladder editor has some warts
01:36 PM rmu: and the file it generates is not for human consumption, non-graphical maintenance of ladder diagrams is not possible
01:39 PM rene_dev_: I just code whatever I need in c as a comp
01:44 PM rmu: https://unfoo.net/~robert/lcnc/Screenshot_20181113_204124.png
01:44 PM rmu: yeah, will do that
01:44 PM rmu: more patterns ;)
01:45 PM rmu: this is with a 2khz servo period and no "real" hardware (except 7i90) on a rpi3b (without +)
01:47 PM rmu: oh forget that picture that was "ondemand" governor
01:58 PM pcw_mesa: A behavior that suggests some of the latency issues are caching related is that the average latency decreases with thread period
02:03 PM jepler: oh yes if the CPU frequency is changing that sure could explain the parts that look downright bimodal
02:04 PM jepler: I think classicladder is by default set up so that it doesn't recalculate every servo cycle, but I'm not convinced this helps in general (putting it on a different thread might, since the faster threads could preempt it then)
02:12 PM rmu: classicladder seems to update with 1khz even if called with 2 or 4 khz
04:42 PM CMorley: rmu: You are right, comment from module_hal from classicladder:
04:42 PM CMorley: // This actually does the magic of periodic refresh of pins and
04:42 PM CMorley: // calculations. This function runs at the period rate of the thread
04:42 PM CMorley: // that you added it to.
04:42 PM CMorley: // period, leftover, t0,and t1 are in nanoseconds.
04:42 PM CMorley: // This function first checks to see if at least 1 millisecond has gone by
04:42 PM CMorley: // if the period is under 1 MS then if will not refresh rungs yet but
04:42 PM CMorley: // will keep track of how many NS were left over. Does this each period
04:42 PM CMorley: // till at least 1 MS has occurred, if more then 1 MS then keeps track of
04:42 PM CMorley: // leftover NS for accuracy. Bottom line is you can run classiclader in
04:42 PM CMorley: // a thread faster than 1 millisecond but it will not refresh the rungs
04:42 PM CMorley: // any faster (it can be slower though). If your refresh is too slow and
04:42 PM CMorley: // your timer are using multiples of 100 microseconds they might not be accurate.
04:42 PM CMorley: // t0 and t1 are for keeping track of how long the refresh of sections,
04:42 PM CMorley: // and HAL pins take (it is displayed in the 'section display' GUI (in microseconds).
05:02 PM mozmck: I don't normally run the runtests, but did just now and got a number of seg faults running halrun - is that normal on a machine with no realtime kernel?
05:03 PM mozmck: bbl
05:12 PM jepler: mozmck: no that's not normal, assuming you have a uspace build.
06:02 PM mozmck: jepler: yes, it's uspace. I'm running it on master and getting lots of failures. This is a rip build, but I also have linuxcnc 2.7 installed from packages. I'm running on linuxmint 17.3
06:02 PM mozmck: Do you think having the packages installed could be causing problems?
06:05 PM seb_kuzminsky: nah that shouldn't be a problem
06:07 PM mozmck: Well, I'm probably not doing something right then
06:12 PM seb_kuzminsky: i've never tried to run linuxcnc on mint 17.3
06:13 PM mozmck: I've run it on that and xubuntu 14.04 a *lot*
06:13 PM mozmck: But I don't normally run the runtests
06:14 PM seb_kuzminsky: i'm surprised linuxcnc runs fine, but the tests segfault
06:15 PM mozmck: Here is one error from ccomp/early-exit: + linuxcnc -r test.ini
06:15 PM mozmck: can't find package Linuxcnc
06:15 PM mozmck: while executing
06:15 PM mozmck: "package require Linuxcnc "
06:15 PM seb_kuzminsky: is it the packaged 2.7 that works, and rip master that fails? try running the tests on rip 2.7?
06:15 PM mozmck: I should try that.
06:15 PM seb_kuzminsky: hmm, did you remember to run rip-environment in the shell where you run runtests?
06:16 PM mozmck: I uninstalled the packaged 2.7 and the master runtests are still failing - some of them. Anywhere from 20 to 64 each time.
06:16 PM mozmck: yes.
06:16 PM seb_kuzminsky: well that's odd
06:17 PM mozmck: That's what I be thinking!
06:17 PM seb_kuzminsky: "package require Linuxcnc" is a tcl thing
06:17 PM seb_kuzminsky: do you have all the tcl stuff installed? (as indicated by dpkg-checkbuilddeps)
06:17 PM seb_kuzminsky: did you configure & build for the tcl version that's installed?
06:18 PM mozmck: I'm sure I did.
06:18 PM mozmck: make clean; ./configure --with-realtime=uspace; make -j8; etc...
06:18 PM seb_kuzminsky: try:
06:19 PM seb_kuzminsky: git clean -fdx ..
06:19 PM seb_kuzminsky: ./autogen.sh
06:19 PM seb_kuzminsky: then what you said
06:19 PM seb_kuzminsky: err
06:19 PM seb_kuzminsky: only run git clean if you don't have anything you care about in your repo
06:19 PM seb_kuzminsky: anything uncommitted, that is
06:19 PM mozmck: hmm, I'll check that.
06:20 PM mozmck: I did ./autogen.sh once at least
06:28 PM mozmck: strange - that did not help. I can't run linuxcnc either - and I did it just the other day with master
06:28 PM mozmck: dpkg-checkbuilddeps: error: cannot read debian/control: No such file or directory
06:31 PM seb_kuzminsky: cd debian; ./configure uspace; cd ..
06:32 PM mozmck: oh - duh! git clean wiped that out didn't it
06:32 PM mozmck: yeah, builddeps are all met - I did that the other day
06:33 PM mozmck: ok, I just built and ran it in another directory - I wonder if the problem is the directory this repo is in
06:34 PM seb_kuzminsky: did it work?
06:34 PM seb_kuzminsky: after the git clean and the total rebuild?
06:34 PM mozmck: I have a repo in another directory and it works there.
06:35 PM mozmck: Did not work in the one I was using after the git clean and total rebuild though.
06:35 PM seb_kuzminsky: different commits? uncommitted changes somewhere?
06:35 PM mozmck: Nope, both tip of master
06:36 PM mozmck: I have to go eat - I'll look at this more later. Seems really odd.
06:36 PM seb_kuzminsky: yeah, something's unusual about that
06:36 PM seb_kuzminsky: ok, talk to you later
07:06 PM mozmck: seb_kuzminsky: is there a way to clean better than make clean but maybe not as thorough as git clean -fdx?
07:15 PM rene_dev_: seb_kuzminsky whats #5400 or #_current_tool on startup?
07:15 PM rene_dev_: I know what it is, but Im looking for documentation backing that up.
07:29 PM mozmck: I wonder if it could be memory errors on my computer? here is another error from near.0: rtapi_shmem_new failed due to shmget(key=0x48535430): Invalid argument
08:00 PM mozmck: hmm, 2.7 runtests all ran without errors
08:04 PM mozmck: I did find that my ulimit was only 64, and I increased that to 20480