#linuxcnc-devel Logs

Jul 06 2017

#linuxcnc-devel Calendar

08:04 AM skunkworks: ran all night with a 35us base period
08:18 AM jepler: skunkworks: that's great
08:19 AM skunkworks: I will try your hal with mesa 7i92 next
08:19 AM jepler: my own testing with the mesa stepgen feedback makes me believe the reported times a great deal
08:20 AM skunkworks: that would be a good 25khz step frequency..
08:21 AM skunkworks: jepler, I don't understand your sentence..
08:21 AM jepler: skunkworks: I believe the latency-test numbers are related to actual real-world nanoseconds
08:22 AM jepler: I was doubting it for awhile, when you reported consistent 5us jitter
08:22 AM skunkworks: ah - ok.
08:24 AM skunkworks: I bought a server off ebay a few weeks ago and it had a 300gb server grade ssd..
08:24 AM jepler: nice
08:24 AM skunkworks: now I have a nice solid drive with linuxcnc on it for testing different machines ;)
08:25 AM skunkworks: it is dell but made by samsung.
08:27 AM skunkworks: I don't know if it is isolcpus or idle=poll or both. I will also test that todya
08:27 AM skunkworks: today
08:28 AM skunkworks: without the kernal lines latency is usually around 50-100us on a normal system
08:28 AM jepler: that is more like I'm used to seeing
08:29 AM skunkworks: right - and that is what the matsuura was running with and it is working great with mesa ethernet cards.
08:29 AM jepler: did you see how to modify the kernel commandline when booting from the live iso?
08:29 AM jepler: for testing this
08:29 AM skunkworks: funny thing. some machines don't allow you to 'e' edit at the menu screen. Do you know why that would be?
08:30 AM skunkworks: when booting from the live image.
08:30 AM archivist: editor not set?
08:30 AM skunkworks: difference between secure boot and normal?
08:31 AM skunkworks: the grub? menu on the livecd normally has an option to edit the kernel line and such. Sometimes that menu option isn't there
08:31 AM jepler: skunkworks: that could be related, there are two different loaders (grub for non-secure UEFI and syslinux for BIOS). I am more familiar with seeing the syslinux one, and the edit key is not "e", its "ctrl-i" (and the shortcut is not displayed on the screen)
08:32 AM skunkworks: ah- cool! thanks
08:32 AM jepler: (and the screens of grub and syslinux look totally different)
08:32 AM skunkworks: yes
08:32 AM jepler: I don't know what technical limitation requires that they have two different bootloaders, since GRUB works fine with BIOS once installed..
08:33 AM jepler: I updated the README about bootloader keys https://github.com/jepler/stretch-live-build/blob/master/README.md#important-notes
08:34 AM skunkworks: cool - thanks
09:42 AM mozmck: I tested yesterday using latency-histogram on a J1800 mini-pc. I can't tell that idle=poll makes any difference on this machine.
09:42 AM mozmck: but isolcpus definitely does.
10:22 AM skunkworks: mozmck, interesting
10:23 AM skunkworks: pcw_home, was saying that he though isolcpus didn't have much effect on his hardware
10:23 AM mozmck: Yes. Now I need to figure out a way to determine which cpus to isolate on various hardware.
10:23 AM skunkworks: mozmck, did you isolate 2 cores? what latency did you get?
10:24 AM mozmck: The J1800 only has two cores, and I isolated #1
10:24 AM skunkworks: I use hwloc to visualize what cache is paired with what cores
10:24 AM jepler: $ egrep '^processor|^core id|^$' /proc/cpuinfo
10:24 AM skunkworks: oh- yah
10:24 AM jepler: isolate the other "processor" with the same "core id" as the highest numbered processor
10:25 AM pcw_home: I have only tried it on a core Duo and a G3258
10:25 AM pcw_home: The core duo was about 30 usec either way The G3258 about 7 usec either way
10:25 AM mozmck: I'm getting max latency of 30 to 40 I ran all night last night with 6 glxgears and firefox and sheetcam running and got 43
10:25 AM mozmck: I have gotten spikes over a hundred on this same computer before isolcpus
10:26 AM mozmck: jepler, so does that work even with HT turned off? I saw something about the numbering possibly being different on some machines.
10:26 AM skunkworks: jepler, that is easy enough
10:27 AM skunkworks: processor : 0
10:27 AM skunkworks: core id : 0
10:27 AM skunkworks: processor : 1
10:27 AM skunkworks: core id : 0
10:27 AM skunkworks: processor : 2
10:27 AM skunkworks: core id : 1
10:27 AM skunkworks: processor : 3
10:27 AM skunkworks: core id : 1
10:27 AM skunkworks: so 2 and 3
10:28 AM jepler: I have systems where it's 3 and 7; and 5 and 11 (both are core i7)
10:29 AM skunkworks: really? weird
10:29 AM mozmck: interesting. My AMD 8-core shows processor and core id the same for all 8 cores - 0 - 7
10:30 AM jepler: mozmck: can you paste the full hwloc and and /proc/cpuinfo?
10:30 AM jepler: some AMD CPUs have a technology similar to HT but by another name
10:30 AM mozmck: processor : 0
10:30 AM mozmck: core id : 0
10:30 AM mozmck: processor : 1
10:30 AM mozmck: core id : 1
10:30 AM mozmck: processor : 2
10:30 AM mozmck: core id : 2
10:30 AM mozmck: processor : 3
10:30 AM mozmck: core id : 3
10:30 AM mozmck: processor : 4
10:30 AM mozmck: core id : 4
10:30 AM mozmck: processor : 5
10:30 AM mozmck: core id : 5
10:30 AM jepler: please pastebin the whole thing, not the grepped part. there might be a different clue available in the full cpuinfo
10:31 AM mozmck: will do
10:34 AM jepler: afk
10:34 AM mozmck: https://pastebin.com/1847qRn1
10:35 AM mozmck: https://pastebin.com/WKkD7MrJ
10:56 AM jepler: OK so that's a Piledriver CPU, similar in design to Bulldozer
10:56 AM jepler: there are pairs of "modules" that share resources https://en.wikipedia.org/wiki/Bulldozer_(microarchitecture)#/media/File:AMD_Bulldozer_block_diagram_(8_core_CPU).PNG
10:56 AM jepler: but I don't see this information reflected in either output
10:57 AM mozmck: huh. well, I'm not using this one to run a machine - it's my dev machine
10:57 AM mozmck: computer that is.
10:58 AM jepler: it'll either be 3,7 or 6,7
10:58 AM mozmck: So I need to isolate cpus in pairs if they share resources then? And the realtime still just uses the highest numbered CPU?
10:58 AM jepler: what kernel are you running anyhow?
10:59 AM jepler: stretch's 4.9 or something else?
10:59 AM mozmck: 3.18.x for machines.
10:59 AM jepler: apparently there was a regression "since 4.6" discussed in 2016. maybe the fix hasn't trickled in yet. https://patchwork.kernel.org/patch/9453057/
11:00 AM mozmck: 3.19 on this piledriver computer
11:10 AM pcw_home: 4.X are not really safe on older core Duo machines where intel 915 video issues in the 4.x kernels cause random crashes
11:11 AM pcw_home: maybe once a week
11:11 AM jepler: rt-tests package has a program called hackbench
11:11 AM jepler: on my intel systems it seems good at "detecting" HT pairs, you might try it for detecting piledriver "modules"
11:12 AM jepler: back on my 4C/8T machine,
11:12 AM jepler: $ for CPUSET in 0,7 3,7 6,7; do printf "%s " $CPUSET; taskset -c $CPUSET hackbench -T 2 -l 1000 | tail -1; done
11:12 AM jepler: 0,7 Time: 2.383
11:12 AM jepler: 3,7 Time: 2.690
11:12 AM jepler: 6,7 Time: 2.397
11:12 AM jepler: this agrees with my interpretation that 3 is the HT doppleganger of 7
11:12 AM jepler: doppelganger
11:20 AM mozmck: Thanks for the information. If I get a working script for setting isolcpus for random hardware, would that be something more widely useful?
11:21 AM jepler: it's at least worth posting
11:21 AM mozmck: ok.
12:03 PM mozmck: I haven't compiled hackbench to try that yet, but lscpu -p may work. It reports cpu 0,1 with core id 0, cpu 2,3 with core id 1, and etc.
12:07 PM mozmck: The hackbench script returns the following:
12:07 PM mozmck: 0,7 Time: 3.333
12:07 PM mozmck: 3,7 Time: 3.927
12:07 PM mozmck: 6,7 Time: 4.126
12:07 PM mozmck: Which fits with the lscpu -p output
12:09 PM mozmck: hmm, each run gives different results, so I don't know about that. Oh! I'm not running a preemt-rt kernel on this box either - duh!
12:10 PM jepler: hm I'd expect the results to be close, as long as there's not much other CPU load
01:51 PM skunkworks: jepler, I don't know if this is something with the livecd.. the 2 usb wireless dongles I plugged in don't connect to any wireless routers
01:51 PM skunkworks: I get [ 8722.476293] wlx00400c002883: aborting authentication with 00:1d:7e:dd:e6:ea by local choice (Reason: 3=DEAUTH_LEAVING)
01:51 PM skunkworks: in dmesg.
01:52 PM skunkworks: I am googling but so far have not figured it out
01:52 PM skunkworks: I could be this computer - I have not tried other computer hardware yet.
02:00 PM jepler: I haven't tried any USB wireless dongles myself
02:11 PM skunkworks: it isn't my routers - it won't connect to my phone hotspot either
02:11 PM skunkworks: wierd
02:11 PM skunkworks: weird. going to try a different computer
02:18 PM jepler: it could totally be poor support for the dongle
02:19 PM skunkworks: could be.. but 2. Unless it just happens be the same chipset.. I would be very supprised though. (although I am unlucky like that
02:38 PM pcw_mesa: Ive been happy with AX88178/179 based Ethernet dongles
02:39 PM pcw_mesa: old chip so work with almost any OS
02:46 PM pcw_mesa: what wifi donges? Ive had decent luck with the cheap Edimax 7811un ones (ralink?)
02:46 PM skunkworks: seems stretch has issues with the rtl
02:47 PM skunkworks_: this is pretty much exactly what issue I am having
02:47 PM skunkworks_: http://forums.debian.net/viewtopic.php?f=10&t=131621
02:54 PM skunkworks: The modprobe didn't work
02:54 PM pcw_mesa: I'll have to try Stretch, The Edimax ones I have use a RTL8192cu
02:54 PM pcw_mesa: does look like recent breakage
02:57 PM jepler: you may be able to achieve something similar to editing /etc/modules/blacklist but in a way that will be effective on the live CD: add modprobe.blacklist=rtl8xxxu
03:02 PM skunkworks: This ended up fixing it.. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=842422
03:02 PM skunkworks: very last message
03:03 PM skunkworks: wifi.scan-rand-mac-address=no in the /etc/NetworkManager/NetworkManager.conf
03:03 PM skunkworks: (under [device] )
03:04 PM skunkworks: ugh - I am going to have to join the debian forums...
03:07 PM skunkworks: huh - skunkworks isn't taken as a user name..
03:10 PM jepler: too long?
03:11 PM skunkworks: I mean - someoone isn't already using it.
03:11 PM jepler: oh oh
03:11 PM jepler: I was imagining you running the debian installer and getting criticized
03:11 PM skunkworks: my work here is done... ;)
03:11 PM skunkworks: http://forums.debian.net/viewtopic.php?f=10&t=131621&start=15
03:13 PM jepler: I have no idea what wifi.scan-rand-mac-address is or does
03:13 PM jepler: but .. good ?
03:14 PM skunkworks: No clue either.. but it fixed it for me./
03:14 PM skunkworks: I should try the other dongle now
03:14 PM jepler: you can do that on the live CD after booting?
03:15 PM skunkworks: hmm - I don't know. Would restarting the network service restart that manager? I could tryit
03:16 PM jepler: https://blogs.gnome.org/thaller/2016/08/26/mac-address-spoofing-in-networkmanager-1-4-0/
03:18 PM jepler: seems like it's an attempt to make identifying your wireless card as the one that scanned for a wifi access point a little more difficult
03:22 PM skunkworks: yes - you can get it to work on the livecd
03:22 PM skunkworks: you have to restart the network-manager
03:26 PM skunkworks: jepler, how hard would it be to add that to the image?
03:34 PM skunkworks: bbl
03:40 PM jepler: skunksleep: umm .. seven ?
03:45 PM jepler: real answer: probably not hard for live, but hard for installer (if installer is affected, I don't think it uses network-manager but icbr)
03:50 PM jepler: we should create a github issue where people can just comment on it to participate on IRC ;-)
03:51 PM mozmck: :-) The IRC notification for github comments is nice.
04:24 PM skunkworks: jepler, it effects the livecd boot and the install
04:41 PM jepler: skunkworks: ok, boo
04:41 PM jepler: unless you find out that it doesn't affect the official installer I am going to assume it's beyond my control
05:03 PM skunkworks: I am sure it is an issue with any install of stretch
05:07 PM pcw_mesa: looks like ubuntu 17.04 also
05:09 PM jepler: it's linux, I'm sure they'll just blame people with inferior hardware and move on. sigh.
05:42 PM jepler: the users list is soooo not the place for the discussion of RTOS or communication bus choice
05:59 PM pcw_mesa_ is now known as pcw_mesa
06:14 PM Tom_itx: is github.com/micges still the place to get the latest mesaflash?
06:14 PM seb_kuzminsky: yes
06:14 PM Tom_itx: thanks
06:14 PM seb_kuzminsky: or get prebuilt debs from www.linuxcnc.org
06:15 PM Tom_itx: i've built it, just making sure there wasn't a newer site
07:55 PM jepler: I've been playing with clang's "address sanitizer" at $DAY_JOB and hooeeee it's found some nice bugs already
07:56 PM jepler: but it makes me wonder, is this the equivalent of putting the armor on the planes where the ones that made it back to base were all shot up?
08:14 PM jepler: huge swathes of linuxcnc's testsuite pass as long as you export ASAN_OPTIONS=detect_odr_violation=0:detect_leaks=0
08:17 PM jepler: most of the failures are due to
08:17 PM jepler: Q: When I link my shared library with -fsanitize=address, it fails due to some undefined ASan symbols (e.g. asan_init_v4)?
08:17 PM jepler: A: Most probably you link with -Wl,-z,defs or -Wl,--no-undefined. These flags don't work with ASan unless you also use -shared-libasan (which is the default mode for GCC, but not for Clang).
08:32 PM jepler: and after mucking around with that, even more passes
08:32 PM jepler: totally different than $DAY_JOB, in which I had to fix 5 problems before initialization would even complete
08:36 PM skunkworks: I have no clue what you are saying
08:36 PM jepler: skunkworks: that's fine
08:36 PM jepler: skunkworks: "address sanitizer" is a system for finding bugs in memory use of C/C++ programs
08:37 PM jepler: skunkworks: I was recently using it at $DAY_JOB and I found severe problems
08:37 PM jepler: skunkworks: I figured LinuxCNC would also have severe problems, but for the most part it doesn't
08:42 PM Tom_itx: well that's reassuring that you write good code for us :D
08:42 PM skunkworks: cool - thanks for constantly trying to improve linuxcnc
08:43 PM jepler: there's another tool for finding bugs in memory use of C/C++ programs I've used on both $DAY_JOB and LinuxCNC, and it has found lots of bugs in both.
08:43 PM jepler: ("valgrind")
08:43 PM jepler: many of the things that "address sanitizer" finds, "valgrind" does too; but some things "address sanitizer" is better at. It just happens that we have lots of that kind of bug in $DAY_JOB, I think
08:44 PM jepler: Runtest: 181 tests run, 180 successful, 1 failed + 0 expected
08:44 PM jepler: Failed:
08:44 PM jepler: /home/jepler/src/linuxcnc/tests/build/ui
08:44 PM jepler: and that one is sorta-expected because it is trying to build a UI program but is unaware of the extra compiler flags that are needed to enable "address sanitizer"
10:35 PM jepler: ==25552==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fffffff8d00 at pc 0x0000064a1ca0 bp 0x7fffffff8b60 sp 0x7fffffff8320
10:35 PM jepler: READ of size 5 at 0x7fffffff8d00 thread T0
10:35 PM jepler: #0 0x64a1c9f in __interceptor_strcmp (/home/jepler/src/2017/bin/linux64-asan/sds2.real+0x64a1c9f)
10:35 PM jepler: #1 0x7ffff7be6c9c in CompareStringKeys /home/jepler/src/tcltk85-for-sds2-2016-0~13/tcl8.5.9/unix/../generic/tclHash.c:864
10:35 PM jepler: this next one will turn out to be the same thing, it's about a supposed out of bounds access, but from a function which is not instrumented
10:36 PM jepler: so the metadata that asan expects to be there about the stack, isn't..
10:36 PM jepler: oops, wrong irc ;-)