#linuxcnc-devel | Logs for 2016-06-23

Back
[01:08:44] <seb_kuzminsky> 64 bit works too
[06:22:00] <skunkworks> zlog
[07:17:25] <jepler> whee, installing 4.1 and 4.6 kernels from backports
[07:50:42] <jepler> and my results, but the latency-tests were short in duration: https://emergent.unpythonic.net/01466682649
[07:57:11] <skunkworks_> they appeared again?
[08:00:41] <skunkworks_> I want to test the rtai 64bit version soon - but currently my only 64bit system here is bulding a 20TB array.
[08:12:28] <jepler> skunkworks_: afaik stuff never disappears from "snapshots" but but it is inconvenient to install
[08:22:42] <jepler> https://emergent.unpythonic.net/files/sandbox/0001-latency-histogram-set-pixel-size-of-window-explicitl.patch
[08:22:54] <jepler> that's why sometimes latency-histogram comes up a weird size -- because X has a weird DPI setting.
[08:54:52] <seb_kuzminsky> jepler: i'm glad to hear they all worked so well
[08:55:37] <seb_kuzminsky> my jessie-rtai-i386 machine locked up solid during the night on the 6th loop through 'git clean; make; runtests'
[09:29:47] <jepler> seb_kuzminsky: :-/
[09:29:49] <seb_kuzminsky> and the amd64 rtai machine locked up too, boo
[09:30:48] <seb_kuzminsky> i ran latency-histogram overnight on a 32-bit rtai machine (3.16, 5.0~test1) with ok results and no lockup
[09:37:41] <cradek> have we ever seen rtai actually work on amd64?
[09:37:48] <jepler> sure, I ran a system that way
[09:37:54] <jepler> I did the initial 64-bit port of linuxcnc that way
[09:38:00] <jepler> it was a very long time ago!
[09:38:09] <cradek> ah, I thought you declared that one unstable after a while
[09:38:15] <cradek> yeah, long time
[09:38:29] <jepler> I don't really remember the details
[09:45:29] <seb_kuzminsky> we had an experimental hardy-rtai-amd64 platform, it ran fine in the buildbot
[09:46:32] <cradek> iirc, that's what jepler used. that may have been the last time we saw rtai/amd64 work reliably
[09:49:00] <jepler> yeah that would have been about the right age
[12:28:26] <CaptHindsight> Alec got rtai working on amd64 over two years ago
[12:31:03] <seb_kuzminsky> i wonder if memleak would be interested in helping debug the rtai-on-VMs problem i'm having
[12:31:13] <seb_kuzminsky> it's using the rtai.org mainline, not his fork
[12:31:27] <seb_kuzminsky> http://mail.rtai.org/pipermail/rtai/2016-June/027282.html
[12:31:54] <seb_kuzminsky> i see the problem on both 32 and 64 bit
[12:33:29] <CaptHindsight> I can ask but my prediction is that he'd say that paulo's tree is broken and he doesn't want to waste anymore time dealing with it or him
[12:38:40] <CaptHindsight> why use mainline RTAI when memleak spent the last few years creating a stable tree just for you?
[13:01:27] <lair82> Would anybody know how to fix the problem I have ran across twice now, I have both of the VMC's that are running Linuxcnc connected to our network, so the programmer can drop the part programs in the nc_files folder in the respective machine, directly from his desk in the office. on both wheezy machines, if you load a program to run on the CNC, then edit the program at the control, then save it, you can no longer access that
[13:01:28] <lair82> same program from the PC in the office that you just dropped the program from.
[13:15:17] <cradek> how do the file permissions/owner/group change when you do this?
[13:18:35] <seb_kuzminsky> lair82: are you using nfs?
[13:22:42] <seb_kuzminsky> CaptHindsight: i'm worried about fragmentation in the rtai community, that the more we go off on our own the fewer people there will be that understand our rtai and can maintain and fix it
[13:23:43] <seb_kuzminsky> i respect memleak's work and all the effort he's put in, and i appreciate that it's hard to work with the rtai.org mainline
[13:31:41] <lair82> -rw-r--r-- 1 rick rick 251964 Jun 23 13:52 RICK-TEST.ngc
[13:31:49] <lair82> -rw-r--r-- 1 greenmill greenmill 251958 Jun 23 14:00 RICK-TEST.ngc
[13:31:58] <lair82> -rw------- 1 greenmill greenmill 251970 Jun 23 14:08 RICK-TEST.ngc
[13:32:27] <lair82> cradek, this how it transferred through the two pc's
[13:33:21] <lair82> first one in on my desktop, second is right after I moved it to the cnc, third is after I loaded it to the cnc, edited it, and saved it.
[13:33:56] <lair82> seb_kuzminsky, no I just have samba set up and share it through our network.
[13:34:00] <seb_kuzminsky> lair82: on the cnc, what's the output of 'umask'?
[13:34:25] <lair82> run that while in the nc_files directory?
[13:34:35] <seb_kuzminsky> run it anywhere
[13:34:43] <seb_kuzminsky> in a terminal on the cnc
[13:34:51] <lair82> 0022
[13:35:00] <lair82> greenmill@greenmill:~/linuxcnc/nc_files$ umask
[13:35:00] <lair82> 0022
[13:35:00] <lair82> greenmill@greenmill:~/linuxcnc/nc_files$
[13:35:12] <seb_kuzminsky> huh, that should be ok
[13:35:31] <seb_kuzminsky> what editor did you use?
[13:35:43] <lair82> whatever gmoccapy is using
[13:35:56] <lair82> that probably explains it all
[13:36:35] <seb_kuzminsky> there's a simple bandaid fix to get you going until we find the root cause
[13:37:07] <seb_kuzminsky> run "chmod 644 RICK-TEST.ngc" to turn read permissions back on for everyone
[13:37:08] <lair82> Punch line?
[13:39:05] <seb_kuzminsky> http://qwantz.com/index.php?comic=815
[13:39:22] <lair82> It's not a huge concern for me right now, they don't run either machine every day, but I never know if it is just something I managed to cause, or if there is actually a problem, and now that I have two machines fully functioning, and they are showing the exact same results, I figured I would say something
[13:39:37] <seb_kuzminsky> yeah
[13:40:11] <seb_kuzminsky> it looks like the editor is changing the permissions on the files it writes, maybe because it's running with a different umask than your terminal?
[13:40:46] <seb_kuzminsky> how are you launching gmoccapy? from the CNC menu in the OS?
[13:41:59] <lair82> I did yes, and clicked the "create desktop shortcut" button before I selected my config.
[13:44:27] <seb_kuzminsky> i bet it inherits the umask of the window manager or similar
[13:45:05] <seb_kuzminsky> in the shortcut you made, can you edit the command it runs to add "umask 0022; " before whatever it's running now?
[13:45:52] <andypugh> Is it possible that gmoccapy inherits the setuid root?
[13:46:15] <seb_kuzminsky> i sure hope not
[13:47:44] <lair82> I didn't make any shortcuts
[13:48:50] <seb_kuzminsky> oh, i thought that's what you meant above, when you selected your config?
[13:50:11] <lair82> Well, I did select the option to create a desktop shortcut to my config with the radio button that is at the bottom of the config picker screen, didn't realize that was what you meant
[13:50:41] <seb_kuzminsky> here's a thing we can try to see if we're on the right track
[13:50:48] <seb_kuzminsky> start linuxcnc the way you normally do
[13:51:13] <seb_kuzminsky> then run "gdb --pid=$(pidof gmoccapy)"
[13:51:16] <lair82> It's running out there now,
[13:51:31] <seb_kuzminsky> then in gdb, "call/o umask(0)"
[13:51:41] <seb_kuzminsky> followed by "call umask($1)"
[13:51:50] <lair82> can I do that through ssh from my desktop, or do I need to be at the actual machine?
[13:51:58] <jepler> (git grep shows no indication that anything in linuxcnc is manipulating umask)
[13:52:08] <seb_kuzminsky> ssh is fine, as long as no one is using the machine
[13:52:11] <jepler> note that while following seb_kuzminsky's instructions the UI will freeze
[13:52:17] <seb_kuzminsky> it will freeze the UI while gdb is running
[13:52:17] <seb_kuzminsky> yeah
[13:52:34] <lair82> nobody is using the machine, just me right now trying to figure this out
[13:52:46] <seb_kuzminsky> and you may have to install gdb with "sudo apt-get install gdb"
[13:53:49] <jepler> and you many need pidof -x gmoccapy, or you may need to look at ps or top and find the process id number and use that instead of $(pidof...) directly
[13:54:12] <seb_kuzminsky> oh yeah
[13:54:23] <lair82> 4169 greenmil 20 0 194m 89m 45m S 56.3 0.6 27:20.57 gmoccapy
[13:55:13] <lair82> gdb --pid=4169 Look good?
[13:55:16] <seb_kuzminsky> yeah
[13:56:18] <lair82> 0xb77799e0 in __kernel_vsyscall ()
[13:56:18] <lair82> (gdb) call/o umask(0)
[13:56:18] <lair82> $1 = 022
[13:56:18] <lair82> (gdb) call umask($1)
[13:56:18] <lair82> $2 = 0
[13:56:19] <lair82> (gdb)
[13:56:37] <CaptHindsight> seb_kuzminsky: it's already happened, Paulo doesn't give him credit for his work, and acts like a code tyrant
[13:56:42] <jepler> now you can "quit" gdb and gmoccapy will become responsive again
[13:57:20] <lair82> Ok I quit,
[13:57:40] <lair82> not literally, just the debug process
[13:57:51] <CaptHindsight> seb_kuzminsky: he stopped trying to help with mainline months ago
[13:57:51] <jepler> 0022 and 022 mean the same thing, and should cause created files to be readable by all.
[13:58:37] <CaptHindsight> seb_kuzminsky: that could change if mainline become less of a dictatorship and becomes more democratic
[14:00:13] <seb_kuzminsky> lair82: ok, so my theory was wrong and something else is causing the problem
[14:00:15] <seb_kuzminsky> try this:
[14:00:39] <seb_kuzminsky> cause gmoccapy to open the editor, then while the editor is running use the gdb trick to inspect the umask of the editor process itself
[14:00:52] <seb_kuzminsky> that's really what we care about
[14:01:21] <seb_kuzminsky> CaptHindsight: i can sure understand the frustration with that :-(
[14:01:38] <seb_kuzminsky> bbl, time for a run before the afternoon thunderstorm
[14:03:25] <lair82> I don't see any other process related to the editor, just use 4169 again?
[14:04:26] <lair82> 0xb77799e0 in __kernel_vsyscall ()
[14:04:26] <lair82> (gdb) call/o umask(0)
[14:04:26] <lair82> $1 = 022
[14:04:26] <lair82> (gdb) call umask($1)
[14:04:26] <lair82> $2 = 0
[14:04:27] <lair82> (gdb)
[14:14:19] <jepler> so here's the skinny
[14:14:50] <jepler> hal_sourceview.py has 'def safe_write' which is using a temporary file + a rename to try to avoid loss of data if the file is only partially written and power is lost or machine crashes or what have you
[14:14:56] <jepler> >>> fd, fn = tempfile.mkstemp()
[14:14:59] <jepler> >>> "%o" % os.stat(fn).st_mode
[14:14:59] <jepler> '100600'
[14:15:15] <jepler> but python's tempfile module creates temporary files with restrictive permissions
[14:15:37] <jepler> .. this is documented in pydoc tempfile
[14:16:40] <lair82> What would be different between Wheezy, and ubuntu 10.04? I have all three turning centers configured the exact same way, and they are all ubuntu,
[14:17:03] <jepler> same linuxcnc version?
[14:17:42] <lair82> No the ubuntu machines are 2.6.3-2.6.4, the wheezy machines are 2.7.3-2.7.4
[14:20:18] <jepler> it appears the explanation I have provided should apply equally to all the versions you name
[14:21:07] <jepler> in other words, I'm not sure why you're seeing a difference
[14:21:40] <lair82> we can move files around all day long edited, unedited between the ubuntu machines and our other PC's, and have no issues
[14:23:03] <lair82> I always come up with the interesting problems.
[14:23:59] <jepler> "gmoccapy creates gcode files with mode 0600 instead of obeying umask" is a legitimate bug and somebody with an interest should file it as an issue on github or fix it and file a pull request
[14:24:48] <lair82> I can file it, just wanted to verify it was an actual issue first.
[14:41:30] -linuxcnc-github:#linuxcnc-devel- [13linuxcnc] 15lair82 opened issue #82: Gmoccapy creates gcode files with mode 0600 instead of obeying umask 02https://github.com/LinuxCNC/linuxcnc/issues/82
[14:44:06] -linuxcnc-github:#linuxcnc-devel- [13linuxcnc] 15jepler commented on issue #82: @gmoccapy can you please take a look at this? I believe this occurs because of hal_sourceview.py:safe_write, which uses tempfile.mkstemp, which is documented as creating a file that only the current user can read. 02https://github.com/LinuxCNC/linuxcnc/issues/82#issuecomment-228155426
[16:51:34] <jepler> lair82: thanks for filing that
[16:51:41] <jepler> I hope that norbert can take a look soon
[18:40:28] <dgarr> seb_kuzminsky: i found a bug that i thought was an error in my rebase of joints_axes15 but testing shows same error in recent (june) master.
[18:40:31] <dgarr> info and steps to reproduce: http://www.panix.com/~dgarrett/stuff/bug.txt
[18:41:38] <dgarr> reproducible in a sim (configs/sim/axis/axis_9axis.ini) using a reduced version of one of my files
[18:53:01] <jepler> dgarr: thank you as always for a detailed report
[18:53:27] <jepler> dgarr: the configure flags thing is due to 402c27b being a 2.6ish ref, --with-realtime=uspace isn't that old
[18:53:56] <jepler> $ git show 402c27bddf68fe58aaf4447190adc3029ce821cc:VERSION
[18:53:56] <jepler> 2.6.12
[18:55:47] <jepler> dgarr: do you think that the part program is important to reproducing the issue?
[18:57:52] <seb_kuzminsky> it would not surprise me if i broke something while trying to fix that task abort bug that zultron reported
[19:00:44] <dgarr> jepler: i'm not sure about the part program, i first observed with a routine program i used, then reduced some obvious things but i could not keep removing iterations so i think it is related to maybe rotary coordinate + complexity or length -- it is a real example and reproducible in sim (for me on uspace anyway)
[19:03:03] <seb_kuzminsky> i can repro the problem using dgarr's instructions in the current tip of 2.6
[19:03:41] <jepler> I also reproduce it
[19:05:01] <seb_kuzminsky> and i dont see the problem in 1b7a523, which is in 2.6 just before i started monkeying with Task
[19:05:11] <seb_kuzminsky> so i think this is my bug
[19:05:56] <jepler> so having turned on some debug flags, I see task repeatedly issuing Issuing EMC_SET_DEBUG -- ( +22,+24, +565, +986,)
[19:05:59] <jepler> same values
[19:06:18] <seb_kuzminsky> bbl
[19:06:27] <jepler> using the debug configuration pop-up from axis
[19:07:23] <jepler> somehow it'll be this commit? Task: simplify handling of emcCommand
[19:34:47] <jepler> 20 minutes later it's still logging the same message
[19:35:40] <jepler> but once I touch the rapid slider once, then it starts doing *that* NML command even more frequently than the EMC_SET_DEBUG
[19:35:59] <jepler> but somehow going back to estop state fixes this
[19:38:09] <jepler> yes, somehow it is that commit. boooo
[19:40:31] <jepler> case EMC_SET_DEBUG_TYPE:
[19:40:32] <jepler> ...
[19:40:36] <jepler> retval = emcTaskIssueCommand(emcCommand);
[19:40:36] <jepler> return retval;
[19:40:36] <jepler> break;
[19:40:52] <jepler> so it doesn't hit the block that acknowledges the message receipt
[19:42:11] <jepler> I think we're better off reverting that commit for now
[19:43:52] <jepler> https://emergent.unpythonic.net/files/sandbox/0001-task-Fix-serial-number-handling-after-516deaef.patch
[19:44:06] <jepler> dgarr, seb_kuzminsky ^^^
[19:50:31] <dgarr> jepler: that fix (sandbox one) seems to work on my original example (not simplified, not hw but uspace on my dev machine) in ja15
[19:54:21] <jepler> dgarr: that's good news, thanks for testing!
[19:54:35] <jepler> the revert is actually a bit less straightforward, you'd get a conflict if you tried it.
[20:00:53] <dgarr> jepler: so not sure what to do for ja15? git am your patch works on ja15 -- should i do that and move on? afk -- i will read back
[20:07:27] <jepler> dgarr: let's stick around until seb_kuzminsky decides revert or apply additional fix
[20:07:53] <jepler> whatever it is, he will end up doing it to the 2.6 branch and it will get merged up to master that way
[20:10:19] <jepler> if the thing you pick is the thing he picks, and we do one last JAxx rebase, it'll disappear. otherwise, it'll be an additional conflict to handle in the ultimate merge or rebase
[20:46:45] <jepler> hmm I may be counting my chickens too early, but it looks like my latency gets much better if I switch from nvidia to intel for my X graphics
[20:47:14] <jepler> .. but I can still run nvidia opengl programs with 'optirun'
[20:53:08] <jepler> max latency more like 24us after 7 minutes, seems like a success!
[22:10:03] <seb_kuzminsky> jepler: i think your fix is spot on
[22:11:04] <seb_kuzminsky> i agonized over 516deaefd since it is so invasive, but it was needed to work around part of the bug-clusterfuck that fell out of zultron's bug report
[22:11:40] <seb_kuzminsky> without that commit, the command ack happened in Task's main(), and i needed to defer emcTaskPlan() processing in some situations
[22:11:53] <seb_kuzminsky> ie while draining the interp list
[22:12:30] <seb_kuzminsky> so i had to move the acking of emcCommand to emcTaskPlan, so that main wouldn't erroneously ack commands that emcTaskPlan hadn't processed yet
[22:12:57] <seb_kuzminsky> did that make any sense? i'm very sleepy
[22:38:33] <seb_kuzminsky> jepler: my only comment on your fix is that it adds a call to readahead_reading() where before there wasnt one
[22:41:03] <KGB-linuxcnc> 03Jeff Epler 052.6-task-fix a38665b 06linuxcnc 10src/emc/task/emctaskmain.cc task: Fix serial number handling after 516deaef * 14http://git.linuxcnc.org/?p=linuxcnc.git;a=commitdiff;h=a38665b
[22:41:26] <seb_kuzminsky> that's my proposal
[22:53:30] <seb_kuzminsky> in the snapshot.debian.org kernels, 4.1-rt installs without drama, but 4.6 needs xorg(!) updates too
[22:53:58] <seb_kuzminsky> so frmo a packaging & redistributing logistics point of view, 4.1 will be easier to deal with than 4.6
[22:54:08] <seb_kuzminsky> i'm running latency tests on 4.1 now, will report back in the morning