#linuxcnc-devel | Logs for 2015-08-07

Back
[00:40:24] <KGB-linuxcnc> 03Sebastian Kuzminsky 05seb/2.7/task-latency-reporting 9498c2e 06linuxcnc 10src/emc/task/emctaskmain.cc task: warn if the main loop takes too long * 14http://git.linuxcnc.org/?p=linuxcnc.git;a=commitdiff;h=9498c2e
[00:40:24] <KGB-linuxcnc> 03Sebastian Kuzminsky 05seb/2.7/task-latency-reporting 1282dfa 06linuxcnc 10src/emc/usr_intf/halui.cc 10src/emc/usr_intf/keystick.cc 10src/emc/usr_intf/xemc.cc UIs: increase task receive timeout to 5.0 seconds * 14http://git.linuxcnc.org/?p=linuxcnc.git;a=commitdiff;h=1282dfa
[00:40:52] <seb_kuzminsky> i'd appreciate feedback on the seb/2.7/task-latency-reporting branch
[00:41:38] <seb_kuzminsky> it makes task warn us if it's taking too long, and it makes all the UIs (including halui) tolerate up to 5 seconds of latency from task, instead of the 1-2 seconds they accepted before
[01:35:49] <archivist> seb_kuzminsky, does that mean 5 seconds from pressing the esc key to for an estop, rather high
[01:36:09] <seb_kuzminsky> no, not really
[01:36:39] <seb_kuzminsky> you press the escape key, the ui asks task to stop immediately, just like before
[01:37:25] <archivist> I have seen a disconcerting delay in that situation
[01:37:39] <seb_kuzminsky> but now the ui lets task take up to 5 seconds to respond, instead of up to 1 or 2, before it calls it an error
[01:38:03] <seb_kuzminsky> this doesn't change how long task takes to respons, it just lets the uis deal with tasks long latencies a little better
[01:38:11] <archivist> never been able to put my finger on the culprit though
[01:38:13] <seb_kuzminsky> (and makes task tell us just how bad its latency is)
[01:38:45] <seb_kuzminsky> well, jepler today discovered the source of occasional 2+ second outages from task
[01:39:11] <seb_kuzminsky> it sometimes takes task a long time to store the gcode parameters to disk, and during that time it can't do anything else
[01:40:40] <archivist> I should check on my machine that sometimes exhibits slow ui response (not just axis, the editor too)
[08:40:02] <pcw_home> Hmm thats interesting, if you start LinuxCNC uspace from a terminal and close the controlling
[08:40:03] <pcw_home> terminal you lose the GUI ( not sure what else) but HAL goes marching on
[08:40:04] <pcw_home> is there a hal way to know the GUI has gone AWOL?
[08:40:24] <jepler> ok, I think I've actually *eliminated* remap as the cause of this little gem: 9: duplicate O-word label - already defined in line 29: 'O200 IF [#2 GT 0]'
[08:40:46] <jepler> I rewrote the file so it can be called with a regular o-call instead of via remap http://emergent.unpythonic.net/files/sandbox/toolchange1.ngc
[08:41:08] <jepler> start linuxcnc, and use o<toolchange1> call [0] [1] [1] / o<toolchange1> call [1] [2] [2]
[08:41:17] <jepler> when you do it via MDI you get the famous error
[08:41:33] <cradek> when you do O<toolchange> call?
[08:41:38] <jepler> right
[08:41:59] <cradek> wow! I bet that's a great clue
[08:42:43] <jepler> we can stop looking at remap..
[08:54:33] <skunkworks> aww - but we like blaming it!
[09:09:09] <skunkworks_> has anyone gone back to older versions of linuxcnc to see if the bug has always been there?
[09:10:52] <jepler> funny, cradek was just saying that to me over a bagel
[09:11:15] <skunkworks_> Mmmm bagel
[09:11:52] <skunkworks_> I seem to remember mdi stuff being touched in the past
[09:12:59] <cradek> I'm not sure where we first started being able to do O-call in mdi
[09:13:07] <cradek> maybe it was 2.5
[09:16:11] <skunkworks> zlog
[09:19:15] <jepler> o<toolchange> endsub [1]
[09:19:19] <jepler> what's [1] doing here?
[09:19:43] <jepler> oh that's the "return value"
[09:19:50] <cradek> yeah, I don't know what it does
[09:20:45] <cradek> can you even run 2.5?
[09:21:03] <jepler> me?
[09:21:09] <jepler> I haven't tried any other versions yet
[09:21:16] <jepler> I was trying to reduce the subroutine first
[09:22:36] <skunkworks_> So.. I should be able to drop the toolchange1.ngc into my ncfiles directory then run the 2 lines in mdi and get the error right?
[09:22:55] <skunkworks_> oh - I probably have to run his config.
[09:22:56] <jepler> skunkworks_: I am still using andy's config because it has to talk to the simulated carousel
[09:23:22] <skunkworks_> I finally got a failed to aling carousel.
[09:27:25] <cradek> ok I think it's 218505cba8 where we gained mdi O-call
[09:27:28] <cradek> that is in 2.5
[09:31:12] <jepler> I think this is the most minimized I'm going to get: http://emergent.unpythonic.net/files/sandbox/obug.ngc
[09:31:16] <jepler> the M66 is required
[09:31:30] <seb_kuzminsky> in the 2.6-remap-bug branch there's a non-carousel config that erproduces it
[09:31:33] <jepler> open linuxcnc, machine on, home all, mdi o<obug> call [0] / o<obug> call [1]
[09:32:46] <cradek> jepler: that's beautiful
[09:33:38] <jepler> that one shouldn't need the rest of the carousel config either
[09:33:57] <cradek> 2.5 builds in 15s
[09:35:09] <cradek> argh I'm getting the thing where AXIS only runs once and then crashes forever after
[09:35:14] <cradek> I forget what fixes that
[09:35:15] <jepler> I do not reproduce it in rs274 in read-from-terminal mode
[09:35:19] <cradek> something about icons
[09:35:55] <seb_kuzminsky> cradek: i sometimes get that when stuff crashes and leaves ipcs behind (sems & shms)
[09:36:03] <jepler> commit 1402d8a118de20edab2761521e132c41c9e68bfb
[09:36:06] <cradek> stash@{1}: On v2.5_branch: fix seticon crashycrashy
[09:36:06] <jepler> axis: drop seticon hack
[09:36:07] <jepler> maybe
[09:36:10] <cradek> heh
[09:37:10] <cradek> whatever's in my stash fixes it
[09:37:47] <cradek> emc/task/emctask.cc 374: interp_error: Unexpected character after O-word
[09:37:47] <cradek> Unexpected character after O-word
[09:38:05] <cradek> in 2.5 I get this on the first call[0]
[09:39:17] <jepler> reproduces in 2.6.0
[09:39:43] <jepler> when did we get named o-words?
[09:39:51] <cradek> I took away the "return value" and it does not reproduce in 2.5
[09:40:02] <cradek> is having the return value important?
[09:40:07] <jepler> for remap
[09:40:19] <cradek> for reproducing in 2.6 I mean
[09:40:34] <jepler> no
[09:40:54] <jepler> .. updated http://emergent.unpythonic.net/files/sandbox/obug.ngc without it
[09:41:20] <jepler> what ref did you check, tip of 2.5?
[09:41:26] <cradek> yeah
[09:41:44] <jepler> what config did you test with?
[09:41:58] <cradek> sim/axis
[09:42:08] <jepler> [RS274NGC]
[09:42:08] <jepler> +SUBROUTINE_PATH=.
[09:42:10] <cradek> I added debugs in the two ifs, and it's all executing properly
[09:42:11] <jepler> did you have to add this?
[09:42:13] <cradek> no
[09:42:18] <jepler> huh
[09:42:34] <jepler> oh you put obug in nc_files
[09:43:08] <cradek> right, sorry
[09:43:15] <jepler> I had been putting it in the config directory
[09:43:23] <cradek> yeah it apparently works right in 2.5
[09:43:30] <cradek> this is good
[09:44:17] <jepler> Bisecting: 1475 revisions left to test after this (roughly 11 steps)
[09:44:27] <cradek> did you limit it to task?
[09:45:01] <cradek> er maybe -- src/emc/task src/emc/rs274ngc
[09:45:02] <jepler> no
[09:45:55] <jepler> oh god redis
[09:49:47] <jepler> hm a lot of unbuildable refs in here
[09:55:46] <seb_kuzminsky> there were some dark days in 2.6
[09:58:04] <jepler> I think I have had 2 buildable refs and 15 unbuildable ones so far
[09:59:05] <jepler> do these refs just not build on wheezy or something?
[10:01:02] <jepler> I'm done for now
[10:01:10] <jepler> maybe with a lucid machine or something
[10:01:54] <seb_kuzminsky> jepler: is it c++ stuff that's not building for you on wheezy?
[10:01:59] <jepler> different stuff
[10:02:08] <jepler> sometimes it's task not linking because python something
[10:02:14] <jepler> sometimes it's commit markers that were committed
[10:02:15] <seb_kuzminsky> ugh
[10:02:18] <seb_kuzminsky> haha
[10:02:21] <jepler> sometimes it links but doesn't start because of shared memory something
[10:02:24] <jepler> er conflict markers
[10:03:12] <seb_kuzminsky> at work we have two special hats of shame that you have to wear if you break the build
[10:03:22] <seb_kuzminsky> one of them is for honest mistakes and subtle bugs
[10:03:34] <jepler> we used a rubber weevil for awhile but then one guy kept it so he could break the build whenever he wanted
[10:03:45] <seb_kuzminsky> but the other one is a ridiculous cowboy hat you have to wear if you make silly mistakes like that
[10:03:55] <jepler> that's racist against cowboys
[10:04:17] <seb_kuzminsky> cowboys are not a race, patrick
[10:04:52] <cradek> well on tv every cowboy I see is white
[10:06:24] <jepler> how much time do you spend watching cowboys on TV?
[10:07:29] <seb_kuzminsky> https://www.youtube.com/watch?v=N_QZNtflyJA
[10:09:24] <cradek> jepler: I'd prefer not to talk about that
[10:10:27] <seb_kuzminsky> the last tv cowboy i watched was cowboy bebop, i'd be happy to talk about that
[10:11:41] <cradek> jepler: what's your bad ref?
[10:13:23] <jepler> git bisect good 405ed60ecc7ebffad1a6b538161aeed4fdb664c9
[10:13:27] <jepler> git bisect bad 1764a195548a68aa1c57ba349720744a6c28324a
[10:13:58] <jepler> full "git bisect log" at http://emergent.unpythonic.net/files/sandbox/bisectlog.txt
[10:15:25] <seb_kuzminsky> awesome
[10:18:13] <skunkworks_> so cradek removing the return value also fixed it? [6905295396fd96537161af67b7cb2419b1b92290] interp/oword: enable optional return values on 'return' and 'endsub
[10:18:39] <cradek> no, I just did that to make it run on 2.5
[10:18:45] <skunkworks_> oh
[10:18:47] <skunkworks_> duh
[10:19:09] <jepler> right, it wasn't needed to reproduce it in 2.6.0 and that allowed to work (and succeed) in 2.5.4.
[10:19:25] <skunkworks_> but that sure seems like it narrows it down
[10:30:47] <cradek> 395 consecutive commits are unbuildable because they contain conflict markers
[10:31:18] <skunkworks_> what does that mean?
[10:31:35] <skunkworks_> 'conflict markers'
[10:31:50] <cradek> it's hard for me to explain this problem and also be nice
[10:31:57] <skunkworks_> heh - ok
[10:32:18] <cradek> just believe me that it's really quite incredible
[10:34:20] <mozmck> somebody made a bunch of commits in a private branch, then pushed/merged without building?
[11:03:07] <jepler> my personal bet is that it was a problem introduced by rebasing just before pushing
[11:04:44] <cradek> dfe0a86828
[11:04:49] <cradek> interp/oword: missed a merge conflict
[11:04:53] <cradek> also, unsure why these fail during rebase:
[11:04:54] <cradek> ...
[11:04:58] <cradek> so yeah
[11:05:08] <cradek> this is the first buildable one
[11:05:59] <cradek> and the breakage started here: interp_o_word.cc: reformat, sorry for the whitespace breakage. This was beyond readability.
[11:06:36] <cradek> start with a bad idea (reformat everything) and then rebase 400 commits without paying attention, and then push it all.
[11:08:44] <mozmck> makes sense. I wondered how that could happen, but rebase makes sense.
[11:09:12] <cradek> of course you'd notice it at the end when your result didn't build
[11:09:36] <cradek> then all you have to do is not care one bit that your 400 commits are bogus now
[11:10:27] <cradek> which you might not care about, if you don't care whether bisect works in the future
[11:10:37] <cradek> it's maddening
[11:11:51] * mozmck has not used bisect yet
[11:15:37] <cradek> and dfe0a86828 won't build because of boost-python stuff
[11:19:26] <cradek> skunkworks_: do you have a lucid machine?
[11:33:40] <cradek> oh yay
[11:33:53] <cradek> I hacked dfe0a86828 into running, and it does NOT show the bug
[11:34:10] <jepler> that's after the terribles?
[11:34:15] <cradek> yes!
[11:34:29] <jepler> what is the problem at dfe?
[11:34:58] <cradek> this fixes it: http://paste.ubuntu.com/12021975/
[11:35:51] <jepler> aha
[11:36:06] <jepler> I wonder how far past that ref that went unfixed
[11:36:16] <cradek> I bet the bisect is much less bad with this one marked good
[11:36:40] <jepler> Bisecting: a merge base must be tested
[11:42:09] <jepler> Bisecting: 26 revisions left to test after this (roughly 5 steps)
[11:42:10] <jepler> [fe83fe91e3232e9057e48bee8adb999ad3b90dbe] classicladder -fix mix of whites space with tabs
[11:42:20] <cradek> offs
[11:43:48] <cradek> oh that one is minor, not a full reformat
[11:44:24] <jepler> my bisect only has mah commits left
[11:46:42] <jepler> well this is an interesting result
[11:46:53] <jepler> 0c8ad2c418795c5aeefcc45c0b351e12644c9c5a is the first bad commit
[11:46:56] <jepler> 0c8ad2c interp: firm up O-word handling
[11:47:08] <jepler> it adds error handling of mismatches of the kind that is being reported
[11:47:22] <seb_kuzminsky> ding ding ding
[11:47:35] <jepler> but that means the wrongness is preexisting but undetected :-/
[11:48:24] <jepler> or that the error test is just wrong, of course
[12:17:18] <jepler> so what we've learned is that M66 leads to emcTaskIssueCommand > emcTaskPlanExecute > Interp::execute setting _setup.sequence_number from <some number> to 0
[12:19:55] <jepler> one possible fix: /shared/home/jepler/patches/0001-WIP-don-t-set-an-invalid-sequence-number.patch
[12:20:12] <cradek> heh, that path isn't going to work
[12:20:14] <jepler> oops
[12:20:27] <jepler> http://emergent.unpythonic.net/files/sandbox/0001-WIP-don-t-set-an-invalid-sequence-number.patch
[12:21:47] <cradek> andypugh: you might want to read the log - jepler and I have been working on that bug
[12:21:50] <cradek> also, good morning
[12:23:14] <andypugh> Interesting, I was just homing in on the settings->sequence number as the problem. More by sleeping on it that analysing it :-)
[12:23:45] <cradek> it became off by a number of lines equal to the line number of the M66
[12:23:59] <cradek> if in MDI
[12:24:24] <andypugh> M66?
[12:24:30] <andypugh> What?
[12:24:49] <cradek> the queue-buster causes it, it's nothing about the O structure
[12:24:58] <andypugh> Aha!
[12:26:04] <jepler> I do believe that fixes it, letting the carousel demo run
[12:26:31] <jepler> I am not pleased with the fix, but when is that ever the case
[12:27:04] <cradek> is it just because you don't understand all the surrounding goop?
[12:27:21] <jepler> I know I'm putting a little self-adhesive bandage on goop
[12:27:37] <jepler> probably the real fix involves a goop solvent
[12:27:56] <cradek> sometimes when you add solvent to goop you just get more goop on you
[12:30:15] <andypugh> It may be that that function shouldn’t even be called in the queue-buster case?
[12:30:41] <andypugh> (Not having looked yet)
[12:31:02] <cradek> I sure don't know
[12:32:09] <jepler> andypugh: if you can confirm that patch fixes your carousel, I'll ask Seb to review it for 2.7, or maybe even 2.6.
[12:32:36] <jepler> I guess I won't need this lucid chroot I was building earlier
[12:32:58] <cradek> yeah I doubt that version built on lucid either
[12:32:59] <andypugh> You can try smoking the lucic chroot instead?
[12:33:14] <cradek> what a swirl
[12:35:31] <jepler> task: remove pseudoMidLinenNumber, calls to interp_list.set_line_number()
[12:35:36] <jepler> - execRetval = emcTaskPlanExecute(command, pseudoMdiLineNumber);
[12:35:39] <jepler> + execRetval = emcTaskPlanExecute(command, 0);
[12:36:05] <jepler> we used to maintain a line number here but that was taken out by mah in 2012 at around the same time the new error checking code was added
[12:36:24] <jepler> I don't know if the old line number we'd been maintaining was right
[12:36:32] <cradek> he did a bunch of stuff with special line numbers (0 and -1) that I never understood
[12:37:15] <cradek> he did a bunch of stuff that I never understood
[12:37:17] <cradek> heh
[12:37:21] <andypugh> I have been seeing a lot of -1 numbers logging this issue
[12:37:37] <seb_kuzminsky> jepler: your patch makes my test pass too
[12:37:40] <andypugh> I had (without evidence) assumed that was an MDI thing
[12:37:41] <seb_kuzminsky> but i dont understand it
[12:38:25] <jepler> seb_kuzminsky: if you investigated the call chain, you'd see that line_number is passed in as 0 in this particular case
[12:38:44] <jepler> and other than that, the line number was properly maintained by the interpreter
[12:39:35] <jepler> NULL command is a special case of some sort
[12:39:40] <seb_kuzminsky> so it's a queue-buster vs o-word bug, not a remap thing?
[12:39:53] <jepler> queue-buster vs o-word vs mdi
[12:39:57] <jepler> remap is off the hook
[12:40:31] <jepler> er is that what I mean? innocent in this matter.
[12:40:32] <cradek> it's only the O word stuff that made the wrong sequence number evident. it has nothing to do with O words.
[12:40:37] <seb_kuzminsky> hah
[12:41:26] <jepler> lunchtime
[12:48:46] <andypugh> Having fixed one bug-like thing in carousel-demo, there is also this (which is invisible unless you run from the command line) http://www.pastebin.ca/3092635
[12:49:03] <andypugh> The other Vismach demos don’t seem to do it.
[12:53:14] <andypugh> I am now very puzzled that the duplicate O-word was only reported once. Logging the data shows that _all_ the line numbers were wrong, all of the time.
[13:00:00] <andypugh> Without the fix: http://www.pastebin.ca/3092651 and with the fix: http://www.pastebin.ca/3092657
[13:15:09] <cradek> is it because it stopped at the first error?
[13:15:24] <jepler> andypugh: I do not get "error 1282" running your config
[13:16:04] <jepler> does that happen right at startup?
[13:18:21] <skunkworks> zlog
[13:24:50] <andypugh> jepler: Yes, it happens when loading the Vismach display. It doesn’t seem to stop anything working properly, though
[13:25:07] <andypugh> Possibly it is a fature of my graphics setup?
[13:26:51] <jepler> is it unusual? linux with nvidia card and proprietary driver here.
[13:27:05] <skunkworks> andypugh, our next machine to convert is a matsurra with a changer that will user your work.
[13:28:09] <andypugh> jepler: The machine i am testing on is running in a VM on a Mac, so I think it is fair to say that it has a pretty unusual graphical setup, yes.
[13:28:48] <andypugh> It’s the Wheezy LiveCD on the Linux side
[13:28:56] <skunkworks> andypugh, is your config in 2.7?
[13:29:46] <jepler> skunkworks: branch andypugh/something
[13:29:53] <skunkworks> ah - ok
[13:30:01] <jepler> remotes/origin/andypugh/carousel_demo
[13:30:28] <jepler> it may get to go in 2.7 now, if we take the fix for the MDI M66
[13:30:45] <skunkworks> Great work!
[13:32:55] <jepler> skunkworks: so if you want to try it, get that branch and then apply http://emergent.unpythonic.net/files/sandbox/0001-WIP-don-t-set-an-invalid-sequence-number.patch
[13:33:31] <andypugh> Or you can run the VM anyway, it only breaks on the second MDI toolchange
[13:45:54] <skunkworks> what is a good - high resoluton laptop with a real mouse pad?
[13:46:14] <skunkworks> (not a clickpad)
[13:46:24] <cradek> a thinkpad from about '05
[13:46:33] <mozmck> yeah, I buy used :)
[13:46:48] <cradek> and no you can't have mine
[13:46:50] <mozmck> Mine is a del precision M6400 from around 09
[13:47:01] <mozmck> 17" screen at 1920 x 1200
[13:47:02] <skunkworks> I would too - I would like atleast 2nd gen i5 or better
[13:47:03] <cradek> I put a 2048x1536 screen in it
[13:47:12] <mozmck> nice!
[13:47:44] <skunkworks> I have a dell xps which I have had since 11 but it is falling apart
[13:47:45] <mozmck> Mine is core2 extreme, and is really quite fast. has an external 1Gig nvidia card
[13:48:32] <mozmck> You can get an M6500 with an i7 quad and better specs than mine - used for $500 - $600 pretty readily I believe.
[13:48:38] <mozmck> Sometimes less.
[13:49:08] <cradek> does it have 3 real buttons?
[13:49:15] <mozmck> 6
[13:49:24] <cradek> !?
[13:49:43] <mozmck> 3 below the mousepad, and 3 above.
[13:49:55] <cradek> neat
[13:50:11] <andypugh> That’s not a laptop, it’s a https://en.wikipedia.org/wiki/Microwriter
[13:50:28] <mozmck> real handy for different positions. Also has one of the mouse sticks in the middle of the keyboard
[13:52:01] <mozmck> Yeah, when I looked for something new, it was going to be $1200+ for what I wanted, and I got this about 3 years ago for $450.
[13:54:06] <skunkworks> andypugh, where is the config?
[13:54:34] <andypugh> sim/axis/vismach/VMC_toolchanher
[13:54:50] <andypugh> It also demonstrates spindle orient :-(
[13:54:56] <andypugh> :-) I mean
[13:55:12] <skunkworks> ah - I didn't look in vismach
[13:55:14] <mozmck> skunkworks: here's one: http://www.ebay.com/itm/281752279314
[13:55:15] <andypugh> (M19 R 180 etc)
[13:56:10] <mozmck> skunkworks: another, slightly cheaper but not quite as good specs: http://www.ebay.com/itm/141736660770
[14:02:27] <skunkworks> Thanks
[14:03:48] <cradek> that second one looks nice. 1920x1200, keyboard not too screwy
[14:03:54] <cradek> 3 buttons
[14:03:56] <mozmck> np. I like mine, although I did have the motherboard go out. I bought a new one for $40.
[14:04:20] <cradek> I'd rather have the normal middle section of a normal keyboard than a numpad
[14:04:30] <cradek> but nobody else wants that I guess
[14:04:46] <jepler> My daily laptop is a Lenovo Thinkpad T530. 1920x1080, ultranav, 15"
[14:04:50] <mozmck> there are 3 buttons right under the spacebar as well.
[14:05:02] <jepler> http://emergent.unpythonic.net/01365079830
[14:05:05] <skunkworks> a laptop with a number pad screws me up. I would have to get used to it. I am usually off by atleast 1 set of keys from home
[14:05:15] <cradek> mine is the one jepler discarded last :-)
[14:05:35] <cradek> minus a few parts I removed with sidecutters
[14:05:39] <skunkworks> I need 1080 atleast - anything less looks cartoony top me
[14:06:03] <mozmck> I really like the extra height gain with 1200 pixels tall
[14:06:10] <cradek> yeah
[14:06:38] <skunkworks> does anyone get an eception in the tkinker.py?
[14:06:39] <jepler> and I also spend a fair amount of time on a samsung 11" chromebook XC303C12, 1366x768, no mouse buttons
[14:06:52] <skunkworks> everthing runs though
[14:07:11] <mozmck> the precision is a little heavier than your chromebook I bet ;)
[14:07:18] <jepler> yeah, about 3x I think
[14:07:50] <andypugh> They gave us new lapdogs at work. Super expensive Panasonic Toughbooks. 1024 x 768 resolution. It’s maddening given that we need screens with hundreds of number on.
[14:08:14] <cradek> I can't believe they still make those screens
[14:08:26] <cradek> although I feel that way about x1080 too
[14:08:31] <andypugh> It also has the worst touchpad ever.
[14:08:48] <mozmck> I wonder if this might be related to the task delay writing the var file? http://www.linuxcnc.org/index.php/english/forum/38-general-linuxcnc-questions/29486-computer-total-freezes-only-during-jogging
[14:08:57] <skunkworks> andypugh, awesome config
[14:09:27] <andypugh> The pointer moves about 1/2” in a random direction when you let go, and the side and bottom edge scrolling areas are too wide (and the setting to turn them off doesn’t “take” in the settings dialog)
[14:10:11] <skunkworks> that is one of those clickpads.. I hate those. You cannot rest your thumb on them or they act goofy
[14:10:18] <andypugh> I think a colleague summed it up “After working in a car for 20 minutes with the ToughLuck you want to hurt something. “
[14:10:58] <andypugh> skunkworks: No, it has buttons. Fnny rubbery ones that need a strong prod right in the middle.
[14:11:10] <skunkworks> eww
[14:11:17] <cradek> mozmck: I notice they didn't replace the actual hard disk
[14:11:58] <mozmck> hmm, yes.
[14:12:00] <cradek> > In resume the machine behaves like the keyboard key was pressed
[14:12:04] <cradek> I don't understand this part
[14:12:31] <cradek> task wouldn't write out the var file while keyboard jogging
[14:12:34] <andypugh> Maybe he means that releasing the key doesn’t stop the jog
[14:13:09] <mozmck> He needs more roots
[14:13:21] <cradek> I wonder if they have multiple guis (halui etc)
[14:13:36] <cradek> I've seen bad pendant configs cause bewildering message floods
[14:14:35] <cradek> I'd not be using continuous jogs on a 1000 ipm machine, wow
[14:14:48] <mozmck> why?
[14:15:03] <cradek> wheel is so much safer
[14:15:11] <cradek> I only use wheel jogs on my fast machine
[14:15:30] <mozmck> on the plasma table I have it's 1000 ipm, and all I have is continuous jog.
[14:16:06] <andypugh> I have never felt 100% happy with keyboard continuous (that’s all the lathe has, but that also has weak steppers)
[14:16:26] <andypugh> I think I would be happy enough with realtime-continuous.
[14:16:44] <cradek> bet m*v^2 is a lot higher on my mill than your plasma table
[14:17:12] <skunkworks_> zlog
[14:17:33] <mozmck> cradek: I'm sure it is! my gantry is around 100lbs
[14:18:23] <cradek> discomfort is proportional to the square of the kinetic energy :-)
[14:18:59] <cradek> hmm, the difference is 3m of powered usb extension to the keyboard
[14:19:07] <cradek> wonder what powered usb extension means
[14:19:46] <mozmck> powered hub?
[14:20:08] <cradek> I guess
[14:20:18] <mozmck> usb is noise prone
[14:20:26] <cradek> I'd sure watch dmesg for "the hub freaked out and I had to reset it" type messages
[14:20:38] <mozmck> if his keyboard is on that it could be a problem.
[14:21:12] <mozmck> nope, item 2) Keyboard location - directly connected...
[14:22:07] <cradek> We started to think what is different ... : Distance from keyboard/mouse/screen from the actual PC that is on the electrical cabinet (about 3 meters of powered USB extention).
[14:28:41] <skunkworks> jepler, andypugh, fixes it here also. I do get the error 1282 stuff
[14:30:16] <jepler> skunkworks: interesting. what is your video?
[14:30:43] <skunkworks> nvidia I think - open source driver
[14:30:48] <skunkworks> let me look
[14:32:59] <skunkworks_> 02:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] RV730/M96-XT [Mobility Radeon HD 4670] (prog-if 00 [VGA controller]
[14:33:11] <skunkworks_> Kernel driver in use: radeon
[14:37:59] <jepler> opengl.org doesn't even document that glMaterialfv can result in GL_INVALID_OPERATION
[14:38:05] <jepler> https://www.opengl.org/sdk/docs/man2/xhtml/glMaterial.xml#errors
[14:44:13] <seb_kuzminsky> is anyone opposed to seb/2.7/task-latency-reporting? it adds some latency info logging to task, and makes all the UIs handle up to 5 seconds of latency from task instead of the 1-2 they currently do
[14:45:56] <cradek> was emcTaskNoDelay dead code?
[14:46:28] <cradek> er wait, you wanted it to do it always
[14:47:50] <cradek> this all looks fine to me
[14:49:16] <seb_kuzminsky> thanks
[14:49:41] <seb_kuzminsky> i think i'll drop it into 2.6 though, since that branch has the same problem
[14:49:50] <cradek> ok
[14:56:20] <jepler> at this rate we'll need another 2.6 release
[14:56:39] <seb_kuzminsky> yeah
[14:57:10] <seb_kuzminsky> once the M66 thing goes in to 2.6 i'll make 2.6.9
[14:59:02] <cradek> ooh there's a touchy+lathe fix too
[14:59:21] <skunkworks> runtest past
[15:00:41] <seb_kuzminsky> skunkworks: do you know if rob's branch origin/feature/tangent-improvement-2.7-rebase is the correct merge candidate? does it have your Quality blessing?
[15:01:39] <skunkworks> It has my quality blessing.. -rebase is the one I have been testing
[15:02:43] <skunkworks> I have found 1 overage of around 1% but I am not worried about it.
[15:08:42] <jepler> http://image.slidesharecdn.com/guardiansofyourcode-150423095739-conversion-gate02/95/guardians-of-your-code-29-638.jpg?cb=1429783778
[15:09:37] <skunkworks> :)
[15:09:54] <cradek> http://www.marriedtothesea.com/120513/how-to-hand-a-pair-of-scissors.gif
[15:11:15] <andypugh> jepler: Is it bad that I saw “127 new bugs” and thought “ah, that’s a signed byte problem”
[15:12:40] <jepler> ugh
[15:12:41] <jepler> > You can use the Github web interface, but there's a TOCTOU problem: If the pull-requester changes their master (or whatever they're PRing from) between the time you test and the time you merge, then you'll be merging code that you haven't reviewed/tested. So let's do it on the command line.
[15:12:49] <cradek> -127 would have been funny too
[15:13:58] <jepler> (I won't link to the article as a whole because it seems to not be giving good advice0
[15:14:19] <jepler> /0/)
[15:14:33] <cradek> bad advice on the internet?
[15:14:46] <jepler> yup, and about software even
[15:15:04] <andypugh> https://xkcd.com/386/
[15:15:40] <seb_kuzminsky> andypugh: Is it bad that i recognized the xkcd number and chuckled before clicking on it?
[15:16:32] <cradek> I guessed it from context
[15:27:37] <andypugh> 386 (like 123) is one of the ones I just know.
[15:28:15] <andypugh> So, I don’t find it at all unusual that you knew which one it should be
[17:40:33] <andypugh> cradek: You might be gratified to know that, at this very moment, polar coordinates are being used to drill a hole circle.
[18:08:32] <jepler> :)
[18:11:53] <jepler> seb_kuzminsky: so you want that sequence number fix in 2.6?
[18:12:05] <jepler> .. sounds like to me
[18:19:23] <KGB-linuxcnc> 03Sebastian Kuzminsky 052.6 d3cce60 06linuxcnc 10src/emc/task/emctaskmain.cc task: warn if the main loop takes too long * 14http://git.linuxcnc.org/?p=linuxcnc.git;a=commitdiff;h=d3cce60
[18:19:23] <KGB-linuxcnc> 03Sebastian Kuzminsky 052.6 18f57cf 06linuxcnc 10src/emc/usr_intf/halui.cc 10src/emc/usr_intf/keystick.cc 10src/emc/usr_intf/xemc.cc UIs: increase task receive timeout to 5.0 seconds * 14http://git.linuxcnc.org/?p=linuxcnc.git;a=commitdiff;h=18f57cf
[18:19:51] <KGB-linuxcnc> 05seb/2.7/task-latency-reporting 1282dfa 06linuxcnc 04. branch deleted * 14http://git.linuxcnc.org/?p=linuxcnc.git;a=commitdiff;h=1282dfa
[18:27:36] <seb_kuzminsky> jepler: yes pls
[18:36:32] <KGB-linuxcnc> 03Sebastian Kuzminsky 052.7 1eb77c6 06linuxcnc 10src/emc/task/emctaskmain.cc 10src/emc/usr_intf/halui.cc 10src/emc/usr_intf/keystick.cc 10src/emc/usr_intf/xemc.cc Merge remote-tracking branch 'origin/2.6' into 2.7 * 14http://git.linuxcnc.org/?p=linuxcnc.git;a=commitdiff;h=1eb77c6
[18:36:32] <KGB-linuxcnc> 03Sebastian Kuzminsky 052.7 d3a8d71 06linuxcnc Merge remote-tracking branch 'origin/feature/tangent-improvement-2.7-rebase' into 2.7 * 14http://git.linuxcnc.org/?p=linuxcnc.git;a=commitdiff;h=d3a8d71
[18:36:47] <seb_kuzminsky> boom
[18:37:45] <KGB-linuxcnc> 03Sebastian Kuzminsky 05master 9b613cc 06linuxcnc 10(6 files in 4 dirs) Merge remote-tracking branch 'origin/2.7' * 14http://git.linuxcnc.org/?p=linuxcnc.git;a=commitdiff;h=9b613cc
[18:39:03] <JT-Shop> seb_kuzminsky, how close is 2.7.0?
[18:39:59] <seb_kuzminsky> how close are the docs? ;-)
[18:40:05] <JT-Shop> I see a bunch of commits from Rob
[18:40:20] <seb_kuzminsky> i merged & pushed his fix branch
[18:40:23] <JT-Shop> pretty much done, some minor things but nothing to stop the show
[18:40:39] <seb_kuzminsky> no show stoppers, i hear you :-)
[18:41:27] <seb_kuzminsky> i want jeff's sequence number fix first, then we're ready i think
[18:41:37] <JT-Shop> Cool!
[18:43:36] <seb_kuzminsky> 2.6.9 at about the same time
[18:43:38] <seb_kuzminsky> bbl
[18:43:43] <JT-Shop> I may have a few things in the morning... dead tired tonight going roof to basement in a 3 story bank then crawling in holes to work on fountains all day
[18:47:02] <andypugh> JT-Shop: No offence meant, but you aren’t really built for that sort of thing, from what I have seen
[18:48:07] <andypugh> seb_kuzminsky: Carousel in or out?
[18:48:24] <andypugh> The component itself can certainly go in.
[19:14:07] <jepler> it's taking more time because I decided to write a test...
[19:18:27] <andypugh> I approve.
[19:19:43] <jepler> iteration 108
[19:19:51] <jepler> and then running it a lot of times :)
[19:23:15] <andypugh> Looking at the two log-files I posted earlier, my surprise is that things were working at all before.
[19:34:46] <seb_kuzminsky> andypugh: can you fix the graphics init warning noise?
[19:35:18] <jepler> fwiw I'm not convinced it's andypugh's fault
[19:35:20] <KGB-linuxcnc> 03Jeff Epler 052.6 301cc84 06linuxcnc 10src/emc/rs274ngc/rs274ngc_pre.cc interp: don't set an invalid sequence number * 14http://git.linuxcnc.org/?p=linuxcnc.git;a=commitdiff;h=301cc84
[19:35:20] <KGB-linuxcnc> 03Jeff Epler 052.6 2793de8 06linuxcnc 10(8 files in 2 dirs) tests: test for the sequence_number problem * 14http://git.linuxcnc.org/?p=linuxcnc.git;a=commitdiff;h=2793de8
[19:35:20] <KGB-linuxcnc> 03Jeff Epler 052.7 e836d5e 06linuxcnc 10src/emc/rs274ngc/rs274ngc_pre.cc Merge branch '2.6' into 2.7 * 14http://git.linuxcnc.org/?p=linuxcnc.git;a=commitdiff;h=e836d5e
[19:35:22] <KGB-linuxcnc> 03Jeff Epler 05master edb8ad1 06linuxcnc Merge branch '2.7' * 14http://git.linuxcnc.org/?p=linuxcnc.git;a=commitdiff;h=edb8ad1
[19:35:24] <andypugh> seb_kuzminsky: I have no idea what is wrong.
[19:36:04] <andypugh> It’s two separate errors (that just look similar) and they don’t seem to correlate with my code.
[19:36:28] <jepler> oh the two errors are not the same?
[19:36:30] <jepler> I should read it again
[19:36:45] <andypugh> I thought that commenting out all my “color” lines fixed it, then they came back
[19:38:02] <seb_kuzminsky> jepler: there's a test in 2.6-remap-bug, in the wrong place
[19:38:14] <seb_kuzminsky> oh i see you found it
[19:38:16] <seb_kuzminsky> bbl
[19:39:36] <jepler> good news everyone! on my system at home, I do reproduce the other error
[19:39:39] <jepler> File "/home/jepler/src/linuxcnc-2.6/lib/python/vismach.py", line 881, in apply
[19:39:42] <jepler> glMaterialfv(GL_FRONT_AND_BACK, GL_AMBIENT_AND_DIFFUSE, self.color)
[19:39:45] <jepler> error: (1282, 'invalid operation')
[19:39:57] <jepler> so I'll look into it too
[19:40:02] <jepler> at home I have intel graphics
[19:44:37] <andypugh> It’s not a helpful error message, it isn’t a documented error message, and there is no indication which program line is causing it. But assuming that a Vismach script runs linearly, it prints a tag print in every section of the Vismach assembly before it prints the error message. But I am aware that things might be going on in parallel.
[20:22:05] <PCW> In regards to emcPTs issue with KB jogging is that any thing that aborts the jog if there's a USB error?
[20:22:21] <PCW> s/is that/is there/
[20:26:50] <andypugh> When you think about it, Keyboard jogging is USB jogging. We should ban it..
[20:27:55] <PCW> I put this is my reply:
[20:27:57] <PCW> Continuous jogging any big machine via a non real time interface (especially one as flaky and prone to long recovery cycles as USB) seems iffy at best
[20:47:14] <jepler> andypugh, skunkworks: please test with http://emergent.unpythonic.net/files/sandbox/0001-vismach-work-around-a-bug-in-mesa.patch -- we may be hitting a mesa bug in the STL code
[20:47:33] <jepler> few vismachs use stl meshes, which would explain why most or all of the others don't trigger the problem
[20:47:53] <jepler> according to an even older bug (bug reported fixed) the complexity of the geometry could also affect whether the bug showed up https://bugs.freedesktop.org/show_bug.cgi?id=7984
[20:53:36] <andypugh> jepler: Yes, that fixes it. Thanks hugely.
[20:53:54] <andypugh> I had been balming me, as other configs didn’t do it
[20:55:00] <jepler> yay
[20:55:26] <jepler> thanks for finding us problems to fix. otherwise we'd be out of a job.
[20:55:44] <jepler> seb_kuzminsky: another for 2.7? http://emergent.unpythonic.net/files/sandbox/0001-vismach-work-around-a-bug-in-mesa.patch works for andypugh
[21:14:59] <jepler> busy day for pushing stuff to git.l.o
[21:18:05] <andypugh> It’s squashed all the bugs I knew of :-)
[21:20:26] <andypugh> Ah, except this one. I would not be at all surprised to find that today fixed this: http://sourceforge.net/p/emc/bugs/425/
[21:37:31] <jepler> there's sure not enough information in that bug report to test whether it is fixed or not
[21:37:46] <jepler> unless I'm overlooking attached config files or something
[21:41:13] <andypugh> Well, it’s using the lathe-fanucy config, and that works for me, so whatever is different is in his config.
[22:18:31] <linuxcnc-build> build #2685 of 4007.deb-precise-i386 is complete: Failure [4failed shell_1] Build details are at http://buildbot.linuxcnc.org/buildbot/builders/4007.deb-precise-i386/builds/2685 blamelist: Jeff Epler <jepler@unpythonic.net>, Sebastian Kuzminsky <seb@highlab.com>, Robert W. Ellenberg <rwe24g@gmail.com>, John Thornton <bjt128@gmail.com>
[22:49:27] <linuxcnc-build> build #3322 of 1300.rip-precise-i386 is complete: Failure [4failed compile runtests] Build details are at http://buildbot.linuxcnc.org/buildbot/builders/1300.rip-precise-i386/builds/3322 blamelist: Jeff Epler <jepler@unpythonic.net>
[22:53:34] <cradek> oops that last one looks like it might be real
[23:15:33] <linuxcnc-build> build #3323 of 1200.rip-lucid-i386 is complete: Failure [4failed compile runtests] Build details are at http://buildbot.linuxcnc.org/buildbot/builders/1200.rip-lucid-i386/builds/3323 blamelist: Jeff Epler <jepler@unpythonic.net>
[23:50:24] <linuxcnc-build> build #3333 of 0000.checkin is complete: Failure [4failed] Build details are at http://buildbot.linuxcnc.org/buildbot/builders/0000.checkin/builds/3333 blamelist: Jeff Epler <jepler@unpythonic.net>