#linuxcnc-devel | Logs for 2014-01-07

Back
[00:06:19] <seb_kuzminsky> linuxcnc-build: force build --branch=unified-build-candidate-3 checkin
[00:06:25] <linuxcnc-build> The build has been queued, I'll give a shout when it starts
[00:50:44] <linuxcnc-build> build #13 of package-rtpreempt-wheezy-source is complete: Failure [4failed install-missing-build-dependencies] Build details are at http://buildbot.linuxcnc.org/buildbot/builders/package-rtpreempt-wheezy-source/builds/13
[01:11:44] <linuxcnc-build> build #1647 of checkin is complete: Failure [4failed] Build details are at http://buildbot.linuxcnc.org/buildbot/builders/checkin/builds/1647
[01:11:47] <linuxcnc-build> build #1648 forced
[01:11:47] <linuxcnc-build> I'll give a shout when the build finishes
[02:43:19] <linuxcnc-build> build #14 of package-rtpreempt-wheezy-source is complete: Failure [4failed install-missing-build-dependencies] Build details are at http://buildbot.linuxcnc.org/buildbot/builders/package-rtpreempt-wheezy-source/builds/14
[02:53:29] <linuxcnc-build> build #11 of deb-precise-xenomai-binary-amd64 is complete: Failure [4failed apt-get-update shell_2] Build details are at http://buildbot.linuxcnc.org/buildbot/builders/deb-precise-xenomai-binary-amd64/builds/11
[02:53:57] <linuxcnc-build> build #11 of deb-precise-xenomai-binary-x86 is complete: Failure [4failed apt-get-update shell_2] Build details are at http://buildbot.linuxcnc.org/buildbot/builders/deb-precise-xenomai-binary-x86/builds/11
[02:54:41] <linuxcnc-build> build #11 of deb-precise-rtpreempt-binary-x86 is complete: Failure [4failed apt-get-update shell_2] Build details are at http://buildbot.linuxcnc.org/buildbot/builders/deb-precise-rtpreempt-binary-x86/builds/11
[02:54:44] <linuxcnc-build> build #11 of deb-precise-rtpreempt-binary-amd64 is complete: Failure [4failed apt-get-update shell_2] Build details are at http://buildbot.linuxcnc.org/buildbot/builders/deb-precise-rtpreempt-binary-amd64/builds/11
[02:56:39] <linuxcnc-build> build #1284 of package-rt-hardy-source is complete: Failure [4failed making debian source package] Build details are at http://buildbot.linuxcnc.org/buildbot/builders/package-rt-hardy-source/builds/1284
[03:06:35] <linuxcnc-build> build #1648 of checkin is complete: Failure [4failed] Build details are at http://buildbot.linuxcnc.org/buildbot/builders/checkin/builds/1648
[03:20:14] <linuxcnc-build> build #1276 of deb-precise-sim-binary-i386 is complete: Failure [4failed shell_3] Build details are at http://buildbot.linuxcnc.org/buildbot/builders/deb-precise-sim-binary-i386/builds/1276
[03:21:02] <linuxcnc-build> build #1275 of deb-precise-sim-binary-amd64 is complete: Failure [4failed shell_3] Build details are at http://buildbot.linuxcnc.org/buildbot/builders/deb-precise-sim-binary-amd64/builds/1275
[03:21:45] <linuxcnc-build> build #105 of deb-precise-rt-binary-i386 is complete: Failure [4failed shell_3] Build details are at http://buildbot.linuxcnc.org/buildbot/builders/deb-precise-rt-binary-i386/builds/105
[03:23:58] <linuxcnc-build> build #1270 of deb-lucid-sim-binary-i386 is complete: Failure [4failed shell_3] Build details are at http://buildbot.linuxcnc.org/buildbot/builders/deb-lucid-sim-binary-i386/builds/1270
[03:40:14] <linuxcnc-build> build #1270 of deb-lucid-sim-binary-amd64 is complete: Failure [4failed shell_3] Build details are at http://buildbot.linuxcnc.org/buildbot/builders/deb-lucid-sim-binary-amd64/builds/1270
[03:41:59] <linuxcnc-build> build #1271 of deb-lucid-rt-binary-i386 is complete: Failure [4failed shell_3] Build details are at http://buildbot.linuxcnc.org/buildbot/builders/deb-lucid-rt-binary-i386/builds/1271
[09:48:10] <seb_kuzminsky> linuxcnc-build: force build --branch=master checkin
[09:48:11] <linuxcnc-build> build #1649 forced
[09:48:11] <linuxcnc-build> I'll give a shout when the build finishes
[11:14:34] <linuxcnc-build> Hey! build checkin #1649 is complete: Success [3build successful]
[11:14:34] <linuxcnc-build> Build details are at http://buildbot.linuxcnc.org/buildbot/builders/checkin/builds/1649
[12:04:31] <seb_kuzminsky> linuxcnc-build_: force build --branch=unified-build-candidate-3 checkin
[12:04:36] <linuxcnc-build_> The build has been queued, I'll give a shout when it starts
[12:04:56] <linuxcnc-build_> build #1650 forced
[12:04:56] <linuxcnc-build_> I'll give a shout when the build finishes
[12:26:03] <andypugh> #udoo
[12:26:14] <andypugh> (Sorry)
[12:30:26] <skunkworks> The nurve!
[12:30:31] <skunkworks> heh
[12:30:49] <skunkworks> nerve
[12:55:33] <linuxcnc-build_> build #1646 of lucid-i386-sim is complete: Failure [4failed compile runtests] Build details are at http://buildbot.linuxcnc.org/buildbot/builders/lucid-i386-sim/builds/1646
[13:02:04] <linuxcnc-build_> build #1650 of checkin is complete: Failure [4failed] Build details are at http://buildbot.linuxcnc.org/buildbot/builders/checkin/builds/1650
[13:34:13] <seb_kuzminsky> that lucid-amd64 failure looks real
[13:35:43] <seb_kuzminsky> here's the output of the test that failed:
[13:35:48] <seb_kuzminsky> http://pastebin.ca/2531438
[13:36:06] <seb_kuzminsky> that's with unified-build-candidate-3, and it's not very well repeatable
[13:37:06] <seb_kuzminsky> linuxcnc-build_: force build --branch=unified-build-candidate-3 checkin
[13:37:07] <linuxcnc-build_> build #1651 forced
[13:37:07] <linuxcnc-build_> I'll give a shout when the build finishes
[14:00:42] <skunkworks> cradek, how did the emc1 trajectory planner work? did it use parabolic blending also? do you remember?
[14:21:57] <mhaberler> seb_kuzminsky: see back on the devlist on 'rate monotonic scheduler' why this test fails
[14:22:18] <mhaberler> this is really a defective spec, so the test is supposed to fail sometimes
[14:23:02] <mhaberler> this test tries to assert that particular behavior, but that was never implemented and thread systems vary in their behavior
[14:24:07] <mhaberler> I dont think that test actually makes any sense, it just happens to work at times
[14:26:25] <mhaberler> Dec 15 2012: 'sim' threading behaviour, HAL/RTAPI thread behavior spec
[14:27:39] <mhaberler> I think the change came about by replacing gnu pth by pthreads
[14:29:39] <cradek> am I correct that this test result means the slow thread sometimes preempts the fast thread?
[14:31:17] <mhaberler> could be - if it were rate monotonic, a say 3xfaster thread would have to execute exactly three times before the slow thread gets its turn
[14:31:46] <mhaberler> I am not aware of any scheduling option to enforce that; the only way I see how this can be achieved is with a single thread
[14:32:01] <cradek> that is not what the test is testing according to my reading
[14:32:25] <cradek> you said in your email that it expects 1..10 1..10 and that is not what I see in checkresult
[14:32:46] <cradek> it expects 1..n 1..n where n can vary
[14:33:09] <cradek> seb's error is that we're seeing 1..n, 0, 1..n
[14:33:53] <cradek> which I think means the slow thread interrupts the fast, which if so, probably violates assumptions in existing components that use fast and slow threads, like stepgen
[14:34:41] <mhaberler> note you do see this on 'sim' aka posix
[14:35:40] <linuxcnc-build_> build #3 of deb-wheezy-rtpreempt-binary-amd64 is complete: Failure [4failed shell_3] Build details are at http://buildbot.linuxcnc.org/buildbot/builders/deb-wheezy-rtpreempt-binary-amd64/builds/3
[14:36:11] <cradek> isn't that a threading model that people also use to control hardware?
[14:36:53] <cradek> I think line 628 of that paste is one of the detected errors
[14:36:57] <mhaberler> no, it's what used to be simulator mode - with posix threads all bets are off anyway wrt timing
[14:37:15] <cradek> but this is not about timing, it's about preemption
[14:37:27] <mhaberler> can you point to a spec?
[14:38:06] <cradek> instead, can you tell me whether you think configurations that control hardware have this same preemption problem?
[14:38:47] <mhaberler> what I can tell you is that I have seen this fail on sim/posix only
[14:39:40] <cradek> ok, so we don't know
[14:40:57] <mhaberler> not really. there was an intent, but none of the thread systems actually support that behavior, and preemption is one possible result: http://www.linuxcnc.org/docs/devel/html/man/man3/hal_create_thread.3hal.html
[14:40:58] <seb_kuzminsky> i've never seen it fail in sim in pre-ubc3 branches
[14:41:18] <mhaberler> HAL assigns decreasing priorities to threads that are created later, so creating them from fastest to slowest results in rate monotonic priority scheduling.
[14:41:38] <cradek> yeah, I was just reading that same thing
[14:41:45] <linuxcnc-build_> build #12 of deb-precise-xenomai-binary-amd64 is complete: Failure [4failed apt-get-update shell_2] Build details are at http://buildbot.linuxcnc.org/buildbot/builders/deb-precise-xenomai-binary-amd64/builds/12
[14:41:46] <mhaberler> yes, that happened
[14:41:51] <linuxcnc-build_> build #12 of deb-precise-xenomai-binary-x86 is complete: Failure [4failed apt-get-update shell_2] Build details are at http://buildbot.linuxcnc.org/buildbot/builders/deb-precise-xenomai-binary-x86/builds/12
[14:42:33] <cradek> so this is a violation of the expectations documented in hal_create_thread
[14:42:47] <mhaberler> no, because that was never implemented
[14:42:49] <cradek> and more importantly, enshrined in at least code
[14:42:55] <cradek> what was never implemented?
[14:43:09] <mhaberler> "…o creating them from fastest to slowest results in rate monotonic priority scheduling." is definitely not implemented,
[14:43:23] <mhaberler> if it were, then preemption were in violation
[14:43:38] <linuxcnc-build_> build #12 of deb-precise-rtpreempt-binary-x86 is complete: Failure [4failed apt-get-update shell_2] Build details are at http://buildbot.linuxcnc.org/buildbot/builders/deb-precise-rtpreempt-binary-x86/builds/12
[14:43:42] <linuxcnc-build_> build #12 of deb-precise-rtpreempt-binary-amd64 is complete: Failure [4failed apt-get-update shell_2] Build details are at http://buildbot.linuxcnc.org/buildbot/builders/deb-precise-rtpreempt-binary-amd64/builds/12
[14:43:45] <cradek> to my knowledge, we have never had a slow thread able to preempt faster threads
[14:44:01] <cradek> are you saying you think we have?
[14:45:01] <mhaberler> I dont understand - we have what?
[14:45:12] <cradek> had slow threads preempting fast threads
[14:45:53] <mhaberler> AFAICT none of the underlying thread systems give a guarantee of non-preemption, so I dont see where that could be coming from
[14:46:35] <mhaberler> strike out, mean posix and gnu pth
[14:47:01] <cradek> it seems unlikely that we've just been lucky so far...
[14:48:01] <mhaberler> I think it's pretty likely in fact because there was no change in the sim threads library for a long time
[14:48:55] <mhaberler> probably a good question to ask jmk
[14:48:55] <cradek> this is not very important, compared to needing a guarantee that we don't have this problem in setups that control hardware. do you know that?
[14:49:48] <mhaberler> what problem do you think has been introduced?
[14:50:15] <cradek> I'm 100% sure it's been institutional knowledge all along that threads are only interrupted by faster threads. you're right that jmk would be the authority.
[14:50:31] <mhaberler> the RTAI thread semantics is exactly unchanged, the Xenomai and RT-PREEMPT thread support mirrors that
[14:50:43] <cradek> that's good news
[14:51:15] <mhaberler> actually I would have to read up on non-preemption in the RT cases, it's been very long since
[14:52:14] <mhaberler> yes I think asking jmk would be a good idea. Which assumption in stepgen were you referring to?
[14:52:44] <cradek> that's a much harder question to answer without study
[14:53:22] <cradek> pretty sure you could look through the fast handler and see places it would be bad if it was interrupted by the slow handler
[14:53:57] <cradek> isn't it at least a problem that you would make the fast handler overrun terribly?
[14:54:44] <mhaberler> overrun.. what?
[14:54:55] <cradek> it seems really obvious to me that you don't want the slow interrupting the fast!
[14:55:20] <cradek> if the slow period is two months and it takes one month to run, you don't want to interrupt the millisecond period thread to run it, only the other way around
[14:55:28] <linuxcnc-build_> Hey! build checkin #1651 is complete: Success [3build successful]
[14:55:28] <linuxcnc-build_> Build details are at http://buildbot.linuxcnc.org/buildbot/builders/checkin/builds/1651
[14:58:15] <mhaberler> The question IMO is if there is more than one variable set from servo which is consumed by base and which can impact behavior, and that access needs to be atomic (which is not a given). Single scalars are atomic anyway.
[15:01:51] <cradek> I don't agree that you must find an example of that assumption in order to call this a problem, but feel free to look, I think you will find one.
[15:04:03] <cradek> the expectaion has been documented in hal_create_thread since the first version of that manpage, and the statement was made stronger by jmk in 2007/05
[15:06:34] <linuxcnc-build_> build #4 of deb-wheezy-rtpreempt-binary-amd64 is complete: Failure [4failed shell_3] Build details are at http://buildbot.linuxcnc.org/buildbot/builders/deb-wheezy-rtpreempt-binary-amd64/builds/4
[15:07:28] <mhaberler> the answer is simple: RT-PREEMPT uses http://man7.org/linux/man-pages/man2/sched_setscheduler.2.html SCHED_FIFO which according to my reading guarantees non-preemption; this is skipped in the Posix flavor because it requires elevated privileges
[15:07:58] <cradek> ok, that's good news
[15:09:19] <mhaberler> so it's either non-setuid, fiddling privs, or having preemption in sim
[15:09:21] <linuxcnc-build_> build #1278 of deb-precise-sim-binary-i386 is complete: Failure [4failed shell_3] Build details are at http://buildbot.linuxcnc.org/buildbot/builders/deb-precise-sim-binary-i386/builds/1278
[15:09:53] <mhaberler> https://github.com/mhaberler/linuxcnc/blob/unified-build-candidate-3/src/rtapi/rt-preempt.c#L353
[15:10:08] <linuxcnc-build_> build #1277 of deb-precise-sim-binary-amd64 is complete: Failure [4failed shell_3] Build details are at http://buildbot.linuxcnc.org/buildbot/builders/deb-precise-sim-binary-amd64/builds/1277
[15:10:46] <linuxcnc-build_> build #107 of deb-precise-rt-binary-i386 is complete: Failure [4failed shell_3] Build details are at http://buildbot.linuxcnc.org/buildbot/builders/deb-precise-rt-binary-i386/builds/107
[15:13:04] <linuxcnc-build_> build #1272 of deb-lucid-sim-binary-i386 is complete: Failure [4failed shell_3] Build details are at http://buildbot.linuxcnc.org/buildbot/builders/deb-lucid-sim-binary-i386/builds/1272
[15:27:32] <cradek> oh, gnu pth is totally non-preemptive and sim_rtapi_run_threads runs things proportionally
[15:27:47] <cradek> no wonder it worked
[15:28:23] <andypugh> yes
[15:28:32] <andypugh> Wrong window
[15:28:44] <linuxcnc-build_> build #1272 of deb-lucid-sim-binary-amd64 is complete: Failure [4failed shell_3] Build details are at http://buildbot.linuxcnc.org/buildbot/builders/deb-lucid-sim-binary-amd64/builds/1272
[15:30:42] <linuxcnc-build_> build #1273 of deb-lucid-rt-binary-i386 is complete: Failure [4failed shell_3] Build details are at http://buildbot.linuxcnc.org/buildbot/builders/deb-lucid-rt-binary-i386/builds/1273
[15:32:25] <seb_kuzminsky> cradek: does it run all "threads" in one actual thread?
[15:32:44] <cradek> yes
[15:33:32] <seb_kuzminsky> then that failure is even more perplexing
[15:33:37] <cradek> wait
[15:33:47] <cradek> this is before ubc3, when we used gnu pth for sim
[15:34:31] <seb_kuzminsky> did it change in ubc3? does ubc3 use a posix thread for each hal thread now?
[15:34:37] <cradek> post-ubc3 I understand it's multiple threads concurrently (even on multiple CPUs) which is what makes it breaky
[15:34:42] <cradek> yes that's my understanding
[15:34:52] <seb_kuzminsky> oh
[15:37:22] <cradek> if this is only a problem in sim, we should just disable the tests in sim that test assumptions that are wrong (ratios, preemption)
[15:38:00] <cradek> if we need those assumptions to be right, maybe we can go back to using pth
[15:39:36] <cradek> I have a really uneasy feeling: that all of hal will be broken in a way that rarely shows up, in sim mode (or if any of the other modes don't guarantee ordering)
[15:50:26] <zultron> Hey guys, drive-by comment (I gotta run): There was some discussion of this about a year ago, IIRC. Had to do with whether threads ought to be 'harmonic' or something (I forget the term).
[15:51:45] <zultron> pth thread semantics are pretty different, and I think you've got it right about the preemption.
[15:52:07] <cradek> I'm not immediately finding what harmonic would mean
[15:52:16] <zultron> Hold on, I'm looking for that email.
[15:52:40] <cradek> do you know - was it for a fundamental reason that pth was replaced by pthreads?
[15:54:20] <zultron> This looks related:
[15:54:22] <zultron> http://sourceforge.net/mailarchive/forum.php?thread_name=50CD4CA4.3010009%40pico-systems.com&forum_name=emc-developers
[15:55:36] <zultron> Yeah, it was a choice between the separate ordeal of porting the pth 'sim' threads to the new RTOS and making the small changes to rt-preempt to not worry about RT.
[15:56:18] <cradek> oh I see, so no fundamental reason, it's just that a new kind of sim "fell out of" code you were already having to write
[15:56:18] <zultron> rt-preempt is basically POSIX threads with some additions for RT_PREEMPT support.
[15:56:27] <zultron> Yes.
[15:57:12] <cradek> oh here's the answer from jmk to the question we were thinking of asking him
[15:57:22] <cradek> you're darn good at searching archives
[15:58:27] <zultron> We found that test was breaking on POSIX threads, but it seemed like the assumption that other threads systems would operate like pth was wrong. That led to the email thread above, which I've forgotten the meat of (need to re-read).
[15:59:10] <cradek> well there are two reasons it can break, and I think the thing it's been testing changed over time
[15:59:33] <cradek> I think it was testing jmk#2 but is now testing jmk#3. I think jmk#3 is much more important for correct operation of hal
[15:59:35] <zultron> Ah, 'monotonic', not 'harmonic'.
[15:59:57] <cradek> jmk#2/3 from http://sourceforge.net/mailarchive/message.php?msg_id=30235135
[16:00:17] <cradek> if I understand correctly, jmk#3 means the same as monotonic
[16:00:42] <zultron> Oh yeah, I'm starting to remember. But, gotta run, see y'all!
[16:00:43] <cradek> looks like ubc3-sim is NOT monotonic today
[16:02:10] <cradek> ubc3-rt-preempt had better be, but I don't know how to test it
[16:12:27] <seb_kuzminsky> the current threads.0 test test for monotonicity, and it passes on the rtpreempt build slaves
[16:12:49] <seb_kuzminsky> it's not a perfect test, obviously...
[16:14:48] <cradek> it would be nice to put a time-eating thing in the "fast" thread (which doesn't have to be very fast) to make a bigger window
[16:15:53] <cradek> I'm not sure how to code a time-eater though
[16:23:04] <PCW> skunkworks: the problem looks like a hm2_eth driver problem, nothing to do specifically
[16:23:06] <PCW> with the rest of your setup, and for whatever reason it seems to only happen on 7I80HD configs
[16:23:08] <PCW> (probably a off by one error somewhere)
[16:39:53] <skunkworks> PCW, awesome
[16:42:00] <PCW> probably micges needs to seed the code with some cookie reads to find where its gone astray
[16:44:03] <skunkworks> (well - that it wasn't me anyway... - so do the other boards work with my motherboard?)
[16:52:57] <skunkworks> PCW, ^
[17:11:56] <skunkworks> *motherboard, 12.04 install
[17:25:56] <PCW> I actually did not test because I had the same symptoms with your FPGA card and a mesa FPGA card with the same config
[17:27:05] <skunkworks> oh
[17:29:04] <PCW> so pretty much has to be a driver bug (7I80 Ethernet code is the same in all)
[17:51:30] <skunkworks> I had the same issue with whatever config was loaded in it initally also
[17:54:35] <skunkworks> PCW, I think this was the initial config http://pastebin.ca/2460941?srch=7i80
[17:56:43] <PCW> Yeah I suspect it has to do with the exact number of modules/pins or something like that
[18:19:14] <skunkworks> ah
[20:23:11] <skunkworks> I hope it is an easy fix. does micges have a 7i80?
[20:26:31] <skunkworks> he does.. http://www.youtube.com/watch?v=n6DdWQ25Ur8
[20:27:29] <skunkworks> why oh why didn't he see that? :)
[20:43:51] <Tom_itx> skunkworks did you figure out your problem?
[20:49:48] <skunkworks> Peter did
[22:17:26] <skunkworks> looks like the forum is down
[22:17:44] <skunkworks> Database Error: Unable to connect to the database:Could not connect to MySQL
[22:19:38] <cradek> perhaps related to this 20-minute-old problem? http://www.dreamhoststatus.com/
[22:24:46] <skunkworks> could be!
[22:35:35] <Tom_itx> i got that yesterday as well
[23:12:04] <CaptHindsight> www.linuxcnc.org Database Error: Unable to connect to the database:Could not connect to My SQL
[23:14:30] <CaptHindsight> how often do they perform maintenance during the week?
[23:18:21] <CaptHindsight> "This is the third time in 12 months that this has happened" affecting all of the US-West (Irvine) Datacenter