#linuxcnc-devel | Logs for 2015-09-04

Back
[08:04:12] <skunksleep> Second to the last day of vacation. Stella was the only one catching fish. One after another. Bluegills
[08:18:41] <jepler> Runtest: 1 tests run, 1 successful, 0 failed + 0 expected
[08:18:41] <jepler> iteration 26627
[08:46:47] <jepler> https://c3.nasa.gov/dashlink/static/media/other/ObservedFailures37.html "commercial semiconductor road maps show component reliability timescales are being reduced to 5-7 years"
[08:57:38] <jepler> as usual, running instead on a heavily-loaded system provoked a failure relatively quickly
[08:57:44] <jepler> iteration 370
[08:57:44] <jepler> Running test: tests/hm2-idrom
[08:57:44] <jepler> *** tests/hm2-idrom: FAIL: test run exited with 1
[08:57:44] <jepler> Runtest: 1 tests run, 0 successful, 1 failed + 0 expected
[08:57:46] <jepler> Failed:
[08:57:49] <jepler> tests/hm2-idrom
[09:33:35] <seb_kuzminsky> ooh nice
[09:40:02] <jepler> so the evidence is destroyed because it continues with all the other runs after one fails
[09:40:12] <jepler> and now no failures on the next 700+ iterations after I tried to make it fail early
[09:40:25] <seb_kuzminsky> jeez, what kind of clown writes a test like that
[09:40:28] <jepler> - ./check-dmesg "${Error[$TEST_PATTERN]}" || result=$?
[09:40:28] <jepler> + ./check-dmesg "${Error[$TEST_PATTERN]}" || exit $?
[09:40:36] <cradek> oops
[09:41:35] <jepler> I looked at strace of rtapi_app, and it does the writes to stderr before it does the write to the file descriptor that tells halcmd the load is done
[09:41:40] <jepler> so the most obvious ordering problem is not it
[09:44:18] <jepler> unless write() from one process is not enough to make the data visible to open+read() from another process on the same system, scary thought
[09:51:21] <seb_kuzminsky> thanks for looking at it, jepler
[09:56:38] <jepler> sure thing
[09:56:54] <jepler> closing in on 1100 iterations on the loaded system without another failure
[10:26:42] <jepler> 1618, let's see what I get
[10:27:42] <jepler> aha
[10:27:45] <jepler> http://paste.ubuntu.com/12273760/
[10:27:49] <jepler> Waiting for component 'hm2_test' to become ready.hm2/hm2_test.0: invalid config name, got '', expected 'HOSTMOT2'
[10:29:10] <cradek> ??
[10:30:38] <jepler> hm but that shouldn't be enough to break the test, it's not using anchored searches
[10:32:50] <jepler> this message is missing
[10:32:50] <jepler> hm2_test: loading HostMot2 test driver with test pattern 14
[10:35:18] <jepler> so now I'm more puzzled, not less
[10:39:42] <jepler> .. under load, rtapi_print_msg(RTAPI_MSG_ERR can just disappear?
[10:40:45] <jepler> but on uspace that just boils down to vfprintf
[10:41:10] <jepler> I remain puzzled
[10:49:59] <seb_kuzminsky> jepler: maybe two threads are writing to the same file via different file descriptors, and one overwrites the others output sometimes?
[10:51:18] <jepler> it's certainly the case here that two processes wrote to halrun-stderr
[10:51:22] <jepler> "Waiting for component" is from halcmd
[10:55:07] <jepler> bbl
[11:12:54] <jepler> $ (echo 1 & echo 2 & wait; wait) > /tmp/foo; cat /tmp/foo
[11:13:15] <jepler> my mental model is something like: /tmp/foo has to contain 1, 2, and two newlines
[11:13:50] <jepler> I thought that was guaranteed by the O_APPEND open / fcntl flag, but only >> redirections set that flag
[11:14:02] <jepler> so maybe my expectation is wrong, because O_APPEND is not set.
[11:14:39] <jepler> the shell does set O_APPEND for >> redirection
[11:19:17] <jepler> so is 'make -j > foo' also broken and prone to loss of output?
[11:19:35] <cradek> andypugh: those are some utterly frightening selenium rectifiers
[11:30:52] <cradek> jepler: does your shell command sometimes fail?
[11:36:01] <jepler> cradek: not in a few tries, no
[11:36:16] <cradek> that would be pretty disturbing
[12:38:16] <seb_kuzminsky> i think both the echo commands, and all the children of 'make -j', and all the children of halrun (in the hm2 test) all write to the same file descriptor, so writes will be interleaved according to buffering settings and write timing, but nothing should be lost
[12:39:21] <seb_kuzminsky> but that's just, like, my opinion, man
[12:50:03] <seb_kuzminsky> the write(2) manpage says:
[12:50:13] <seb_kuzminsky> The adjustment of the file offset and the write operation are performed as an atomic step.
[13:13:45] <jepler> and file position is shared between processes?
[13:16:59] <seb_kuzminsky> i think it's per filedescriptor
[13:17:11] <seb_kuzminsky> and those are inherited by children
[13:19:01] <seb_kuzminsky> the fork(2) manpage confirms this belief
[15:43:56] <jepler> there are no seeks or the like in halcmd or rtapi_app as far as I can imagine
[15:45:10] <jepler> dup/dup2s also share position