#linuxcnc-devel | Logs for 2015-07-17

Back
[00:10:12] <CaptHindsight> wow
[00:21:59] <seb_kuzminsky> jepler: 6eb58550 hm2_eth: allow multiple instances (up to 4)
[00:22:20] <seb_kuzminsky> if an init_board() fails, num_boards will be wrong
[00:22:58] <seb_kuzminsky> chances are good the next thing that happens is hm2_eth_probe will get a partially-initialized board and fail, which will cause rtapi_app_main() to fail completely
[00:23:44] <seb_kuzminsky> the iflist kvlist of interfaces that have had their iptables rule installed is clever
[00:23:48] <seb_kuzminsky> megusta.jpg
[00:24:06] <seb_kuzminsky> the error0 target shouln't call hal_exit() because it must have been hal_init() that failed
[00:24:45] <seb_kuzminsky> 491e63fe hostmot2: support split reads
[00:25:39] <seb_kuzminsky> hm2_eth_enqueue_read()'s overloaded size argument is giving me a headache
[00:26:34] <seb_kuzminsky> the call chain of hm2_read_request() calling hm2_queue_read() caling llio->queue_read(size=-1) illustrates my confusion
[00:27:01] <seb_kuzminsky> hm2_queue_read() does not queue a read as its name implies, instead it drains already-queued read requests (by sending them)
[00:27:31] <seb_kuzminsky> it got its name because it calls the llioo's queue_read() in the magic way that means "dont queue a read, instead drain the read queue"
[00:28:22] <seb_kuzminsky> maybe new llio API functions could more cleanly (or less obscurely) take over the jobs of llio->queue_read(size=-1) and (size=-2)? llio->send_queued_reads() and llio->receive_queued_reads() maybe?
[00:28:43] <seb_kuzminsky> i'd say at least this size thing deserves a comment
[00:29:31] <seb_kuzminsky> i like that the way hm2_read_request() is split out from hm2_read() is nice and backwards compatible (i mean if .read_request doesn't get called)
[00:32:25] <seb_kuzminsky> 81ff3fba hm2_eth: in case of failed recv(), show an error
[00:32:58] <seb_kuzminsky> i have nothing nitpicky about names to say about this commit
[00:34:08] <seb_kuzminsky> it's nice that zeroing errno makes the error message in case of short packets sensible
[00:35:28] <seb_kuzminsky> 81ff3fba hm2_eth: in case of failed recv(), show an error
[00:35:32] <seb_kuzminsky> oops
[00:35:39] <seb_kuzminsky> 7be45d77 hm2_eth: make unrecognized boards work
[00:35:42] <seb_kuzminsky> that's the one
[00:36:35] <seb_kuzminsky> hm2_eth_probe boldly violates layers by parsing the IDROM, but i think the end justifies the means
[00:39:04] <seb_kuzminsky> ... and llio.ioport_connector_name[] is only used for printing, no hal pin generation, so ?? is no problem
[00:39:25] <seb_kuzminsky> this concludes my review
[00:39:28] <seb_kuzminsky> nice branch!
[01:03:34] <KGB-linuxcnc> 03Sebastian Kuzminsky 052.7 c8b7864 06linuxcnc 10docs/src/Master_Documentation_es.txt docs: fix an include file name in Spanish * 14http://git.linuxcnc.org/?p=linuxcnc.git;a=commitdiff;h=c8b7864
[01:31:52] <linuxcnc-build_> build #1453 of 1405.rip-wheezy-armhf is complete: Failure [4failed compile runtests] Build details are at http://buildbot.linuxcnc.org/buildbot/builders/1405.rip-wheezy-armhf/builds/1453 blamelist: Sebastian Kuzminsky <seb@highlab.com>
[01:41:48] <linuxcnc-build_> build #3275 of 0000.checkin is complete: Failure [4failed] Build details are at http://buildbot.linuxcnc.org/buildbot/builders/0000.checkin/builds/3275 blamelist: Sebastian Kuzminsky <seb@highlab.com>
[01:42:44] <seb_kuzminsky> huh, the test failure on armhf looks weird
[01:43:22] <seb_kuzminsky> it's supposed to print increasing integers that reset back to 1 when they get to 10 or so
[01:43:37] <seb_kuzminsky> but right in the middle it printed this instead:
[01:44:59] <seb_kuzminsky> http://paste.ubuntu.com/11891493/
[01:45:09] <seb_kuzminsky> stray 0 on line 17
[01:45:16] <seb_kuzminsky> maybe my u3 is getting flaky
[01:45:40] <seb_kuzminsky> it's approaching 1 year of uptime, maybe it wants a therapeutic reboot & cool-down
[01:46:03] <seb_kuzminsky> oh and there's a bug in threads.0's checkresult
[01:46:39] <KGB-linuxcnc> 03Sebastian Kuzminsky 052.7 f460494 06linuxcnc 10tests/threads.0/checkresult threads.0 test: report correct line number on error * 14http://git.linuxcnc.org/?p=linuxcnc.git;a=commitdiff;h=f460494
[01:55:56] <KGB-linuxcnc> 03Sebastian Kuzminsky 052.7 974d78e 06linuxcnc 10src/hal/utils/scope.c halscope: fix a printf format string * 14http://git.linuxcnc.org/?p=linuxcnc.git;a=commitdiff;h=974d78e
[01:56:17] <seb_kuzminsky> g'nite
[07:27:03] <jepler> seb_kuzminsky: I wouldn't be surprised if that failure represents a real, if hard to reproduce, problem.
[07:28:57] <jepler> seb_kuzminsky: the fast thread is executing "count++", which if course is not an atomic operation; it's implemented as a read-modify-write instruction sequence: register x = count; increment x; count = x;
[07:30:26] <jepler> hm, no, that doesn't really explain things
[07:30:57] <jepler> logging a "0" would happen if the reset in the slow thread happened after the increment in the fast thread
[07:31:12] <jepler> but I don't see which scenario allows the value to reset up to the expected one the next loop around
[07:35:51] <jepler> seb_kuzminsky: anyway I'll break out the queue_read function into 3 functions as requested. overloading it for one extra thing was not so bad (and anyway it was the design I inherited from micges' work) but I see that overloading it for 2 things is getting worse
[07:36:24] <jepler> I'll also double-check the error handling
[07:37:02] <jepler> .. I agree about the layering violation. a better solution would be to allow those fields which are not filled out in llio to be filled from idrom; for instance, if they have a placeholder value of 0 or -1
[07:37:37] <jepler> I tried to do this, but it didn't work out for some reason; I think some checking was done before idrom reading and I was hitting the first error checking.
[07:55:12] <skunkworks> zlog
[09:21:04] <seb_kuzminsky> doesn't arm's gcc have atomic intrinsics?
[09:25:17] <pcw_home> Captain, we must arm the atomic intrinsics!
[09:28:40] <cradek> why does it matter whether the fast thread uses an atomic increment? the fast thread shouldn't be interrupted - that's the whole point of the test.
[09:48:44] <seb_kuzminsky> agreed
[09:50:33] <cradek> I agree getting 8 afterward is the most baffling bit
[09:54:09] <seb_kuzminsky> it's not the first thing the armhf buildslave has said 0 when it meant something else
[09:54:12] <seb_kuzminsky> http://buildbot.linuxcnc.org/buildbot/builders/1405.rip-wheezy-armhf/builds/1100/steps/runtests/logs/stdio
[09:54:17] <seb_kuzminsky> search for limit3
[09:54:38] <seb_kuzminsky> then it goes back to what it's supposed to be doing like nothing happened
[09:55:16] <cradek> I bet it's a sampler bug
[09:55:32] <seb_kuzminsky> that only bites on arm?
[09:55:58] <cradek> mumble mumble
[09:57:00] <seb_kuzminsky> the sampler's queue is larger than the test output, it's not an overrun
[09:58:24] <cradek> I was just wondering that
[09:58:46] <cradek> how big is sampler's queue?
[10:01:15] <seb_kuzminsky> 4096, for that test
[10:01:34] <seb_kuzminsky> but looking at sampler.c, i think there's a bug in the queue allocation
[10:02:05] <cradek> I was just squinting at size =
[10:02:31] <cradek> what do you see?
[10:03:06] <seb_kuzminsky> nevermind
[10:04:47] <seb_kuzminsky> each slot in the queue holds a pin_data_t, which is a union of all the data types it can be
[10:05:07] <seb_kuzminsky> i was worried it was holding 32-bit things, but our floats are 64 bits now
[10:05:30] <seb_kuzminsky> but it's written so well it's immune to stupidity like that
[10:09:21] <KGB-linuxcnc> 03Sebastian Kuzminsky 052.7 ce8a5c5 06linuxcnc 10src/rtapi/uspace_rtapi_app.cc uspace: warn on errors in harden_rt() * 14http://git.linuxcnc.org/?p=linuxcnc.git;a=commitdiff;h=ce8a5c5
[11:17:18] <jepler> does the RT part of sampler accidentally say a sample is available before writing all the data for that sample?
[11:17:21] <jepler> that could explain this
[11:19:51] <cradek> I still don't understand the fifo handshake
[11:20:21] <jepler> or we are assuming too much about the ARM memory coherency model, and it actually does permit the write to fifo->in to become visible before the earlier store to dptr, from the viewpoint of another CPU
[11:22:05] <cradek> oh, the rt and usr are certainly on different CPUs aren't they
[11:22:10] <cradek> unlike the fast and slow rt threads
[11:36:41] <jepler> so, yes, we are asking too much of the ARM memory model
[11:36:56] <jepler> > For software programmers, considering the model at the Application level, the key factor is that for accesses to Normal memory barriers are required in some situations where the order of accesses observed by other observers must be controlled.
[11:38:55] <cradek> parse error at "for accesses to Normal memory"
[11:39:26] <jepler> the key factor is that (for acesses to 'Normal memory'), barriers are required in some situations
[11:39:30] <jepler> that's my parse
[11:40:23] <cradek> is big-N-Normal something defined nearby?
[11:40:27] <jepler> yes
[11:40:41] <jepler> basically, anything but memory-mapped I/O is Normal
[11:41:03] <cradek> what's a barrier?
[11:41:07] <seb_kuzminsky> we must assume intel-style memory access ordering all over our code
[11:41:21] <seb_kuzminsky> memory accesses are not reordered across barriers
[11:41:22] <archivist> barrier=a lock ?
[11:41:55] <seb_kuzminsky> ie: write0, barrier, write1 will always be percieved by other cpus as write0, write1, never as write1, write0
[11:41:56] <jepler> barriers are special instructions
[11:42:49] <seb_kuzminsky> memory access reordering made my blood run cold when i first learned of it
[11:42:51] <cradek> do you ever just get an uneasy feeling about software?
[11:42:57] <cradek> ok yes
[11:43:01] <seb_kuzminsky> heh
[11:43:20] <jepler> so in this case, I think a barrier would be required just before updating fifo->in, so that the writes to that 'line' of the fifo have to appear before the updated value of fifo->in appears
[11:44:45] <cradek> can you think of a way we could make it worse, for testing the barrier fix?
[11:46:02] <jepler> not offhand
[11:55:04] <jepler> I mean, assuming the task is "identify where barriers are needed for correct operation", no. probably you could write a standalone threaded program without barriers that more quickly showed up a problem
[11:56:13] <jepler> since you get a comparatively strong ordering guarantee on x86, this is just something we've taken for granted on arm
[11:56:30] <jepler> boy I've been looking at this documentation for 15 minutes and it's still clear as mud
[11:57:14] <jepler> > The second part of the definition of Group A is recursive. Ultimately, membership of Group A derives from the observation by Py of a load before Py performs an access that is a member of Group A as a result of the first part of the definition of Group A.
[11:57:19] <jepler> yeah thanks a lot
[11:58:00] <jepler> http://herbsutter.com/2012/08/02/strong-and-weak-hardware-memory-models/ boils it down to "man we hate this for writing efficient software" but is sparse on details unless you already know what terms like SC-DRF mean (I don't)
[11:58:10] <archivist> I sent a flame to ARM about one of their manuals and got told off by the boss at the time
[11:58:16] <jepler> hah
[12:04:10] <seb_kuzminsky> i just realized that cradek called it: it's a sampler bug that only bites on arm
[12:04:32] <jepler> yeah that's basically the position I'm coming to as well
[12:05:12] <seb_kuzminsky> and likely there are other bugs with the same underlying cause sprinkled liberally through our code
[12:05:20] <jepler> the other thing I can think of this affecting linuxcnc offhand is in the task/motion interface where we write a big buffer and the last(-ish) write to that buffer marks the fresh buffer as available and filled out for the other side
[12:05:37] <seb_kuzminsky> yep, and the ui/task interface
[12:05:51] <jepler> yeah so the nml shared memory stuff
[12:05:53] <jepler> ugh
[12:05:55] <seb_kuzminsky> and streamer
[12:05:59] <jepler> well yeah
[12:06:03] <seb_kuzminsky> and halscope
[12:06:10] <jepler> I have a patch set that unifies streamer/sampler because it's really all the same stuff
[12:06:14] <jepler> hm yeah
[12:06:19] <seb_kuzminsky> anything that uses shmem
[12:06:22] <seb_kuzminsky> of any sort
[12:07:02] <jepler> hal shared memory has never really promised an ordering between updating signal foo and signal bar..
[12:07:52] <jepler> I feel like it's a good thing we don't actively encourage users to use ARM systems, we keep running into pitfalls
[12:07:52] <seb_kuzminsky> but i bet some code relies on ordering for handshaking
[12:08:15] <seb_kuzminsky> encoder writes a latched position, then clears index-enable?
[12:08:28] <seb_kuzminsky> probe works similarly
[12:08:31] <jepler> oh yeah that would be a good example
[12:08:48] <cradek> but is it still a problem if it's just one cpu, ie all in realtime threads?
[12:09:16] <jepler> everything that runs on one CPU is not a problem
[12:09:24] <seb_kuzminsky> i think it's only an issue across cpus
[12:09:37] <cradek> so then it's only when the writer is in rt and the reader in userland, or the other way
[12:09:45] <cradek> ... or both are in userland
[12:09:46] <seb_kuzminsky> each cpu will think its memory accesses happened in the order it asked for
[12:09:55] <seb_kuzminsky> yep, all of uspace is vulnerable
[12:10:24] <cradek> in uspace are different rt threads on different cpus? I thought they weren't
[12:10:50] <seb_kuzminsky> hmm you may be right
[12:10:52] <jepler> so if all memory starts out zeroed, and you execute CPU0: [X]=1; [Y]=1; CPU1: r1=[X]; r2=[Y];
[12:11:52] <jepler> errrr
[12:12:00] <jepler> CPU1: r2=[Y]; r1=[X]
[12:12:17] <jepler> .. it's possible to get r2==1 [write to Y appears to have occurred] but r1==0 [write to X appears not to have occurred]
[12:12:46] <jepler> which, as far as I understand, isn't one of the possiblities on the x86 memory model
[12:13:04] <jepler> and the fix is for CPU0 to execute [X]=1; barrier; [Y]=1;
[12:13:24] <seb_kuzminsky> jepler: i think the behavior you described can be observed on weak memory ordering systems like arm, and can not be observed on strong memory ordering systems like i386 and amd64
[12:14:13] <seb_kuzminsky> and i agree with your fix
[12:16:40] <seb_kuzminsky> i opened the arm manual and saw this: http://goo.gl/ZX2n6O
[12:17:45] <jepler> actually in the "ARM Architecture Reference Manual (ARM ARM?) ARMv7-A and ARMv7-R edition" way down at appendix G it is greatly clarified with real-world examples
[12:18:00] <jepler> in fact G2.2 "Message Passing" is exactly what we do and what they say not to do
[12:18:40] <jepler> and the fix is given as a barrier on *both* sides (reader and writer)
[12:19:38] <cradek> I don't understand how you can barrier at the reader
[12:20:09] <seb_kuzminsky> to prevent reordering of the reads?
[12:20:21] <cradek> oh I see
[12:20:21] <jepler> yes, I think the read of R1 could be reordered above the read of R2
[12:22:37] <jepler> ugh the trick on the next page where the read-side barrier is replaced by an address dependency is super-gross
[12:22:54] <jepler> r2 = *(r3 + (r1 & 0))
[12:23:13] <seb_kuzminsky> now they're just fucking with us
[12:45:55] <jepler> yeah I agree
[12:46:22] <seb_kuzminsky> only a psychopath would write code like that in public
[12:46:44] <jepler> wellllllll
[12:47:03] <jepler> it depends on how harmful the barrier instruction is compared to those rather stupid instructions
[12:47:09] <jepler> for performance
[12:48:06] <seb_kuzminsky> no number of nanoseconds is worth that
[13:33:52] <cradek> I wonder what guarantees you get for line 3 if you use an implicit barrier like that
[13:35:04] <cradek> or would you have to write that for every read forever
[13:42:00] <jepler> "imagine N boots stomping concurrently on a programmer's face -- forever" -- not george orwell
[14:26:45] <jepler> wow the 96boards website is pretty special. on firefox it frequently takes >30s to load a page, and makes the rest of the browser unresponsive for that long.
[14:52:37] <skunkworks> well... I like are mini van...
[14:52:42] <skunkworks> our
[14:54:02] <mozmck> only thing I don't like about ours is that it's not diesel
[14:54:41] <skunkworks> yah.. not many of them
[14:54:48] <skunkworks> volkswagan
[14:55:23] <mozmck> I don't know if they have a diesel one. I read their van is actually made by dodge
[14:58:19] <skunkworks> we need 4wd so that limited us to toyota for the most part.
[14:59:30] <mozmck> I see. Ours is an '02 honda, and it's been great.
[15:03:18] <skunkworks> we had a subaru forester - with 2 kids and 2 adults - that is all you can fit. no taking grandma or others.
[15:03:48] <skunkworks> unless you stuff them in the back.. ;)
[16:40:46] <mozmck> having multiple identical keys in the ini file causes problems with some inifile parsers.
[16:42:58] <cradek> that's not too surprising. our ini file spec (haha) does allow that, though.
[16:44:07] <mozmck> yes. I found one library that can handle multiple keys, but it writes them back with each muliple key next to the other.
[16:44:16] <mozmck> EMBED_TAB_NAME = THC
[16:44:17] <mozmck> EMBED_TAB_NAME = THC Settings
[16:44:17] <mozmck> EMBED_TAB_LOCATION = thc_box
[16:44:17] <mozmck> EMBED_TAB_LOCATION = thc_settings
[16:44:29] <mozmck> Which I'm pretty sure won't work :)
[16:44:39] <cradek> why not?
[16:44:52] <cradek> you mean it reorders the lines?
[16:44:56] <mozmck> Yes.
[16:45:17] <cradek> the order of lines with matching keys matters, but otherwise I don't think order matters
[16:45:50] <cradek> I think queries are of the form (first,second,third) occurrence of (key) in (section)
[16:47:29] <mozmck> The two EMBED_TAB_COMMAND lines were next. So the first TAB_COMMAND would be associated with the first TAB_LOCATION?
[16:48:05] <cradek> I don't know how they're used exactly
[16:48:32] <mozmck> If so, it would probably work. I'm making a simple gui configurator, but I want to read and modify the ini and hal files directly instead of writing them new each time.
[16:48:37] <cradek> but I'm saying that I think if you don't change the order of *matching* keys within a section you don't change the meaning of the ini file
[16:49:23] <mozmck> I don't either. I may have to dig into that farther. I would prefer the keys the go together be grouped together though.
[16:52:00] <mozmck> huh, it worked!
[16:52:28] <cradek> haha
[16:52:33] <cradek> said like a true programmer
[16:52:47] <cradek> oh wait, that's "huh, it compiled!"
[16:53:34] <mozmck> I'm playing with https://github.com/brofield/simpleini
[16:54:12] <mozmck> the test file reads and re-writes the ini file, and I took that output and ran it in linuxcnc and it worked fine.
[17:08:33] <jepler> an ideal inifile modifier would somehow preserve all the layout choices of the original, except for the part that was requested to be changed
[17:14:24] <mozmck> Yes, I've been trying to find something like that. This one at least preserves the comments, which most don't do.
[17:15:19] <jepler> anybody getting e-mail from the lists today? My last mail was received Thu, 16 Jul 2015 11:22:46 -0500
[17:15:53] <jepler> and a message I sent at Fri, 17 Jul 2015 12:23:42 -0500 hasn't shown up yet
[17:16:08] <jepler> also I have trouble not typing "sourceforget"
[17:16:20] <mozmck> latest I have is about the same, from kirk
[17:16:37] <mozmck> sourceforge has been losing respect lately.
[17:18:10] <jepler> yeah the list service website is down http://https.downforeveryone.com/check.php?url=https://lists.sourceforge.net/
[17:18:19] <jepler> though the smtp service (at mx.sourceforge.net.) is up
[17:18:56] <jepler> > The sourceforge.net website is temporarily in static offline mode.
[17:18:56] <jepler> Only a very limited set of project pages are available until the main website returns to service.
[17:19:06] <jepler> (you get a 404 if you click the 'help' link on their front page)
[17:19:22] <jepler> > SourceForge down due to storage platform bug, working 24x7 on recovery and data validation, service restoral. Slashdot restored.
[17:20:12] <jepler> ah this brings back memories of that one time that cvs was broken for a month
[17:22:27] <mozmck> ouch!
[17:23:14] <mozmck> Just make sure they don't start distributing linuxcnc with adware/spyware in the installer :)
[17:43:59] <jepler> we use them for two things of consequence -- bug tracker and mailing lists
[17:44:17] <jepler> I think we could easily move the bug tracker to github
[17:44:21] <jepler> mailing lists are harder to move
[17:44:34] <jepler> whatever you do, you lose a lot of subscribers
[17:44:48] <jepler> and aside from google groups I don't know who is out there for free
[17:55:55] <mozmck> Yeah, I don't know. Kicad uses launchpad for the development list (and development)
[17:56:16] <mozmck> launchpad could get more interesting as they are switching to git.
[19:42:35] <cradek> http://linuxcnc.org/index.php/english/forum/18-computer/29429-the-weekest-link#60708
[19:43:31] <cradek> (although I'm suppressing a lot of comments about this post) I think people worry a lot more about latency numbers than is necessary, and I wonder if we could affect that somehow
[19:44:30] <cradek> I wonder if we could determine whether software stepgen is in use, and if not, relax that realtime-delay warning error a lot
[19:44:44] <cradek> er maybe presence of base thread?
[19:45:31] <cradek> I wonder if it's a software bug, documentation bug, or self-perpetuating culture bug
[19:47:11] <archivist> or, we admit it, and the dark side gloss over the issue
[19:47:48] <cradek> it's not just that
[19:48:02] <cradek> I see people saying they've worked on it for weeks and they can only get their latency down to 20000
[19:48:10] <cradek> which is JUST FINE even with software stepgen
[19:48:23] <cradek> I wonder what makes them do this
[19:48:31] <archivist> I have PCs that are about 22k at best
[19:49:41] <cradek> I guess I have fpga cards on all my machines now
[19:50:09] <cradek> if I had latencies of 50000 it would be fine except I might see the warningerror
[19:50:16] <archivist> I have no fpga and have to live within my means
[19:51:38] <PCW> I think you will only get a warning if the servo thread jitter is > 20% of the period (so 200 usec on a 1 KHz thread)
[19:52:04] <cradek> pretty sure that's not the rule
[19:52:27] <cradek> it's the variance between this servo cycle and the last several
[19:52:43] <cradek> but your rule really might be better
[19:54:03] <PCW> even 200 usec is OK if you retime the stepgen and encoder position sampling
[20:34:13] <PCW> archivist: I'll send you a FPGA card for free if you dont mind soldering 2 wires on it
[20:34:14] <PCW> (have a dozen or so first rev 7I80DBs that have the Ethernet EEPROM DI/DO wires swapped)
[20:34:16] <PCW> we patched some but that got old fast
[20:46:49] <jepler> ah that's what those wires are on the ones you sent cradek
[20:55:20] <PCW> if its 2 green wires, that what its is (I always make that DI/DO mistake)
[20:56:36] <PCW> probably why they use MOSI and MISO for SPI, harder to confuse
[21:52:49] <cradek> Shank diameter : 3mm (1/8 inch)
[21:57:14] <jepler> mumble mumble government work
[21:59:13] <cradek> I can't seem to find the tooling I need :-/
[21:59:28] <cradek> these diamond burrs work fine, but the shanks are undersized and so crappy they pull out of the collet
[22:00:01] <cradek> they're for a dremel or something
[22:01:27] <cradek> somehow even after two screwups I haven't ruined the work quite yet