#linuxcnc-devel | Logs for 2013-09-13

Back
[03:07:47] <cmorley> Skunkworks: I think you should mention your mach/linuxcnc interp findings in the maillist. Real word comparisons are useful.
[03:10:52] <cmorley> Jeff: yes I agree the holecircle error is obviously not the real problem, but the fix makes it useable. I added a comment to the source to mention why it was done in case someone wishes to investigate deeper.
[03:11:11] <cmorley> night
[07:31:50] <skunkworks> I may not understand what the P and Q does in G64. I thought they were separated and P was strictly path deviation while Q was how far segments could deviate and be combined together.
[07:33:16] <skunkworks> Q being the naive cam detector
[07:34:20] <skunkworks> but in my exaple above with the strait line - Just P seems to combine segments together. Maybe that is just a special case..
[08:47:21] <cradek> I tried reading the docs, but I don't trust them to have this exactly right
[08:48:12] <cradek> one is supposed to be blend tolerance (corner divergence) and the other is supposed to be NCD-joining path diversion tolerance
[08:48:43] <cradek> I bet if you specify P only, it uses it for both tolerances (or it should)
[08:48:50] <cradek> beware I haven't looked at the source...
[08:50:37] <skunkworks> heh - ok.
[08:50:48] <skunkworks> cradek, on vacation?
[08:52:23] <cradek> I was, but I'm back to my "normal" life now
[08:52:34] <skunkworks> heh - I though you seemed to disapear.. :)
[08:53:34] <skunkworks> camping?
[08:58:53] <cradek> nope, I stayed in a hotel for a week at shocking expense
[08:59:31] <cradek> went to denver
[08:59:47] <cradek> I just couldn't bear to drive the bus that far
[09:05:01] <skunkworks> Neat - is seb around there?
[09:05:14] <skunkworks> Hopefully not near the flooding
[09:37:06] <cradek> he's nearby in boulder, which is the center of the flooding, but he and his family are doing fine
[09:37:21] <cradek> I saw him tuesday night before we left, and it was sure raining hard
[09:48:32] <cradek> er, that was monday night, we left denver tuesday morning
[10:31:17] <archivist> skunkworks, for real comparison, can you actually measure the error from the current lines (peak error) that may show one or the other being better/worse
[10:32:00] <archivist> is one snake oil accuracy and one real
[10:32:57] <archivist> I can imagine not many can dynamically measure the path taken
[10:33:02] <skunkworks> well - I sort of believe mach... I know I know... because it peaks out the same velocity that linuxcnc does for a G2/3 circle the same diameter...
[10:36:07] <archivist> sort of a convincing argument...if they got the maths right and dont have some "that will be near enough no one will notice" maths
[10:36:40] <skunkworks> right
[10:39:21] <archivist> which ours could be too hence actual measurement, to see which one is correctly finding the merged curve
[10:39:41] <skunkworks> Hey - look - You can set the read-ahead in mach to 2 - it runs the same speed as linuxcnc...
[10:41:43] <archivist> which implies our lack of read ahead is a real problem
[10:41:56] <skunkworks> that is my conclusion...
[10:42:25] <skunkworks> it is interesting that it does run the same speed - must be using similar maths...
[10:43:01] <archivist> similar? what are you implying :)
[10:43:51] <skunkworks> heh
[10:45:04] <skunkworks> well - I think mach is qutie a bit older than the trajectory re-write chris did..
[10:46:05] <archivist> should just tell the other side one (or more I do not know) of us have reverse engineered code for court copyright case
[10:48:20] <skunkworks_> at a read ahead of 2 - mach runs the 500 linse of spiral just a tad slower than linuxcnc.
[10:48:27] <skunkworks_> (seconds)
[10:56:55] <skunkworks_> for that program - it looks like a read-ahead of about 9 make the outside diameter run at the maximum velocity
[11:00:03] <skunkworks_> Well that is neat. Someone just needs to do some programming ;) I am sure it is easy.
[11:00:28] <skunkworks_> (it must be because I don't understand it)
[11:01:50] <archivist> I find maths hard.....
[11:03:30] <skunkworks_> No - the work is all done - It already reads ahead 1 line segment. I am sure you just have to increase a number somewhere.
[11:03:38] <skunkworks_> ;)
[11:04:14] <archivist> yes dear
[11:06:01] <cradek> I recall someone tried to make cutter comp handle concave corners by just removing the error
[11:07:33] <skunkworks_> doesn't that work?
[11:07:53] <skunkworks_> onerror resume next
[11:08:13] <cradek> well it DID cause it to not error on the concave corners, so you might say yes it does work
[11:35:10] <jepler> does it give good results in a range of situations human operators can understand?
[11:40:47] <cradek> oh certainly not
[13:34:35] <andypugh> This is going to be fun to debug.
[13:35:07] <andypugh> 3 times out of 4 the SSI driver (plus associated changes) starts, runs and exits with no errors.
[13:44:22] <andypugh> the 4th time it crashes hard on exit. No error message (tail -f kern.log is running) but machine completely frozen into unplpug-the-psu reset requirement.
[13:44:44] <andypugh> I don't really know where to start :-)
[13:46:59] <andypugh> Ah, no, wait, there is a clue... Oddly it started up twice with no errors, then the third time this warning: http://pastebin.com/9rNZgN1Q
[13:47:25] <andypugh> However, it appears to run perfectly well after the warning, but have horrible problems on exit.
[13:50:39] <jepler> If I had to guess, something is writing to an out of bounds area of memory allocated by kalloc/kzalloc/krealloc. a subsequent attempt to allocate memory finds the housekeeping information to be corrupted, unable to find the size of the memory block being reallocated in this case
[13:51:44] <cradek> do you have a patch we can eyeball?
[13:53:11] <andypugh> Give me a moment
[13:53:20] <cradek> is hm2_absenc_parse_md new code?
[13:53:33] <cradek> ok
[13:54:55] <jepler> also useful to know whether it's necessary to start rt threads for the problem to occur
[13:55:34] <andypugh> I can run that test.
[13:57:00] <andypugh> This is the new file, though the actual changes are more extensive.
[13:57:01] <andypugh> http://pastebin.com/3ny6qvRt
[14:00:13] <cradek> is hm2->absenc.chans initialized to NULL?
[14:00:20] <cradek> ... before your first krealloc
[14:01:01] <andypugh> No.
[14:01:11] <cradek> I bet that's bad
[14:01:17] <andypugh> Ah, well, it might be.
[14:01:34] <cradek> you should check krealloc for failure too
[14:01:46] <cradek> (I'm assuming it works like realloc, which might be bad to assume)
[14:02:31] <andypugh> A few more clues, it managed to not quite die this time without realtime starting: http://pastebin.com/jqSBhDkk
[14:03:17] <andypugh> I think that the containing hm2 struct is zeroed.
[14:03:54] <andypugh> I think I need to figure out which krealloc is causing the trouble.
[14:04:19] <cradek> is chan->confs initialized to NULL before the first krealloc?
[14:05:21] <andypugh> Do I have to keep flags and kzalloc first time then krealloc?
[14:05:52] <cradek> I assume krealloc(NULL, ...) is the same as kmalloc(...)
[14:05:57] <andypugh> I think I am beginning to understand the question.
[14:06:24] <andypugh> krealloc may not know that this is a "new" struct.
[14:06:36] <cradek> it surely doesn't know anything
[14:06:58] <andypugh> I may be assuming too much of the memory management pixies.
[14:06:59] <cradek> mypointer = krealloc(uninitialized pointer, ...) is going to be crashy
[14:08:45] <cradek> also be aware that realloc CAN move the whole region in order to resize it, so you must not save pointers into it
[14:09:02] <cradek> I think krealloc *always* moves the region
[14:10:02] <cradek> http://lxr.free-electrons.com/source/mm/util.c#L110
[14:10:44] <andypugh> Can you elaborate on that last point?
[14:11:10] <cradek> a = krealloc(a, LARGER_SIZE)
[14:11:24] <cradek> after this, expect that a might have changed
[14:11:48] <andypugh> I need to think about whether I am doing that.
[14:11:51] <cradek> so if you had previously done p=a or p=&a[3] etc earlier, p will now be invalid
[14:13:10] <andypugh> pointers in the structs ought to still point to the same stuff, I think?
[14:13:41] <cradek> yes the data inside a will be unchanged
[14:14:16] <cradek> but pointers previously pointing to the data inside a will be invalid because a has moved
[14:15:06] <cradek> and after a=krealloc(a,...) always check a for being NULL
[14:15:21] <andypugh> This could all be quite hard, thinking about it.
[14:15:24] <cradek> I bet one of these three things is the problem :-)
[14:15:37] <cradek> yeah every use of realloc is hard, unfortunately
[14:15:50] <andypugh> a contains pointers to b with pointers to c.
[14:16:04] <cradek> but the CONTENTS of a are fine
[14:16:14] <cradek> ... if a is the thing you're reallocing
[14:16:44] <cradek> it's pointers INTO a, or copies of the a pointer itself, that become wrong
[14:17:15] <andypugh> I think I have that.
[14:17:36] <cradek> at first scan I don't think the region moving is your problem. I think a, initially not being NULL, is the problem.
[14:18:14] <cradek> I can see that krealloc(chan->confs is definitely uninitialized
[14:18:29] <cradek> you just need to init it near chan->num_confs=0
[14:19:02] <cradek> and krealloc(hm2->absenc.chans needs to be initialized somewhere else
[14:19:40] <andypugh> A complication is that this code may be called by three different module types that all get lumped together, so the code has to be re-enterable.
[14:20:33] <andypugh> so, the absenc.chans might first have 3 x SSI modules, then have a couple of BISS modules appended to the end, and so on.
[14:21:07] <andypugh> (that bit isn't there yet, it is just a feature I am trying to cover for)
[14:21:18] <cradek> that doesn't seem like a problem?
[14:22:01] <andypugh> Except that I need to be carefull not to null all my pointers the second time through the parse_md code
[14:22:34] <cradek> you should NULL hm2->absenc.chans before any of this is ever called
[14:22:56] <cradek> and you should NULL each new chan->confs when you make the new chan
[14:23:58] <andypugh> OK. I just realized that the first thing needs to happen way back up the tree in hostmot2.c
[14:26:06] <andypugh> Let's see how I get on. back in a bit.
[14:57:50] <andypugh> Well, early days yet, but it is looking rather better behaved
[15:06:47] <cradek> which fix? the NULL init?
[15:23:53] <andypugh> Yes
[15:25:01] <cradek> yay
[15:25:47] <andypugh> I might even risk re-enabling the clean-up code now.
[15:36:47] <andypugh> And that seems to work too.
[15:37:18] <andypugh> Clearly I am not safe to be let loose in a kernel without a memory manager nanny
[15:39:42] <andypugh> I have a feeling that given an arbitrary bit-length number I should either sign-extend it _or_ gray-code convert it?
[15:40:04] <andypugh> ie, I probably want to assume that gray-code is unsigned?
[15:42:11] <cradek> I don't understand what you're asking at all
[15:43:34] <andypugh> OK, now that the code isn't crashing, I have moved on the fact it isn't working :-)
[15:44:29] <andypugh> The encoder (well, actually, what I have is a laser rangefinder, but it was the cheapest SSI device on eBay that week)
[15:45:01] <andypugh> The output is an error bit in LSB then 24 bits of gray-code encoder counts.
[15:45:21] <andypugh> (Yes, I know, gray-code on serial is pointless).
[15:46:04] <jepler> To convert a gray code to an integer, you're going to use shift and mask operations. In C, use unsigned types when performing shift operations because with signed types the result of some shift operations is undefined.
[15:46:10] <andypugh> The "several fields of data in a bit-field" is so much like smart-serial that it is actually being passed to smart-serial for decoding.
[15:46:50] <andypugh> But smart-serial sign-extends its numbers.
[15:47:23] <andypugh> Because smart serial encoders are (known to be) signed.
[15:48:18] <andypugh> But I am not even sure if there is such a thing as signed gray-code, so I think I should skip the sign-extension in that case.
[15:48:33] <andypugh> jepler: And that might be part of it too.
[15:49:18] <jepler> well .. consider good old quadrature. It's a gray code. It isn't "signed" per se, but you can arrive at both positive and negative values depending on the change of the codes over time
[15:50:15] <andypugh> Yes, and I spent several evenings on that part with arbitrary bit-lengths in a 64-bit buffer.
[15:50:42] <jepler> I think you'd use that same logic on the integer you get from grey decoding
[15:50:57] <andypugh> The simulated index still looks wrong to me, but works.
[15:51:00] <jepler> e.g., if the old decoded value was 0 and the new decoded value is MAX then the net change is -1 counts
[15:52:25] <andypugh> Now I am confused, an American just wrongly spelt "Gray" as "grey" when they normally miss-spell "grey" as "gray" :-)
[15:53:28] <jepler> I don't know which way the color is spelled, and I don't know which way the code is spelled
[15:53:54] <cradek> I doubt the code is named after the color
[15:53:57] <jepler> no, I know it's not
[15:54:00] <cradek> the color can be spelled either way
[15:54:18] <jepler> > reflected binary code, also known as Gray code after Frank Gray, is ...
[15:55:17] <andypugh> The code was invented by a Frank Gray. (Not, as is often claimed, by Elisha Gray) http://en.wikipedia.org/wiki/Frank_Gray_(researcher)
[15:56:16] <andypugh> The colour grey can be spelt only one way :-)
[15:56:56] <andypugh> (I am trying to think of a more dense example of differences in US/UK orthography)
[15:57:30] <cradek> yeah I was aware of "color" and "spelled" as I typed that, and I did it on purpose...
[15:59:39] <jepler> what's the alternative -- "spelt"?
[15:59:47] <andypugh> Yes.
[15:59:52] <jepler> you know that's a grain, right?
[16:00:38] <andypugh> Yes, in fact the generic reply on rec.motorcycles a long time ago was "bulgar off" when that was mentioned.
[16:01:47] <jepler> anyway .. when you talk about "sign extending" you mean tracking the successive values so that you can reconstruct a count larger than the number of input bits, as long as it never changes by too much (half or more of the total counts) between reads?
[16:02:07] <jepler> if so, I'll reiterate that you should do the Gray code decoding and then that step
[16:02:56] <andypugh> No, I mean assuming that if the MSB of the N-bit buffer is set then all the rest to the top of the 64-bit buffer should be.
[16:03:13] <andypugh> The adding of deltas part comes next.
[16:04:14] <jepler> well then maybe you want to do the Gray code, then the sign extension, then the "adding of deltas" as you call it
[16:04:26] <andypugh> In the Smart-Serial case I know that I have an arbitrary bit-length 2s complement number.
[16:05:02] <andypugh> In the case of a gray-code value it is actually quite unlikely to be 2s-complement.
[16:05:55] <andypugh> (arbitrary but known bit-length, I should point out)
[16:08:53] <jepler> In your place I'd arrange this gray-decoding, sign-extending, and adding-of-deltas code so that I could build a userspace program that would serve as a test
[16:09:13] <andypugh> Context: In Smart Serial the counter data type can be any bit-length (though is typically 8 or 16) and is 2s complement notation. That code has been released and working for a year or so.
[16:09:32] <jepler> (for instance by putting it in a separate file which you can build as a two-file project that happens to live beside linuxcnc)
[16:11:40] <andypugh> In SSI (And BISS, Fanuc, whatever else PCW produces) there is a modparam like "error%1bencoder%24g" which means "1 bit of error value followed by 24 bits of Gray-coded encoder"
[16:14:21] <andypugh> I am vaccilating between "%24ge" / "%24e" and "%24g" / "%24e" as the way to do this, too. Is gray-code a modifier to be applied to any data type? Is it possible that the entire bit packet would be gray-coded?
[16:15:16] <andypugh> jepler: You might be shocked to learn that I have no idea at all how to compile code outside of LinuxCNC.
[16:17:03] <andypugh> Heck, I don't even know how to code in userspace. I am possibly the world's most limited C-coder.
[16:17:29] <jepler> gcc -o myprogram a.c b.c; ./myprogram
[16:17:37] <jepler> ^^^ basic super-simple thing
[16:18:00] <andypugh> And I have seen it done, to be honest. I just never have,
[16:25:23] <andypugh> The vast majority of the stuff I have done needs the rest of rtapi to work even a little bit. For one-off test code I rather like codepad.org
[16:59:46] <andypugh> S64 / double. Is that a built-in? I am getting funny results.
[17:00:20] <andypugh> *pin->float_pin = (pin->accum - pin->offset) / param->float_param;
[17:01:41] <andypugh> accum is -5760, offset is 0 (both S64) and float_param is 1.0. The result is 5.8e-273
[17:02:54] <andypugh> (sorry, if it matters, -5650)
[17:13:34] <andypugh> The minus is certainly bogus. This device doesn't do negative.
[18:28:17] <jepler> in that expression, s64 is converted to double, and then double/double is computed. I would have expected that we do s64->double conversions elsewhere. It usually turns into a single machine instruction inline (fild) instruction, so there's not much to go wrong
[18:34:10] <jepler> I assume in your real code you dereferenced accum and offset
[18:35:54] <jepler> I'd double check at the site of declaration that pin->float_pin has the type expected (hal_float * I suppose)
[18:36:44] <jepler> that nothing else is unexpectedly writing to that address
[18:36:53] <jepler> write *pin->float_pin = 3.1415; and see what happens
[20:53:04] <andypugh> Sorry, only just got back.
[20:54:54] <skunkworks> andypugh: I don't get your reply to the list
[20:54:57] <andypugh> The real code does not dereference accum and offset (they are not actually pins, just variables).
[20:56:02] <andypugh> I don't think that they need dereferencing, unless the fact that they keep moving about (krealloc) is a problem?
[20:56:37] <skunkworks> I was saying a circle with a radius of 2" runs at around 450ipm while a circle made up of line segments runs at about 100ipm. (in linuxcnc) but the mach planner will run the short line segment program also at 450ipm (aprox)
[20:58:37] <andypugh> For fun, can you cut the spiral at a very slow speed, then run Mach and LCNC through the same code and weigh the chips? (Is LinuxCNC rubbish, or is Mach sacrificing accuracy?)
[20:59:07] <andypugh> jepler: Writing an explicit idea is a good idea, let me try it.
[20:59:59] <andypugh> (value, first time round, I think that suggests that I thing 3 words ahead of my typing)
[21:06:04] <andypugh> So, I do get the right answer (3.14157) if I set the pin directly, so it is the arithmetic. (using an _accurate_ value of pi risks an unlucky coincidence :-)
[21:21:22] <skunkworks> (plus if I tell mach to only look ahead 2 segments - it runs the same as linuxcnc - around 100ipm)
[21:21:27] <skunkworks> andypugh: should you not be in bed?
[21:22:10] <andypugh> I should. I have a long drive tomorrow