Is Cobol holding you hostage with Math?

200 pointsby mbellottialmost 7 years ago

29 comments

simonbyrnealmost 7 years ago

There is a little more to it than "languages don't provide decimal or fixed point". Both Java and Python provide decimal implementations as part of their standard libraries, and languages don't provide fixed-point as it's largely trivial: "just do integer operations and scale them appropriately".The problem is that fixed point is ambiguous: there are multiple ways to do rounding (unlike floating point which has been largely standardised since the early 90s). In fact, the COBOL rounding rules are rather complicated: <a href="https://stackoverflow.com/a/30215718/392585" rel="nofollow">https://stackoverflow.com/a/30215718/392585</a>This has some interesting consequences:I know of an insurance company where there were difficulties trying to replicate the exact premium calculation (which was done on a mainframe in COBOL), part of which involved taking 1.01 to the power of a positive integer (which depended on various risk factors of the policy).COBOL was not really intended for numerical work, and so doesn't have a "pow" function (or at least this version didn't): instead, it turns out that the programmer had used a simple loop which would iteratively multiply a variable by 1.01, incurring round-off at each iteration. So the only way to emulate it exactly was to use the same arithmetic _and_ the same ugly hacks used in the original software.

评论 #17636640 未加载

评论 #17636912 未加载

评论 #17639330 未加载

hyperman1almost 7 years ago

I had many adventures interfacing other stuff with mostly MicroFocus COBOL. Some personal observations:* The COBOL knowledge shortage is a myth. I got pretty good at reading their code and giving the COBOL guys small bugfixes while most of the time not even having the compiler at my disposal. Learning COBOL the language is pretty easy, even if actually doing something with it is slow, verbose and bureaucratic. Training new devs will never be the problem if a company and a person were found willing.* But training is the problem, politically. Why would a dev learn a language that is a bad mark on the resume while being paid less? Why would management want to risk their career by pushing forward an evolutionary dead end?* And the language is a dead end. NLS in Cobol is really weak, which means it's a no-go for anything global. Fixed width everywhere causes massive technical dept as there is a tendency to 'cheap out on bytes', after which you're completely locked in by the required data migration everywhere. Cobol culture is rife with obsolete practices. One program equals one file (and some libs managed by other cobol devs) most of the time, so methods of a few thousand lines are the norm.* The real value of the COBOL devs I know is that they are business people first, technical second. They know their business, deep. They've seen everything and smell if an idea is good or bad. Because of this, they run rings around younger devs and analists when you look at business value, especially if said devs/analists are outside consultants that knknow basically nothing and just code what's been fed to them. While training for Cobol is easy, training for the actual business is close to impossible.* Another Cobol upside is being a technical dinosaur: The Enterprise Architecture Astronauts or bungee-boss CEO won't touch it with a 10 feet pole, so there is no new great architectural vision every other year. Lava flow architecture is not that huge a problem. Besides, your code base lived more than 40 years and is already ugly as hell. The technological investment and training cost have been made long ago. Cobol devs can just convert problem to program without weird technological surprises, integration hell, or architectural distractions.Don't get me wrong. I'm very glad I don't do COBOL. But its not burn-it-down-and-start-over-now bad either.

评论 #17644237 未加载

评论 #17639271 未加载

otabdeveloper2almost 7 years ago

Daily reminder: floats are emulated reals.Money values are rationals, not reals.Languages need to support the full number stack, including the rationals!"Floating point decimal" is wrong and the worst of both worlds. Unless you live in the USA, you need to do currency conversions anyways, and those aren't decimal.

评论 #17637328 未加载

评论 #17637313 未加载

评论 #17638726 未加载

评论 #17647435 未加载

jillesvangurpalmost 7 years ago

Java has BigDecimal in the standard library. That gives you arbitrary precision. Also, using a double instead of a float gives you a bit better precision. The performance hit of using double instead of float on modern hardware is not something that should be a showstopper.I don't get the argument about libraries and performance. We're talking about companies using emulators of ancient hardware to run decades old software. Pulling in some library is a complete non issue from a performance point of view.Speaking of hardware, that is magnitudes faster than anything imaginable when most cobol was first written. Performance is not the key concern here.

评论 #17637233 未加载

rossdavidhalmost 7 years ago

While much of this was interesting and perhaps true, I don't really think it's why COBOL is still being used. It's still being used because it works for cases where the "BO" part of "COBOL" is relevant, in domains where the potential downside of migrating are very, very large (if it goes badly), and the potential upside is actually pretty limited. Best case, your migration doesn't get noticed as causing any problems, and now your programmers keep leaving for other industries because their skills are more general. That's a whole lot of scary scenarios on one side of the scale, against a pretty meager upside on the other side of the scale.

评论 #17636439 未加载

jim_lawlessalmost 7 years ago

I'd like to point out that COBOL generates rather efficient fixed-point math code on IBM mainframes because those mainframes have a dedicated set of machine-level instructions that deal with fixed-point math.The data type used is "Packed Decimal" where each nybble in a string of bytes represents a digit, except the last nybble. The last nybble describes the sign of the overall number. It's similar to BCD with a sign-nybble at the end.Here's a list of the Packed Decimal instructions with a description of each.<a href="http://faculty.cs.niu.edu/~byrnes/csci360/notes/360pack.htm" rel="nofollow">http://faculty.cs.niu.edu/~byrnes/csci360/notes/360pack.htm</a>

评论 #17640757 未加载

评论 #17639273 未加载

dmitriidalmost 7 years ago

The main reason COBOL is still around is not math.The main reason is that business and domain logic exists only as COBOL logic. COBOL devs are ~60 on average, and many people who wrote system requirements are probably already dead. Existing code runs millions or billions of dollars worth of transactions often based on arcane financial rules and internal regulations. Good luck untangling that without stopping the business and without losing that functionality.Relevant links:- <a href="https://uk.reuters.com/article/uk-usa-banks-cobol-idUKKBN17C0DZ" rel="nofollow">https://uk.reuters.com/article/uk-usa-banks-cobol-idUKKBN17C...</a>- Reverse engineering a factory: <if someone can find a link to this fascinating story, please help me :)>

Animatsalmost 7 years ago

<pre><code> PIC 9(3)V9(15). </code></pre> is really shorthand for<pre><code> PICTURE 999V999999999999999. </code></pre> Fixed point numbers are declared like that.COBOL is a decent language for business logic. Especially when money amounts are involved. Certainly better than PHP.

评论 #17637435 未加载

mastaxalmost 7 years ago

C#/.NET has the built-in decimal type which is technically floating point, but it has a 96-bit mantissa and a base-10 exponent which makes it behave similarly to fixed point numbers.

评论 #17636338 未加载

评论 #17639777 未加载

slaymaker1907almost 7 years ago

The statement about Decimal being an import is a garbage, misleading statement. Decimal IS PART OF THE STANDARD LIBRARY!!!! It might not be in the global context, but that is a far cry from having to install an external library which this article makes it seem like.Also, performance wise COBOL might be faster on a single machine, but good like trying to scale it out. Plus, a distributed architecture can make it easier to deploy software and hardware upgrades since you can generally take out a few nodes without bringing down the whole system.I’m not saying all this code should be rewritten. If you have some code out in the wild that works, you should always weigh carefully the risks in terms of cost and potential new bugs before doing a rewrite. However, lack of fixed point is a lousy excuse to not upgrade.

评论 #17636343 未加载

评论 #17636287 未加载

评论 #17638873 未加载

评论 #17636292 未加载

评论 #17637647 未加载

评论 #17636799 未加载

评论 #17636679 未加载

tonysdgalmost 7 years ago

Couldn't you at least port your COBOL to C? It's also statically typed, compiled, there are high-performance libraries that support arbitrary levels of numeric precision (GNU GMP), and every engineer trained in the last 20 years at least has some C experience.

评论 #17636896 未加载

评论 #17638102 未加载

评论 #17636795 未加载

ashton314almost 7 years ago

I'm curious to see how languages with support for rational types would handle this problem. (E.g. Scheme, Clojure, Haskell) Seems to me that they would eliminate the round-off problem entirely for a great number of commonly-faced applications in the finance world. (So easy to use `x/100`.)

评论 #17637351 未加载

评论 #17636904 未加载

评论 #17643442 未加载

评论 #17637095 未加载

zengidalmost 7 years ago

I'm doing an internship at a logistics company, and we're building projects in .NET. They're keeping us away from the Cobol, but part of me really wants to learn it. There are a lot of Senior Devs who only work in Cobol, and they're getting close to retiring. I feel like it might be a good trade to be an expert in Cobol and .NET, since most enterprises are going to have some of both.Any Cobol free-lancers hanging out here? Whats you're job perspectives like? Are you filthy rich?

评论 #17638484 未加载

评论 #17638437 未加载

le-markalmost 7 years ago

Good article, very informative. You used to hear the "can't do the math" excuse a lot around y2k with respect to migrating these systems to something (anything?) else. You don't hear as much nowadays. I think a lot of that boiled down to properly handling rounding, which is non trivial as other posters have mentioned.A lot of commenters here are missing the point about decimal library support. The point is not that language x does or doesn't have some sort of support for decimal math, the point is it's not native support. Even if language x supports a decimal type, back by a byte[] (for example) there is still function/method call overhead for basic operations (+-/*). For high volume stuff she's talking about, it adds up, fast.I think there are a few languges with similar native decimal support; ada and pl/1 iirc.Cobol is an albatross that's going to be with us for a loooong time to come.

评论 #17638212 未加载

kwccoinalmost 7 years ago

Confusion about many of the comments.COBOL can be used for online system and performance wise a 16 MB (M not G) even for a small mainframe (PC on P5!) can support 1000 uses easily. Whilst it is totally dated, no one support and IBM charge a lot - None is related to cobol, or cics or ims ...For the floating point part ... not using those. Most of the attention is to handle and agree upon how to do exact calc inclding reminder. No rounding per sec in the system. Nothing lose. Not even one cent. Hence, floating point ...And change it to c, ada etc. It is English programming language. The hard part is to translate and test. And explain to use decmical as exact number.... lots of project has successfully migrated. But unless you get that right - cobol is not slow and it is an exact number computation with in depth business know-how, good luck.

sampoalmost 7 years ago

# Short versionThe Muller’s Recurrence is a mathematical problem that will converge to 5 only with precise arithmetics. This has nothing to do with COBOL and nothing to do with floating and fixed point arithmetics as such. The more precision your arithmetics has, the closer you get to 5 before departing and eventually converging to 100. Python's fixed point package has 23 decimal points of precision by default, whereas normal 64bit floating point has about 16 decimal points. If you increase your precision, you can linger longer near 5, but eventually you will diverge and then converge to 100.# Long versionWhat's going on here is that someone has tried to solve the roots of the polynomial<pre><code> x^3 - 108 x^2 + 815 x - 1500, </code></pre> which is equal to<pre><code> (x - 3)(x - 5)(x - 100). </code></pre> So the roots are 3, 5 and 100. We can derive a two-point iteration method by<pre><code> x^3 = 108 x^2 - 815 x + 1500 x^2 = 108 x - 815 + 1500/z x = 108 - (815 - 1500/z)/y </code></pre> where y = x_{n-1} and z = x_{n-2}. But at this point, we don't know yet whether this method will converge, and if yes, to which roots.This iteration method can be seen as a map F from R^2 to R^2:<pre><code> F(y,z) = (108 - (815 - 1500/z)/y, y). </code></pre> The roots of the polynomial are 3,5 and 100, so we know that this map F has fixed points (3,3), (5,5) and (100,100). Looking at the derivative of F (meaning the Jacobian matrix) we can see that the eigenvalues of the Jacobian at the fixed points are 100/3 and 5/3, 20 and 3/5, 1/20 and 3/100.So (3,3) is a repulsive fixed point (both eigenvalues > 1), any small deviation from this fixed point will be amplified when the map F is applied iteratively. (100,100) is an attracting fixed point (both eigenvalues < 1). And (5,5) has one eigenvalue much larger than 1, and one slightly less than 1. So this fixed point is attracting only when approached from a specific direction.Kahan [1, page 3] outlines a method to find sequences that converge to 5. We can choose beta and gamma freely in his method (Kahan has different values for the coefficients of the polynomial, though) and with lots of algebra (took me 2 pages with pen and paper) we can eliminate the beta and gamma and get to the bottom of it. What it comes down to, is that for any 3 < z < 5, choose y = 8 - 15/z, and this pair z,y will start a sequence that converges to 5. But only if you have precise arithmetics with no rounding errors.For the big picture, we have this map F, you can try to plot a 2D vector field of F or rather F(x,y) - (x,y) to see the steps. Almost any point in the space will start a trajectory that will converge to (100,100), except (3,3) and (5,5) are stable points themselves, and then there is this peculiar small segment of a curve from (3,3) to (5,5), if we start exactly on that curve and use exact arithmetics, we converge to (5,5).Now that we understand the mathematics, we can conclude:Any iteration with only finite precision will, at every step, accrue rounding errors and step by step end up further and further away from the mathematical curve, inevitably leading to finally converging to (100,100). Using higher precision arithmetics, we can initially get the semblance of closing in to (5,5), but eventually we will reach the limit of our precision, and further steps will take us away from (5,5) and then converge to (100,100).The blog post is maybe a little misleading. This has nothing to do with COBOL and nothing to do with fixed point arithmetics. It just happens that by default Python's Decimal package has more precision (28 decimal places) than 64bit floating point (53 binary places, so around 16 decimals). Any iteration, any finite precision no matter how much, run it long enough and it will eventually diverge away from 5 and then converge to 100.Specifically, if you were to choose floating point arithmetic that uses higher precision than the fixed point arithmetic, then the floating point would "outperform" the fixed point, in the sense of closing in nearer to 5 before going astray.[1] <a href="https://people.eecs.berkeley.edu/~wkahan/Math128/M128Bsoln09Feb04.pdf" rel="nofollow">https://people.eecs.berkeley.edu/~wkahan/Math128/M128Bsoln09...</a>

评论 #17638183 未加载

评论 #17637986 未加载

sanxiynalmost 7 years ago

Delphi has Currency type, and I heard that it is largely responsible for Delphi's hold on financial software.

codeisawesomealmost 7 years ago

Wow. I didn't even finish reading the whole post, but just the beginnings of the post gave me a much better functioning intuition about how floating point works! Thank you Marienne!

incompatiblealmost 7 years ago

If it's decimal currency, why not process amounts as integer cents?

评论 #17639797 未加载

评论 #17636404 未加载

评论 #17636800 未加载

tzsalmost 7 years ago

I recently went through our Perl, Python, PHP, and JavaScript code making sure sales tax and VAT calculations were right, particular when the sale amount and tax rate were both in floating point (damn legacy code...). During the course of this, I found some good test cases. They are given below.Let R = the tax rate x 10000, in a jurisdiction where the tax rate is an integral multiple of 0.0001. Note that R is an integer.Let A = the sale amount x 100, in a jurisdiction where prices are an integral multiple of 0.01. I.e., in the US, A is the sale amount in cents. Note that A is an integer.Let T = the tax * 100, or in US terms, the tax in cents. Note that T is an integer.If you can arrange to keep your amounts in cents and your rates in the x 10000 form (or whatever is appropriate for the jurisdiction), then you only need integers and things are simple:<pre><code> def tax(A, R): T = (A * R + 5000)//10000 return T </code></pre> You probably have to go from cents to dollars somewhere, such as when informing the user of prices, taxes, and totals. I believe that integer/100, rounded to 2 places in all cases and printed in all of the above languages will be correct, but I kind of cheated and for display results I treated it as a string manipulation problem, not a number problem (which also takes care of making sure amounts less than $1 have a leading 0, and that multiples of 0.1 have a trailing zero) [1].If you don't have the amount and rate in the nice integer forms above, but rather have them in floating point such as you get from parsing a string like 12.34 (for a price of $12.34) or 0.086 (for a tax of 8.6%), here are three functions to return the tax in cents that might seem reasonable, and you might think are properly handling rounding:<pre><code> def tax_f1(amt, rate): tax = round(amt * rate,2) return round(tax * 100) def tax_f2(amt, rate): return round(amt*rate*100) def tax_f3(amt, rate): return round(amt*rate*100+.5) </code></pre> Alas, they are all flawed.<pre><code> input f1 f2 f3 ------------- --- --- --- 1% of $21.50 21 22 22 ( 22 is right) 3% of $21.50 65 64 65 ( 65 is right) 6% of $21.50 129 129 130 (129 is right) 10% of $21.15 211 211 212 (212 is right) </code></pre> It does work to convert from floating point to the x 100 and x 10000 integer form, and then use the integer function given earlier:<pre><code> def tax_f4(amt, rate): amt = round(amt * 100) rate = round(rate * 10000) tax = (amt * rate + 5000)//10000 return tax def tax_f5(amt, rate): amt = int(amt * 100 + .5) rate = int(rate * 10000 + .5) tax = (amt * rate + 5000)//10000 return tax </code></pre> Both of those are right in the test cases above, and I believe in all other cases (well, all other cases where everything is positive...). For Python I've done brute force testing of all combinations of amount = 0.01 to 25.00 in steps of 0.01 and rate = 0.0001 to 1.0000 in steps of 0.0001 to verify that.I've also done a brute force C test that involved sscanf(..., "%lf",...) of strings of the form "0.ddd...ddd" where the 'd' are decimal digits, and there are up to 9 of them. In all cases multiplying the resulting double by 10^k, where k is the number of digits after the decimal point and called round() on that gave the correct integer. Assuming that Python, PHP, etc., are using IEEE 754 when they do floating point, the results should be the same in all of those, which is why I believe that tax_f4 and tax_f5 should work for all cases, not just the ones I actually tested in the Python brute force test.I did another C test, over the same range as the sscanf test, to verify that given an integer I, in the range [0, 10^k] for positive k up to 9, if you computer (double)I/10^k, then multiply that by 10^k and round(), you get back I.My conclusions (assuming IEEE 754 or something that behaves similarly):1. It is OK to store money values and tax rates in floating point, at least as long as you have 9 or fewer digits after the decimal point. Just avoid doing calculations in this form.2. Converting from a floating point representation to an integer x 10^k representation by doing a floating point multiply by 10^k and a round to nearest integer works, at least as long as k <= 9.3. sscanf '%lf', and I expect most reasonable string to float parsers, if applied to a floating point number of the form 0.ddd... with up to 9 digits after the decimal point, will work as expected, in the sense that they will give you a floating point number that when converted to integer x 10^k representation as described in #2 will give you the integer you expect and want.4. I did not do any tests of floating point amounts that had large integer parts. With a large enough integer part, the places where I mention k <= 9 above might need to have that 9 lowered.[1] e.g., in PHP:<pre><code> function cents_to_dollars($cents) { $cents = strval($cents); while (strlen($cents) < 3) $cents = '0' . $cents; return substr($cents,0,strlen($cents)-2) . '.' . substr($cents,-2); }</code></pre>

dwheeleralmost 7 years ago

Ada also has decimal fixed point built in, without the overheads the author is worried about. But I agree with the author of that built-in easy and efficient support of decimal arithmetic is not so common.

wedesoftalmost 7 years ago

Many languaged have rational numbers (fractions) and big integers in their stack. This has the advantage that one has full control over where and when to trade off performance against accuracy.

stevew20almost 7 years ago

You really should right a new opening line Marianne; fractions are legit, and tons of people like them. Like most people who can do simple division without a calculator... Starting off with "no one likes fractions" is not a way to make friends or impress people.

nitwit005almost 7 years ago

It's not exactly difficult to just write your own fixed point library in Java. I'm sure it'd be tricker to get exactly the same behavior as the old code, but it still doesn't seem like it should be a huge time sink on a large project.

eyphkaalmost 7 years ago

Have found similar to be true in my experience with COBOL, and older banking systems that rely on COBOL.

krickalmost 7 years ago

Does Rust have solid support for fixed-point decimals, BTW? Would be quite a selling point, I suppose.

评论 #17640190 未加载

评论 #17645934 未加载

crb002almost 7 years ago

Time for COBOL to get it's Elixir.

dmeadalmost 7 years ago

Marianne, I'd like to chat with you about this article not in public. Whats the easiest way to do that?

flossballalmost 7 years ago

Isn't this more that floating points are not the solution for everything and most 'computer scientists' have little clue. Posits for the win (within a larger range of win at least)!