# Intro and Problem Statement

Hello everyone, I thought it would be ideal to start with Project Euler Problem #16 as a follow up to my first post on my Arbitrary Precision Calculator, PowArray. If you haven’t yet checked out that first post yet, I invite you to do so! I will be posting a lot of the code that went into the making of PowArray such as my solutions to the problem I will be discussing in this post and PE #20. ** You are free to use all the code posted in this section in your own projects provided that you provide appropriate attribution as per the Creative Commons Attribution 4.0 International license. See the “About This Site” section for more details.** While I haven’t made the PowArray source code open source, a lot of the code that I will be sharing will contain some of the key pieces which would allow one to recreate an Arbitrary Precision Calculator functionally identical to mine.

Moving on, let me introduce Project Euler problem # 16:

Power digit sum2

^{15}= 32768 and the sum of its digits is 3 + 2 + 7 + 6 + 8 = 26.What is the sum of the digits of the number 2

^{1000}?

# The Approach

I’ll spare you the details of my initial approach, and leave you to the “About This Site” section if you are curious. However, I will describe as best as I can the approach which led me to the correct answer. Hopefully I’ve stated enough times on this site that I would never be able to settle with the trivial and lazy “one liner library” approach, as the purpose of Project Euler is not to teach one how to use libraries and functions which others have created, but to pique one’s curiosity to try and understand the mathematics and underlying concepts behind a problem. Thus I refrained from using Java’s very convenient BigInteger library and decided to construct my own from scratch. A nice challenge indeed. Now folks, here’s my attempt at explaining how I was able to solve this problem.

# The Solution in a Nutshell

So the idea here was basically to first take in some user input consisting of two ints, one for the base and one for the exponent (I wanted to test the algorithm for any arbitrary base and exponent at the time). The next step would be to calculate the number of digits in the result and create an array of ints (or any integer primitive data type. The ideal primitive would be byte, since you would only need the numbers from 1-9 in each cell. Moreover, this would consume the least memory) with the same length as the number of digits to be expected in the final result. After instantiating the array, a value of one would be assigned to the last cell in the array. Now here’s where the tricky part comes, implementing your own BigInteger operations. So in this particular scenario, all you would need is a simple single digit multiplication algorithm that treated the array and its cells as a whole unit (each cell consists of one digit, read from left to right, just like on paper). Essentially, this would involve having carryover and product variables which would store the result of a a multiplication between the base and a single digit (one cell in the array) as well as the carryover to the next multiplication. The individual operations would involve arithmetic multiplications and taking division remainders. After performing the exponentiation, the next step in the process would be to find the sum of the elements in the array and print the answer. This would be one of the easiest steps in the process, as it could be done trivially with a for loop and a counter variable.

# Calculating the Number of Digits in *x*^{n}

^{n}

So let’s start off with the calculation of the number of digits in the exponentiation. The derivation for the formula that will be used is as follows:

# Getting User Input and Translating Math into Java

Now we can start talking Java. Putting together the first bit, the program asks for user input (base x and exponent n), then calculates the number of digits in *x ^{n}*. Next, the array holding the product is declared and is instantiated with 1 in its last index. Then we declare our digit product and carryover variables. We will also need something to keep track of how many zeroes are scanned in the calculation loop with a counter variable to avoid scanning the entire array at each iteration, as there will be many zeroes to sift through. Finally we need our index variable which will keep track of the current index of the array the program is operating on.

private static Scanner in; public static void main(String[] args) { // ---GET BASE AND EXPONENT FROM SCANNER INPUT--- System.out.println("Enter a positive integer base of an exponent to calculate (from one to ten)"); System.out.println(); in = new Scanner(System.in); int x = in.nextInt(); System.out.println(); System.out.println("Enter a positive power of " + x + " to calculate"); System.out.println(); double n = in.nextDouble(); double digits; // ---CALCULATION FOR NUMBER OF DIGITS--- digits = Math.floor(n * Math.log10(x) + 1); // ---VARIABLE DECLARATION FOR EXPONENTIATION ALGORITHM--- int[] pow = new int[(int) digits]; pow[(int) digits - 1] = 1; int digprod = 0; int carryover = 0; int zerocount = 0; int i = (int) digits - 1; }

# The Exponentiation Algorithm

Next comes the real deal, the multiplication algorithm loop which will carry out the exponentiation. Once it is finished, the answer will be stored in our int array. Unfortunately if I were to explain every single line of code in this snippet, this blog post would become a novella. Instead, I’ve annotated this section with brief comments.

// ---EXPONENTIATION ALGORITHM FOR CALCULATING X^N--- // loop executes n (exponent) number of times for (int a = 0; a < n; a++) { // Check if 100 leading zeroes have been counted and if the // index variable has reached the // beginning of the array. If both are true, perform the next iteration while (zerocount < 100 && i >= 0) { // Get the product of the base and the digit // stored in the current index. digprod = pow[i] * x; // Get the mod 10 of the product and add it to carryover. // Take the mod 10 of the whole expression. // This will ensure that no matter how large the product is // (it will always be between 0 - 81), only a single digit // will be stored in each cell. pow[i] = (digprod % 10 + carryover) % 10; // The carryover will simply be the tens digit of the product plus // the previous carryover. carryover = (digprod + carryover) / 10; // Check if all that remains in the array are leading zeroes. // Zerocount will increment by 1 every time 4 consecutive // zeroes are found. This only applies if the base is not equal // to ten. if (i > 2 && pow[i] == 0 && pow[i - 1] == 0 && pow[i - 2] == 0 && pow[i - 3] == 0 && x != 10) { zerocount++; } // Shift the array index down by one for the next iteration. i--; } // Reset the values of the carryover, zerocount and the index variable // for the subsequent multiplication. carryover = 0; zerocount = 0; i = (int) digits - 1; }

# Icing the Cake

Finally, we write our little summation loop and print out the result. I’ve included a little extra code which will count the number of occurrences of each digit in the answer as well.

// ---CALCULATE SUM OF DIGITS--- public static int sum(int[] a) { int summation = 0; for (int q = 0; q < a.length; q++) { summation += a[q]; } return summation; } // ---PRINT OUTPUT - THE EXPONENTIATION RESULT, THE SUM OF THE DIGITS, // THE NUMBER OF DIGITS AND THE OCCURRENCES OF EACH DIGIT--- System.out.println(); System.out.println(x + " to the power of " + (int) n + " is: "); System.out.println(); int charcount = 0; int dignums[] = new int[10]; for (int c = 0; c < pow.length; c++) { System.out.print(pow[c]); charcount++; if (charcount == 180) { System.out.println(); charcount = 0; } if (pow[c] >= 0 && pow[c] <= 9) dignums[pow[c]]++; } System.out.println(); System.out.println(); System.out.println("The number of digits in the result is: " + pow.length); System.out.println(); System.out.println("The sum of the digits of the result is: " + sum(pow)); System.out.println(); System.out.println("# Zeroes: " + dignums[0]); System.out.println("# Ones: " + dignums[1]); System.out.println("# Twos: " + dignums[2]); System.out.println("# Threes: " + dignums[3]); System.out.println("# Fours: " + dignums[4]); System.out.println("# Fives: " + dignums[5]); System.out.println("# Sixes: " + dignums[6]); System.out.println("# Sevens: " + dignums[7]); System.out.println("# Eights: " + dignums[8]); System.out.println("# Nines: " + dignums[9]);

# Piecing Together the Solution

Putting it all together, we have:

package projectEuler; import java.util.Scanner; public class PowerDigitSum { private static Scanner in; // ---CALCULATE SUM OF DIGITS--- public static int sum(int[] a) { int summation = 0; for (int q = 0; q < a.length; q++) { summation += a[q]; } return summation; } public static void main(String[] args) { // ---GET BASE AND EXPONENT FROM SCANNER INPUT--- System.out .println("Enter a positive integer base of an exponent to calculate (from one to ten)"); System.out.println(); in = new Scanner(System.in); int x = in.nextInt(); System.out.println(); System.out.println("Enter a positive power of " + x + " to calculate"); System.out.println(); double n = in.nextDouble(); double digits; // ---CALCULATION FOR NUMBER OF DIGITS--- digits = Math.floor(n * Math.log10(x) + 1); // ---VARIABLE DECLARATION FOR EXPONENTIATION ALGORITHM--- int[] pow = new int[(int) digits]; pow[(int) digits - 1] = 1; int digprod = 0; int carryover = 0; int zerocount = 0; int i = (int) digits - 1; // ---EXPONENTIATION ALGORITHM FOR CALCULATING X^N--- for (int a = 0; a < n; a++) { while (zerocount < 100 && i >= 0) { digprod = pow[i] * x; pow[i] = (digprod % 10 + carryover) % 10; carryover = (digprod + carryover) / 10; if (i > 2 && pow[i] == 0 && pow[i - 1] == 0 && pow[i - 2] == 0 && pow[i - 3] == 0 && x != 10) { zerocount++; } i--; } carryover = 0; zerocount = 0; i = (int) digits - 1; } // ---PRINT OUTPUT - THE EXPONENTIATION RESULT, THE SUM OF THE DIGITS // THE NUMBER OF DIGITS AND THE OCCURRENCES OF EACH DIGIT--- System.out.println(); System.out.println(x + " to the power of " + (int) n + " is: "); System.out.println(); int charcount = 0; int dignums[] = new int[10]; for (int c = 0; c < pow.length; c++) { System.out.print(pow[c]); charcount++; if (charcount == 180) { System.out.println(); charcount = 0; } if (pow[c] >= 0 && pow[c] <= 9) dignums[pow[c]]++; } System.out.println(); System.out.println(); System.out.println("The number of digits in the result is: " + pow.length); System.out.println(); System.out.println("The sum of the digits of the result is: " + sum(pow)); System.out.println(); System.out.println("# Zeroes: " + dignums[0]); System.out.println("# Ones: " + dignums[1]); System.out.println("# Twos: " + dignums[2]); System.out.println("# Threes: " + dignums[3]); System.out.println("# Fours: " + dignums[4]); System.out.println("# Fives: " + dignums[5]); System.out.println("# Sixes: " + dignums[6]); System.out.println("# Sevens: " + dignums[7]); System.out.println("# Eights: " + dignums[8]); System.out.println("# Nines: " + dignums[9]); } }

# Runtime Complexity

My attempt at evaluating the complexity of the algorithm for calculating the n^{th} power of x: The complexity is in polynomial time and is: , as the nested while loop within the for loop executes until there are 100 leading zeros found after the last digit has been multiplied. Therefore the complexity of the nested while loop is approximately . The complexity of the for loop is , since it executes n times. I plan on improving this algorithm in the near future and implementing exponentiation by squaring or binary splitting to reduce the complexity significantly.

# Wrapping Up

In summary, this program should take in user input, namely a base from 1 to 10 and an exponent (less than 2^{31}-1), perform the exponentiation and return the answer including the sum of the digits and the occurrences of each digit. I hope you enjoyed this post!

OK, nice.

So far π

Working with decimal number safes you the hassle of base conversion, admitted, but it is extremely slow if you use only one digit at time. With 32 bit signed integers you can use any power of 10 lower than 2^15 without bit-juggling. The first such power is 4 and it is 9 for 64 bit respectively.

It is a bit more work to extract the individual digits but not much and if you use the smaller base (10^4) you can even make a table with ten thousand entries such that you can get the digit-sum of the individual big-digits in O(1).

I would also part the multiplication from the exponentiation.

The exponentiation of x^n as is needs O(n) multiplications. It can be done with O(log n) multiplications by way of multiplication by squaring.

Pseudocode (assuming WordPress’ bbcode works in comments, too) for a recursive version:

See also the Wikipedia entry for this method which has more pseudocode and talks a lot about the different variations of it.

BTW: squaring can be done faster than multiplying because you have a lot of similar small products. It is faster only by a constant but it gets quite significant for larger numbers!

And while you are at it: why not implement a complete BigInteger library? You may take a look at Tom St Denis’ book which is freely available[1] on the net, just search for a file named tommath.pdf. It has all the necessary things in it to build one. Some are amiss, like FFT multiplication, fast division, logarithms, and roots (it is meant as a basis for cryptography, no need for such things in crypto) but it is a very good start; highly recommended!

Another thing I saw: you are using only one data type for your integers, so the whole bunch of

can be cut down to the single line

It might be good style to check if the integer at

`pow[c]`

is inside the range [0-9].“Will never happen! Never!” is one of the most frequently used last words of programmers! π

[1] it has been put by the author into Public Domain explicitly.

LikeLike

Hi Cristoph,

Thank you very much for your reply.

Agreed. What exactly do you mean by using one digit at a time? Are you referring to the fact that my algorithm performs the calculation by operating on a single index in the array at any given time? Is there a way to do this using multithreading or is there an even simpler way?

OK, I’m a little confused here. Why are you referring to 2

^{15}as an upper bound? In Java the max value of a signed int is 2^{31}-1. So do you mean take the highest power of 10 below the max value of a signed int/long? So essentially what you’re saying is to work in a larger base system? I thought of this earlier, and this is something I might do in the future.Great idea, I will consider implementing this. Just out of curiosity, why stop at squaring? Why not extend to cubing, or any power x? π

Hi,

(Thsi is the comment I had problems with, so let’s see if it gets published twice)

The inner loop computes only one decimal digit per round.

Yes, you can use multiple threads to do it but the overhead is way too large. It is more profitable to keep as much data as possible in the L1 cache to speed up the multiplication of native integers. That means that it is better to make the individual limbs (that’s how the big-digits get called in big integer libraries) as big as possible. It is quite common to use the largest native integer for that task.

Yes.

As said above it is common to use the largest native integer for it but that needs a base conversion for the output and some bit-juggling for multiplying because the magnitude of the result is obviously larger than the biggest native integer.

A way to avoid the conversion is to use base 10 limbs and a way to avoid the bit-juggling is to use only half of the maximum such that the result of a multiplication is always smaller than the native integer.

Hence my suggestion to use 10^4 for 32 bit native integers (2 * 4 = 8 and the largest possible power of ten is 9 for 32 bit) and 10^9 for 64 bit. Java has 64 bit integers (I don’t know how they handle it on 32 bit machines and how fast/slow that is) so I would propose to use 32-bit integers for the limbs, 10^9 as the decimal size, and 64 bit integers for the intermediate results.

One large disadvantage: all of the descriptions of algorithms for arbitrary precision assume binary bases, so you have to work it out all on your own. Not the worst thing to do for a student but long division will get a bit tricky to get right and sufficiently fast. Feel free to ask if you get stuck.

Not any power of x, it must be a positive integer larger than one.

You stop at squaring because the overhead of cubing or more would be too large. But nobody keeps you from giving it a try. Modern hardware might support it in some way I do not know. Have fun digging through thousands of pages of opcode description for the myriads of modern CPUs (http://ref.x86asm.net/ might be a start andhttp://www.agner.org/optimize/instruction_tables.pdf has a list of latencies,http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html has all the opcodes for Intel cpus and http://developer.amd.com/resources/documentation-articles/developer-guides-manuals/ does the same for AMD) and GPUs (somewhere at http://developer.amd.com/ for AMD (e.g.: http://developer.amd.com/wordpress/media/2012/10/R700-Family_Instruction_Set_Architecture.pdf but there are most probably newer ones) http://docs.nvidia.com/cuda/cuda-binary-utilities/index.html#instruction-set-ref fro NVIDIA and so on). It’s a wide subject.

I did it a couple of years ago to see if I can use the then new 128 bit SSE registers to compare two MD5 sums (yes, a

longtime ago π ) for equality.Doing it the old way was faster btw. The md5 sum consisted of four 32 bit integers in a row and the probability that the first integers of two such structures differ is already very high. The 128 bit were also not really 128 bit because the data bus was still 64 bit wide at most and the xmm registers were floating point registers. It was quite a waste of time but interesting nevertheless.

A quick search for an already published paper was not successful so your paper might get published in a well-known journal if you found a way to do it.

Your judgement is correct and it was of course the fault of Signore Copypasta, as always! π

But don’t ask me how it got there and in that form, I honestly cannot imagine.

What the…?

Oh, my… *sigh*

Good luck with that, it’s hard these days.

But I think you mean the school internal journal you mentioned once?

LikeLike

Yes, as it turns out, Akismet automatically flagged your two comments as spam as it contained a tad too many links. π

Anyways, I’ve changed those parameters so that the issue shouldn’t be as frequent.

LikeLike

Just deleted the duplicate.

LikeLike

OK good, so this is what I thought you meant. Quite a good point indeed.

Yeah, it’s my school’s journal. It is considered a research journal for experimental science, so it is worth a shot. Not sure how strict the procedure is and what the formalities are, but I’ve browsed through a few publications and it seems there are none on the subject of computer science. They mostly contain articles regarding medicine, biology, chemistry, and even mathematical proofs (I would’ve never thought!). I think they’ll be open to the idea. π

So you mention the term “limbs” quite often, but what exactly is a limb in this context? Could you point me to a resource explaining this concept? I have my own mental idea of what you’re saying and this is what I’m thinking:

Instantiate a two-dimensional array, which will be an array of 32 bit integer limbs, with each limb holding a maximum possible value of 10

^{9}. But then where I’m confused is, how can you operate on all limbs at once? Once you do perform the multiplications on each of the limbs, what do you do when the maximum value is reached, and you can’t multiply any further? Perhaps I’m not thinking on the right track with this whole “limbs” concept.LikeLike

Hi,

Probably not as strict regarding the themes (I don’t think they expect original work, just something outside of the standard curriculum) but I hope that the procedures and formalities do not differ much.

Yes.

*points to himself*

But serious: your are already quite on it:

It is actually a one-dimensional array (zero dimensions make a point, one dimension is a line, two dimensions is a plane, three dimensions is a body, four dimensions include the time that body lasts, five dimensions look a bit like mice in the three dimensional projections…but I digress π ) but otherwise correct, yes.

“limb” is just an expression for “single digit of the base x”.

You don’t. You work with one after the other. You can slice it up and give every CPU its own slice to work with but that is normally not very useful, because it is hard to implement, the overhead is just to large and there are better methods to speed up multiplication which are also better suited for multi-threading.

That’s called a carry.

I guess you are thinking way too complicated?

The concept is very simple, just like the form of long multiplication you learned in primary school with the single difference that the digits are larger. That’s all. (and actually wrong, because you learned something different in primary school, you applied a shortcut but nobody told you that it is a shortcut)

Digits are the numbers from zero to nine in a number based on the number ten. Base ten is quite common these days but in some under-developed countries you may find a measurement system with base 12 (the foot and inch system used in your southern neighbourhood, y’know? π ).

And for multiplying numbers that consist of several digits you multiply two digits and when the result x is larger than 10 you slice the result r = x%10 and c = floor(x/10) where r is the rest and c is the carry.

The only difference between digits and limbs is the size.

You can take the above equations and replace the “10” with “10^9” and you have what you need, so in general with the number, a positive integer the base, the carry and the rest.

Another difference is that the carry can be larger than one.

And you have to manage the memory yourself, so don’t forget that the magnitude of every multiplication is up to twice as large as the individual factors (e.g.: 9*9 = 18, 99*99 = 9801 and so on.), so if the product of two of the largest possible limbs does not fit into the same data type as the limb you need a bigger one. For example: you need a 64 bit large data type to hold the product of two 32 bit large numbers. You don’t need to but avoiding it needs some sophisticated bit-juggling. It is not

thatcomplicated but nothing I would recommend for a start and if you keep it modular you can replace it later without a problem, it will be only at one place (actually two places, with the other one being squaring). if you want to try it nevertheless: take both 32-bit values as two digit numbers (low and high) and do long multiplication with them.Assuming 32 bit integers a and b and written for legibility.

Caveat: the following pseudocode might contain typographical and other errors but does not contain any checks and balances, please control before use.

Obvious problem: that is for binary based numbers not for decimal numbers but the difference is only superfluous.

Bit shift right for binary numbers is the same as dividing (dropping the rest) by a power of ten for decimal numbers (the difference between “>>” and “>>>” in Java and JavaScript is between signed and unsigned shifting).

Bit shift left is the same as multiplying with a power of ten

Oring (the vertical bar “|”) is the same as adding.

But as said at the start: keep it simple and just multiply and use a larger data type as the intermediate result.

Ah, found a rather good (YMMV, of course!) explanation of the squaring algorithm here: http://www.cygnus-software.com/docs/html/64bitcalcs.htm

Another hint if you want to implement a full big-integer library, although it’s one of the YMMV tips:

You do all computation on unsigned big-integers and handle the sign apart. In my JavaScript implementation I used a JavaScript object containing the array, the length of the array (can differ from the actual length) and the sign. You can put it all in one large array (length and sign need only one integer) but that makes it unnecessary complicated and the overhead of a full object is negligible in contrast to the many operations the array inside of the object gets treated with.

LikeLike