### Building a polynomial out of its roots

Building a polynomial out of its roots is not particularly hard, one just multiplies the corresponding linear/first degree polynomials. But this isnt fast, even doing it recursively and using FFT based multiplication isnt all that great its still O(nlog^{2}n). We can under some circumstances get rid of one of these log(n) factors as ill describe in a moment below. But first i wonder if below is the best that can be done or if iam silly and theres a faster or simpler way?

The idea of this method is not to build up the polynomial coefficients but to build up a vector of polynomial evaluations at evenly spaced points. Doing this for a linear factor “prototype” like x-1 on m points costs us O(m). next we build a sparse vector that is 1 where its index matches a root (and 2 for double roots,…) otherwise its 0. Now we can almost build our evaluation of the final polynom by convolution, just that convolution adds its terms while we need the factors multiplied of course. The solution is simply to convert the evaluation of the “prototype” linear factor by elementwise log() before convolution and by exp() afterwards. The value for log(0) does not matter for us except for numerical stabilty, we have to after exp() reset all roots to 0 anyway. With log(0)=0 one gets first order derivatives at the roots though. Also one can implement this using clasic log/exp and a complex value fft or finite field log/exp with a real or finite field fft. The last step of turning the evaluation vector of our polynom into coefficients can be done with a finite field fft. This make the whole thing run in O(m logm) time for a field size of m.

Whats annoying on it is that the the first part works with samples evenly spaced (aka an additative subgroup) while the second, that is turning the evaluation into coefficients is on a multiplicative subgroup of a finite field. In practice that means while my roots are along a multiplicative subgroup of GF(2^{16}+1) i have to apply the rdft over the whole field. which is kinda feeling like a waste of cpu cycles

Suggestions to improve this are welcome. Also alternatively if one knows of a linear time method to zero pad in the frequency domain a block of size 2^{n} to twice its size than the resursive multiplication variant should also run in O(n logn) time.