Interleaving Computations With Iterators
Prime Number Factorization in Rust
Finding prime numbers and factorizing numbers into their prime components is one of my stock programs: Whenever I learn a new programming language, I try to solve the following problems:
- Given a number
n
, find all prime numbers in the interval[2;n]
.- Example: The prime numbers up to
20
are[2, 3, 5, 7, 11, 13, 17, 19]
.
- Example: The prime numbers up to
- Given a number
x
, factorizex
into its prime components.- Example: The number
234
can be factorized into[2, 3, 3, 13]
.
- Example: The number
The two problems are related: In order to find the prime factors of a number,
one needs to find the prime numbers that could possibly make up its factors. In
order to find the prime factors of x
, a naive approach would be to first find
all prime numbers up to x
—or up to the square root of x
as an optimization.
(See my explanation of the Trial Division
Method
for a rationale; I’ve written this paper when I implemented the same stock
program concurrently in Elixir.) So let’s do what Rustaceans do: re-implement
it in Rust!
Re-implementing factor
in Rust
The factor(1)
program, which probably is written in C—and therefore
compulsively needs to be rewritten in Rust—factorizes the given number:
$ factor 234
234: 2 3 3 13
Let’s start with a new Cargo project:
$ cargo new factor
Created binary (application) `factor` package
The Hello, World!
program shall be replaced with this boiler-plate code:
fn main() {
let mut args = std::env::args();
args.next().expect("skipping the executable name");
let argument = args.next().expect("number argument missing");
let number: u64 = argument.parse().expect("argument is not a number");
println!("{}", number);
}
This allows us to parse a single command line argument as a number. The program panics (i.e. exits) if no positive number was provided.
With this out of the way, let’s find the prime numbers up to number
!
Prime Sieve
The Prime Sieve of Eratosthenes finds prime numbers up to n
by only trying to
divide i
within [2;n]
for values of i
that are prime numbers themselves.
(Rationale: if a number is divisible by 4
, it is also divisible by 2
, and
therefore checking 2
is enough.)
Let’s implement this algorithm:
fn prime_sieve(n: u64) -> Vec<u64> {
let mut primes = Vec::new();
for i in 2..=n {
if !primes.iter().any(|p| i % p == 0) {
primes.push(i);
}
}
primes
}
The numbers from 2
(which is the smallest prime number) up to and including
n
are processed in a for
loop. For every number i
in the interval [2;n]
it is checked whether or not it is divisible by any prime number already found.
If it is not divisible, i
is a prime number, and thus added to the vector of
prime numbers.
A little testing won’t hurt:
mod tests {
use super::*;
#[test]
fn prime_sieve_up_to_20() {
assert_eq!(prime_sieve(20), vec![2, 3, 5, 7, 11, 13, 17, 19]);
}
}
Prime Factorization: Trial Division Method
Given a number n
to factorize, and having found the prime numbers up to n
(or up to the square root of n
, for that matter), we can implement the Trial
Division Method for prime factorization:
fn factorize(n: u64) -> Vec<u64> {
let n_sqrt = (n as f64).sqrt().ceil() as u64;
let primes = prime_sieve(n_sqrt);
let mut factors = Vec::new();
let mut n = n;
let mut i = 0;
while n > 1 {
let prime = primes[i];
if n % prime == 0 {
factors.push(prime);
n /= prime;
} else {
i += 1;
}
}
factors
}
First, the prime numbers up to the square root of n
are computed using
the prime_sieve
function just written.
Second, the trial division is repeated as long as the number n
(shadowed with
a mutable version) is bigger than 1, at which point the factorization is
finished. The prime numbers are picked one by one. As long as a prime number
divides n
without a remanider, the prime number is a prime factors and pushed
onto factors
as such, and n
becomes the remainder of dividing it by the
current prime number. Otherwise, the next prime number is tried, until a prime
number is reached that divides the remainder of n
to 1, at which point the
loop terminates.
This test case demonstrates that factors
works for n=234
:
#[test]
fn factorize_234() {
assert_eq!(factorize(234), vec![2, 3, 3, 13]);
}
However, the following test fails:
#[test]
fn factorize_39() {
assert_eq!(factorize(39), vec![3, 13]);
}
The algorithm runs out of prime numbers and fails to use the remainder 13
as a
prime factor:
thread 'tests::factorize_39' panicked at src/main.rs:58:27:
index out of bounds: the len is 4 but the index is 4
The logic could be slightly modified by using an Iterator
instead of the index
variable i
:
fn factorize(n: u64) -> Vec<u64> {
let n_sqrt = (n as f64).sqrt().ceil() as u64;
let primes = prime_sieve(n_sqrt);
let mut primes = primes.iter();
let mut factors = Vec::new();
let mut n = n;
let mut prime = match primes.next() {
Some(p) => p,
None => {
return vec![n];
}
};
while n > 1 {
if n % prime == 0 {
factors.push(*prime);
n /= prime;
} else {
prime = match primes.next() {
Some(p) => p,
None => {
factors.push(n);
break;
}
}
}
}
factors
}
The initial primes
vector is shadowed by a (mutable) iterator over those prime
numbers. Since our primes Iterator
returns an Option<u64>
instead of a plain
u64
, we make sure to handle the case of running out of prime numbers properly.
(In this case, the remainder n
is the last factor to be added.)
The code hasn’t gotten any shorter, quite the opposite, but the factorize
function no longer violates index bounds—and is now ready to make use of an
Iterator
of prime numbers, which will come in very handy soon…
Minible Viable factor
Let’s extend the main function to replicate the core functionality of factor
by making use of factorize
:
fn main() {
let mut args = std::env::args();
args.next().expect("skipping the executable name");
let argument = args.next().expect("number argument missing");
let number: u64 = argument.parse().expect("argument is not a number");
let factors = factorize(number);
let factors = factors
.iter()
.map(|x| x.to_string())
.collect::<Vec<String>>()
.join(" ");
println!("{}: {}", number, factors);
}
Our factor
implementation now can be used from the command line (discarding
compiler output):
$ cargo run -- 234 2>/dev/null
234: 2 3 3 13
Rewriting it in Rust should not lead to a performance penalty. However, our
re-implementation works visibly slower than factor
with big numbers (e.g. one
billion), even with the release
build profile:
$ time factor 1000000000
real 0m0.002s
user 0m0.000s
sys 0m0.002s
$ cargo build --release && time target/release/factor 1000000000
1000000000: 2 2 2 2 2 2 2 2 2 5 5 5 5 5 5 5 5 5
real 0m0.024s
user 0m0.020s
sys 0m0.004s
Stripping down the main
function into a prime number finder might give a hint:
fn main() {
let mut args = std::env::args();
args.next().expect("skipping the executable name");
let argument = args.next().expect("number argument missing");
let number: u64 = argument.parse().expect("argument is not a number");
let n_sqrt = (number as f64).sqrt().ceil() as u64;
println!("{} prime numbers computed", prime_sieve(n_sqrt).len());
}
Computing the prime numbers takes up most of the time:
$ cargo build --release && time target/release/factor 1000000000
3401 prime numbers computed
real 0m0.027s
user 0m0.023s
sys 0m0.003s
However, only the two prime numbers 2
and 5
are actually used for this
particular input. This is a waste of computing power!
Let’s make it faster by being lazy.
Iterators are Rust’s (Lazy) Streams
Instead of first finding all the prime numbers that possibly will be used and only then starting the factorization process, the prime numbers could be just computed on demand, i.e. when the next one is actually needed. In functional programming languages such as Elixir, this is achieved using a Stream, which is a virtually endless collection that computes the next value as needed. (For details, see my Elixir implementation of the prime sieve.)
Rust has a similar concept: Iterators. An Iterator
has an internal state,
which is advanced as the next()
method is called on it. This returns an
Option
, i.e. either the next value, or None
if all the elements have been
consumed and the Iterator
is exhausted. For prime numbers, the latter case
won’t happen, at least not within the u64
value range.
So let’s implement an Iterator
for finding prime numbers!
An Iterator for Prime Numbers
First, we need a data structure that holds the internal state of the iteration.
For the prime sieve, this is the last number that has been checked for
primality (i
), and the prime numbers found so far (primes
):
struct PrimeIterator {
i: u64,
primes: Vec<u64>,
}
For convenience, a new()
associated method is provided, which initializes the
iteration at the number 2
, which is the first prime number:
impl PrimeIterator {
fn new() -> Self {
PrimeIterator {
i: 2,
primes: Vec::new(),
}
}
}
The vector of prime numbers is initially empty. (The current value of i
, which
is 2
, will be moved into primes
upon the first iteration.)
To implement the Iterator
trait, the data type of the Item
and the next()
method need to be defined:
impl Iterator for PrimeIterator {
type Item = u64;
fn next(&mut self) -> Option<u64> {
let mut i = self.i;
loop {
if !self.primes.iter().any(|p| i % p == 0) {
self.primes.push(i);
self.i = i + 1;
return Some(i);
}
i += 1;
}
}
}
We use u64
values, and next()
returns an Option<u64>
. First, the current
value of i
is grabbed from self
. Then, i
is checked for divisibility
against the prime numbers found so far. If no prime number divides it, a new
prime number is found, which is added to the primes
vector. The variable i
is advanced by one, so that the next iteration starts with the its successor.
The found prime number is returned.
This test case demonstrates how the PrimeIterator
can be used:
#[test]
fn prime_iterator() {
let mut iter = PrimeIterator::new();
assert_eq!(iter.next(), Some(2));
assert_eq!(iter.next(), Some(3));
assert_eq!(iter.next(), Some(5));
assert_eq!(iter.next(), Some(7));
assert_eq!(iter.next(), Some(11));
assert_eq!(iter.next(), Some(13));
assert_eq!(iter.next(), Some(17));
assert_eq!(iter.next(), Some(19));
}
It would be possible to define an upper limit n
for prime numbers, but it only
complicates matters and is therefore left as an exercise for the reader.
Factorizing Faster with PrimeIterator
In factorize
, we simply replace the Iterator
acquired from the returned
vector of prime_sieve
with a PrimeIterator
:
fn factorize(n: u64) -> Vec<u64> {
let mut primes = PrimeIterator::new();
let mut factors = Vec::new();
let mut n = n;
let mut prime = match primes.next() {
Some(p) => p,
None => {
return vec![n];
}
};
while n > 1 {
if n % prime == 0 {
factors.push(prime);
n /= prime;
} else {
prime = match primes.next() {
Some(p) => p,
None => {
factors.push(n);
return factors;
}
}
}
}
factors
}
Let’s retry our factor
re-implementation with the input from before, i.e. with
one billion:
$ cargo build --release && time target/release/factor 1000000000
1000000000: 2 2 2 2 2 2 2 2 2 5 5 5 5 5 5 5 5 5
real 0m0.001s
user 0m0.001s
sys 0m0.000s
Notice how the speedup vanishes as other inputs require more prime numbers to be computed (using one billion and one):
$ cargo build --release && time target/release/factor 1000000001
1000000001: 7 11 13 19 52579
real 0m0.073s
user 0m0.069s
sys 0m0.004s
However, this is still fast enough, expecially compared to the timings yielded by the concurrent Elixir implementation.
Conclusion
Iterators allow us to interleave two stateful computations without mixing up their code. The concerns—finding prime numbers, factorizing a number into its prime components—remain separated and can be tested independently. The state of the iterator is only advanced as much as needed, which avoids unnecessary computations and, compared to the original computation in advance, speeds up the program considerably (depending on the input).
The final code can be found in my factor repository.