Taxman conquered

Presentation of results of Franklín & Moniot (2023), proving

likelihood that optimal Taxman play is NP-hard;
existence of a winning strategy that is efficient to compute;
relatively tight upper and lower bounds on the optimal score that are efficient to compute.

Proofs of results are omitted here. See the publication: Franklín, A. F. and Moniot R. K. The difficulty of beating the Taxman. Discrete Applied Mathematics, 339, 166-171 (2023).

The reason for the somewhat hubristic title of this page is that these outstanding questions about the taxman game have now been satisfactorily answered.

Taxman as a graph matching problem

The Taxman graph

Given a pot of numbers from 1 to N, create a graph in which the vertices are the numbers, and each number is connected by an edge to each of the numbers that divide it with a single prime factor divisor. Thus, for instance, 20 is connected by an edge to 4 (which divides it with a divisor 5) and 10 (which divides it with a factor 2), but not to 5 (whose divisor is 4, not a prime).

Define the rank of a number as the number of prime factors it has, counted with multiplicity. Thus 1 has rank 0 and all primes have rank 1. The number 20 = 2 × 2 × 5 has rank 3. In the Taxman graph, each edge connects vertices whose ranks differ by 1. For a pot of size N, the number of largest rank is of the form 2^k for the largest k such that 2^k ≤ N. The graph can be partitioned into sets of vertices having the same rank, with edges connecting only vertices in adjacent partitions having rank that differs by 1. We define a subgraph of rank r as a subset of the Taxman graph for which the vertices have rank r - 1 and r, for 1 ≤ r ≤ k, together with the edges connecting them.

A matching on any graph is a collection of edges such that no two edges in the matching have a vertex in common. This is also called an independent edge set. If the edges are assigned weights, the weight of the matching is the sum of the weights of the matched edges. For the Taxman graph, we define the weight of an edge as the value of its larger vertex.

Given a matching on a graph, an alternating path is a sequence of connected edges such that the path alternates between matched and unmatched edges. A cycle is a path that ends on the same vertex where it began.

Playing Taxman as a graph matching problem

Given a matching on a Taxman graph, we can construct a series of moves by the player, in which the player takes the larger (rank r) vertex of a matched edge, and the taxman takes the smaller (rank r - 1) vertex to which the vertex is matched, as well as any other divisors of that vertex. This method may not give a sequence of moves that allows all the matched edges to be used by the player. The condition for it to do so is given by the following theorem:

Theorem 1. A matching on a Taxman graph corresponds to a valid sequence of moves for a game if and only if there is no alternating cycle within any subgraph of a given rank. The player's score is the weight of the matching.

For convenience, we call such an alternating cycle, in which the vertices all have rank r - 1 or r for some r, a flat alternating cycle.

Playing Taxman optimally is probably NP-hard

Theorem 1 implies that the optimal score for a Taxman game is the maximum weight of a matching on the graph that is free of flat alternating cycles.

If we replace the rank function by a function that has only 2 ranks, 0 and 1, while retaining the rule that vertices only connect to vertices of rank that differs by 1, then the graph becomes bipartite, with edges only connecting vertices of rank 0 with vertices of rank 1. If we also replace the weight function by a weight of 1 for each edge, then the maximum-weight matching becomes a maximum-cardinality matching, i.e., a matching that includes the largest possible number of edges.

The problem of finding a maximum-cardinality matching on a bipartite graph that is free of alternating cycles has been shown to be NP-complete. Finding the optimal Taxman score seems at least as hard as that problem. Therefore this result suggests that optimal Taxman play is probably NP-hard.

For the reader who is unfamiliar with the concept of NP-hardness, in essence what it means is that if a solution to an NP-hard problem exists, once provided it can be easily verified. However, finding a solution is, in general, inefficient and soon becomes out of reach as the size of the problem increases. For instance, if I tell you that you can achieve a score of 301 for a Taxman game of size N = 30, and give you the sequence of moves 29, 25, 15, 21, 27, 18, 30, 20, 16, 24, 28, 26, 22, you can play Taxman with those moves and verify that they give a legal game with a score of 301. But finding the sequence of moves that give that score is not so easy, and quickly gets harder as N increases.

That said, Brian Chess has made substantial progress on finding optimal move sequences and optimal scores. Currently the optimal score and move sequence are known out to N = 1000. He has exploited the fact that for many cases, the optimal play for N can be quickly derived from that for N - 1. An example of this is when N is prime: then the optimal play is found simply by replacing the first move, which was the largest prime less than or equal to N - 1, by the new largest prime, N. Also, the maximum score for N cannot be larger than N plus the maximum score for N - 1. This allows the search space to be pruned considerably. See https://github.com/bvchess/taxman/wiki for more details.

Upper bound on maximum score

Suppose we have a maximum-weight matching on a graph. Clearly, adding the constraint that the matching must be free of flat alternating cycles can only reduce the weight. Hence, we have the following result:

Theorem 2: The optimum score for a Taxman game cannot exceed the weight of a maximum-weight matching on the graph.

An algorithm to find a maximum-weight matching on a graph of N vertices exists that has running time O(N² log N). This is quite efficient, and allows this upper bound to be calculated out to quite large values of N. For N ≤ 701, the largest N for which the optimal score was known at the time of this work, this bound was found to be within 1% of the optimal score in almost all cases.

(For the reader unfamiliar with “big-O” notation, the expression O(f(N)) means a compute time —basically, the number of steps required— that is dominated by a term proportional to f(N) for large N.)

Lower bound on the maximum score

Given a maximum-weight matching on the Taxman graph, we can remove some edges from the matching to turn it into a matching that is free of flat alternating cycles, and obtain a valid move sequence that way. The heuristic is to remove as little weight as possible to break the existing cycles. By changing the undirected edges in the original definition of the graph into directed edges, in which an unmatched edge is directed from the lower-rank vertex to the upper-rank vertex, and matched edges are directed oppositely, this problem is turned into a well-known problem called the minimum feedback arc set problem. Although this problem is NP-hard, heuristic algorithms exist to compute a good solution in time O(V × E) where V is the number of vertices and E is the number of edges. Hence the time complexity of this heuristic method is still O(N² log N). The score using this method is thus a lower bound on the maximum score.

This heuristic method for playing Taxman compares favorably with other heuristics that have been published, often exceeding their score. In most cases tested, it yields a score between 60% and 62% of the pot, while the optimal play yields between about 62% and 64%. Thus it is within approximately 3% of the optimal score.

Some comparisons of this method to other heuristics that have been published, and to the optimal scores, are given here. (These comparisons were not included in the paper.)

A winning strategy

It is possible to define a matching that by construction has no flat alternating cycles, and hence always yields a playable move set. Given a Taxman game of size N, for each prime p ≤ N define a set of edges as follows:

S_p = { (x, px) ∣ ∃ i : p²ⁱ⁺¹x ≤ N < p²ⁱ⁺² x }

This set can be produced by iterating through the values of x from N down to 1 and matching x to x / p whenever x is a multiple of p and x has not already been matched with px. One can think of the edges in this set for a given value of x as forming an alternating path of multiples of x times powers of p from the largest value of p^k x ≤ N down to either p x or x. Now form a set of edges by forming the union of the sets S_p for all primes p ≤ N, starting with the largest prime and excluding any edges from a smaller prime that connect to a vertex that has already been included. That is, if q is a prime, then include in the union only the edges of S_q that are independent of the edges in all S_p for p > q.

Theorem 3. The set of edges formed as described above is a matching that is free of flat alternating cycles.

One can calculate a polynomial expression for a lower bound on the score produced by this matching. This lower bound can be shown to be greater than 50% of the pot, i.e., a winning score, for all N ≥ 847. For all values of N < 847 the algorithm is observed to yield a winning score except for N = 1, 3, 7, and 13. For N = 1 the game ends immediately since there is nothing the player can take, and the taxman wins by a score of 1 to 0. For N = 3, the optimal play is to take 3, giving the taxman 1 and 2 and yielding a 3-3 tie. For 7 and 13 the move sequence given by this matching can be replaced by the known optimal move sequence. Hence we have the following result:

Theorem 4. For all values of N other than 1 and 3, the Taxman game can be won.

The significance of this result is that while there have been some prior proofs of the existence of a winning strategy, they were only able to show that the strategy was guaranteed to win for values of N larger than some undetermined value. This proof covers all N except the two cases that cannot be won.

Ordering the picks

The matching identifies the numbers that can be picked to achieve a winning score, but they need to be put into a playable sequence. This can be done as follows.

Start with the the subgraph of rank 1, and proceed sequentially through the higher ranks. The matched vertices of a given rank r are not connected by matched edges to vertices of higher rank, and so can be taken without preventing any future picks of higher rank. Define the order of a matched vertex of rank r as the number of edges connecting it to vertices within the subgraph of rank r. Select a matched vertex of rank r having order 1, which means the only edge connected to it is a matched edge. Such a vertex must exist, otherwise there would be a cycle within this subgraph. Remove that vertex and the lower-rank vertex to which it is matched. This removes any (unmatched) edges connecting the lower-rank vertex to other vertices of rank r, lowering their order by 1. Thus as the process proceeds, other rank r vertices become of order 1 and can be taken. Continue until all the rank r matched vertices in the subgraph are taken. Then proceed to the subgraph of rank r + 1 until all subgraphs are processed.

This method can be implemented in a way that takes O(N log N) time. The score has been calculated out to N = 100 000, and is observed to settle down to a fraction of about 57% of the pot for large N.