1 Introduction
Recurrent neural networks (RNNs) are an attractive apparatus for probabilistic language modeling Mikolov and Zweig (2012)
. Recent experiments show that RNNs significantly outperform other methods in assigning high probability to heldout English text
Jozefowicz et al. (2016).Roughly speaking, an RNN works as follows. At each time step, it consumes one input token, updates its hidden state vector, and predicts the next token by generating a probability distribution over all permissible tokens. The probability of an input string is simply obtained as the product of the predictions of the tokens constituting the string followed by a terminating token. In this manner, each RNN defines a
weighted language; i.e. a total function from strings to weights. siegelmann showed that singlelayer rationalweight RNNs with saturated linear activation can compute any computable function. To this end, a specific architecture with 886 hidden units can simulate any Turing machine in realtime (i.e., each Turing machine step is simulated in a single time step). However, their RNN encodes the whole input in its internal state, performs the actual computation of the Turing machine when reading the terminating token, and then encodes the output (provided an output is produced) in a particular hidden unit. In this way, their RNN allows “thinking” time (equivalent to the computation time of the Turing machine) after the input has been encoded.We consider a different variant of RNNs that is commonly used in natural language processing applications. It uses ReLU activations, consumes an input token at each time step, and produces softmax predictions for the next token. It thus immediately halts after reading the last input token and the weight assigned to the input is simply the product of the input token predictions in each step.
Other formal models that are currently used to implement probabilistic language models such as finitestate automata and contextfree grammars are by now wellunderstood. A fair share of their utility directly derives from their nice algorithmic properties. For example, the weighted languages computed by weighted finitestate automata are closed under intersection (pointwise product) and union (pointwise sum), and the corresponding unweighted languages are closed under intersection, union, difference, and complementation Droste et al. (2013). Moreover, toolkits like OpenFST Allauzen et al. (2007) and Carmel^{1}^{1}1https://www.isi.edu/licensedsw/carmel/ implement efficient algorithms on automata like minimization, intersection, finding the highestweighted path and the highestweighted string.
RNN practitioners naturally face many of these same problems. For example, an RNNbased machine translation system should extract the highestweighted output string (i.e., the most likely translation) generated by an RNN, Sutskever et al. (2014); Bahdanau et al. (2014). Currently this task is solved by approximation techniques like heuristic greedy and beam searches. To facilitate the deployment of large RNNs onto limited memory devices (like mobile phones) minimization techniques would be beneficial. Again currently only heuristic approaches like knowledge distillation Kim and Rush (2016) are available. Meanwhile, it is unclear whether we can determine if the computed weighted language is consistent; i.e., if it is a probability distribution on the set of all strings. Without a determination of the overall probability mass assigned to all finite strings, a fair comparison of language models with regard to perplexity is simply impossible.
The goal of this paper is to study the above problems for the mentioned ReLUvariant of RNNs. More specifically, we ask and answer the following questions:

Consistency: Do RNNs compute consistent weighted languages? Is the consistency of the computed weighted language decidable?

Highestweighted string: Can we (efficiently) determine the highestweighted string in a computed weighted language?

Equivalence: Can we decide whether two given RNNs compute the same weighted language?

Minimization: Can we minimize the number of neurons for a given RNN?
2 Definitions and notations
Before we introduce our RNN model formally, we recall some basic notions and notation. An alphabet is a finite set of symbols, and we write for the number of symbols in . A string over the alphabet is a finite sequence of zero or more symbols drawn from , and we write for the set of all strings over , of which is the empty string. The length of the string is denoted and coincides with the number of symbols constituting the string. As usual, we write for the set of functions . A weighted language is a total function from strings to realvalued weights. For example, for all is such a weighted language.
We restrict the weights in our RNNs to the rational numbers . In addition, we reserve the use of a special symbol to mark the start and end of an input string. To this end, we assume that for all considered alphabets, and we let .
Definition 1.
A singlelayer RNN is a tuple , in which

is an input alphabet,

is a finite set of neurons,

is an initial activation vector,

is a transition matrix,

is a indexed family of bias vectors ,

is a prediction matrix, and

is a prediction bias vector.
Next, let us define how such an RNN works. We first prepare our input encoding and the effect of our activation function. For an input string
with , we encode this input as and thus assume that and. Our RNNs use ReLUs (Rectified Linear Units), so for every
we let (the ReLU activation) be the vector such thatIn other words, the ReLUs act like identities on nonnegative inputs, but clip negative inputs to . We use softmaxpredictions, so for every vector and we let
RNNs act in discrete time steps reading a single letter at each step. We now define the semantics of our RNNs.
Definition 2.
Let be an RNN, an input string of length and a time step. We define

the hidden state vector given by
where and we use standard matrix product and pointwise vector addition,

the nexttoken prediction vector

the nexttoken distribution
Finally, the RNN computes the weighted language , which is given for every input as above by
In other words, each component of the hidden state vector is the ReLU activation applied to a linear combination of all the components of the previous hidden state vector together with a summand that depends on the th input letter . Thus, we often specify as linear combination instead of specifying the matrix and the vectors . The semantics is then obtained by predicting the letters of the input and the final terminator and multiplying the probabilities of the individual predictions.
Let us illustrate these notions on an example. We consider the RNN with and

and ,

and

and and

and .
In this case, we obtain the linear combinations
computing the next hidden state components. Given the initial activation, we thus obtain . Using this information, we obtain
Consequently, we assign weight to input , weight to , and, more generally, weight to .
Clearly the weight assigned by an RNN is always in the interval , which enables a probabilistic view. Similar to weighted finitestate automata or weighted contextfree grammars, each RNN is a compact, finite representation of a weighted language. The softmaxoperation enforces that the probability is impossible as assigned weight, so each input string is principally possible. In practical language modeling, smoothing methods are used to change distributions such that impossibility (probability ) is removed. Our RNNs avoid impossibility outright, so this can be considered a feature instead of a disadvantage.
The hidden state of an RNN can be used as scratch space for computation. For example, with a single neuron we can count input symbols in via:
Here the letterdependent summand is universally . Similarly, for an alphabet we can use the method of siegelmann to encode the complete input string in base using:
where is a bijection. In principle, we can thus store the entire input string (of unbounded length) in the hidden state value , but our RNN model outputs weights at each step and terminates immediately once the final delimiter is read. It must assign a probability to a string incrementally
using the chain rule decomposition
.Let us illustrate our notion of RNNs on some additional examples. They all use the alphabet and are illustrated and formally specified in Figure 1. The first column shows an RNN that assigns . The nexttoken prediction matrix ensures equal values for and at every time step. The second column shows the RNN , which we already discussed. In the beginning, it heavily biases the next symbol prediction towards , but counters it starting at . The third RNN uses another counting mechanism with . The first two components are ReLUthresholded to zero until , at which point they overwhelm the bias towards turning all future predictions to .
3 Consistency
We first investigate the consistency problem for an RNN , which asks whether the recognized weighted language is indeed a probability distribution. Consequently, an RNN is consistent if . We first show that there is an inconsistent RNN, which together with our examples shows that consistency is a nontrivial property of RNNs.^{2}^{2}2
For comparison, all probabilistic finitestate automata are consistent, provided no transitions exit final states. Not all probabilistic contextfree grammars are consistent; necessary and sufficient conditions for consistency are given by booth_thompson_1973. However, probabilistic contextfree grammars obtained by training on a finite corpus using popular methods (such as expectationmaximization) are guaranteed to be consistent
Nederhof and Satta (2006).We immediately use a slightly more complex example, which we will later reuse.
Example 3.
Let us consider an arbitrary RNN
with the singleletter alphabet , the neurons , initial activation for all , and the following linear combinations:
Now we distinguish two cases:
Case 1: If for all , then and and . Hence we have and . In this case the termination probability
(i.e., the likelihood of predicting ) shrinks rapidly
towards , so the RNN assigns less than 15% of the probability
mass to the terminating sequences (i.e., the finite strings), so the
RNN is inconsistent (see Lemma 15
in the
appendix).
Case 2: Suppose that there exists a time point
such that for all
Then for all and otherwise. In addition, we have and . Hence we have
which shows that the probability
of predicting increases over time and eventually (for ) far outweighs the probability of predicting . Consequently, in this case the RNN is consistent (see Lemma 16 in the appendix).
We have seen in the previous example that consistency is not trivial for RNNs, which takes us to the consistency problem for RNNs:
Consistency:
Given an RNN , return “yes” if is consistent and “no” otherwise.
We recall the following theorem, which, combined with our example, will prove that consistency is unfortunately undecidable for RNNs.
Theorem 4 (Theorem 2 of siegelmann).
Let be an arbitrary deterministic Turing machine. There exists an RNN
with saturated linear activation, input alphabet , and designated neuron such that for all and

if does not halt on , and

if does halt on empty input after steps, then
In other words, such RNNs with saturated linear activation can semidecide halting of an arbitrary Turing machine in the sense that a particular neuron achieves value at some point during the evolution if and only if the Turing machine halts on empty input. An RNN with saturated linear activation is an RNN following our definition with the only difference that instead of our ReLUactivation the following saturated linear activation is used. For every vector and , let
Since for all
, and the righthand side is a linear transformation, we can easily simulate saturated linear activation in our RNNs. To this end, each neuron
of the original RNN is replaced by two neurons and in the new RNN such that for all and , where the evaluation of is performed in the RNN . More precisely, we use the transition matrix and bias function , which is given byfor all and , where and are the two neurons corresponding to and and are the two neurons corresponding to (see Lemma 17 in the appendix).
Corollary 5.
Let be an arbitrary deterministic Turing machine. There exists an RNN
with input alphabet and designated neurons such that for all and

if does not halt on , and

if does halt on empty input after steps, then
We can now use this corollary together with the RNN of Example 3 to show that the consistency problem is undecidable. To this end, we simulate a given Turing machine and identify the two designated neurons of Corollary 5 as and in Example 3. It follows that halts if and only if is consistent. Hence we reduced the undecidable halting problem to the consistency problem, which shows the undecidability of the consistency problem.
Theorem 6.
The consistency problem for RNNs is undecidable.
As mentioned in Footnote 2, probabilistic contextfree grammars obtained after training on a finite corpus using the most popular methods are guaranteed to be consistent. At least for 2layer RNNs this does not hold.
Theorem 7.
A twolayer RNN trained to a local optimum using Backpropagationthroughtime (BPTT) on a finite corpus is not necessarily consistent.
Proof.
The first layer of the RNN with a single alphabet symbol uses one neuron and has the following behavior:
The second layer uses neuron and takes as input at time :
Let the training data be . Then the objective we wish to maximize is simply . The derivative of this objective with respect to each parameter is , so applying gradient descent updates does not change any of the parameters and we have converged to an inconsistent RNN. ∎
It remains an open question whether there is a singlelayer RNN that also exhibits this behavior.
4 Highestweighted string
Given a function we are often interested in the highestweighted string. This corresponds to the most likely sentence in a language model or the most likely translation for a decoder RNN in machine translation.
For deterministic probabilistic finitestate automata or contextfree grammars only one path or derivation exists for any given string, so the identification of the highestweighted string is the same task as the identification of the most probable path or derivation. However, for nondeterministic devices, the highestweighted string is often harder to identify, since the weight of a string is the sum of the probabilities of all possible paths or derivations for that string. A comparison of the difficulty of identifying the most probable derivation and the highestweighted string for various models is summarized in Table 1, in which we marked our results in bold face.
Bestpath  Beststring  

General RNN  Undecidable  
Consistent RNN  NPc ^{3}^{3}3Restricted to solutions of polynomial length  
Det. PFSA/PCFG  P ^{4}^{4}4Dijkstra shortest path / Knuth (1977)  
Nondet. PFSA/PCFG  NPc ^{5}^{5}5Casacuberta and de la Higuera (2000) / Simaan (1996) 
We present various results concerning the difficulty of identifying the highestweighted string in a weighted language computed by an RNN. We also summarize some available algorithms. We start with the formal presentation of the three studied problems.

Best string: Given an RNN and , does there exist with ?

Consistent best string: Given a consistent RNN and , does there exist with ?

Consistent best string of polynomial length: Given a consistent RNN , polynomial with for , and , does there exist with and ?
As usual the corresponding optimization problems are not significantly simpler than these decision problems. Unfortunately, the general problem is also undecidable, which can easily be shown using our example.
Theorem 8.
The best string problem for RNNs is undecidable.
Proof.
Let be an arbitrary Turing machine and again consider the RNN of Example 3 with the neurons and identified with the designated neurons of Corollary 5. We note that in both cases. If does not halt, then for all . On the other hand, if halts after steps, then
using Lemma 14 in the appendix. Consequently, a string with weight above exists if and only if halts, so the best string problem is also undecidable. ∎
If we restrict the RNNs to be consistent, then we can easily decide the best string problem by simple enumeration.
Theorem 9.
The consistent best string problem for RNNs is decidable.
Proof.
Let be the RNN over alphabet and be the bound. Since is countable, we can enumerate it via . In the algorithm we compute for increasing values of . If we encounter a weight , then we stop with answer “yes.” Otherwise we continue until , at which point we stop with answer “no.”
Since is consistent, , so this algorithm is guaranteed to terminate and it obviously decides the problem. ∎
Next, we investigate the length of the shortest string of maximal weight in the weighted language generated by a consistent RNN in terms of its (binary storage) size . As already mentioned by siegelmann and evidenced here, only small precision rational numbers are needed in our constructions, so we assume that for a (reasonably small) constant , where is the set of neurons of . We show that no computable bound on the length of the best string can exist, so its length can surpass all reasonable bounds.
Theorem 10.
Let be the function with
for all . There exists no computable function with for all .
Proof.
In the previous section (before Theorem 6) we presented an RNN that simulates an arbitrary (singletrack) Turing machine with states. By siegelmann we have . Moreover, we observed that this RNN is consistent if and only if the Turing machine halts on empty input. In the proof of Theorem 8 we have additionally seen that the length of its best string exceeds the number of steps required to halt.
For every , let be the th “Busy Beaver” number Radó (1962), which is
It is wellknown that cannot be bounded by any computable function. However,
so clearly cannot be computable and no computable function can provide bounds for . ∎
Finally, we investigate the difficulty of the best string problem for consistent RNN restricted to solutions of polynomial length.
Theorem 11.
Identifying the best string of polynomial length in a consistent RNN is NPcomplete and APXhard.
Proof sketch.
Clearly, we can guess an input string of polynomial length, run the RNN, and verify whether its weight exceeds the given bound in polynomial time. Therefore the problem is trivially in NP. For NPhardness, we reduce from the 01 Integer Linear Programming Feasibility Problem:
01 Integer Linear Programming Feasibility:
Given: variables which can only take values in , and constraints (): , . , . Return: Yes iff. there is a feasible solution that satisfies all constraints.
Suppose we are given an instance of the above problem. We construct an instance of the consistent best string of polynomial length problem with input . Our construction ensures that the only length at which a string can have weight greater than is . Thus, if there is any string whose weight is greater than , the given instance of 01 Integer Linear Programming Problem is feasible; otherwise it is not.
Our reduction is a PolynomialTime Approximation Scheme (PTAS) reduction and preserves approximability. Since 01 Integer Linear Programming Feasibility is NPcomplete and the corresponding maximization problem is APXcomplete, consistent best string of polynomial length is NPcomplete and APXhard, meaning there is no PTAS to find the best string bounded by polynomial length (i.e. the best we can hope for in polynomial time is a constantfactor approximation algorithm) unless P NP.
The full proof is given in the appendix.
If we assume that the solution length is bounded by some finite number, we can convert algorithms from Higuera2013ComputingTM for computing the most probable string in PFSAs for use in RNNs. Such algorithms would be similar to beam search Lowerre (1976) used most widely in practice.
5 Equivalence
We prove that equivalence of two RNNs is undecidable. For comparison, equivalence of two deterministic WFSAs can be tested in time , where , are the number of states of the two WFSAs and is the size of the alphabet Cortes et al. (2007); equivalence of nondeterministic WFSAs are undecidable Griffiths (1968). The decidability of language equivalence for deterministic probabilistic pushdowntown automata (PPDA) is still open Forejt et al. (2014), although equivalence for deterministic unweighted pushdowntown automata (PDA) is decidable Sénizergues (1997).
The equivalence problem is formulated as follows:
Equivalence:
Given two RNNs and , return “yes” if for all , and “no” otherwise.
Theorem 12.
The equivalence problem for RNNs is undecidable.
Proof.
We prove by contradiction. Suppose Turing machine decides the equivalence problem. Given any deterministic Turing Machine , construct the RNN that simulates on input as described in Corollary 5. Let and . If does not halt on , for all , ; if halts after steps, , . Let be the trivial RNN that computes . We run on input . If returns “no”, halts on , else it does not halt. Therefore the Halting Problem would be decidable if equivalence is decidable. Therefore equivalence is undecidable. ∎
6 Minimization
We look next at minimization of RNNs. For comparison, stateminimization of a deterministic PFSA is where is the number of transitions and is the number of states Aho et al. (1974). Minimization of a nondeterministic PFSA is PSPACEcomplete Jiang and Ravikumar (1993).
We focus on minimizing the number of hidden neurons () in RNNs:
Minimization:
Given RNN and nonnegative integer , return “yes” if RNN with number of hidden units such that for all , and “no” otherwise.
Theorem 13.
Minimization of RNNs is undecidable.
Proof.
We reduce from the Halting Problem. Suppose Turing Machine decides the minimization problem. For any Turing Machine , construct the same RNN as in Theorem 12. We run on input . Note that an RNN with no hidden unit can only output constant for all . Therefore the number of hidden units in can be minimized to if and only if it always outputs . If returns “yes”, does not halt on , else it halts. Therefore minimization is undecidable. ∎
7 Conclusion
We proved the following hardness results regarding RNN as a recognizer of weighted languages:

Consistency:

Inconsistent RNNs exist.

Consistency of RNNs is undecidable.


Highestweighted string:

Finding the highestweighted string for an arbitrary RNN is undecidable.

Finding the highestweighted string for a consistent RNN is decidable, but the solution length can surpass all computable bounds.

Restricting to solutions of polynomial length, finding the highestweighted string is NPcomplete and APXhard.


Testing equivalence of RNNs and minimizing the number of neurons in an RNN are both undecidable.
Although our undecidability results are upshots of the Turingcompleteness of RNN Siegelmann and Sontag (1995), our NPcompleteness and APXhardness results are original, and surprising, since the analogous hardness results in PFSA relies on the fact that there are multiple derivations for a single string Casacuberta and de la Higuera (2000)
. The fact that these results hold for the relatively simple RNNs we used in this paper suggests that the case would be the same for more complicated models used in NLP, such as long short term memory networks (LSTMs;
Hochreiter and Schmidhuber 1997).Our results show the nonexistence of (efficient) algorithms for interesting problems that researchers using RNN in natural language processing tasks may have hoped to find. On the other hand, the nonexistence of such efficient or exact algorithms gives evidence for the necessity of approximation, greedy or heuristic algorithms to solve those problems in practice. In particular, since finding the highestweighted string in RNN is the same as finding the mostlikely translation in a sequencetosequence RNN decoder, our NPcompleteness and APXhardness results provide some justification for employing greedy and beam search algorithms in practice.
Acknowledgments
This work was supported by DARPA (W911NF1510543 and HR001115C0115). Andreas Maletti was financially supported by DFG Graduiertenkolleg 1763 (QuantLA).
References
 Aho et al. (1974) Alfred V. Aho, John E. Hopcroft, and Jeffrey D. Ullman. 1974. The design and analysis of computer algorithms. AddisonWesley.
 Allauzen et al. (2007) Cyril Allauzen, Michael Riley, Johan Schalkwyk, Wojciech Skut, and Mehryar Mohri. 2007. OpenFst: A General and Efficient Weighted FiniteState Transducer Library, Springer Berlin Heidelberg, Berlin, Heidelberg, pages 11–23.
 Bahdanau et al. (2014) D. Bahdanau, K. Cho, and Y. Bengio. 2014. Neural machine translation by jointly learning to align and translate. In Proc. ICLR.
 Booth and Thompson (1973) T. L. Booth and R. A. Thompson. 1973. Applying probability measures to abstract languages. IEEE Transactions on Computers C22(5):442–450. https://doi.org/10.1109/tc.1973.223746.
 Casacuberta and de la Higuera (2000) Francisco Casacuberta and Colin de la Higuera. 2000. Computational complexity of problems on probabilistic grammars and transducers. Grammatical Inference: Algorithms and Applications Lecture Notes in Computer Science page 15–24. https://doi.org/10.1007/9783540452577_2.
 Cortes et al. (2007) Corinna Cortes, Mehryar Mohri, and Ashish Rastogi. 2007. L distance and equivalence of probabilistic automata. International Journal of Foundations of Computer Science 18(04):761–779. https://doi.org/10.1142/s0129054107004966.
 de la Higuera and Oncina (2013) Colin de la Higuera and José Oncina. 2013. Computing the most probable string with a probabilistic finite state machine. In FSMNLP.
 Droste et al. (2013) Manfred Droste, Werner Kuich, and Heiko Vogler. 2013. Handbook of Weighted Automata. Springer Berlin.
 Forejt et al. (2014) Vojtěch Forejt, Petr Jančar, Stefan Kiefer, and James Worrell. 2014. Language equivalence of probabilistic pushdown automata. Information and Computation 237:1–11. https://doi.org/10.1016/j.ic.2014.04.003.
 Griffiths (1968) T. V. Griffiths. 1968. The unsolvability of the equivalence problem for free nondeterministic generalized machines. Journal of the ACM 15(3):409–413. https://doi.org/10.1145/321466.321473.
 Hochreiter and Schmidhuber (1997) Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long shortterm memory. Neural Comput. 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735.
 Jiang and Ravikumar (1993) Tao Jiang and B. Ravikumar. 1993. Minimal NFA problems are hard. SIAM Journal on Computing 22(6):1117–1141. https://doi.org/10.1137/0222067.
 Jozefowicz et al. (2016) Rafal Jozefowicz, Oriol Vinyals, Mike Schuster, Noam Shazeer, and Yonghui Wu. 2016. Exploring the limits of language modeling. https://arxiv.org/pdf/1602.02410.pdf.
 Kim and Rush (2016) Yoon Kim and Alexander M. Rush. 2016. Sequencelevel knowledge distillation. In EMNLP.
 Knuth (1977) Donald E. Knuth. 1977. A generalization of Dijkstra’s algorithm. Information Processing Letters 6(1):1–5. https://doi.org/10.1016/00200190(77)900023.
 Lowerre (1976) Bruce T. Lowerre. 1976. The Harpy Speech Recognition System.. Ph.D. thesis, Pittsburgh, PA, USA. AAI7619331.
 Mikolov and Zweig (2012) Tomas Mikolov and Geoffrey Zweig. 2012. Context dependent recurrent neural network language model. 2012 IEEE Spoken Language Technology Workshop (SLT) https://doi.org/10.1109/slt.2012.6424228.
 Nederhof and Satta (2006) MarkJan Nederhof and Giorgio Satta. 2006. Estimation of consistent probabilistic contextfree grammars. Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics  https://doi.org/10.3115/1220835.1220879.
 Radó (1962) Tibor Radó. 1962. On noncomputable functions. Bell System Technical Journal 41:877–884.
 Sénizergues (1997) Géraud Sénizergues. 1997. The equivalence problem for deterministic pushdown automata is decidable. In Proc. Automata, Languages and Programming: 24th International Colloquium, Springer Berlin Heidelberg, pages 671–681.
 Siegelmann and Sontag (1995) Hava T. Siegelmann and Eduardo D. Sontag. 1995. On the computational power of neural nets. Journal of Computer and System Sciences 50(1):132–150. https://doi.org/10.1006/jcss.1995.1013.
 Simaan (1996) Khalil Simaan. 1996. Computational complexity of probabilistic disambiguation by means of treegrammars. In COLING. https://doi.org/10.3115/993268.993392.
 Sutskever et al. (2014) I. Sutskever, O. Vinyals, and Q. V. Le. 2014. Sequence to sequence learning with neural networks. In Proc. NIPS.
Appendix
Theorem 11.
Identifying the best string of polynomial length in a consistent RNN is NPcomplete and APXhard.
Proof.
Clearly, we can guess an input string of polynomial length, run the RNN, and verify whether its weight exceeds the given bound in polynomial time. Therefore the problem is trivially in NP. For NPhardness we now reduce from the 01 Integer Linear Programming Feasibility Problem to our problem:
01 Integer Linear Programming Feasibility:
Given: variables which can only take values in , and constraints (): , . , . Return: Yes iff. there is a feasible solution that satisfies all constraints.
Suppose we are given an instance of the above problem. Construct an instance of the consistent best string with polynomial length problem with input , where:

is an RNN as follows:
Let , . We pick a big enough positive rational number so that if we define ,
(1) When , set
Therefore
When , one can verify that we can set
where
so that
since the range of is a finite set of values .

From equation 1 we get
so we can pick such that its length written in binary
is logarithmic in and . So the weights in matrices that produce are polynomial in and . Same is true for the weights that produce . written in binary has length
which is polynomial in and . So our construction is polynomial.
We now prove that if we can solve the instance of consistent best string of polynomial length in polynomial time, we can also solve the given instance of 01 Integer Linear Programming Feasibility in polynomial time.
By our design, at time , reads a binary string into neurons while predicting almost halfhalf probability for either 0 or 1 and infinitesimal probability for termination. Therefore no string with length less then has weight greater than .
At time , since is an integer, is the indicator for whether the th constraint is satisfied:
Therefore is the total number of clauses satisfied by a given setting of (). The termination probability at is . If all clauses are satisfied, this setting of would have termination probability and therefore weight . If fewer than clauses are satisfied, would have weight at most .
When , continues to assign almost halfhalf probability for either 0 or 1 and infinitesimal probability for termination. Therefore any string of length greater than has a weight smaller than . From that point on the output vector is constant, so the RNN is consistent. Notice that the weights of strings monotonically decrease with length except for at length .
Therefore our construction ensures that the only length at which a string can have weight greater than is . Thus, if there is any string whose weight is greater than , the given instance of 01 Integer Linear Programming Problem is feasible; otherwise it is not.
Define the maximum number of clauses satisfied by all assignments of :
By our construction, when , the highestweighted string will occur at length , and has weight which is proportional to <
Comments
There are no comments yet.