Cryptographic hash functions have a wide range of applications: starting from various data security  and cryptocurrency protocols  to theoretical methods for cryptographic resistance justification of different cryptosystems [3, 4]. The Merkle-Damgard construction [5, 6] is considered to be one of the most successful paradigms for constructing cryptographically resistant hash functions. The MD4 hash function  is one of the first examples of hash functions based on the Merkle-Damgard construction. The widely known works [8, 9] demonstrated the possibility of constructing collisions for hash functions MD4 and MD5. Thus, these functions have been compromised with respect to the collision attack. However, today even MD4 remains resistant to the so called preimage attack, which consists in the following: for a known hash value to find a corresponding input message. In this context, the implementation of preimage attacks on truncated variants of MD4 hash function is of interest. The truncated variant of the MD4 hash function is a variant of the original algorithm, which contains fewer steps (non-truncated variant consists of 48 steps). Hereinafter by MD4-k we denote a truncated variant of MD4 with steps, .
The first successful attack on truncated variant of MD4 with a relatively large number of steps was described by H. Dobbertin in . In this work it was showed that two-round version of MD4, i.e. MD4-32, is not one-way. The main idea of Dobbertin’s attack is to use additional constraints on chaining variables at the certain steps of the MD4 algorithm to derive additional information, which leads to fast resolution of the corresponding cryptanalysis equations.
To the best of our knowledge, the attack described in  is currently the best known attack on truncated variants of MD4. This attack is a SAT-variant of Dobbertin’s attack which used the constraints of Dobbertin’s type, in the sense that they were applied to the same chaining variables as in . The resulting system of cryptanalysis equations was reduced to the Boolean Satisfiability Problem (SAT) and then solved using the minisat  SAT solver. For MD4-32 the SAT variant of the Dobbertin’s attack turned out to be very effective. The main novelty of  is to use Dobbertin’s constraints and state-of-the-art SAT solvers to find preimages for MD4-k, , within a reasonable time. However, it should be noted that the corresponding computational experiment for MD4-39 took a lot of time (about 8 hours on one processor core). In addition, in  only the hash values of special kinds were considered. It is surprising that until 2017 there was, apparently, no progress in the practical implementation of the preimage attacks, which would be more effective than the attack from .
In  we presented a parallel SAT-variant of Dobbertin’s attack on MD4-k, . One of the main results of  is the automatic search procedure of Dobbertin’s constraints. For MD4-39 it was achieved a relatively fast solving of the preimage finding problem for the hash value (hereinafter denotes a word which consists of symbols).
In the present paper we improve the results from  in the following directions. First, we consider the problem of finding relaxation constraints of Dobbertin’s type as a problem of black-box optimization over Boolean hypercube. To solve this problem we develop metaheuristic algorithm related to the class of Tabu Search algorithms. Using this algorithm we construct new relaxation constraints of Dobbertin’s type for the MD4-39 preimage finding problem. These constraints, which are different from the ones presented in  and 
, make it possible to find the MD4-39 preimages for 65-75% of randomly generated 128-bit vectors within one minute of theminisat2.2 SAT solver runtime on a single processor core Intel i7-3770K (3.5 GHz). Whereas using constraints from  and  minisat2.2 is not capable to solve these tasks in several hours.
As it was mentioned above, the MD4 hash function is a cryptographic hash function based on the Merkle-Damgard construction. This hash function can be used to calculate hash values for messages of an arbitrary length. The input message is split into 512-bit blocks. The resulting hash value is written in a special 128-bit register called hash register. The hash register is divided into four parts of 32-bit length. According to the Merkle-Damgard construction, the fixed Initial Value (IV) is written to the hash register before hashing the first block of input message. Further, the contents of the hash register is iteratively modified. Thus, before hashing the 512-bit block with number , the hash register contains the result of hashing of message blocks with numbers from 1 to . The process of hashing of one 512-bit block is divided into 3 rounds with 16 steps each (thus, 48 steps in total). The contents of the hash register is mixed with the input message using the round functions. In total, MD4 uses three round functions, detailed descriptions of which can be found in a variety of sources (i.e. ). On each step with number a variable called chaining variable is associated with one of four parts of the hash register.
Hereinafter, we consider the problem of finding preimage (preimage attack) for the function of the kind:
assuming that at the initial moment of time the hash register contains IV, corresponding to the specification of MD4. In fact, we consider the problem of finding 512-bit MD4-k preimage for known 128-bit hash value. Herein the main object of further interest is the function.
Let us briefly recall the idea of the Dobbertin’s attack . Based on the analysis of the round functions properties, H. Dobbertin proposed to fix with constant the values of certain chaining variables corresponding to the steps of the algorithm with numbers:
The substitution of corresponding values into the cryptanalysis equations makes it possible to derive a significant part of the values of variables, which encode the unknown 512-bit input message. This, in turn, leads to a further simplification of the problem. As a result, in 1998 H. Dobbertin managed to find preimages for the MD4-32 hash function on a personal computer. We will refer to Dobbertin’s constraints to denote the additional constraints of the form , where is chaining variable at -th step and goes through the set of numbers from (2). In general case, similar constraints on various steps of the MD4 algorithm different from (2) can be used. For all such constraints we use the term relaxation constraints.
The next step is to use a powerful combinatorial algorithm for solving the cryptanalysis equations with additional relaxation constraints. As it was mentioned above, this idea was proposed in  where Dobbertin’s constraints were used with constant and the corresponding cryptanalysis equations were solved using the minisat  SAT solver.
Let us recall, that SAT (short for ”Satisfiability”) is a problem of satisfiability of an arbitrary Boolean formula, which consists in the following: for an arbitrary formula over the set of Boolean variables to decide if there exists such an assignment of variables from that makes this formula true. It is usually considered in the variant where is presented in conjunctive normal form (CNF).
The approach in which modern SAT solvers are used to solve cryptanalysis problems is called SAT-based cryptanalysis. To reduce the preimage finding problem (inversion problem) of an arbitrary total discrete function of the kind to SAT one can use various automatic translation systems, like Cryptol  or URSA . In our work we use software system Transalg  specially designed to produce SAT encodings for the inversion problems of cryptographic functions. Transalg performs a symbolic execution  of a program, which specifies the considered function . The result of such execution is a CNF called template CNF. By we denote the result of the substitution of a known image of function , , into CNF . It can be shown that is satisfiable. Asumming that satisfying assignment for is found using SAT solver, a preimage such that can be extracted from this assignment. Using the methods of SAT-based cryptanalysis to find preimages of cryptographic hash functions is called a SAT-based preimage attack on this function.
As it was mentioned above, in  the parallel version of SAT-based preimage attack on MD4-k () from  was proposed. However, in the role of relaxation constraints the same Dobbertin’s constraints were used. In the next section we consider the generation of relaxation constrains as a problem of block-box optimization over Boolean hypercube. We also present computational results obtained using new relaxation constraints.
Iii The generation of relaxation constrains as a problem of black-box optimization
Let us consider the preimage finding problem for the function of the kind (1) with fixed and reduce this problem to SAT. Let be template CNF for this problem and be a set of all Boolean variables in . By we denote a CNF obtained by substitution of a hash value into .
Below we briefly describe the idea of switching variables introduced in . Suppose that there is some set of relaxation constraints , where an arbitrary constraint is usually a conjunction of some literals, i.e. a formula of the kind:
(note that literal is a formula of the kind or , where is a Boolean variable).
Consider a new set of Boolean variables , . We call such variables switching variables. Let us associate with an arbitrary of the kind (3) the following CNF:
It should be noted that the literals can be derived from CNF using the Unit Propagation (UP) rule . Then, this new information will be further propagated according to UP. On the other hand, it’s obvious that . In this case the constraint does not give any additional information. We say that the constraint is active if and inactive if .
Let be the set of switching variables. The set of all possible values of variables from is . Thus, each nonzero Boolean vector specifies some set of active relaxation constraints from set . Our first goal is to learn how to distinguish more effective sets of relaxation constraints from less effective ones (in the sence of increasing the efficiency of the corresponding SAT-based preimage attack). To solve this problem we use the approach similar to that applied in [19, 20] for searching SAT partitionings  of SAT-instances arising in cryptanalysis problems. In particular, we introduce a measure of efficiency for an arbitrary set of relaxation constraints from and consider the problem of finding sets of relaxation constrains with good efficiency as a problem of maximization of a specially defined function over Boolean hypercube .
The problem of choosing the adequate measure of efficiency for relaxation constraints is quite non-trivial. At this stage, after a large number of experiments, the measure was defined as follows. Consider an arbitrary vector , where is a set of components equal to 1. Taking into account the above, these components define a set of active relaxation constraints from , namely, the constraints with numbers . Consider the following CNF:
Everywhere below, we will use notation to denote that literal is derived from CNF (4) using UP. By we denote a set of Boolean variables in , which encode an unknown 512-bit input of function.
For an arbitrary we consider the function:
In other words is the number of literals from , which were derived by UP from CNF (4) as a result of activation of relaxation constraints corresponding to vector .
We will consider the maximization problem of (5) over Boolean hypercube . It’s obvious that function (5) is a function of black-box type and its analitical properties are unknown. Thereby it is justified to use metaheuristic algorithms for the maximization problem of (5). At this stage, we implemented a special variant of Tabu Search algorithm . In the computational experiments, discussed further, we considered the Hamming neighborhoods of the radius 1 in . The pseudocode of the algorithm is presented below.
Let us give more detailed description of the A1 algorithm. The input of A1 algorithm is CNF encoding the MD4-39 preimage finding problem for a known hash value χ and starting point with a corresponding set of relaxation constraints of the kind (3). As a starting point, either a random point or some known point can be chosen. The contents of the and lists are initialized using function initializeLists. At the initial moment the list is empty, contains point , is equal to and is the value of the objective function .
In the main loop of the algorithm the neighborhood of the point , denoted by , is considered. Function getNewPoint chooses any unchecked point from as a current point . Function markPointInTabuLists adds point to and then marks as checked in all neighborhoods of points from which contain . This allows to avoid re-processing of the same points. If the neighborhood of some point contains only checked points, then this point is moved to .
For current point and corresponding CNF of the kind (4) function isCorrectPoint runs a SAT solver for a short period of time. If, as the result, CNF is proven to be unsatisfiable, then the algorithm moves to the next point from the neighborhood . Otherwise, the value is computed and compared with the value .
In case if we did not improve value in the neighborhood of , new point must be selected from . Function getNewCenter chooses a point from with a value of the objective function which is closest to the known .
The algorithm is completed if a certain time limit is exceeded or the entire search space is processed (in this case is empty). The output of the algorithm is the point and the corresponding value of the objective function .
Iv Computational experiments
In this section we describe computational results for MD4-39 preimage attack using the method of relaxation constraints generation described above. At the current stage, the A1 algorithm is implemented as a single-threaded application. To calculate the value of the function (5) the Unit Propagation procedure, implemented in all modern CDCL solvers, is used.
Everywhere below, the constraints of the kind (3), consisting only of literals with negation, were used as relaxation constraints. Thus, we used constraints of Dobbertin’s type with constant .
Let us note here that the structure of the MD4-39 hash function makes it impossible to impose constraints on the first four and the last (preceding the calculation of the final hash value) four steps of the MD4-39 algorithm. According to this, the sets of new relaxation constraints were selected (using the values of the corresponding switching variables) from the set of power . Thus, the problem of maximization of the function (5) over Boolean hypercube was considered.
In the early experiments it was found that some sets of relaxation constraints produce CNFs, for which the UNSAT can be proven quite quickly (within a few seconds). In practical implementation of the algorithm for each set of values of switching variables that specifies a set of relaxation constraints, not only the value of function (5) was calculated, but also short time limit was given to solve the corresponding SAT instance. This step allows to screen out some points without the computation of the objective function.
In the A1 algorithm the following actions are performed: selection of starting search point; screening out the points for which unsatisfiability is proven quickly; accumulation of all record points; exit from local maxima.
The A1 algorithm was run on one core of Intel i7-3770K (3.5 GHz) processor under Linux OS (Ubuntu 16.04). In all computational experiments the MD4-39 preimage finding problem for was considered. For the points, obtained using the A1 algorithm, with the value of the objective function close to the maximal possible value (i.e. 512), we established that corresponding sets of relaxation constraints define unsatisfiable CNFs. The satisfiability problems of such CNFs were considered as a separate problems, which in some cases required a significant amount of time. Thus, it was necessary to select points from for which there was a good chance for the corresponding CNF of the kind (4
) to be satisfiable. To find such points the following heuristic was used: first, to select only those points where the value of the functionwas improved (i.e., record points); second, to select the points with the value of the function from the interval . The total number of record points from the number of all points processed in several hours was approximately 2%. The total number of perspective points identified by the heuristic described above was 0.5%. For each point from the perspective set of points the minisat2.2 SAT solver was applied to the corresponding CNFs with a small time limit (60 seconds).
As a result of the above actions, two new sets of relaxation constraints were obtained. These sets are specified by the following vectors of values of switching variables from :
The application of these sets allows one to find preimages of the MD4-39 hash function for known hash values and within a minute of minisat2.2 runtime (whereas using constraints from  the solution of the preimage finding problem for requires about 2 hours, and the preimage finding problem for cannot be solved in 8 hours). Corresponding results are presented in Table I, where denotes the set of relaxation constraints described in  and denotes the variant of Dobbertin’s constraints from  with constant . Below these relaxation constraints are specified by the vectors of values of switching variables from (in the similar notation to that of and ):
What is particularly interesting is that the application of new sets of relaxation constraints and also allows one to find preimages of MD4-39 for randomly generated 128-bit hash values persistently. To obtain this result, we considered a test set consisting of 500 randomly generated vectors from . Each vector from this set was taken as a hash value of the MD4-39 hash function. After that the preimage finding problem for this value was solved using constraints and . For the prevailing part of the tasks (65-75%) the solutions were successfully found using the minisat2.2 SAT solver. The average time of finding one preimage was less than 1 minute. The rest ones (25-35% of the tasks) corresponded to 128-bit vectors for which there were no MD4-39 preimages under constraints and . These results are presented in Table II. Note that even in a few hours we did not manage to solve the preimage finding problem for any vector from the test set using constraints from  or .
In the present paper a new SAT-based preimage attack on the 39-step variant of the MD4 cryptographic hash function is suggested. This attack makes it possible to solve the MD4-39 preimage finding problem for a very significant percentage of randomly generated 128-bit vectors, spending on one such vector less than a minute of minisat2.2 runtime. The proposed attack is much more effective than the best known attack on the considered truncated variant of the MD4 hash function presented about 10 years ago in .
We intend to develop the approach described in this paper in the direction of studying the preimage finding problem of MD4-k, where . The preliminary results show that the corresponding problem for MD4-40 demands significantly more computational resources in comparison with MD4-39: the relaxation constrains constructed using the method described in this paper do not make it possible to solve the MD4-40 preimage finding problem on a single processor core. At the same time, this effect is not observed between the preimage attacks on MD4-38 and MD4-39. In the nearest future we plan to apply the parallel SAT solvers to the inversion problems of MD4-k, with relaxation constraints constructed using the method presented in this paper.
|Relaxation||Solving time (s)|
|Relaxation||Avg.||Max.||Solved instances (in % of total|
|constraints||solving||solving||number of instances)|
|time (s)||time (s)||with preimages (SAT)||with no preimages (UNSAT)|
Vi Related Work
The first mention of the approach to the construction of hash functions, which is widely known today as the Merkle-Damgard construction, can be found in . In  and  R. Merkle and I. Damgard independently described a number of important properties of hash functions based on this construction. One of the first practical implementations of Merkle-Damgard construction was the MD4 hash function  developed by R.Rivest. In  the MD4 hash function was completely compromised with respect to the collision attack. The collision search problem for the functions from MD family in the form of SAT was first proposed in . However, real practical results in this direction were obtained later in . The use of propositional encodings presented in  made it possible to find collisions for MD4 hash function (with the help of modern SAT solvers) about 1000 times faster than it was done in .
In a number of works the resistance of MD4 hash function to the preimage attack was studied. Today it is generally accepted that MD4 is not resistant to the preimage attack, although the best known preimage attack on the full-round version of MD4 is theoretical . The first practical preimage attack on truncated variant of MD4 was implemented by H.Dobbertin: the algorithm presented in  allows one to find preimages of MD4-32 on a personal computer. As far as we know, in the last 10 years the best practical attack on truncated variants of MD4 was attack described in . In this attack the minisat  SAT solver was used to find preimages of MD4-39 weakened by the additional constraints. In the present paper we significantly improve the results presented in .
The research was funded by Russian Science Foundation (project No. 16-11-10046).
-  B. Schneier, Applied Cryptography - Protocols, Algorithms, and Source Code in C, 2nd Edition. Wiley, 1996.
-  S. Nakamoto, “Bitcoin: A peer-to-peer electronic cash system”.” [Online]. Available: http://bitcoin.org/bitcoin.pdf
-  D. Pointcheval and J. Stern, “Security arguments for digital signatures and blind signatures,” Journal of Cryptology, vol. 13, no. 3, pp. 361–396, 2000.
-  N. Koblitz and A. J. Menezes, “The random oracle model: A twenty-year retrospective,” Des. Codes Cryptography, vol. 77, no. 2-3, pp. 587–610, 2015.
-  R. C. Merkle, “A certified digital signature,” in Advances in Cryptology - CRYPTO ’89, Proceedings, ser. Lecture Notes in Computer Science, G. Brassard, Ed., vol. 435. Springer, 1989, pp. 218–238.
-  I. Damgård, “A design principle for hash functions,” in Advances in Cryptology - CRYPTO ’89, Proceedings, ser. Lecture Notes in Computer Science, G. Brassard, Ed., vol. 435. Springer, 1989, pp. 416–427.
-  R. L. Rivest, “The MD4 message digest algorithm,” in Advances in Cryptology - CRYPTO ’90, Proceedings, ser. Lecture Notes in Computer Science, A. Menezes and S. A. Vanstone, Eds., vol. 537. Springer, 1990, pp. 303–311.
-  X. Wang, X. Lai, D. Feng, H. Chen, and X. Yu, “Cryptanalysis of the hash functions MD4 and RIPEMD,” in Advances in Cryptology - EUROCRYPT 2005, Proceedings, ser. Lecture Notes in Computer Science, R. Cramer, Ed., vol. 3494. Springer, 2005, pp. 1–18.
-  X. Wang and H. Yu, “How to break MD5 and other hash functions,” in Advances in Cryptology - EUROCRYPT 2005, Proceedings, ser. Lecture Notes in Computer Science, R. Cramer, Ed., vol. 3494. Springer, 2005, pp. 19–35.
-  H. Dobbertin, “The first two rounds of MD4 are not one-way,” in Fast Software Encryption, ser. Lecture Notes in Computer Science, S. Vaudenay, Ed. Springer Berlin Heidelberg, 1998, vol. 1372, pp. 284–292.
-  D. De, A. Kumarasubramanian, and R. Venkatesan, “Inversion attacks on secure hash functions using SAT solvers,” in Theory and Applications of Satisfiability Testing - SAT 2007, Proceedings, ser. Lecture Notes in Computer Science, J. Marques-Silva and K. A. Sakallah, Eds., vol. 4501. Springer, 2007, pp. 377–382.
-  N. Eén and N. Sörensson, “An extensible SAT-solver,” in SAT, ser. Lecture Notes in Computer Science, E. Giunchiglia and A. Tacchella, Eds., vol. 2919. Springer, 2003, pp. 502–518.
-  I. Gribanova, O. Zaikin, I. Otpuschennikov, and A. Semenov, “Using parallel SAT solving algorithms to study the inversion of MD4 hash function,” in Parallel Computing Technologies, PCT 2017, Kazan, Russia, April 3-7, 2017, Proceedings, 2017, pp. 100–109.
-  L. Erkök and J. Matthews, “Pragmatic equivalence and safety checking in Cryptol,” in Proceedings of the 3rd ACM Workshop Programming Languages meets Program Verification, PLPV 2009, Savannah, GA, USA, 2009, pp. 73–82.
-  P. Janicic, “URSA: a system for uniform reduction to SAT,” Logical Methods in Computer Science, vol. 8, no. 3, pp. 1–39, 2012.
I. Otpuschennikov, A. Semenov, I. Gribanova, O. Zaikin, and S. Kochemazov,
“Encoding cryptographic functions to SAT using Transalg system,” in
ECAI 2016, Proceedings
, ser. Frontiers in Artificial Intelligence and Applications, G. A. Kaminka, M. Fox, P. Bouquet, E. Hüllermeier, V. Dignum, F. Dignum, and F. van Harmelen, Eds., vol. 285. IOS Press, 2016, pp. 1594–1595.
-  J. C. King, “Symbolic execution and program testing,” Commun. ACM, vol. 19, no. 7, pp. 385–394, 1976.
W. F. Dowling and J. H. Gallier, “Linear-time algorithms for testing the
satisfiability of propositional horn formulae,”
The Journal of Logic Programming, vol. 1, no. 3, pp. 267–284, oct 1984.
-  A. Semenov and O. Zaikin, “Using Monte Carlo method for searching partitionings of hard variants of Boolean satisfiability problem,” in Parallel Computing Technologies, ser. Lecture Notes in Computer Science, V. Malyshkin, Ed., vol. 9251. Springer International Publishing, 2015, pp. 222–230.
-  A. Semenov and O. Zaikin, “Algorithm for finding partitionings of hard variants of Boolean satisfiability problem with application to inversion of some cryptographic functions,” SpringerPlus, vol. 5, no. 1, pp. 1–16, 2016.
-  A. E. J. Hyvärinen, “Grid based propositional satisfiability solving,” Ph.D. dissertation, Aalto University, Helsinki, Finland, 2011.
-  F. Glover and M. Laguna, TABU search. Kluwer, 1999.
-  R. C. Merkle, “Secrecy, authentication, and public key systems,” Ph.D. dissertation, Stanford University, Stanford, USA, 1979.
-  D. Jovanovic and P. Janicic, “Logical analysis of hash functions,” in Frontiers of Combining Systems, 5th International Workshop, FroCoS 2005, Proceedings, ser. Lecture Notes in Computer Science, B. Gramlich, Ed., vol. 3717. Springer, 2005, pp. 200–215.
-  I. Mironov and L. Zhang, “Applications of SAT solvers to cryptanalysis of hash functions,” in SAT, ser. Lecture Notes in Computer Science, A. Biere and C. P. Gomes, Eds., vol. 4121. Springer, 2006, pp. 102–115.
-  G. Leurent, “MD4 is not one-way,” in Fast Software Encryption, ser. Lecture Notes in Computer Science, K. Nyberg, Ed., vol. 5086. Berlin, Heidelberg: Springer Berlin Heidelberg, 2008, pp. 412–428.