Log In Sign Up

GE852: A Dataset of 852 Game Engines

Game engines provide a platform for developers to build games with an interface tailored to handle the complexity during game development. To reduce effort and improve quality of game development, there is a strong need to understand and analyze the quality of game engines and their various aspects such as API usability, code quality, code reuse and so on. To the best our knowledge, we are not aware of any dataset that caters to game engines in the literature. To this end, we present GE852, a dataset of 852 game engine repositories mined from GitHub in two languages, namely Java and C++. The dataset contains metadata of all the mined repositories including commits, pull requests, issues and so on. We believe that our dataset can lay foundation for empirical investigation in the area of game engines.


page 1

page 2

page 3

page 4


Are Game Engines Software Frameworks? A Three-perspective Study

Game engines help developers create video games and avoid duplication of...

Comparative Study of Cloud and Non-Cloud Gaming Platform: Apercu

Nowadays game engines are imperative for building 3D applications and ga...

Game Engine Comparative Anatomy

Video game developers use game engines as a tool to manage complex aspec...

Scaling Probe-Based Real-Time Dynamic Global Illumination for Production

We contribute several practical extensions to the probe based irradiance...

NoteG: A Computational Notebook to Facilitate Rapid Game Prototyping

Game development-based approaches are increasingly used to design curric...

Procedural Planetary Multi-resolution Terrain Generation for Games

Terrains are the main part of an electronic game. To reduce human effort...

Academic Search Engines: Constraints, Bugs, and Recommendation

Background: Academic search engines (i.e., digital libraries and indexer...

I Introduction

Games are considered as one of the sophisticated and complex forms of software [25][3]. Studies indicating positive effects of playing video games has opened up new opportunities for game developers to develop innovative games [9]. This has also led to a drastic increase in the number of games available in platforms such as Google PlayStore111 and Steam222 The rise in the number of games led to diversified ways of game development, and researchers have noted that the process of game development is different from software development [18, 22]. On the other hand, game software development is considered as an effort-intensive activity [1]. A recent study by Kasurin et al. [13] reveals several concerns of developers in game development such as code reusability, API dependencies, compatibility, maintenance and so on.

Game engines provide developers with reusable software development kits with inbuilt APIs and functionality which helps developers perform various tasks more efficiently [12]. Paul et al. [19] discussed different types of game engines, their specifications and features in detail. The development of game engines itself is considered as a complex software development process [14] with complex implementation [10]. However, the increasing number of game engines makes it difficult for game developers to choose an appropriate game engine that is suitable in the context of their game development [19]. It is also critical for game developers to analyze game engines for appropriate API compatibility, software evolution, dependencies, code clones and reuse and several issues that align their own software development process. Existing empirical studies on game engines have largely focused on performance evaluation. Messaoudi et al. [17, 16] have presented a study on performance evaluation of the CPU and GPU of different modules in Unity3D. However, there is a strong need to perform empirical research on different aspects of game engines such as energy efficiency, code clones and reuse, API usage patterns, release engineering and pull requests among others to support game developers.

While there exist several datasets to support empirical research in the broad areas of software engineering such as API usage patterns [26, 24], energy consumption [4, 15], code clones [21], source code metrics [11] and so on, to the best of our knowledge, there have been no datasets that provide game engines for empirical research. With this background, we present GE852, a dataset333The dataset is available online at of game engines consisting of 852 distinct game engines mined from GitHub in two programming language, Java and C++. Total number of game engines written in Java are 408 and that of C++ are 444. Overall, GE852 contains metadata of 2627 game engines including the forked projects. We believe that our dataset will help in fostering new direction in domain of game engines.

The remainder of this paper is structured as follows. In Section II, we discuss the data collection process and Section III details database schema. Section IV lists the potential usages of the dataset for researchers and practitioners. Finally, we discuss limitations in Section V followed by conclusions and future work in Section VI.

Fig. 1: Flow of extraction process

Ii Data Collection

The data extraction process is shown in Figure 1 and is elaborated here. The process of collection of metadata began with preliminary automated filtering, followed by manual validation and ended with mining the actual repositories using GHTorrent444, that is widely used for mining GitHub repositories [8].

We enumerate the following steps followed to create the dataset.

Step 1 - Collection of game engine repository links. As an initial step, we queried GitHub with keywords, “game engine” that returned over 29 thousand repositories.

Step 2 - Filtration of game engines by programming language. We then filtered out game engines written in Java and C++.

Step 3 - Manual validation of the obtained results. We noticed that even after filtration, some repositories did not truly belong to the game engine category that we were intending to download. An instance of this was initially noticed in the second search page of Java game engines when sorted by most stars. We found that the repository “arjanfrans/mario-game” was listed along with the surrounding game engine repositories. Then, we anticipated many more false positives and manually validated every listed repository by reading the description of every repository. Then we had a corpus of GitHub links pointing to game engine repositories.

Step 4 - Collection of metadata. The collection of metadata from GitHub could be done either using custom made scripts or using any open source tool such as PyDriller [23], RepoDriller555, GitMiner666 and Boa [6]. We found that GHTorrent is a tool that has been used by researchers to mine data from GitHub since 2012 and continuously lists the daily dumps. For our study we independently mined data using GHTorrent without using the dumps provided by them.

Step 5 - Store GitHub API responses in MongoDB.

Finally, we stored the raw data as MongoDB documents and structured data in MariaDB (a fully open source fork of MySQL) tables. The raw data contains responses to every REST API call.

Step 6 - Dropping irrelevant tables. GHTorrent, by default retrieves the metadata with fields as described by their schema777 Tables and attributes relevant to the context of game engines were retained and rest of them were dropped. We considered those tables that are relevant for conducting empirical research in the context of game engines.

Fig. 2: Database Schema

Iii Database Schema

The schema of the database is shown in Figure 2. The description of all the 11 tables are given below:

  • projects: This table contains attributes such as name, url, descriptor, language, owner_id that describe preliminary information about the repository. The forked_from field indicates the project_id of the project from where it has been forked. It contains a NULL value for repositories that have not been forked. The deleted field indicates if the repository has been deleted from GitHub.

  • users: It gives the information of all the users related to game engine projects. It contains various attributes depicting the information about the users such as name, email, login, company, country_code. The fake attribute states whether the user is real or fake. Real users can own projects, push commits and create pull requests whereas fake users can only appear as authors or committers of commits. There is another attribute with name deleted, which indicates an earlier presence of the user on GitHub whose details no longer exist.

  • commits: project_id refers to the project to which a particular commit has been associated first, which might not be the project_id of the project it was initially pushed to. Another attribute of this table is sha, which acts as global identifier for each commit.

  • projects_commits: It stores the relation between commits and projects, as more than one project can share the same commit if one is a fork of another.

  • commit_comments: It stores the comments associated with each commit. If a commit has association with any pull request, then its comment can be obtained from the pull_Request_Comments table.

  • issues: Field id is the primary key, whereas, repo_id is the foreign key pointing to the projects table. assignee_id refers to the user to whom the issue was assigned at the time of creating the issue. issue_id is unique identifier given by GitHub to each issue. Fields pull_request and pull_request_id are for pull_requests associated to any issue. Creation date of the repository is stored in created_at column.

  • issue_comment: It stores information about discussions being made in the issues section in GitHub. Each comment is related to a issue by issue_id. user_id uniquely identifies the user who made the comment.

  • pull_requests: A pull request is initiated by (head_repo_id, head_commit_id) to (base_repo_id:base_commit_id). Thus, head and base information is stored in these attributes. pull_request_id is a unique GitHub identifier to pull requests. intra_branch indicates whether the head and base repository of pull request are same or different.

  • pull_request_comments: It stores all the comments being made on commits associated with any pull request.

  • pull_request_commits: stores the association of a commit with a pull request.

  • pull_request_history: This table contains the history of pull_requests.

In Figure 2, the dotted and solid lines depict the connections between different tables in the database schema. The solid line denotes an identifying relationship which means that primary key of parent entity is included in primary key of child entity. However, the dotted line refers to a non-identifying relationship which indicates that primary key of the parent table is included in child entity but not as part of its primary key.

Iv Dataset Usage or Applications

API Patterns Probability of usage
SDL2.SDL.UTF8_ToNative 0.21983
SDL2.SDL.UTF8_ToManaged, SDL2.SDL.SDL_free 0.15948
OpenDiablo2.Common.Models.Mobs.Stat.GetCurrent, OpenDiablo2.Common.Models.Mobs.Stat.GetMax 0.10345
OpenDiablo2.Common.Interfaces.Mobs.IStatModifier.GetValue 0.10345
OpenDiablo2.Common.Interfaces.IRenderWindow.LoadSprite 0.03017
TABLE I: API usage patterns found in Open Diablo Game Engine using PAM tool

The main goal of GE852 dataset is to create a platform for researchers and practitioners to empirically investigate research challenges in the broad areas of software engineering, that were largely undermined in the context of game engines. We enumerate a set of preliminary research directions based on our dataset.

Iv-a Discovering API usage patterns of game engines

There has been existing research on API usability for software development [27]. However, it is important to understand API usage in game engines, which is not studied till now. Through our dataset, we have conducted a preliminary experiment to extract API usage patterns and the probability of the occurrence of APIs along with other APIs. Table 1 shows some statistics for a game engine that were calculated using a tool called PAM (Probabilistic API-miner) [7]. The table shows that the API OpenDiablo2.Common.Models.Mobs.Stat.GetCurrent is used more frequently than the API OpenDiablo2.Common.Interfaces.IRenderWindow.LoadSprite. We see that there is scope for doing extensive research in this direction and as a first step we are working on determining API use and misuse in game engines.

Iv-B Analyzing pull-requests and commits

Rahman et al. [20] have emphasized the need to study successful and unsuccessful pull-requests. The GE852 dataset could be potentially used for understanding potential reasons for acceptance or rejection of pull-requests and its relationship to developer and project-specific information. Further to this, we can study pull requests across game engine repositories and also issues. We can analyze the commit history of the project to infer insights that could help developers. We can also analyze code changes across the version history of the game engines.

Iv-C Issues in game engines

Bissyande et al. [2] emphasize that issues play a critical role in improving a project’s performance. Issues help software developers to understand flaws in specific modules of software development, their source, potential reasons behind issues and eventually what measures could to be taken to resolve these issues. There is a need to empirically investigate issues in the context of game engines, as they form the basis for further development of large number of games. Hence, we see that our dataset could be used to conduct empirical studies in understanding, analyzing and reporting issues in game engines.

Iv-D Energy Efficiency

In the last few years, there is an ever increasing demand for energy-efficient apps and games, more so for mobile platforms. This presents a tremendous opportunity for conducting empirical research that analyzes game engines for energy efficiency, which is largely ignored in the current literature.

Iv-E Code quality and cloning

Code quality and code clones are an active area of research in games [21, 5]. However, we are not aware of existing research on analyzing code clones, code reuse and code quality patterns in game engines, which could be conducted through our dataset.

V Limitations

  • In the current dataset, we consider game engines written in Java and C++ only. We intend to extend the dataset to include game engines in other languages such as C# and Javascript.

  • Even after careful examination, some unintentional noisy data may have been provided in the dataset. We plan to refine the process and extend the dataset in our future works.

Vi Conclusion and Future Work

We have presented GE852 as a dataset consisting of 852 game engines mined from GitHub. It essentially contains metadata such as commits, pull requests and issues of the respective game engines. To the best of our knowledge, this is the first dataset made available in the domain of game engines. We believe that this dataset will help in conducting a number of empirical studies on game engines. The quality of game engines may be analyzed that will lead to better quality of game engines. This dataset will help in exploring the underlying causes of why developers choose a particular game engine to develop their game over others. Moreover, this data can help in building tools which can benchmark game engines in terms of API usage, structural usability of game engines and performance. Thus, we believe that GE852 will open new opportunities and research work in the domain of game engines.


  • [1] S. Aleem, L. F. Capretz, and F. Ahmed, “Game development software engineering process life cycle: a systematic review,” Journal of Software Engineering Research and Development, vol. 4, no. 1, p. 6, 2016.
  • [2] T. F. Bissyandé, D. Lo, L. Jiang, L. Réveillere, J. Klein, and Y. Le Traon, “Got issues? who cares about it? a large scale investigation of issue trackers from github,” in Software Reliability Engineering (ISSRE), 2013 IEEE 24th International Symposium on.   IEEE, 2013, pp. 188–197.
  • [3] J. Blow, “Game development: Harder than you think,” Queue, vol. 1, no. 10, p. 28, 2004.
  • [4] E. Capra, C. Francalanci, and S. A. Slaughter, “Is software “green”? application development environments and energy efficiency in open source applications,” Information and Software Technology, vol. 54, no. 1, pp. 60–71, 2012.
  • [5] J. R. Cordy and C. K. Roy, “The nicad clone detector,” in Program Comprehension (ICPC), 2011 IEEE 19th International Conference on.   IEEE, 2011, pp. 219–220.
  • [6] R. Dyer, H. A. Nguyen, H. Rajan, and T. N. Nguyen, “Boa: A language and infrastructure for analyzing ultra-large-scale software repositories,” in 35th International Conference on Software Engineering, ser. ICSE’13, May 2013, pp. 422–431.
  • [7] J. Fowkes and C. Sutton, “Parameter-free probabilistic api mining across github,” in Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering.   ACM, 2016, pp. 254–265.
  • [8] G. Gousios, “The ghtorrent dataset and tool suite,” in Proceedings of the 10th Working Conference on Mining Software Repositories, ser. MSR ’13.   Piscataway, NJ, USA: IEEE Press, 2013, pp. 233–236. [Online]. Available:
  • [9] I. Granic, A. Lobel, and R. C. Engels, “The benefits of playing video games.” American psychologist, vol. 69, no. 1, p. 66, 2014.
  • [10] J. Gregory, Game engine architecture.   AK Peters/CRC Press, 2014.
  • [11] S. Haefliger, G. Von Krogh, and S. Spaeth, “Code reuse in open source software,” Management science, vol. 54, no. 1, pp. 180–193, 2008.
  • [12] E. Hudlicka, “Affective game engines: motivation and requirements,” in Proceedings of the 4th international conference on foundations of digital games.   ACM, 2009, pp. 299–306.
  • [13] J. Kasurinen, M. Palacin-Silva, and E. Vanhala, “What concerns game developers?: A study on game development processes, sustainability and metrics,” in Proceedings of the 8th Workshop on Emerging Trends in Software Metrics.   IEEE Press, 2017, pp. 15–21.
  • [14] M. Lewis and J. Jacobson, “Game engines,” Communications of the ACM, vol. 45, no. 1, p. 27, 2002.
  • [15] M. Linares-Vásquez, G. Bavota, C. Bernal-Cárdenas, R. Oliveto, M. Di Penta, and D. Poshyvanyk, “Mining energy-greedy api usage patterns in android apps: an empirical study,” in Proceedings of the 11th Working Conference on Mining Software Repositories.   ACM, 2014, pp. 2–11.
  • [16] F. Messaoudi, A. Ksentini, G. Simon, and P. Bertin, “Performance analysis of game engines on mobile and fixed devices,” ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), vol. 13, no. 4, p. 57, 2017.
  • [17] F. Messaoudi, G. Simon, and A. Ksentini, “Dissecting games engines: The case of unity3d,” in Proceedings of the 2015 International Workshop on Network and Systems Support for Games.   IEEE Press, 2015, p. 4.
  • [18] E. Murphy-Hill, T. Zimmermann, and N. Nagappan, “Cowboys, ankle sprains, and keepers of quality: How is video game development different from software development?” in Proceedings of the 36th International Conference on Software Engineering.   ACM, 2014, pp. 1–11.
  • [19] P. S. Paul, S. Goon, and A. Bhattacharya, “History and comparative study of modern game engines,” International Journal of Advanced Computed and Mathematical Sciences, vol. 3, no. 2, pp. 245–249, 2012.
  • [20] M. M. Rahman and C. K. Roy, “An insight into the pull requests of github,” in Proceedings of the 11th Working Conference on Mining Software Repositories.   ACM, 2014, pp. 364–367.
  • [21] C. K. Roy, J. R. Cordy, and R. Koschke, “Comparison and evaluation of code clone detection techniques and tools: A qualitative approach,” Science of computer programming, vol. 74, no. 7, pp. 470–495, 2009.
  • [22] R. E. S. Santos, C. V. C. de Magalhães, L. F. Capretz, J. S. C. Neto, F. Q. B. da Silva, and A. Saher, “Computer games are serious business and so is their quality: Particularities of software testing in game development from the perspective of practitioners,” CoRR, vol. abs/1812.05164, 2018. [Online]. Available:
  • [23] D. Spadini, M. Aniche, and A. Bacchelli, “Pydriller: Python framework for mining software repositories,” in Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering.   ACM, 2018, pp. 908–911.
  • [24] J. Wang, Y. Dang, H. Zhang, K. Chen, T. Xie, and D. Zhang, “Mining succinct and high-coverage api usage patterns from source code,” in Proceedings of the 10th Working Conference on Mining Software Repositories.   IEEE Press, 2013, pp. 319–328.
  • [25] M. Washburn Jr, P. Sathiyanarayanan, M. Nagappan, T. Zimmermann, and C. Bird, “What went right and what went wrong: an analysis of 155 postmortems from game development,” in Proceedings of the 38th International Conference on Software Engineering Companion.   ACM, 2016, pp. 280–289.
  • [26] H. Zhong, T. Xie, L. Zhang, J. Pei, and H. Mei, “Mapo: Mining and recommending api usage patterns,” in European Conference on Object-Oriented Programming.   Springer, 2009, pp. 318–343.
  • [27] M. F. Zibran, F. Z. Eishita, and C. K. Roy, “Useful, but usable? factors affecting the usability of apis,” in Reverse Engineering (WCRE), 2011 18th Working Conference on.   IEEE, 2011, pp. 151–155.