计算机作业怎么写:Artificial Intelligence for Game Playing

发布时间:2022-06-20 23:15:55 论文编辑:wangda1203

本文是计算机专业的留学生作业范例,题目是“Artificial Intelligence for Game Playing(游戏的人工智能)”,本文讨论了游戏人工智能中的人工智能系统的历史。AI中的学习和自我发挥部分涵盖了如何将自我发挥应用于各种具有确定性的游戏,如国际象棋、围棋和西洋跳棋,或带有隐藏随机性的游戏,如扑克、桥牌和西洋双陆棋。这引发了几个问题,包括:“深度学习是如何成功地应用于自我游戏的?”以及“在人工智能的历史中,这些学习和自我游戏的问题在多大程度上发生了改变?”为了回答这些问题,我们将讨论自我游戏的研究和实验。

Abstract 摘要

This paper discusses the history of AI systems in artificial intelligence for playing games. The part learning and self-play in AI cover how self-play can be applied in various games which may be deterministic such as Chess, Go and Checkers or games with hidden randomness like Poker, Bridge, and Backgammon. This has led to several questions, including: "how has deep learning been successfully applied to self-play?" and "to what extent have these learning and self-play issues changed through the history of AI?" To answer these questions, research and experiments of self-play will be discussed.

This will be followed by addressing to what extent machine learning is important and the techniques of machine learning for developing high quality AI programs to play games. Finally, how these advancements changed the history of AI will be addressed as well.

接下来,我们将讨论机器学习在多大程度上是重要的,以及机器学习技术在开发高质量的AI游戏程序方面的重要性。最后,我们还将讨论这些进步如何改变人工智能的历史。

1. Introduction引言

The requirement for satisfying computing artificial intelligence (AI) is perceived as necessary by game players these days as the virtual environments have become increasingly realistic. Even today, the AI of virtually all games is predicated on a finite set of actions whose sequence may be simply expected by knowledgeable players (Fabio Aiolli, 2008). Instead, behaviour of the players in a game can be classified by using the machine learning techniques to the present aim. When a game is taken into consideration its machine can process/play in two ways. One of these may be a self-play, that is when the system plays itself repeatedly. Another way is learning from opponent moves where the player and the AI only have restricted data about the game state, and it is a part of the game to guess the knowledge hidden by the opponent.

随着虚拟环境变得越来越现实,满足计算人工智能(AI)的需求被游戏玩家认为是必要的。甚至在今天,几乎所有游戏的AI都是基于有限的一系列行动,而这些行动的序列可能只是知识丰富的玩家所期待的。相反,游戏中玩家的行为可以通过使用机器学习技术来进行分类。当考虑到一款游戏时,它的机器可以以两种方式处理/玩游戏。其中一种可能是自我游戏,即系统不断地自我游戏。另一种方法是从对手的移动中学习,玩家和AI只有关于游戏状态的有限数据,猜测对手隐藏的知识是游戏的一部分。

The purpose of the present paper is to look at questions pertaining to three main areas. First, to learning issues, such as the importance of machine learning in the development of AI programs for games. Second, to self-play issues, such as the extent to which self-play can be applied to games, whether AI gains expertise in learning such games through self-play, or even whether deep learning has been successfully applied in self-play. Finally, a historical perspective will be taken, adding up all the questions and sub-questions discussed throughout. A brief explanation of the history of AI and gaming will be included in this section so as to introduce the topic

计算机作业代写

1.1 History of game-playing in AI

There is a long history of games and AI. A lot of research on AI for games is related to the creation of gameplay agents with or without a learning component. Historically, this is the first and, for a long time, the only way to use AI in games. Since artificial intelligence was recognized as a field, early pioneers of computer science wrote game-playing programs because they wanted to test whether "computers could solve tasks where intelligence is needed" (Togelius, 2018). The first software that managed to master a game was developed by A.S. Douglas (Togelius, 2018).

The digital version of the Tic-Tac-Toe game was programmed by Douglas in 1952 and as part of his doctoral dissertation at Cambridge. As he used a unique computer (EDSAC) for gaming this no one outside the university could play this game. This was the first graphical computer game which was played against a human and a computer. A mechanical telephone dialler was used to communicate what a player places (either the nought or cross) (Douglas, 1952). A few years later, Arthur Samuel invented the form of machine learning, now called reinforcement learning, using a program that had learned to play checkers by playing against itself. Reinforcement learning is defined as "including algorithms from the temporal difference learning problem" (Togelius, 2018).

According to Togelius (2018), most of the early research on game-playing AI focused on classic board games such as chess, checkers, and Go, which are beneficial to work with because they are simple to model in code and the developers can emulate them very fast. Essentially, He also says that the modern computers can make millions of tricks per second using the AI technologies.

2.Learning and Self-play in AI人工智能中的学习和自我发挥

This section concentrates on machine learning and its techniques in AI. Self-paly in AI and some examples of self-play and their outcomes. Addressing the questions related to learning and self-play. How the new techniques improved the self-play in AI.

本节集中讨论机器学习及其在人工智能中的技术。AI中的自玩和一些自玩的例子及其结果。解决与学习和自我游戏相关的问题。新技术如何改善人工智能的自我发挥。

2.1 Machine Learning

"Machine learning usually refers to the changes in systems that perform tasks associated with artificial intelligence (AI). Such tasks involve recognition, diagnosis, planning, robot control, prediction, etc" (Nilsson, 1998).

Machine learning techniques are classified into two main categories:

Supervised Learning

Unsupervised Learning

Supervised learning has the trained datasets that is this learning trains the data on existing or know input and output whereas in unsupervised learning the data sets are unknown, and it has hidden patterns in the input data so that the output cannot be determined (Nilsson, 1998).

Apart from the above-mentioned techniques of machine learning there are some new techniques in machine learning such as self-supervised learning, reinforcement learning, artificial neural networks, support vector machines, Decision tree learning. Self- supervised learning is still a supervised learning, but the only difference is training data is automatically labelled without any human interaction.

除了上述的机器学习技术外,机器学习中还出现了自我监督学习、强化学习、人工神经网络、支持向量机、决策树学习等新技术。自我监督学习仍然是一种监督学习,但唯一的区别是训练数据在没有任何人类交互的情况下被自动标记。

2.2 Self-play

Self-play refers to the artificial game system that acquires skill at playing a game from the clones of the system play instead of acquiring the skill through a human expert turning of system (assignment specification). According to many studies, self-play is not completely understood until now because performance of the system is not guaranteed. To explain this claim, this section includes a few known examples of self-play and their results.

The AlphaZero algorithm learned to play go, chess and shogi in super-human. In other words, it played against itself through reinforcement learning. The performance of AlphaZero during self-play reinforcement learning is measured by Elo scale as a function of training steps. In chess, AlphaZero defeated the stockfish after only 4 hours (300,000 steps); In Shogi, AlphaZero first ejected Elmo after 2 hours (110,000 steps); And in Go, AlphaZero beat Alpha Go Lee for the first time after 30 hours (74,000 steps). The training algorithm suggested achieving equal performance across all independent runs suggesting that the high performance of AlphaZero's training algorithm is repeatable (Silver, et al., 2018).

In 1959, a program to play checkers was written by Arthur L. Samuel (see Samuel, 1959). The system learned from samples that were gathered from both self-play and human play such as "board configuration, game results" (ibid.). The system used a combination of methods to evaluate the board to determine its next moves such as a lookup table, pruned (alpha-beta) search tree, limited depth and an evaluation task consisting of manually engineered features (ibid.). After figuring out some limitations in machine learning techniques that were used earlier such as limited progress and optimization of playing strategies, he concludes that the machine techniques with some advancements can now be efficiently used than the earlier stages and can be applied to many problems.

Gerald Tesauro created a neural network (NN) named TD-Gammon that plays backgammon by using temporal difference learning (Tesauro, 1992). These networks were trained from the starting position all the way to the end of the game. These networks were also tested in actual game play against sun microsystems and the results were plotted as the graph below (ibid.). The only difference between backgammon and the other games such as chess, checkers and go is it involves the concept of randomness as it uses dice.

OpenAI was able to defeat humans in the game of Dota2 (Rodriguez, 2018). The OpenAI team has developed a system using five-coordinated neural networks that each represent a different player. Called the OpenAI Five, the model uses cutting-edge reinforcement learning techniques such as proximal policy optimization (PPO) to master the details of a game. The ultimate goal is to defeat the world's top professional team. The statespace of Dota2 is large, continuous and only partially observable. Dota2's game status includes 20,000 dimensions and the team used approximately 128,000 CPU cores on the Google Cloud Platform. One of the most notable outcomes of this training was the ability of OpenAI Five Agents to maximize long-term rewards over short-term gains. Comparing this to chess's 8x8square board with 12 different pieces (768 bits) or 2 pieces of Go's 19x19-square board. In chess and go the complete board is usually visible to each player. In Dota2 only a small part of the full game state appears. For each of the five players - and mix all of that with 1000 - and you have a very complicated game. OpenAI is solving this game through self-play (for all facts and figures see Rodriguez, 2018).

英文作业代写

"OpenAI Five is actually five different neural networks that coordinate through a hyperparameter called team spirit" (Jones, 2019).Team spirit is what determines whether the AI will focus on their own individual rewards (e.g. killing other heroes and collecting treasure), or on the rewards of the whole team (Jones, 2019).

According to Sweet (2018), self-play is an exciting idea because it holds the promise of not only relieving the engineer to specify a solution to a problem (as it optimizes the parameters) but also to specify the purpose. This is a higher level of autonomy than we usually consider when studying reinforcement learning. While self-play is not fully understood, it certainly works. He says that training neural network against itself doesn't make sense and it creates roshambo problem that is when we train a neural network by playing against the best result of the previous generation: the neural network learns a strategy B to defeat strategy A and becomes a "champion". A new neural network then learns a strategy, which defeats B. According to Sweet (2018), this should be an interesting subject to watch for many years.

Self-play allows a player to acquire great knowledge on game despite of having prior information on the game, but it depends on the machine learning technique used. According to Togelius (2018), games using reinforcement learning take a lot of time to get trained that is they should play a thousand of times to play well. So, he says that reinforcement learning is applicable when there is a plenty of learning time. Togelius (2018) also says there are many games which learned to play without any starting information on the game such as TD-gammon which uses temporal difference learning, but the performance was limited due to the lack of optimisation functions.

Reinforcement learning will be successful if temporal difference learning is integrated with function approximators. Finally, the major breakthrough came in 2015, when a paper was released by google Deep-Mind stating that they developed deep neural networks to play many different games from the classic Atari 2600 managed to train game console (Togelius, 2018). Each network was trained to play a single game, with raw pixels of the input game's visuals, along with scores and output controller instructions and fire buttons. The method used to train deep networks is deep cue networks, essentially applied to standard Q-learning neural networks that have multiple layers (some of the layers used in the architecture were conformable). Crucially, they managed to overcome the problems associated with using the Temporal Difference experience techniques combined with neural networks by a method. Here, short sequences of gameplay are stored in different order, and re-added to the network, to break and reward long chains of similar states (Togelius, 2018).

2.3 Deep Learning

Deep learning is a subset of machine earning techniques inspired by the concept of artificial neural networks. Deep learning can be supervised, unsupervised, and reinforcement learning. The key difference between the other machine learning techniques and deep learning is that the machine learning algorithms uses more structured format of data whereas deep learning uses neural networks and is having different layers of algorithms which is known as artificial neural networks. In deep learning the input is passed through different layers in the network and no human intervention is needed. Invention of these neural networks helped artificial intelligence in many areas such as game designing, voice assistants, self-learning, etc (Brownlee, 2019).

深度学习是受人工神经网络概念启发的机器学习技术的一个子集。深度学习可以是有监督的、无监督的和强化的学习。其他机器学习技术和深度学习之间的关键区别是,机器学习算法使用更结构化的数据格式,而深度学习使用神经网络,并有不同的算法层,即所谓的人工神经网络。在深度学习中,输入通过网络的不同层传递,不需要人工干预。这些神经网络的发明在许多领域帮助了人工智能,如游戏设计、语音助手、自学等(Brownlee,2019年)。

For many years, game developers have been wary of machine learning (ML) and this has limited usage in many games. In fact, there are no major game releases that have machine learning concepts. Some attribute this notion to the fact that ML techniques are not important for the development of the game. However, there are new possibilities that can cause many game development companies to create games that will match the player's potential rather than improving their potential by using Deep learning (Shaleynikov, n.d.).

According to Yossi (2019), the reason for why deep learning is so successful that it does not require handholding, or as he puts it, "Unlike machine learning, we're not trying to understand what's inside, (…) We're feeding it only raw images; we're not writing any code". Deep learning algorithms can do their thing with a few tens of lines of code. They learn by trial and error, moving at the speed of the machine to become more and more accurate. As just one example of how far an AI can get into learning to act with supernatural accuracy, he points to AlphaGo ("The impact deep learning is having on artificial intelligence", 2019).

AlphaGo has no idea how to play the ancient Chinese game, Go. With a power of 10 of 170 possible configurations on a board with 64 squares, Go has far more possible configurations than Chess, making it an ideal proving ground for AI. After exposing the program to 160,000 amateur games to give the idea, AlphaGo's developers then ran the computer more and more by themselves. By the end of the process, in 2015, the Machine had defeated the European Go Champion in a 5–0 match and then lost the World Champion 4–1. Yossi said, "After training against myself for three days, it became the best in the world." Now, he said, "There is no game that a machine cannot play better than a human being" ("The impact deep learning is having on artificial intelligence", 2019).

According to Pollack and Blair (1996), evidence of success in backgammon learning using simple hill-climbing indicates that the reinforcement and temporal difference method used by Tesaro in TD-Gammon was non-essential to its success. The success came from the setup of co-evolutionary self-play biased by the dynamics of backgammon. TD-Gammon could be a major milestone for a kind of biological process machine learning within which the initial specification of the model is way easier than expected as a result of the training atmosphere is clearly specified and emerges as a result of co-development between a learning system is. Its training environment.

The idea of machine learning based on evolution is often thought of by Holland in the context of a pioneering genetic algorithm field. However, this work focuses on the optimization of a certain target expressed as an absolute fitness function (Fitness Function is used to optimize the parameters of neural networks with CMA-ES, this is the value of the neural network we specify). Using the idea of co-development in learning, one recognizes the difference between optimization based on absolute fitness and one based on relative fitness (Pollack & Blair,1996).

基于进化的机器学习思想经常被霍兰德在开创性的遗传算法领域的背景下想到。然而,本工作的重点是对某个表示为绝对适应度函数的目标进行优化(适应度函数是用CMA-ES对神经网络的参数进行优化,这是我们指定的神经网络的值)。在学习中使用共同发展的思想,我们可以认识到基于绝对适合度的优化和基于相对适合度的优化之间的区别(Pollack & Blair,1996)。

From the above examples of self-play and their outcomes it is clearly identified that new advancements in machine learning such as reinforcement learning, temporal difference learning and deep learning created a great influence on self-play in AI. Apart from these advancements there is also some contribution made to improve the already existing technique such as selfsupervised learning but, according to Abshire (2018), it is found that this technique is used in self driving to compare the driver's interaction using a video footage. So, self-play depended more on the new techniques of machine learning. The analysis and conclusions vary for different games and AI mechanisms as they depend on the game flow, behaviour of the player, the technique used and adaptability of the game features. As the key features are different to each game the mechanism and analysis need not be necessarily same. For example backgammon cannot be implemented with alpha-beta pruning and search tress like checkers and the techniques used for developing backgammon cannot be applied to develop AlphaGo, and other games.

Even though machine learning techniques have many advantages in games and there are many breakthroughs as stated in the above sections depending on machine learning Stephenson (2018) says that machine learning applications still have major challenges in gaming. A major challenge is the lack of data to learn from it. These algorithms will model complex systems and actions, and we do not have good historical data on these complex interactions. In addition, there is a need to fool the machine learning algorithms developed for the gaming industry. They cannot break the experience of the game or the player. This means that the algorithms must be correct, but they must also be fast and uninterrupted from the player's point of view. Anything that slows or breaks the game saves the player from drowning in the game that the game has created. That said, most major game development studios have teams researching, refining and applying AI to their games (Stephenson, 2018).

2.4 Historical Changes in AI

From the above-mentioned examples, breakthroughs and some key points it is identified that there is a huge change in games from the past to present. Earlier gaming in AI used different strategies and techniques such as branching factors and tree search whereas at present the techniques used in gaming and self-play are more advances. They can even learn from the basic level without any initial information of the game. There is also an increase in the performance with the invention of function approximators such as fitness functions. With the discovery of advanced machine learning techniques AI in gaming is in a position that selflearning is more effective and creative than learning from the human and even defeating human players in competitions.

3.Conclusion结论

The advancements in machine learning techniques created a great impact on gameplay in AI. Gaming in AI started with the techniques of machine learning and there are newer techniques emerging day by day. As there are some limitations with the existing machine learning techniques (limited progress, optimised playing strategies, etc.) self-play depended more on advanced machine learning techniques such as neural networks, reinforcement learning, and more. There are some breakthrough depending on machine learning techniques such as invention of first digital version of tic-tac-toe by Douglas in 1952 and absolute fitness function.

机器学习技术的进步对AI的玩法产生了巨大的影响。人工智能中的游戏始于机器学习技术,而且每天都有新的技术出现。由于现有的机器学习技术存在一些局限性(进步有限、优化的游戏策略等),自玩更多地依赖于先进的机器学习技术,如神经网络、强化学习等。依赖于机器学习技术的一些突破,如道格拉斯在1952年发明了第一个数字版本的一字棋和绝对适应度函数。

AI is often depicted to alter or complement human abilities, but rarely as a full team member, performing tasks similar to humans. As these game experiments involve machine human collaboration, they offer a glimpse of the future. In the case study of DOTA2 capturing the flag's human players considered bots to be more collaborative than other humans, but DOTA 2 players responded mixed to their AI teammates. Some people were quite excited, saying that they felt supported and learned to play with him. A professional DOTA 2 player, teamed up with Bots about his experience (Lavanchy, 2019).

Lavanchy (2019), arises the question, should AI learn from us or continue to teach itself? Selflearning can teach AI greater efficiency and creativity without mimicking humans, but it can also make algorithms more suitable for tasks that do not involve human collaboration, such as warehousing robots. On the other hand, one could argue that it would be more intuitive to have a machine trained than humans- humans using such AI can understand why a machine did this. As AI gets smarter, we humans get more surprised.

Lavanchy(2019)提出了一个问题,人工智能应该向我们学习还是继续自我学习?自学可以让人工智能在不模仿人类的情况下提高效率和创造力,但它也可以让算法更适合不需要人类协作的任务,比如仓储机器人。另一方面,有人可能会说,让机器接受训练比让人类更直观——使用这种人工智能的人类可以理解机器为什么会这样做。随着人工智能越来越聪明,我们人类也越来越惊讶。

留学生作业相关专业范文素材资料,尽在本网,可以随时查阅参考。本站也提供多国留学生课程作业写作指导服务,如有需要可咨询本平台。

提交代写需求

如果您有论文代写需求,可以通过下面的方式联系我们。