做A*算法的时候，会用到集合，如果选用的不正确，会极大影响执行速度。昨天执行一次短距离寻路大约是1ms，今天优化结束后，执行一次大约是0.02  0.06ms。地图是8方向tilemap。
结合我今天的优化过程，写一下优化重点：

尽量不使用、少用容器，可能的情况下尽量用vector代替map set
我只在open列表那里使用了一个multiset，集合里保存数组下标，根据f排序；别的数据都放在数组中；我省掉了close列表，我把close信息和f g放在一起保存。 
预先分配内存，避免动态伸缩vector，尽量使用更少内存。
open列表、搜索状态数组，都是提前分配的，并预留了足够空间，一旦分配就不再释放，直到程序退出。合理安排数据结构和布局，使用更少内存，一般说来这样可以更好的利用cpu缓存。 
尽量缓存计算结果
不要重复相同计算。 
可以先做一次Raycast碰撞检测，如果没有障碍物，不再执行A*算法，直线行走。
总体说来，主要的性能提升在第一条。第二条带来的提升，我没有衡量，因为我一开始就写成了预先分配内存，没有前后对比，大概有一倍吧。
参考文献：
下面这篇英文，对我很有启发：
我把std::sort改成std::multiset后，A*执行速度提升了68倍。
My recommendation: The best generic choice is a binary heap. If you have a binary heap library available, use it. If not, start with sorted arrays or unsorted arrays, and switch to binary heaps if you want more performance. If you have more than 10,000 elements in your OPEN set, then consider more complicated structures such as a bucketing system.
Unsorted arrays or linked lists#
The simplest data structure is an unsorted array or list. Membership test is slow, O(F) to scan the entire structure. Insertion is fast, O(1) to append to the end. Finding the best element is slow, O(F) to scan the entire structure. Removing the best element is O(1) with linked lists, and O(1) with unsorted arrays since we can change the order. The increasepriority operation is O(F) to find the node and O(1) to change its value.
Sorted arrays#
To make removebest fast, we can keep the array sorted. Membership is then O(log F), since we can use binary search. Insertion is slow, O(F) to move all the elements to make space for the new one. Finding the best is fast, O(1) since it’s already at the end. And removing the best is O(1) if we make sure the best sorts to the end of the array. The increasepriority operation is O(log F) to find the node and O(F) to change its value/position.
Make sure the array is sorted so that the best element is at the end.
Sorted linked lists#
With sorted arrays, insertion is slow. If we use a linked list, we can make that fast. Membership in the linked list is slow, O(F) to scan the list. Insertion is fast, O(1) to insert a new link, but it was O(F) to find the right position for it. Finding the best remains fast, O(1) because the best is at the end. Removing the best also is O(1). The increasepriority operation is O(F) to find the node and O(1) to change its value/position.
Binary heaps#
A binary heap (not to be confused with a memory heap) is a tree structure that is stored in an array. Unlike most trees, which use pointers to refer to children, the binary heap uses indexing to find children.
In a binary heap, membership is O(F), as you have to scan the entire structure. Insertion is O(log F) and removebest is O(log F).
The increasepriority operation is tricky, with O(F) to find the node and surprisingly, only O(log F) to increasepriority it. Unfortunately most priority queue libraries don’t include this operation. Fortunately, it’s not strictly necessary . So I recommend not worrying about it unless you absolutely need to. Instead of increasing priority, insert a new element into the priority queue. You’ll potentially end up processing the node twice but that’s relatively cheap compared to implementing increasepriority.
In C++, use the priority_queue class, which doesn’t have increasepriority, or Boost’s mutable priority queue, which does. In Python, use the heapq library.
You can combine a hash table or indexed array for membership and a priority queue for managing priorities; see the hybrid section below.
As part of my newer A* tutorial, I have a complete A* implementation in Python and C++ using binary heaps for the priorities and hash tables for the for membership. I do not implement increasepriority and explain why in the optimization section. I also have some C# code there.
A variant of the binary heap is a dary heap, which has more than 2 children per node. Inserts and increasepriority become a little bit faster, but removals become a little bit slower. They may have better cache performance. Bheaps are also worth a look if your frontier is large.
I have only used binary heaps and bucket approaches for my own pathfinding projects. If binary heaps aren’t good enough, then consider pairing heaps, sequence heaps, or a bucket based approach. The paper Priority Queues and Dijkstra’s Algorithm is worth a read if you are unable to shrink the graph and need a faster priority queue.
Sorted skip lists#
Searching an unsorted linked list is slow. We can make that faster if we use a skip list instead of a linked list. With a skip list, membership is fast if you have the sort key: O(log F). Insertion is O(1) like a linked list if you know where to insert. Finding the best node is fast if the sort key is f
, O(1), and removing a node is O(1). The increasepriority operation involves finding a node, removing it, and reinserting it.
If we use skip lists with the map location as the key, membership is O(log F), insertion is O(1) after we’ve performed the membership test, finding the best node is O(F), and removing a node is O(1). This is better than unsorted linked lists in that membership is faster.
If we use skip lists with the f
value as the key, membership is O(F), insertion is O(1), finding the best node is O(1), and removing a node is O(1). This is no better than sorted linked lists.
Indexed arrays#
If the set of nodes is finite and reasonably sized, we can use a direct indexing structure, where an index function i(n)
maps each node n
to an index into an array. Unlike the unsorted and sorted arrays, which have a size corresponding to the largest size of OPEN, with an indexed array the array size is always max(i(n))
over all n
. If your function is dense ( i.e., there are no indices unused), then max(i(n))
will be the number of nodes in your graph. Whenever your map is a grid, it’s easy to make the function dense.
Assuming i(n)
is O(1), membership test is O(1), as we merely have to check whether Array[i(n)]
contains any data. Insertion is O(1), as we set Array[i(n)]
. Find and remove best is O(numnodes), since we have to search the entire structure. The increasepriority operation is O(1).
Hash tables#
Indexed arrays take up a lot of memory to store all the nodes that are not in the OPEN set. An alternative is to use a hash table, with a hash function h(n)
that maps each node n
into a hash code. Keep the hash table twice as big as N to keep the chance of collisions low. Assuming h(n)
is O(1), membership test is expected O(1), insertion is expected O(1), and remove best is O(numnodes), since we have to search the entire structure. The increasepriority operation is O(1).
Hash tables are best for set membership but not for managing priorities. In my my newer A* tutorial, I use hash tables for membership and binary heaps for priorities. I combined the OPEN and CLOSED sets into one, which I call VISITED.
Splay trees#
Heaps are a treebased structure with expected O(log F) time operations. However, the problem is that with A*, the common behavior is that you have a low cost node that is removed (causing O(log F) behavior, since values have to move up from the very bottom of the tree) followed by low cost nodes that are added (causing O(log F) behavior, since these values are added at the bottom and bubble up to the very top). The expected case behavior of heaps here is equivalent to the worst case behavior. We may be able to do better if we find a data structure where expected case is better, even if the worst case is no better.
Splay trees are a self adjusting tree structure. Any access to a node in the tree tends to bring that node up to the top. The result is a “caching” effect, where rarely used nodes go to the bottom and don’t slow down operations. It doesn’t matter how big your splay tree is, because your operations are only as slow as your “cache size”. In A*, the low cost nodes are used a lot, and the high cost nodes aren’t used for a long time, so those high cost nodes can move to the bottom of the tree.
With splay trees, membership, insertion, removebest, and increasepriority are all expected O(log F), worst case O(F). Typically however, the caching keeps the worst case from occurring. Dijkstra’s Algorithm and A* with an underestimating heuristic however have some peculiar characteristics that may keep splay trees from being the best. In particular, f(n') >= f(n)
for nodes n
and neighboring node n'
. When this happens, it may be that the insertions all occur on one side of the tree and end up putting it out of balance. I have not tested this.
Bucket queues#
A bucket queue is useful when the range of movement costs is limited to small integers, especially for Dijkstra’s Algorithm. Given a restricted range, there are often better algorithms. For example, sorting can be done on arbitrary values in O(N log N) time, but when there is a fixed range the bucket or radix sorts can perform sorting in O(N) time.
Consider Dijkstra’s Algorithm: the frontier has priorities p
through p+k
where k
is the largest movement cost. For example if your movement costs are 1, 2, or 3, then everything in the frontier has priority p
, p+1
, p+2
, p+3
. Use 4 buckets, one for each bucket. There’s no need to sort within a bucket because the priorities are all the same. And there’s no need to sort the buckets. That means insertion is O(1) and removebest is O(1).
Note that Breadth First Search can be viewed as using a bucketed priority queue with exactly two buckets.
With A* it’s a little more complicated because you also have to look at the effect of the heuristic on the priority. There will be more buckets than with Dijkstra’s Algorithm.
HOT queues#
HOT Queues are a variant of bucket queues that accept a larger range of values. Instead of each bucket having exactly one priority, each bucket has a range of priorities. Since we’re only removing elements from the top bucket, only the top bucket has to be ordered.
HOT queues make the topmost bucket use a binary heap. All other buckets are unsorted arrays. Membership test is O(F) because we don’t know which bucket the node is in. Insertion and removebest in the top bucket are O(log (F/K)), for K buckets. Insertion into other buckets is O(1), and removebest never runs on other buckets. If the top bucket empties, then we need to convert the next bucket, an unsorted array, into a binary heap. It turns out this operation (“heapify”) can be run in O(F/K) time. The increasepriority operation is best treated as a O(F/K) removal followed by an O(log (F/K)) or O(1) insertion.
In A*, many of the nodes we put into OPEN we never actually need. HOT Queues are a big win because the elements that are not needed are inserted in O(1) time. Only elements that are needed get heapified (which is not too expensive). The only operation that is more than O(1) is node deletion from the heap, which is only O(log (F/K)).
One person reported that HOT queues are as fast as heaps for at most 800 nodes in the OPEN set, and are 20% faster when there are at most 1500 nodes. I would expect that HOT queues get faster as the number of nodes increases.
A cheap variant of a HOT queue is a twolevel queue: put good nodes into one data structure (a heap or an array) and put bad nodes into another data structure (an array or a linked list). Since most nodes put into OPEN are “bad”, they are never examined, and there’s no harm putting them into the big array.
Pairing heaps#
Fibonacci heaps are good priority queues for A*, in theory. However, in practice they’re not used. A pairing heap can be thought of as a simplified Fibonacci heap. They are said to work well in practice; I have never used them.
Here’s the original paper describing them.
Soft heaps#
A soft heap is a type of heap that gives the nodes in approximately the right order. By approximating, it can provide results faster than a regular heap.
I haven’t tried soft heaps. The algorithms described in this paper seem fairly short and straightforward to implement. However this answer on stackoverflow says it’s useful in theory but “unlikely to be useful in practice”.
For pathfinding in games, we often do not need the exact shortest path and usually would prefer to have a reasonably short path computed quickly. So this may be one of the applications where it is useful in practice. I don’t know yet; if you have studied this data structure, please email me.
Sequence heaps#
I haven’t looked into them. See Peter Sanders’s paper Fast Priority Queues for Cached Memory.
“Sequence heaps may currently be the fastest available data structure for large comparison based priority queues both in cached and external memory This is particularly true if the queue elements are small and if we do not need deletion of arbitrary elements or decreasing keys.”
Data Structure Comparison#
It is important to keep in mind that we are not merely looking for asymptotic (“big O”) behavior. We also want to look for a low constant. To see why, consider an algorithm that is O(log F) and another that is O(F), where F is the number of elements in the heap. It may be that on your machine, an implementation of the first algorithm takes 10,000 * log(F) seconds, while an implementation of the second one takes 2 * F seconds. For F = 256, the first would take 80,000 seconds and the second would take 512 seconds. The “faster” algorithm takes more time in this case, and would only start to be faster when F > 200,000.
You cannot merely compare two algorithms. You should also compare the implementations of those algorithms. You also have to know what size your data might be. In the above example, the first implementation is faster for F > 200,000, but if in your game, F stays under 30,000, then the second implementation would have been better.
None of the basic data structures is entirely satisfactory. Unsorted arrays or lists make insertion very cheap and membership and removal very expensive. Sorted arrays or lists make membership somewhat cheap, removal very cheap and insertion very expensive. Binary heaps make insertion and removal somewhat cheap, but membership is very expensive. Splay trees make everything somewhat cheap. HOT queues make insertions cheap, removals fairly cheap, and membership tests somewhat cheap. Indexed arrays make membership and insertion very cheap, but removals are incredibly expensive, and they can also take up a lot of memory. Hash tables perform similarly to indexed arrays, but they can take up a lot less memory in the common case, and removals are merely expensive instead of extremely expensive.
For a good list of pointers to more advanced priority queue papers and implementations, see Lee Killough’s Priority Queues page.
http://theory.stanford.edu/~amitp/GameProgramming/ImplementationNotes.html