Array

Table of Contents

Array

An array is a low-level data structure where elements are identified by integer indices. Arrays are…

数组是一种低级数据结构,其中的元素由整数索引标识。

Quick reference

An array organizes items sequentially, one after another in memory.

数组在内存中一个接一个地按顺序组织元素。

Each position in the array has an index , starting at 0.

数组中的每个位置都有一个索引,从 0 开始。

Strengths:

  • Fast lookups. Retrieving the element at a given index takes O(1) time, regardless of the length of the array.
  • Fast appends. Adding a new element at the end of the array takes O(1) time.

优点:

  • 快速查找 – 无论数组的长度如何,以给定索引检索元素都只需要 O(1)时间;
  • 快速追加 – 在数组末尾添加新元素需要 O(1)时间。

Weaknesses :

  • Fixed size. You need to specify how many elements you're going to store in your array ahead of time. (Unless you're using a funcy dynamic array.)
  • Costly inserts and deletes. You have to "scoot over" the other elements to fill in or close gaps, which takes worst-case O(n) time.

缺点:

  • 固定大小 – 您需要预先指定要在数组中存储多少元素(除非您使用的是函数动态数组);
  • “昂贵的”插入和删除 – 你必须“移动”其他元素来填补或填补空白,这需要最坏情况 O(n) 的时间。
Table 1: Worst Case
space O(n)
lookup O(1)
append O(1)
insert O(n)
delete O(n)

In Python 2.7

Some languages (including Python) don't have these bare-bones arrays.

有些语言(包括 Python )没有这些基本的数组。

Here's what arrays look like in Java.

以下是Java中的数组。

// instantiate an array that holds 10 integers
int gasPrices[] = new int[10];

gasPrices[0] = 346;
gasPrices[1] = 360;
gasPrices[2] = 354;

Inserting

If we want to insert something into an array, first we have to make space by "scooting over" everything starting at the index we're inserting into:

如果我们想要向数组中插入一些东西,首先我们必须从要插入的索引开始“移动”所有东西来腾出空间:

In the worst case we're inserting into the 0th index in the array (prepending), so we have to "scoot over" everything in the array. That's O(n) time.

在最坏的情况下,我们将插入到数组的第 0 个索引(前置),因此我们必须“移动”数组中的所有内容,这是 O (n) 时间。

Deleting

Array elements are stored adjacent to each other. So when we remove an element, we have to fill in the gap – "scooting over" all the elements that were after it:

数组元素彼此相邻存储。因此,当我们删除一个元素时,我们必须填补空白 – “移动”它之后的所有元素:

In the worst case we're deleting the 0th item in the array, so we have to "scoot over" everything else in the array. That's O(n) time.

在最坏的情况下,我们正在删除数组中的第 0 项,因此我们必须“移动”数组中的其他所有内容,这是 O (n) 时间。

Why not just leave the gap? Because the quick lookup power of arrays depends on everything being sequential and uninterrupted. This lets us predict exactly how far from the start of the array the 138th or 9,203rd item is. If there are gaps, we can no longer predict exactly where each array item will be.

为什么不留下空白呢?因为数组的快速查找能力依赖于一切都是连续的、不间断的,这让我们可以准确地预测第 138 项或第 9,203 项与数组开始的距离。如果存在空白,我们就不能再准确地预测每个数组项的位置。

Data structures built on arrays

i.e.建立在数组上的数据结构

Arrays are the building blocks for lots of other, more complex data structures.

数组是许多其他更复杂的数据结构的 构建块

Don't want to specify the size of your array ahead of time? One option: use a dynamic array.

不想提前指定数组的大小?一种选择:使用动态数组。

Want to look up items by something other than an index? Use a dictionary.

想通过索引以外的其他方式查找项吗?使用一个字典。

Array Slicing

This is a common tool, but you need to know what it does under the hood!

这是一个常见的工具,但是您需要了解它的底层功能!

Array slicing involves taking a subset from an array and allocating a new array with those elements .

数组切片涉及从数组中获取一个子集,并为这些元素分配一个新数组。

In Python 2.7 you can create a new list of the elements in my_list, from start_index to end_index (exclusize), like this:

在Python 2.7中,您可以在 my_list 中切取一个新的元素数组,从 start_indexend_index (不包括),如下所示:

my_list[start_index:end_index]

You can also get everything from start_index onwards by just omitting end_index:

您也可以通过省略 end_index 来获取从 start_index 开始的所有元素:

my_list[start_index:]

Careful: there's a hidden time and space cost here! It's tempting to think of slicing as just "getting elements," but in reality you are:

  1. allocating a new list;
  2. copying the elements from the original list to the new list.

注意:这里有隐藏的时间和空间成本!人们很容易认为切片只是“获取元素”,但实际上你是这样做的:

  1. 分配一个新的数组;
  2. 将元素从原始数组复制到新数组。

This takes O(n) time and O(n) space, where n is the number of elements in the resulting list.

这需要 O(n) 时间和 O(n) 空间,其中 n 是结果数组中的元素数量。

That's a bit easier to see when you save the result of the slice to a variable:

当您将切片的结果保存到一个变量时,这更明显:

tail_of_list = my_list[1:]

But a bit harder to see when you don't save the result of the slice to a variable:

但是,如果不将切片的结果保存到变量中,就很难看到了。

return my_list[1:]
# Whoops, I just spent O(n) time and space!
for item in my_list[1:]:
    # Whoops, I just spent O(n) time and space!
    pass

So keep an eye out. Slice wisely.

因此,请注意,明智地切片。

ANCHOR In-Place Algorithms

  • State "ANCHOR" from "DONE" [2020-02-04 Tue 12:02]
  • State "DONE" from "ANCHOR" [2020-02-04 Tue 12:02]

An in-place algorithm operates directly on its input and changes it, instead of creating an return…

An in-place function modifies data structures or objects outside of its own stack frame ↓ (i.e.:stored on the process heap or in the stack frame of a calling function). Because of this, the changes made by the function remain after the call completes.

In-place algorithms are sometimes called destructive, since the original input is "destroyed" (or modified) during the function call.

Careful: "In-place" does not mean "without creating any additional variables!" Rather, it means "without creating a new copy of the input." In general, an in-place function will only create additional variables that are O(1) space.

An out-of-place function doesn't make any changes that are visible to other functions. Usually, those functions copy any data structures or objects before manipulating and changing them.

In many languages, primitive values (integers, floating point numbers, or characters) are copied when passed as arguments, and more complex data structures (lists, heaps, or has tables) are passed by reference. This is waht Python does.

Here are two funcions that do the same operation on a list, except one is in-place and the other is out-of-place:

def square_list_in_place(int_list):
    for index, element in enumerate(int_list):
        int_list[index] *= element

    # NOTE: no need to return anything -- we modified
    # int_list in place


def square_list_out_of_place(int_list):
    # We allocate a new list with the length of the iniput list
    square_list = [None] * len(int_list)

    for index, element in enumerate(int_list):
        square_list[index] = element ** 2

    return square_list

Working in-place is a good way to save time and space. An in-place algorithm avoids the cost of initializing or copying data structures, and it usually has an O(1) space cost.

But be careful: an in-place algorithm can cause side effects. Your input is "destroyed" or "altered," which can affect code outside of your function. For example:

original_list = [2, 3, 4, 5]
square_list_in_place(original_list)

print("original list: %s" % original_list)
# Prints: original list: [4, 9, 16, 25], confusingly!

Generally, out-of-place algorithms are considered safer because they avoid side effects. You should only use an in-place algorithm if you're space constrained or you're positive you don't need the original input anymore, even for debugging.

stack frame

Overview

The call stack is what a program uses to keep track of function calls. The call stack is made up of stack frames – one for each function call.

For instance, say we called a function that rolled two dice and printed the sum.

def roll_die():
    return random.randint(1, 6)

def roll_two_and_sum():
    total = 0
    total += roll_die()
    total += roll_die()
    print(total)

roll_two_and_sum()

First, our program calls roll_two_and_sum() . It goes on the call stack:

roll_two_and_sum()

That function calls roll_die() , which gets pushed on to the top of the call stack:

roll_die()
roll_two_and_sum()

Inside of roll_die() , we call random.randint() . Here's what our call stack looks like then:

random.randint()
roll_die()
roll_two_and_sum()

When random.randint() finishes, we return back to roll_die() by removing ("popping") random.randint()'s stack frame.

roll_die()
roll_two_and_sum()

Same thing when roll_die() returns:

roll_two_and_sum()

We're not done yet! roll_two_and_sum() calls roll_die() again:

roll_die()
roll_two_and_sum()

Which calls random.randint() again:

random.randint()
roll_die()
roll_two_and_sum()

random.randint() returns, then roll_die() returns, putting us back in roll_two_and_sum() :

roll_two_and_sum()

Which calls print()() :

print()()
roll_two_and_sum()

What's stored in a stack frame?

What actually goes in a function's stack frame?

A stack frame usually stores:

  • Local variables
  • Arguments passed into the function
  • Information about the caller's stack frame
  • The return address – what the program should do after the function returns (i.e.: where it should "return to"). This is usually somewhere in the middle of the caller's code.

Some of the specifics vary between processor architectures. For instance, AMD64 (64-bit x86) processors pass some arguments in registers and some on the call stack. And, ARM processors (common in phones) store the return address in a special register instead of putting it on the call stack.

For example, if we wanted to multiply all the numbers between 1 and n, we could use this recursive approache:

def product_1_to_n(n):
    return 1 if n <= 1 else n * product_1_to_n(n - 1)

What would the call stack look like when n = 10 ?

First, product_1_to_n() gets called with n = 10 :

product_1_to_n()                # n = 10

Which calls product_1_to_n() with n = 8 .

product_1_to_n()                # n = 8
product_1_to_n()                # n = 9
product_1_to_n()                # n = 10

And so on until we get to n = 1 .

product_1_to_n()                # n = 1
product_1_to_n()                # n = 2
product_1_to_n()                # n = 3
product_1_to_n()                # n = 4
product_1_to_n()                # n = 5
product_1_to_n()                # n = 6
product_1_to_n()                # n = 7
product_1_to_n()                # n = 8
product_1_to_n()                # n = 9
product_1_to_n()                # n = 10

Look at the size of all those stack frames! The entire call stack takes up O(n) space. That's right – we have O(n) space cost even though our function itself doesn't create any data structures!

What if we'd used an iterative approach instead of a recursive one?

def product_1_to_n(n):
    # We assume n >= 1
    result = 1
    for num in range(1, n + 1):
        result *= num

    return result

This version takes a constant amount of space. At the beginning of the loop, the call stack looks like this:

product_1_to_n()                # n = 10, result = 1, num = 1

As we iterate through the loop, the local variables change, but we stay in the same stack frame because we don't call any other functions.

product_1_to_n()                # n = 10, result = 2, num = 2
product_1_to_n()                # n = 10, result = 6, num = 3
product_1_to_n()                # n = 10, result = 24, num = 4

In general, even though the compiler or interpreter will take care of managing the call stack for you, it's important to consider the depth of the call stack when analyzing the space complexity of an algorithm.

Be especially careful with recursive functions! They can end up building huge call stacks.

What happens if we run out of space? It's a stack overflow ! In Python 3.6, you'll get a RecursionError.

If the very last thing a function does is call another function, then its stack frame might not be needed any more. The function could free up its stack frame before doing its final call, saving space.

This is called tail call optimization (TCO). If a recursive function is optimized with TCO, then it may not end up with a big call stack.

In general, most languages don't provide TCO. Scheme is one of the few languages that guarantee tail call optimization. Some Ruby, C, and JavaScript implementations may do it. Python and Java decidedly don't.

Dynamic Array

A dynamic array automatically doubles its size when you try to make an insertion and there's no mor…

Quick reference

A dynamic array is an array with a big improvement: automatic resizing.

One limitation of arrays is that they're fixed size , meaning you need to specify the number of elements your array will hold ahead of time.

A dynamic array expands as you add more elements. So you don't need to determine the size ahead of time.

Strengths:

  • Fast lookups. Just like arrays, retrieving the element at a given index takes O(1) time.
  • Variable size. You can add as many items as you want, and the dynamic array will expand to hold them.
  • Cache-friendly. Just like arrays, dynamic arrays place items right next to each other in memory, making efficient use of caches.

Weaknesses:

  • Slow worst-case appends. Usually, adding a new element at the end of the dynamic array takes O(1) time. But if the dynamic array doesn't have any room for the new item, it'll need to expand, which takes O(n) time.
  • Costly inserts and deletes. Just like arrays, elements are stored adjacent to each other. So adding or removing an item in the middle of the array requires "scooting over" other elements, which takes O(n) time.

In Python 3.6

In Python, dynamic arrays are called lists.

Here's what they look like:

gas_prices = []

gas_prices.append(346)
gas_prices.append(360)
gas_prices.append(354)

Size vs. Capacity

When you allocate a dynamic array, your dynamic array implementation makes an underlying fixed-size array . The starting size depends on the implementation – let's say our implementation uses 10 indices. Now say we append 4 items to our dynamic array. At this point, our dynamic array has a length of 4. But the underlying array has a length of 10.

We'd say this dynamic array's size is 4 and its capacity is 10. The dynamic array stores an end_index to keep track of where the dynamic array ends and the extra capacity begins.

Doubling Appends

What if we try to append an item but our array's capacity is already full?

To make room, dynamic arrays automatically make a new, bigger underlying array. Usually twice as big.

Why not just extend the existing array? Because taht memory might already be taken by another program.

Each item has to be individually copied into the new array.

Copying each item over costs O(n) time! So whenever appending an item to our dynamic array forces us to make a new double-size underlying array, that append takes O(n) time.

That's the worst case. But in the best case (and the average case), appends are just O(1) time.

Amortized cost of appending

  1. The time cost of each special O(n) "doubling append" doubles each time.
  2. At the same time, the number of O(1) appends you get until the next doubling append also doubles.

These two things sort of "cancel out," and we can say each append has an average cost or amortized cost of O(1).

Given this, in industry we usually wave our hands and say dynamic arrays have a time cost of O(1) for appends, even though strictly speaking that's only true for the average case or the amortized cost.

Date: 2020-01-30 Thu 11:53

Author: Jack Liu

Created: 2020-02-08 Sat 21:25

Validate