numpy 数组 ::
It is sometimes said that Python, compared to low-level languages such as C++, improves development time at the expense of runtime. Fortunately, there are a handful of ways to speed up operation runtime in Python without sacrificing ease of use. One option suited for fast numerical operations is NumPy, which deservedly bills itself as the fundamental package for scientific computing with Python.
有时会说,与低级语言(例如C ++)相比,Python缩短了开发时间,却以运行时为代价。 幸运的是,有几种方法可以在不牺牲易用性的情况下加快Python中的操作运行时间。 NumPy是一种适用于快速数值运算的选项,它值得称赞为使用Python进行科学计算的基本软件包。
Granted, few people would categorize something that takes 50 microseconds (fifty millionths of a second) as “slow.” However, computers might beg to differ. The runtime of an operation taking 50 microseconds (50 μs) falls under the realm of microperformance, which can loosely be defined as operations with a runtime between 1 microsecond and 1 millisecond.
当然,很少有人会将需要50微秒(每秒五十万分之一秒)的东西归为“慢”。 但是,计算机可能会有所不同。 耗时50微秒(50μs)的操作的运行时间属于微性能的范畴,可以将其粗略地定义为运行时间在1微秒至1毫秒之间的操作。
Why does speed matter? The reason that microperformance is worth monitoring is that small differences in runtime become amplified with repeated function calls: an incremental 50 μs of overhead, repeated over 1 million function calls, translates to 50 seconds of incremental runtime.
为什么速度很重要? 值得监视的微性能的原因是,重复的函数调用会放大运行时间中的细微差别:重复50 s的开销(重复执行超过一百万次函数调用)会导致50秒钟的增量运行时间。
When it comes to computation, there are really three concepts that lend NumPy its power:
在计算方面,实际上有三个概念使NumPy具有强大的功能:
- Vectorization
- Broadcasting
- Indexing
- 向量化
- 广播
- 索引编制
In this tutorial, you’ll see step by step how to take advantage of vectorization and broadcasting, so that you can use NumPy to its full capacity. While you will use some indexing in practice here, NumPy’s complete indexing schematics, which extend Python’s slicing syntax, are their own beast. If you’re looking to read more on NumPy indexing, grab some coffee and head to the Indexing section in the NumPy docs.
在本教程中,您将逐步了解如何利用矢量化和广播 ,以便可以充分利用NumPy。 尽管您将在此处实际使用一些索引,但是NumPy的完整索引原理图(它们扩展了Python的切片语法 )是他们自己的野兽。 如果您想了解有关NumPy索引编制的更多信息,请喝杯咖啡,然后转到NumPy文档中的“ 索引编制”部分。
Free Bonus: Click here to get access to a free NumPy Resources Guide that points you to the best tutorials, videos, and books for improving your NumPy skills.
免费红利: 单击此处可获取免费的NumPy资源指南 ,该指南为您提供最佳教程,视频和书籍,以提高NumPy技能。
逐渐成形:NumPy数组简介 (Getting into Shape: Intro to NumPy Arrays)
The fundamental object of NumPy is its ndarray
(or numpy.array
), an n-dimensional array that is also present in some form in array-oriented languages such as Fortran 90, R, and MATLAB, as well as predecessors APL and J.
NumPy的基本对象是它的ndarray
(或numpy.array
),这是一个n维数组,在面向数组的语言(例如Fortran 90,R和MATLAB)以及前身APL和J中也以某种形式存在。
Let’s start things off by forming a 3-dimensional array with 36 elements:
让我们从组成36个元素的3维数组开始:
>>> >>> import import numpy numpy as as np
np
>>> >>> arr arr = = npnp .. arangearange (( 3636 )) .. reshapereshape (( 33 , , 44 , , 33 )
)
>>> >>> arr
arr
array([[[ 0, 1, 2],
array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 3, 4, 5],
[ 6, 7, 8],
[ 6, 7, 8],
[ 9, 10, 11]],
[ 9, 10, 11]],
[[12, 13, 14],
[[12, 13, 14],
[15, 16, 17],
[15, 16, 17],
[18, 19, 20],
[18, 19, 20],
[21, 22, 23]],
[21, 22, 23]],
[[24, 25, 26],
[[24, 25, 26],
[27, 28, 29],
[27, 28, 29],
[30, 31, 32],
[30, 31, 32],
[33, 34, 35]]])
[33, 34, 35]]])
Picturing high-dimensional arrays in two dimensions can be difficult. One intuitive way to think about an array’s shape is to simply “read it from left to right.” arr
is a 3 by 4 by 3 array:
在二维中描绘高维阵列可能很困难。 考虑数组形状的一种直观方法是简单地“从左向右读取”。 arr
是一个3 x 4 x 3数组:
Visually, arr
could be thought of as a container of three 4×3 grids (or a rectangular prism) and would look like this:
在视觉上, arr
可以看作是三个4×3网格 (或直角棱镜)的容器,看起来像这样:
Higher dimensional arrays can be tougher to picture, but they will still follow this “arrays within an array” pattern.
高维数组可能很难描绘,但它们仍将遵循这种“数组中的数组”模式。
Where might you see data with greater than two dimensions?
您在哪里可以看到大于二维的数据?
- Panel data can be represented in three dimensions. Data that tracks attributes of a cohort (group) of individuals over time could be structured as
(respondents, dates, attributes)
. The 1979 National Longitudinal Survey of Youth follows 12,686 respondents over 27 years. Assuming that you have ~500 directly asked or derived data points per individual, per year, this data would have shape(12686, 27, 500)
for a total of 177,604,000 data points. - Color-image data for multiple images is typically stored in four dimensions. Each image is a three-dimensional array of
(height, width, channels)
, where the channels are usually red, green, and blue (RGB) values. A collection of images is then just(image_number, height, width, channels)
. One thousand 256×256 RGB images would have shape(1000, 256, 256, 3)
. (An extended representation is RGBA, where the A–alpha–denotes the level of opacity.)
- 面板数据可以用三个维度表示。 随时间跟踪个人队列(组)的属性的数据可以构造为
(respondents, dates, attributes)
。 1979年全国青年纵向调查追踪了27686年中的12686名受访者。 假设您每个人每年有〜500个直接询问或导出的数据点,则这些数据将具有(12686, 27, 500)
177686,27,500(12686, 27, 500)
形状(12686, 27, 500)
总共177,604,000个数据点。 - 多个图像的彩色图像数据通常以四个维度存储。 每个图像都是
(height, width, channels)
的三维数组,其中通道通常是红色,绿色和蓝色(RGB)值。 然后就是图像的集合(image_number, height, width, channels)
。 一千个256×256 RGB图像将具有形状(1000, 256, 256, 3)
。 (扩展表示为RGBA,其中A–alpha表示不透明度。)
For more detail on real-world examples of high-dimensional data, see Chapter 2 of François Chollet’s Deep Learning with Python.
有关真实世界中高维数据示例的更多详细信息,请参见FrançoisChollet的Python深度学习中的第2章。
什么是向量化? (What is Vectorization?)
Vectorization is a powerful ability within NumPy to express operations as occurring on entire arrays rather than their individual elements. Here’s a concise definition from Wes McKinney:
向量化是NumPy中强大的功能,可将操作表示为发生在整个数组而不是单个元素上。 这是Wes McKinney的简洁定义:
This practice of replacing explicit loops with array expressions is commonly referred to as vectorization. In general, vectorized array operations will often be one or two (or more) orders of magnitude faster than their pure Python equivalents, with the biggest impact [seen] in any kind of numerical computations. [source]
用数组表达式替换显式循环的这种做法通常称为向量化。 通常,向量化数组运算通常比纯Python运算要快一个或两个(或更多)数量级,在任何一种数值计算中影响最大。 [ 来源 ]
When looping over an array or any data structure in Python, there’s a lot of overhead involved. Vectorized operations in NumPy delegate the looping internally to highly optimized C and Fortran functions, making for cleaner and faster Python code.
在Python中循环访问数组或任何数据结构时,涉及很多开销。 NumPy中的矢量化操作在内部将循环委派给高度优化的C和Fortran函数,从而实现了更简洁,更快速的Python代码。
计数:1、2、3…容易 (Counting: Easy as 1, 2, 3…)
As an illustration, consider a 1-dimensional vector of True
and False
for which you want to count the number of “False to True” transitions in the sequence:
举例说明,考虑一个True
和False
一维向量,您要为该向量计算序列中从“ False到True”的过渡次数:
>>> >>> npnp .. randomrandom .. seedseed (( 444444 )
)
>>> >>> x x = = npnp .. randomrandom .. choicechoice ([([ FalseFalse , , TrueTrue ], ], sizesize == 100000100000 )
)
>>> >>> x
x
array([ True, False, True, ..., True, False, True])
array([ True, False, True, ..., True, False, True])
With a Python for-loop, one way to do this would be to evaluate, in pairs, the truth value of each element in the sequence along with the element that comes right after it:
使用Python for循环,一种方法是成对评估序列中每个元素的真值以及紧随其后的元素:
In vectorized form, there’s no explicit for-loop or direct reference to the individual elements:
在矢量化形式中,没有显式的for循环或直接引用各个元素:
>>> >>> npnp .. count_nonzerocount_nonzero (( xx [:[: -- 11 ] ] < < xx [[ 11 :])
:])
24984
24984
How do these two equivalent functions compare in terms of performance? In this particular case, the vectorized NumPy call wins out by a factor of about 70 times:
这两个等效功能在性能方面如何比较? 在这种情况下,矢量化的NumPy调用胜出大约70倍:
Technical Detail: Another term is vector processor, which is related to a computer’s hardware. When I speak about vectorization here, I’m referring to concept of replacing explicit for-loops with array expressions, which in this case can then be computed internally with a low-level language.
技术细节 :另一个术语是矢量处理器 ,它与计算机的硬件有关。 当我在这里谈论向量化时,我指的是用数组表达式替换显式for循环的概念,在这种情况下,可以使用低级语言在内部计算该表达式。
低买高卖 (Buy Low, Sell High)
Here’s another example to whet your appetite. Consider the following classic technical interview problem:
这是另一个增进食欲的例子。 考虑以下经典技术面试问题:
Given a stock’s price history as a sequence, and assuming that you are only allowed to make one purchase and one sale, what is the maximum profit that can be obtained? For example, given
prices = (20, 18, 14, 17, 20, 21, 15)
, the max profit would be 7, from buying at 14 and selling at 21.给定股票的价格历史记录作为序列,并假设只允许您进行一次购买和一次销售,那么可以获得的最大利润是多少? 例如,给定
prices = (20, 18, 14, 17, 20, 21, 15)
,则从14点买入和21点卖出的最大利润为7。
(To all of you finance people: no, short-selling is not allowed.)
(对所有财务人员:不,不允许卖空。)
There is a solution with n-squared time complexity that consists of taking every combination of two prices where the second price “comes after” the first and determining the maximum difference.
存在一种具有n平方时间复杂度的解决方案,该解决方案包括获取两个价格的每个组合,其中第二个价格“出现”在第一个价格之后,然后确定最大差值。
However, there is also an O(n) solution that consists of iterating through the sequence just once and finding the difference between each price and a running minimum. It goes something like this:
但是,还有一个O(n)解决方案,该解决方案包括仅迭代一次序列并找出每个价格与最低运行价格之间的差异。 它是这样的:
>>> >>> def def profitprofit (( pricesprices ):
):
... ... max_px max_px = = 0
0
... ... min_px min_px = = pricesprices [[ 00 ]
]
... ... for for px px in in pricesprices [[ 11 :]:
:]:
... ... min_px min_px = = minmin (( min_pxmin_px , , pxpx )
)
... ... max_px max_px = = maxmax (( px px - - min_pxmin_px , , max_pxmax_px )
)
... ... return return max_px
max_px
>>> >>> prices prices = = (( 2020 , , 1818 , , 1414 , , 1717 , , 2020 , , 2121 , , 1515 )
)
>>> >>> profitprofit (( pricesprices )
)
7
7
Can this be done in NumPy? You bet. But first, let’s build a quasi-realistic example:
可以在NumPy中完成吗? 你打赌 但首先,让我们建立一个准现实的例子:
Here’s what this looks like with matplotlib. The adage is to buy low (green) and sell high (red):
这是matplotlib的样子。 格言是低买(绿色)卖高(红色):
>>> >>> import import matplotlib.pyplot matplotlib.pyplot as as plt
plt
# Warning! This isn't a fully correct solution, but it works for now.
# Warning! This isn't a fully correct solution, but it works for now.
# If the absolute min came after the absolute max, you'd have trouble.
# If the absolute min came after the absolute max, you'd have trouble.
>>> >>> mn mn = = npnp .. argminargmin (( pricesprices )
)
>>> >>> mx mx = = mn mn + + npnp .. argmaxargmax (( pricesprices [[ mnmn :])
:])
>>> >>> kwargs kwargs = = {
{
'markersize''markersize' : : 1212 , , 'linestyle''linestyle' : : '''' }
}
>>> >>> figfig , , ax ax = = pltplt .. subplotssubplots ()
()
>>> >>> axax .. plotplot (( pricesprices )
)
>>> >>> axax .. set_titleset_title (( 'Price History''Price History' )
)
>>> >>> axax .. set_xlabelset_xlabel (( 'Time''Time' )
)
>>> >>> axax .. set_ylabelset_ylabel (( 'Price''Price' )
)
>>> >>> axax .. plotplot (( mnmn , , pricesprices [[ mnmn ], ], colorcolor == 'green''green' , , **** kwargskwargs )
)
>>> >>> axax .. plotplot (( mxmx , , pricesprices [[ mxmx ], ], colorcolor == 'red''red' , , **** kwargskwargs )
)
What does the NumPy implementation look like? While there is no np.cummin()
“directly,” NumPy’s universal functions (ufuncs) all have an accumulate()
method that does what its name implies:
NumPy实现是什么样的? 虽然没有“直接” np.cummin()
,但NumPy的通用函数 (ufuncs)都有一个accumulate()
方法,其名称含义是:
Extending the logic from the pure-Python example, you can find the difference between each price and a running minimum (element-wise), and then take the max of this sequence:
从纯Python示例中扩展逻辑,您可以找到每个价格与运行最小值之间的差值(以元素为单位),然后采用此序列的最大值:
>>> >>> def def profit_with_numpyprofit_with_numpy (( pricesprices ):
):
... ... """Price minus cumulative minimum price, element-wise."""
"""Price minus cumulative minimum price, element-wise."""
... ... prices prices = = npnp .. asarrayasarray (( pricesprices )
)
... ... return return npnp .. maxmax (( prices prices - - cummincummin (( pricesprices ))
))
>>> >>> profit_with_numpyprofit_with_numpy (( pricesprices )
)
44.2487532293278
44.2487532293278
>>> >>> npnp .. allcloseallclose (( profit_with_numpyprofit_with_numpy (( pricesprices ), ), profitprofit (( pricesprices ))
))
True
True
How do these two operations, which have the same theoretical time complexity, compare in actual runtime? First, let’s take a longer sequence. (This doesn’t necessarily need to be a time series of stock prices at this point.)
这两个具有相同理论时间复杂度的操作在实际运行时如何比较? 首先,让我们花更长的时间。 (此时,这不一定是股票价格的时间序列。)
Now, for a somewhat unfair comparison:
现在,进行不公平的比较:
>>> >>> setup setup = = (( 'from __main__ import profit_with_numpy, profit, seq;'
'from __main__ import profit_with_numpy, profit, seq;'
... ... ' import numpy as np'' import numpy as np' )
)
>>> >>> num num = = 250
250
>>> >>> pytime pytime = = timeittimeit (( 'profit(seq)''profit(seq)' , , setupsetup == setupsetup , , numbernumber == numnum )
)
>>> >>> nptime nptime = = timeittimeit (( 'profit_with_numpy(seq)''profit_with_numpy(seq)' , , setupsetup == setupsetup , , numbernumber == numnum )
)
>>> >>> printprint (( 'Speed difference: {:0.1f}x''Speed difference: {:0.1f}x' .. formatformat (( pytime pytime / / nptimenptime ))
))
Speed difference: 76.0x
Speed difference: 76.0x
Above, treating profit_with_numpy()
as pseudocode (without considering NumPy’s underlying mechanics), there are actually three passes through a sequence:
上面,将profit_with_numpy()
当作伪代码(不考虑NumPy的底层机制),实际上有三个遍历一个序列:
cummin(prices)
has O(n) time complexityprices - cummin(prices)
is O(n)max(...)
is O(n)
-
cummin(prices)
)的时间复杂度为O(n) -
prices - cummin(prices)
为O(n) -
max(...)
是O(n)
This reduces to O(n), because O(3n) reduces to just O(n)–the n “dominates” as n approaches infinity.
这减少到O(n),因为O(3n)减少到O(n)–当n接近无穷大时,n“占优势”。
Therefore, these two functions have equivalent worst-case time complexity. (Although, as a side note, the NumPy function comes with significantly more space complexity.) But that is probably the least important takeaway here. One lesson is that, while theoretical time complexity is an important consideration, runtime mechanics can also play a big role. Not only can NumPy delegate to C, but with some element-wise operations and linear algebra, it can also take advantage of computing within multiple threads. But there are a lot of factors at play here, including the underlying library used (BLAS/LAPACK/Atlas), and those details are for a whole ‘nother article entirely.
因此,这两个功能具有等效的最坏情况时间复杂度。 (尽管,作为一个旁注,NumPy函数具有更大的空间复杂性。)但这可能是最不重要的地方。 一个教训是,尽管理论上的时间复杂性是一个重要的考虑因素,但运行时机制也可以发挥重要作用。 NumPy不仅可以委派给C,而且还可以使用按元素进行操作和线性代数,它还可以利用多个线程中的计算优势。 但是这里有很多因素在起作用,包括所使用的基础库(BLAS / LAPACK / Atlas),这些细节完全是针对另一篇文章的。
间奏:了解轴符号 (Intermezzo: Understanding Axes Notation)
In NumPy, an axis refers to a single dimension of a multidimensional array:
在NumPy中,轴是指多维数组的一维:
The terminology around axes and the way in which they are described can be a bit unintuitive. In the documentation for Pandas (a library built on top of NumPy), you may frequently see something like:
围绕轴的术语及其描述方式可能有点不直观。 在Pandas的文档(基于NumPy构建的库)中,您可能经常看到类似以下内容:
axis : {'index' (0), 'columns' (1)}
axis : {'index' (0), 'columns' (1)}
You could argue that, based on this description, the results above should be “reversed.” However, the key is that axis
refers to the axis along which a function gets called. This is well articulated by Jake VanderPlas:
您可能会争辩说,基于此描述,以上结果应“颠倒”。 但是,关键是axis
是指沿其调用函数的轴。 杰克·范德普拉斯(Jake VanderPlas)很好地阐明了这一点:
The way the axis is specified here can be confusing to users coming from other languages. The axis keyword specifies the dimension of the array that will be collapsed, rather than the dimension that will be returned. So, specifying
axis=0
means that the first axis will be collapsed: for two-dimensional arrays, this means that values within each column will be aggregated. [source]此处指定轴的方式可能会使其他语言的用户感到困惑。 axis关键字指定将折叠的数组的维数,而不是将返回的维数。 因此,指定
axis=0
意味着将折叠第一个轴:对于二维数组,这意味着将聚合每列内的值。 [ 来源 ]
In other words, summing an array for axis=0
collapses the rows of the array with a column-wise computation.
换句话说,对axis=0
的数组求和将通过逐列计算折叠该数组的行。
With this distinction in mind, let’s move on to explore the concept of broadcasting.
考虑到这一区别,让我们继续探索广播的概念。
广播 (Broadcasting)
Broadcasting is another important NumPy abstraction. You’ve already seen that operations between two NumPy arrays (of equal size) operate element-wise:
广播是另一个重要的NumPy抽象。 您已经看到两个NumPy数组(大小相等)之间的操作按元素进行操作:
>>> >>> a a = = npnp .. arrayarray ([([ 1.51.5 , , 2.52.5 , , 3.53.5 ])
])
>>> >>> b b = = npnp .. arrayarray ([([ 10.10. , , 5.5. , , 1.1. ])
])
>>> >>> a a / / b
b
array([0.15, 0.5 , 3.5 ])
array([0.15, 0.5 , 3.5 ])
But, what about unequally sized arrays? This is where broadcasting comes in:
但是,大小不相等的数组呢? 这是广播的来源:
The term broadcasting describes how NumPy treats arrays with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes. Broadcasting provides a means of vectorizing array operations so that looping occurs in C instead of Python. [source]
术语广播描述了NumPy在算术运算期间如何处理具有不同形状的数组。 受到某些约束,较小的阵列在较大的阵列上“广播”,因此它们具有兼容的形状。 广播提供了一种对数组操作进行矢量化的方法,从而使循环在C而不是Python中发生。 [ 来源 ]
The way in which broadcasting is implemented can become tedious when working with more than two arrays. However, if there are just two arrays, then their ability to be broadcasted can be described with two short rules:
当使用两个以上的阵列时,实现广播的方式可能会变得乏味。 但是,如果只有两个数组,则可以使用两个简短规则来描述它们的广播能力:
When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing dimensions and works its way forward. Two dimensions are compatible when:
在两个数组上进行操作时,NumPy逐元素比较其形状。 它从尾随尺寸开始,一直向前发展。 在以下情况下,两个维度兼容:
- they are equal, or
- one of them is 1
- 它们相等,或者
- 其中之一是1
That’s all there is to it.
这里的所有都是它的。
Let’s take a case where we want to subtract each column-wise mean of an array, element-wise:
让我们来看一下我们要减去数组的每个列方向均值,元素方向的情况:
In statistical jargon, sample
consists of two samples (the columns) drawn independently from two populations with means of 2 and 20, respectively. The column-wise means should approximate the population means (albeit roughly, because the sample is small):
用统计学术语来说, sample
由两个样本(列)组成,分别从两个总体中分别以2和20的平均值绘制。 按列的均值应近似于总体均值(尽管很粗,因为样本很小):
>>> >>> mu mu = = samplesample .. meanmean (( axisaxis == 00 )
)
>>> >>> mu
mu
array([ 2.7486, 20.0584])
array([ 2.7486, 20.0584])
Now, subtracting the column-wise means is straightforward because broadcasting rules check out:
现在,减去逐列均值就很简单了,因为广播规则会检查:
Here’s an illustration of subtracting out column-wise means, where a smaller array is “stretched” so that it is subtracted from each row of the larger array:
这是一个减去列方式的示例,其中“拉伸”了一个较小的数组,以便从较大数组的每一行中减去它:
Technical Detail: The smaller-sized array or scalar is not literally stretched in memory: it is the computation itself that is repeated.
技术细节 :较小的数组或标量实际上没有在内存中扩展:重复的是计算本身。
This extends to standardizing each column as well, making each cell a z-score relative to its respective column:
这也扩展到了标准化每列,使每个像元相对于其相应列为z得分:
>>> >>> (( sample sample - - samplesample .. meanmean (( axisaxis == 00 )) )) / / samplesample .. stdstd (( axisaxis == 00 )
)
array([[-1.2825, 0.6605],
array([[-1.2825, 0.6605],
[ 0.1251, -1.4132],
[ 0.1251, -1.4132],
[ 1.1574, 0.7527]])
[ 1.1574, 0.7527]])
However, what if you want to subtract out, for some reason, the row-wise minimums? You’ll run into a bit of trouble:
但是,如果由于某种原因要减去逐行最小值怎么办? 您会遇到一些麻烦:
The problem here is that the smaller array, in its current form, cannot be “stretched” to be shape-compatible with sample
. You actually need to expand its dimensionality to meet the broadcasting rules above:
这里的问题是,以其当前形式的较小数组无法“拉伸”为与sample
形状兼容。 实际上,您需要扩展其尺寸,以满足上面的广播规则:
>>> >>> samplesample .. minmin (( axisaxis == 11 )[:, )[:, NoneNone ] ] # 3 minimums across 3 rows
# 3 minimums across 3 rows
array([[1.816 ],
array([[1.816 ],
[2.8395],
[2.8395],
[3.5901]])
[3.5901]])
>>> >>> sample sample - - samplesample .. minmin (( axisaxis == 11 )[:, )[:, NoneNone ]
]
array([[ 0. , 21.887 ],
array([[ 0. , 21.887 ],
[ 0. , 9.4212],
[ 0. , 9.4212],
[ 0. , 20.6214]])
[ 0. , 20.6214]])
Note: [:, None]
is a means by which to expand the dimensionality of an array, to create an axis of length one. np.newaxis is an alias for None
.
注意 : [:, None]
是一种扩展数组维数,创建长度为1的轴的方法。 np.newaxis是None
的别名。
There are some significantly more complex cases, too. Here’s a more rigorous definition of when any arbitrary number of arrays of any shape can be broadcast together:
还有一些明显更复杂的情况。 这是何时可以一起广播任意形状的任意数量的数组的更严格定义:
A set of arrays is called “broadcastable” to the same shape if the following rules produce a valid result, meaning one of the following is true:
如果以下规则产生有效的结果,则一组称为“可广播”的数组具有相同的形状,这意味着以下条件之一成立 :
The arrays all have exactly the same shape.
The arrays all have the same number of dimensions, and the length of each dimension is either a common length or 1.
The arrays that have too few dimensions can have their shapes prepended with a dimension of length 1 to satisfy property #2.
阵列均具有完全相同的形状。
数组的维数均相同,每个维的长度可以是公共长度或1。
尺寸过小的数组可以使其形状的长度为1,以满足属性#2的要求。
[source]
[ 来源 ]
This is easier to walk through step by step. Let’s say you have the following four arrays:
这更容易逐步进行。 假设您有以下四个数组:
Before checking shapes, NumPy first converts scalars to arrays with one element:
在检查形状之前,NumPy首先将标量转换为具有一个元素的数组:
>>> >>> arrays arrays = = [[ npnp .. atleast_1datleast_1d (( arrarr ) ) for for arr arr in in (( aa , , bb , , cc , , dd )]
)]
>>> >>> for for arr arr in in arraysarrays :
:
... ... printprint (( arrarr .. shapeshape )
)
...
...
(10, 1)
(10, 1)
(1, 10)
(1, 10)
(10, 1)
(10, 1)
(1,)
(1,)
Now we can check criterion #1. If all of the arrays have the same shape, a set
of their shapes will condense down to one element, because the set()
constructor effectively drops duplicate items from its input. This criterion is clearly not met:
现在我们可以检查标准#1。 如果所有数组都具有相同的形状,则一set
形状将浓缩为一个元素,因为set()
构造函数有效地从其输入中删除重复项。 显然不符合此标准:
The first part of criterion #2 also fails, meaning the entire criterion fails:
准则#2的第一部分也失败了,这意味着整个准则失败了:
>>> >>> lenlen (( setset (((( arrarr .. ndimndim ) ) for for arr arr in in arraysarrays )) )) == == 1
1
False
False
The final criterion is a bit more involved:
最终标准涉及更多:
The arrays that have too few dimensions can have their shapes prepended with a dimension of length 1 to satisfy property #2.
尺寸过小的数组可以使其形状的长度为1,以满足属性#2的要求。
To codify this, you can first determine the dimensionality of the highest-dimension array and then prepend ones to each shape
tuple until all are of equal dimension:
为了对此进行编码,您可以首先确定最高维数组的维数,然后在每个shape
元组之前添加维数,直到所有维数相等为止:
Finally, you need to test that the length of each dimension is either (drawn from) a common length, or 1. A trick for doing this is to first mask the array of “shape-tuples” in places where it equals one. Then, you can check if the peak-to-peak (np.ptp()
) column-wise differences are all zero:
最后,您需要测试每个尺寸的长度是(从)公共长度(或从其得出)还是1。执行此操作的一个技巧是首先在“ shape-tuples”数组相等的位置上对其进行遮盖。 然后,您可以检查峰峰值( np.ptp()
)列差异是否全部为零:
>>> >>> masked masked = = npnp .. mama .. masked_wheremasked_where (( shapes shapes == == 11 , , shapesshapes )
)
>>> >>> npnp .. allall (( maskedmasked .. ptpptp (( axisaxis == 00 ) ) == == 00 ) ) # ptp: max - min
# ptp: max - min
True
True
Encapsulated in a single function, this logic looks like this:
封装在单个函数中,此逻辑如下所示:
Luckily, you can take a shortcut and use np.broadcast()
for this sanity-check, although it’s not explicitly designed for this purpose:
幸运的是,您可以使用快捷方式并使用np.broadcast()
进行健全性检查,尽管它不是为此目的而明确设计的:
>>> >>> def def can_broadcastcan_broadcast (( ** arraysarrays ) ) -> -> boolbool :
:
... ... trytry :
:
... ... npnp .. broadcastbroadcast (( ** arraysarrays )
)
... ... return return True
True
... ... except except ValueErrorValueError :
:
... ... return return False
False
...
...
>>> >>> can_broadcastcan_broadcast (( aa , , bb , , cc , , dd )
)
True
True
For those interested in digging a little deeper, PyArray_Broadcast is the underlying C function that encapsulates broadcasting rules.
对于有兴趣深入研究的人, PyArray_Broadcast是封装广播规则的基础C函数。
数组编程的实际应用:示例 (Array Programming in Action: Examples)
In the following 3 examples, you’ll put vectorization and broadcasting to work with some real-world applications.
在下面的3个示例中,您将向量化和广播与某些实际应用程序一起使用。
聚类算法 (Clustering Algorithms)
Machine learning is one domain that can frequently take advantage of vectorization and broadcasting. Let’s say that you have the vertices of a triangle (each row is an x, y coordinate):
机器学习是可以经常利用矢量化和广播的领域。 假设您有一个三角形的顶点(每行都是一个x,y坐标):
The centroid of this “cluster” is an (x, y) coordinate that is the arithmetic mean of each column:
此“簇”的质心是(x,y)坐标,它是每一列的算术平均值:
>>> >>> centroid centroid = = tritri .. meanmean (( axisaxis == 00 )
)
>>> >>> centroid
centroid
array([2. , 1.6667])
array([2. , 1.6667])
It’s helpful to visualize this:
将此可视化将很有帮助:
Many clustering algorithms make use of Euclidean distances of a collection of points, either to the origin or relative to their centroids.
许多聚类算法都利用了点集合的欧几里得距离,即到原点或相对于它们的质心。
In Cartesian coordinates, the Euclidean distance between points p and q is:
在笛卡尔坐标中,点p和q之间的欧几里得距离为:
[source: Wikipedia]
[资料来源:维基百科 ]
So for the set of coordinates in tri
from above, the Euclidean distance of each point from the origin (0, 0) would be:
因此,对于上面从tri
开始的一组坐标,每个点到原点(0,0)的欧几里得距离为:
>>> >>> npnp .. sumsum (( tritri **** 22 , , axisaxis == 11 ) ) ** ** 0.5 0.5 # Or: np.sqrt(np.sum(np.square(tri), 1))
# Or: np.sqrt(np.sum(np.square(tri), 1))
array([1.4142, 3.1623, 3.6056])
array([1.4142, 3.1623, 3.6056])
You may recognize that we are really just finding Euclidean norms:
您可能认识到,我们实际上只是在寻找欧几里得准则:
Instead of referencing the origin, you could also find the norm of each point relative to the triangle’s centroid:
除了参考原点,您还可以找到每个点相对于三角形质心的范数:
>>> >>> npnp .. linalglinalg .. normnorm (( tri tri - - centroidcentroid , , axisaxis == 11 )
)
array([1.2019, 1.2019, 1.3333])
array([1.2019, 1.2019, 1.3333])
Finally, let’s take this one step further: let’s say that you have a 2d array X
and a 2d array of multiple (x, y) “proposed” centroids. Algorithms such as K-Means clustering work by randomly assigning initial “proposed” centroids, then reassigning each data point to its closest centroid. From there, new centroids are computed, with the algorithm converging on a solution once the re-generated labels (an encoding of the centroids) are unchanged between iterations. A part of this iterative process requires computing the Euclidean distance of each point from each centroid:
最后,让我们更进一步:假设您有一个2d数组X
和一个由多个(x,y)“建议”质心组成的2d数组。 诸如K-Means聚类之类的算法通过随机分配初始“建议”质心,然后将每个数据点重新分配给其最接近的质心来工作。 从那里开始,计算新的质心,一旦在迭代之间重新生成的标签(质心的编码)不变,该算法就会收敛于解决方案。 此迭代过程的一部分要求计算每个点到每个质心的欧几里得距离:
In other words, we want to answer the question, to which centroid does each point within X
belong? We need to do some reshaping to enable broadcasting here, in order to calculate the Euclidean distance between each point in X
and each point in centroids
:
换句话说,我们要回答这个问题, X
中的每个点属于哪个质心? 我们需要进行一些调整以在此处启用广播,以便计算X
每个点与centroids
每个点之间的欧几里得距离:
>>> >>> centroidscentroids [:, [:, NoneNone ]
]
array([[[ 5, 5]],
array([[[ 5, 5]],
[[10, 10]]])
[[10, 10]]])
>>> >>> centroidscentroids [:, [:, NoneNone ]] .. shape
shape
(2, 1, 2)
(2, 1, 2)
This enables us to cleanly subtract one array from another using a combinatoric product of their rows:
这使我们能够使用行的组合积从另一个数组中干净地减去一个数组:
In other words, the shape of X - centroids[:, None]
is (2, 10, 2)
, essentially representing two stacked arrays that are each the size of X
. Next, we want the label (index number) of each closest centroid, finding the minimum distance on the 0th axis from the array above:
换句话说, X - centroids[:, None]
的形状为X - centroids[:, None]
(2, 10, 2)
,本质上表示两个堆叠的数组,每个数组的大小都为X
接下来,我们想要每个最接近的质心的标签(索引号),找到与上面的数组在第0轴上的最小距离:
>>> >>> npnp .. argminargmin (( npnp .. linalglinalg .. normnorm (( X X - - centroidscentroids [:, [:, NoneNone ], ], axisaxis == 22 ), ), axisaxis == 00 )
)
array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1])
array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1])
You can put all this together in functional form:
您可以将所有这些功能形式整合在一起:
Let’s inspect this visually, plotting both the two clusters and their assigned labels with a color-mapping:
让我们直观地检查一下,用颜色映射绘制两个聚类及其分配的标签:
>>> >>> c1c1 , , c2 c2 = = [[ '#bc13fe''#bc13fe' , , '#be0119''#be0119' ] ] # https://xkcd.com/color/rgb/
# https://xkcd.com/color/rgb/
>>> >>> llimllim , , ulim ulim = = npnp .. trunctrunc ([([ XX .. minmin () () * * 0.90.9 , , XX .. maxmax () () * * 1.11.1 ])
])
>>> >>> __ , , ax ax = = pltplt .. subplotssubplots (( figsizefigsize == (( 55 , , 55 ))
))
>>> >>> axax .. scatterscatter (( ** XX .. TT , , cc == npnp .. wherewhere (( labelslabels , , c2c2 , , c1c1 ), ), alphaalpha == 0.40.4 , , ss == 8080 )
)
>>> >>> axax .. scatterscatter (( ** centroidscentroids .. TT , , cc == [[ c1c1 , , c2c2 ], ], markermarker == 's''s' , , ss == 9595 ,
,
... ... edgecoloredgecolor == 'yellow''yellow' )
)
>>> >>> axax .. set_ylimset_ylim ([([ llimllim , , ulimulim ])
])
>>> >>> axax .. set_xlimset_xlim ([([ llimllim , , ulimulim ])
])
>>> >>> axax .. set_titleset_title (( 'One K-Means Iteration: Predicted Classes''One K-Means Iteration: Predicted Classes' )
)
摊销表 (Amortization Tables)
Vectorization has applications in finance as well.
向量化在金融领域也有应用。
Given an annualized interest rate, payment frequency (times per year), initial loan balance, and loan term, you can create an amortization table with monthly loan balances and payments, in a vectorized fashion. Let’s set some scalar constants first:
给定年度利率,还款频率(每年的次数),初始贷款余额和贷款期限,您可以以向量化方式创建包含每月贷款余额和还款额的摊销表。 让我们首先设置一些标量常量:
NumPy comes preloaded with a handful of financial functions that, unlike their Excel cousins, are capable of producing vector outputs.
NumPy预先加载了一些财务函数 ,这些函数与Excel表亲不同,它们能够产生矢量输出。
The debtor (or lessee) pays a constant monthly amount that is composed of a principal and interest component. As the outstanding loan balance declines, the interest portion of the total payment declines with it.
债务人(或承租人)每月支付固定金额,该金额由本金和利息组成。 随着未偿还贷款余额的减少,总付款中的利息部分随之减少。
>>> >>> periods periods = = npnp .. arangearange (( 11 , , nper nper + + 11 , , dtypedtype == intint )
)
>>> >>> principal principal = = npnp .. ppmtppmt (( raterate , , periodsperiods , , npernper , , pvpv )
)
>>> >>> interest interest = = npnp .. ipmtipmt (( raterate , , periodsperiods , , npernper , , pvpv )
)
>>> >>> pmt pmt = = principal principal + + interest interest # Or: pmt = np.pmt(rate, nper, pv)
# Or: pmt = np.pmt(rate, nper, pv)
Next, you’ll need to calculate a monthly balance, both before and after that month’s payment, which can be defined as the future value of the original balance minus the future value of an annuity (a stream of payments), using a discount factor d:
接下来,您需要计算当月付款前后的每月余额,可以将其定义为原始余额的未来价值减去年金 (支付流) 的未来价值。 d:
Functionally, this looks like:
从功能上看,这看起来像:
Finally, you can drop this into a tabular format with a Pandas DataFrame. Be careful with signs here. PMT
is an outflow from the perspective of the debtor.
最后,您可以使用Pandas DataFrame将其放入表格格式。 注意此处的标志。 从债务人的角度来看, PMT
是一种流出。
>>> >>> import import pandas pandas as as pd
pd
>>> >>> cols cols = = [[ 'beg_bal''beg_bal' , , 'prin''prin' , , 'interest''interest' , , 'end_bal''end_bal' ]
]
>>> >>> data data = = [[ balancebalance (( pvpv , , raterate , , periods periods - - 11 , , -- pmtpmt ),
),
... ... principalprincipal ,
,
... ... interestinterest ,
,
... ... balancebalance (( pvpv , , raterate , , periodsperiods , , -- pmtpmt )]
)]
>>> >>> table table = = pdpd .. DataFrameDataFrame (( datadata , , columnscolumns == periodsperiods , , indexindex == colscols )) .. T
T
>>> >>> tabletable .. indexindex .. name name = = 'month'
'month'
>>> >>> with with pdpd .. option_contextoption_context (( 'display.max_rows''display.max_rows' , , 66 ):
):
... ... # Note: Using floats for $$ in production-level code = bad
# Note: Using floats for $$ in production-level code = bad
... ... printprint (( tabletable .. roundround (( 22 ))
))
...
...
beg_bal prin interest end_bal
beg_bal prin interest end_bal
month
month
1 200000.00 -172.20 -1125.00 199827.80
1 200000.00 -172.20 -1125.00 199827.80
2 199827.80 -173.16 -1124.03 199654.64
2 199827.80 -173.16 -1124.03 199654.64
3 199654.64 -174.14 -1123.06 199480.50
3 199654.64 -174.14 -1123.06 199480.50
... ... ... ... ... ... ... ... ...
...
358 3848.22 -1275.55 -21.65 2572.67
358 3848.22 -1275.55 -21.65 2572.67
359 2572.67 -1282.72 -14.47 1289.94
359 2572.67 -1282.72 -14.47 1289.94
360 1289.94 -1289.94 -7.26 -0.00
360 1289.94 -1289.94 -7.26 -0.00
At the end of year 30, the loan is paid off:
在第30年末,还清了贷款:
Note: While using floats to represent money can be useful for concept illustration in a scripting environment, using Python floats for financial calculations in a production environment might cause your calculation to be a penny or two off in some cases.
注意 :虽然使用浮点数表示金钱对于脚本环境中的概念图很有用,但是在生产环境中使用Python浮点数进行财务计算在某些情况下可能会使您的费用减少一两美元。
图像特征提取 (Image Feature Extraction)
In one final example, we’ll work with an October 1941 image of the USS Lexington (CV-2), the wreck of which was discovered off the coast of Australia in March 2018. First, we can map the image into a NumPy array of its pixel values:
在最后一个示例中,我们将处理1941年10月列克星敦号(CV-2)的图像 ,该图像的残骸于2018年3月在澳大利亚沿海发现。首先,我们可以将图像映射到NumPy数组中其像素值:
>>> >>> from from skimage skimage import import io
io
>>> >>> url url = = (( 'https://www.history.navy.mil/bin/imageDownload?image=/'
'https://www.history.navy.mil/bin/imageDownload?image=/'
... ... 'content/dam/nhhc/our-collections/photography/images/'
'content/dam/nhhc/our-collections/photography/images/'
... ... '80-G-410000/80-G-416362&rendition=cq5dam.thumbnail.319.319.png''80-G-410000/80-G-416362&rendition=cq5dam.thumbnail.319.319.png' )
)
>>> >>> img img = = ioio .. imreadimread (( urlurl , , as_greyas_grey == TrueTrue )
)
>>> >>> figfig , , ax ax = = pltplt .. subplotssubplots ()
()
>>> >>> axax .. imshowimshow (( imgimg , , cmapcmap == 'gray''gray' )
)
>>> >>> axax .. gridgrid (( FalseFalse )
)
For simplicity’s sake, the image is loaded in grayscale, resulting in a 2d array of 64-bit floats rather than a 3-dimensional MxNx4 RGBA array, with lower values denoting darker spots:
为简单起见,图像以灰度加载,从而生成64位浮点数的2d数组,而不是3维MxNx4 RGBA数组,其中较低的值表示较暗的点:
One technique commonly employed as an intermediary step in image analysis is patch extraction. As the name implies, this consists of extracting smaller overlapping sub-arrays from a larger array and can be used in cases where it is advantageous to “denoise” or blur an image.
通常在图像分析中用作中间步骤的一种技术是斑块提取。 顾名思义,这包括从较大的数组中提取较小的重叠子数组,并且可以用于“去噪”或模糊图像的情况。
This concept extends to other fields, too. For example, you’d be doing something similar by taking “rolling” windows of a time series with multiple features (variables). It’s even useful for building Conway’s Game of Life. (Although, convolution with a 3×3 kernel is a more direct approach.)
这个概念也扩展到其他领域。 例如,您可以通过“滚动”具有多个功能(变量)的时间序列的窗口来做类似的事情。 甚至对于构建Conway的《人生游戏》也很有用。 (不过,使用3×3内核进行卷积是更直接的方法。)
Here, we will find the mean of each overlapping 10×10 patch within img
. Taking a miniature example, the first 3×3 patch array in the top-left corner of img
would be:
在这里,我们将找到img
中每个重叠的10×10色块的平均值。 以一个微型示例为例,在img
左上角的第一个3×3补丁数组将是:
>>> >>> imgimg [:[: 33 , , :: 33 ]
]
array([[0.8078, 0.7961, 0.7804],
array([[0.8078, 0.7961, 0.7804],
[0.8039, 0.8157, 0.8078],
[0.8039, 0.8157, 0.8078],
[0.7882, 0.8 , 0.7961]])
[0.7882, 0.8 , 0.7961]])
>>> >>> imgimg [:[: 33 , , :: 33 ]] .. meanmean ()
()
0.7995642701525054
0.7995642701525054
The pure-Python approach to creating sliding patches would involve a nested for-loop. You’d need to consider that the starting index of the right-most patches will be at index n - 3 + 1
, where n
is the width of the array. In other words, if you were extracting 3×3 patches from a 10×10 array called arr
, the last patch taken would be from arr[7:10, 7:10]
. Also keep in mind that Python’s range()
does not include its stop
parameter:
用于创建滑动补丁的纯Python方法将涉及嵌套的for循环。 您需要考虑最右边补丁的起始索引为索引n - 3 + 1
,其中n
是数组的宽度。 换句话说,如果要从10×10数组arr
中提取3×3色块,则最后一个色块将从arr[7:10, 7:10]
提取。 还请记住,Python的range()
不包含其stop
参数:
With this loop, you’re performing a lot of Python calls.
通过此循环,您将执行许多Python调用。
An alternative that will be scalable to larger RGB or RGBA images is NumPy’s stride_tricks
.
NumPy的stride_tricks
可以扩展到更大的RGB或RGBA图像。
An instructive first step is to visualize, given the patch size and image shape, what a higher-dimensional array of patches would look like. We have a 2d array img
with shape (254, 319)
and a (10, 10)
2d patch. This means our output shape (before taking the mean of each “inner” 10×10 array) would be:
具有指导意义的第一步是,在给定补丁大小和图像形状的情况下,可视化补丁的高维数组的外观。 我们有一个形状为(254, 319)
的二维数组img
和一个(10, 10)
二维补丁。 这意味着我们的输出形状(在取每个“内部” 10×10数组的平均值之前)将是:
>>> >>> shape shape = = (( imgimg .. shapeshape [[ 00 ] ] - - size size + + 11 , , imgimg .. shapeshape [[ 11 ] ] - - size size + + 11 , , sizesize , , sizesize )
)
>>> >>> shape
shape
(245, 310, 10, 10)
(245, 310, 10, 10)
You also need to specify the strides of the new array. An array’s strides is a tuple of bytes to jump in each dimension when moving along the array. Each pixel in img
is a 64-bit (8-byte) float, meaning the total image size is 254 x 319 x 8 = 648,208 bytes.
您还需要指定新数组的步幅 。 数组的步幅是沿数组移动时在每个维度上跳转的字节元组。 img
中的每个像素都是一个64位(8字节)浮点数,这意味着总图像大小为254 x 319 x 8 = 648,208字节。
Internally, img
is kept in memory as one contiguous block of 648,208 bytes. strides
is hence a sort of “metadata”-like attribute that tells us how many bytes we need to jump ahead to move to the next position along each axis. We move in blocks of 8 bytes along the rows but need to traverse 8 x 319 = 2,552 bytes to move “down” from one row to another.
在内部, img
作为一个648,208字节的连续块保存在内存中。 因此, strides
是一种类似于“元数据”的属性,它告诉我们需要向前跳跃多少个字节才能沿每个轴移动到下一个位置。 我们沿着行以8字节为块移动,但是需要遍历8 x 319 = 2,552字节才能从一行向下“移动”到另一行。
>>> >>> imgimg .. strides
strides
(2552, 8)
(2552, 8)
In our case, the strides of the resulting patches will just repeat the strides of img
twice:
在我们的例子中,所得补丁的跨度将重复img
两次:
Now, let’s put these pieces together with NumPy’s stride_tricks:
现在,让我们将这些片段与NumPy的stride_tricks放在一起:
>>> >>> from from numpy.lib numpy.lib import import stride_tricks
stride_tricks
>>> >>> patches patches = = stride_tricksstride_tricks .. as_stridedas_strided (( imgimg , , shapeshape == shapeshape , , stridesstrides == stridesstrides )
)
>>> >>> patchespatches .. shape
shape
(245, 310, 10, 10)
(245, 310, 10, 10)
Here’s the first 10×10 patch:
这是第一个10×10补丁:
The last step is tricky. To get a vectorized mean of each inner 10×10 array, we need to think carefully about the dimensionality of what we have now. The result should collapse the last two dimensions so that we’re left with a single 245×310 array.
最后一步很棘手。 为了获得每个内部10×10数组的矢量化均值,我们需要仔细考虑我们现在所拥有的维数。 结果应该折叠最后两个维度,以便我们剩下单个245×310数组。
One (suboptimal) way would be to reshape patches
first, flattening the inner 2d arrays to length-100 vectors, and then computing the mean on the final axis:
一种(次优)的方法是先整形patches
,将内部2d数组展平为length-100个向量,然后在最终轴上计算平均值:
>>> >>> veclen veclen = = size size ** ** 2
2
>>> >>> patchespatches .. reshapereshape (( ** patchespatches .. shapeshape [:[: 22 ], ], veclenveclen )) .. meanmean (( axisaxis =-=- 11 )) .. shape
shape
(245, 310)
(245, 310)
However, you can also specify axis
as a tuple, computing a mean over the last two axes, which should be more efficient than reshaping:
但是,您也可以将axis
指定为元组,并计算最后两个轴的平均值,这比重塑更为有效:
Let’s make sure this checks out by comparing equality to our looped version. It does:
让我们通过将相等性与我们的循环版本进行比较来确保检查出来。 它确实:
>>> >>> strided_means strided_means = = patchespatches .. meanmean (( axisaxis == (( -- 11 , , -- 22 ))
))
>>> >>> npnp .. allcloseallclose (( patch_meanspatch_means , , strided_meansstrided_means )
)
True
True
If the concept of strides has you drooling, don’t worry: Scikit-Learn has already embedded this entire process nicely within its feature_extraction
module.
如果迈步的概念令人垂涎,请不要担心:Scikit-Learn已将整个过程很好地嵌入了其feature_extraction
模块中。
离别的想法:不要过度优化 (A Parting Thought: Don’t Over-Optimize)
In this article, we discussed optimizing runtime by taking advantage of array programming in NumPy. When you are working with large datasets, it’s important to be mindful of microperformance.
在本文中,我们讨论了如何利用NumPy中的数组编程来优化运行时。 当使用大型数据集时,务必要注意微观性能。
However, there is a subset of cases where avoiding a native Python for-loop isn’t possible. As Donald Knuth advised, “Premature optimization is the root of all evil.” Programmers may incorrectly predict where in their code a bottleneck will appear, spending hours trying to fully vectorize an operation that would result in a relatively insignificant improvement in runtime.
但是,在少数情况下,无法避免使用本机Python for循环。 正如Donald Knuth所建议的那样 :“过早的优化是万恶之源。” 程序员可能会错误地预测瓶颈在代码中出现的位置,从而花费大量时间尝试完全向量化操作,这将导致运行时间的相对微不足道的改善。
There’s nothing wrong with for-loops sprinkled here and there. Often, it can be more productive to think instead about optimizing the flow and structure of the entire script at a higher level of abstraction.
随处可见的for循环并没有错。 通常,以更高的抽象级别考虑优化整个脚本的流程和结构可能会更有效率。
更多资源 (More Resources)
Free Bonus: Click here to get access to a free NumPy Resources Guide that points you to the best tutorials, videos, and books for improving your NumPy skills.
免费红利: 单击此处可获取免费的NumPy资源指南 ,该指南为您提供最佳教程,视频和书籍,以提高NumPy技能。
NumPy Documentation:
NumPy文档:
- What is NumPy?
- Broadcasting
- Universal functions
- NumPy for MATLAB Users
- The complete NumPy Reference index
- 什么是NumPy?
- 广播
- 通用功能
- 适用于MATLAB用户的NumPy
- 完整的NumPy参考索引
Books:
图书:
- Travis Oliphant’s Guide to NumPy, 2nd ed. (Travis is the primary creator of NumPy)
- Chapter 2 (“Introduction to NumPy”) of Jake VanderPlas’ Python Data Science Handbook
- Chapter 4 (“NumPy Basics”) and Chapter 12 (“Advanced NumPy”) of Wes McKinney’s Python for Data Analysis 2nd ed.
- Chapter 2 (“The Mathematical Building Blocks of Neural Networks”) from François Chollet’s Deep Learning with Python
- Robert Johansson’s Numerical Python
- Ivan Idris: Numpy Beginner’s Guide, 3rd ed.
- Travis Oliphant的NumPy指南,第二版。 (Travis是NumPy的主要创建者)
- Jake VanderPlas的Python数据科学手册第2章(“ NumPy简介”)
- Wes McKinney的数据分析Python第2版的第4章(“ NumPy基础”)和第12章(“ Advanced NumPy”) 。
- FrançoisChollet的Python深度学习的第2章(“神经网络的数学构建基块”)
- 罗伯特·约翰逊的数值Python
- 伊万·伊德里斯(Ivan Idris):《 Numpy初学者指南》,第三版。
Other Resources:
其他资源:
- Wikipedia: Array Programming
- SciPy Lecture Notes: Basic and Advanced NumPy
- EricsBroadcastingDoc: Array Broadcasting in NumPy
- SciPy Cookbook: Views versus copies in NumPy
- Nicolas Rougier: From Python to Numpy and 100 NumPy Exercises
- TensorFlow docs: Broadcasting Semantics
- Theano docs: Broadcasting
- Eli Bendersky: Broadcasting Arrays in Numpy
- 维基百科: 数组编程
- SciPy讲义: 基本和高级 NumPy
- EricsBroadcastingDoc: NumPy中的阵列广播
- SciPy食谱: NumPy中的视图与副本
- 尼古拉斯·鲁吉尔(Nicolas Rougier): 从Python到Numpy和100项NumPy练习
- TensorFlow文档: 广播语义
- Theano docs: 广播
- Eli Bendersky: 在Numpy中广播数组
翻译自: https://www.pybloggers.com/2018/04/look-ma-no-for-loops-array-programming-with-numpy/
numpy 数组 ::
今天的文章numpy数组的常用的数学函数_string字符串转为数组[通俗易懂]分享到此就结束了,感谢您的阅读。
版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 举报,一经查实,本站将立刻删除。
如需转载请保留出处:https://bianchenghao.cn/85945.html