AIX内存管理及调优


AIX下内存种类
在AIX的虚拟内存管理中将内存区分为以下几种类型:
(1)working memory 工作区,computational memory 计算区
又被称为Working Storage,working memory,computational memory或者Working Segments,即计算内存,它是指
在执行程序的时候,AIX虚拟内存管理为程序栈临时分配的数据空间。在程序执行过程中,数据在随着指令执行而变化,并且
可能被交换区管理程序调入、换出(Page In,Page Out)物理内存。AIX内核中可以调入、换出的部分也属于Working Segments。

 

(2)persistent 或permanent ,文件区/缓存区
又叫做文件内存、缓存,是AIX虚拟内存管理中为了提高文件IO处理速度而准备的磁盘缓存(AIX会用所有可用的物理内存为文件缓存)。
里面暂时存放文件系统数据。当进程打开文件的时候,数据也被调入(Page In)到此区域,如果进程对文件进行了修改,此页被标记
修改过(Dirty),那么最后直接换出(Page Out)到文件数据块所存放的位置(刷新磁盘文件)而不是普通的交换区;如果数据没有
被修改,则可以直接丢弃,也不会写到交换区。程序使用的数据文件,以及程序代码本身,两者都是Persistent Objects,不同的是
程序代码本身永远不会被修改(程序编译的时候例外),因此不需要被交换出去,在某些时候,如果优先释放这一类内存空间,将减少
Page Out动作中磁盘写操作(因为可以直接丢弃,磁盘上还留有一份原始文件数据),也能更高效地利用内存。

(3)client Objects
为其它类型的文件系统使用(除JFS之外的文件系统),例如NFS,JFS2等。此类数据也是当读入数据的时候分配内存区,如果此数据被
修改,则进行标记,最后再回写到数据原有存储的位置(或通过网络写回NFS Server);但是与Persistent Objects不同,即使数据没有
被修改过,也不会直接释放内存,而是Page Out到本地磁盘交换区,以便下次使用的时候提高效率(AIX假设本地磁盘读写比网络上重新
读取要快)。作为JFS2的Cache 的Client 内存管理稍有不同,只对特别创建的临时类型文件(是一种特殊的文件模式)进行page out
处理,其它处理方式与JFS的Cache 处理方式相同。

(4)log
此类内存用于读取、写入JFS/JFS2文件系统的Log。

(5)mapping
这是为了支持map()编程接口而定义的内存类型,它允许应用程序将多种内存类型映射到一个内存段,也就是一种多类型的内存区,
通常用于程序之间通信。

 

 

为了便于理解,我们可以想象一下这个场景,参考上图AIX内存分配触发示意图:一个密封的水罐中存有两种液体,水和油。上层的油代表计算内存,下层的
水代表文件系统缓存。在这两层液体之上的剩余空间,代表空余的物理内存。这个水管与4个管道相连,左侧两个分别用于注入(或者吸出)油和水,右侧两个
连接两个缓冲空间,分别是paging space交换区和磁盘文件系统,另一个放水口,代表当文件缓存并没有改写过的时候,如果需要释放,可以直接丢弃掉,因为
在磁盘上,必然存在一份该文件的原始数据。

 

 

现在我们开始模拟AIX内存分配和释放的全过程。首先,水罐是完全空的,代表所有的页面都可以使用,也就是在操作系统刚刚启动时,在vmstat 输出中会有大量的fre.
当系统执行程序时,需要首先将此程序代码读入内存,需要分配一些文件系统缓存(所有的磁盘文件读取操作都会导致文件系统缓存分配),在示意图中可以用向水箱中
注入水表示。当程序执行时,进行变量初始化及执行过程中申请临时空间,又产生了计算内存的分配请求,相当于注入了油。此时要注意:油浮在水面之上,因此随着两
种物理内存被分配,“液面”的高度是油和水总量的叠加。当空闲内存逐渐被分配光,“液面”上升到minfree 标尺所在位置,系统将启动lrud,示意图中相当于缓冲
阀门打开,水罐内的水/油分别向Paging Space和File内转移,当然也可能直接从放水口放掉(例如文件缓存没有被修改过)。

经过lrud计算,如果需要转移计算内存的数据,则无条件必须保留(“油比较贵重”),如果是文件系统缓存呢?则需要先看看这部分数据有没有变化,如果变化了,要写回
文件系统,而如果没变化(大部分情况都是如此),则直接丢弃掉(从下面的放水口放掉),因为磁盘文件系统中还有一份副本呢,需要的时候再度进来就可以了。

同样,当程序释放内存,或者文件被删除(磁盘上文件被删除的同时,此文件在内存中的缓存同时被释放),液面会下降,fre数值也就越来越大。

系统剩余内存的思考


我们再来看看vmstat内存管理相关的信息


$vmstat 1 5
System Configuration :lcpu=16  mem=31488MB
kthr         memory        page                        faults              cpu
____________   __________  ___________________        ________        _________
  r     b      avm   fre   re pi po fr     sr    cy        in    sy    cs     us  sy id  wa
  4     2   3127273  3272  0  4  8  338    937   0        2357  22962  1561   13  1  83  3
  5     3   3127257  1556  0  4  0  0        0   0        2890  15790  2165   20  0  59 21
  2     4   3127903   958  0  7  4 1014   1673   0        2353   7290  1670   17  1  53 29
  3     3   3127948   966  0 23 12 1392   2693   0        2620   7943  1779   18  1  62 19
 
 
  avm:不是available memory(可用内存),而是active virtual memory,就是当前系统分配的所有虚拟内存之和(包括
  实际使用的物理内存和交换区使用的空间),并且avm不包括文件系统cache.(也就是说不包括persistent 内存)
 
  fre:可用的物理内存
  sr:在寻找空余内存的时候,所搜寻的内存页
  fr:在搜寻的内存页中,真正能够释放的内存页    
  AIX 中最让人困惑的是剩余内存的问题.似乎所有的AIX初学者都不喜欢AIX系统中剩余物理内存总是那样少。说实话,开始我也不喜欢这样,
但我明白一个道理的时候,发现物理内存剩的少不但没什么不好,反而更有优势;物理内存非常宝贵,性能比磁盘系统高(至少快1000倍以上),
为什么让物理内存闲置(剩余)呢?因此AIX默认把不使用的物理内存都变成了文件系统缓存,以提高磁盘存取效率,系统只留下很少的一点,
供突发的内存请求用。

  如果内存请求超过了现在可用的物理内存(这种情况很少发生)的容量,系统会自动检索当前全部的物理内存,即vmstat 中的sr一项。如果
被检索的内存页能够释放,就会释放出来分配(fr一项)给请求程序。因此,只要保证系统能够及时释放出足够的内存页,就可以保证AIX的虚拟
内存系统可以很好的运行下去。即使不能及时释放出足够的内存页,也只会影响到程序启动/运行的速度,通常不会有严重的问题。


如果你的系统经常有程序瞬间申请巨大的内存(几百MB),那么可能会由于fre值很小而引起系统响应变慢,通过适当更改系统参数,让fre值加大可以解决
这个问题。再次说明,这种经常瞬时申请巨大内存的情况及其少见,一半都是在程序启动的时候,如果你的程序在运行的时候也这样做,那么还是先与
编程人员沟通(或修改程序参数),去处理程序的这种坏习惯把!任何商业软件都不会这样做的。

下面的命令可以将系统(显示的)剩余内存调大,其含义是当系统fre的内存少于5000个页面(5000*4KB=20MB)的时候,系统开始通过Page Out释放物理
内存,直到fre达到10000个页面(10000*4KB=40MB)后,停止释放内存。要注意,尽管这样做你看到的剩余内存多了,但实际上内存利用效率更低,可能会
降低系统性能。
#/usr/samples/kernel/vmtune  -F  10000 -f 5000 ->重新启动系统后参数会失效

#vmo -p -o maxfree = 10000 minfree = 5000

关于 maxfree 和minfree两个参数控制系统剩余内存如表 3-2所示

maxfree 和 minfree两个参数控制系统剩余的内存如表 3-2所示
     表 3-2  maxfree 和 minfree两个参数控制系统剩余的内存

  实际剩余内存                                        系统动作
多于maxfree                                           误操作
在minfree和maxfree之间                           如果当前在进行内存释放,则继续,知道剩余内存多余maxfree,否则无动作
少于minfree                                     开始进行内存释放动作

通常AIX的虚拟内存管理方案都很有效,但在下面的这种情况下,AIX虚拟内存管理方案会出现问题,如果系统中运行的软件产生大量的磁盘I/O,而且软件本身
能对磁盘I/O进行缓存、优化,此时再通过操作系统再次缓存,反而降低了效率(例如:数据库使用文件系统方式的库文件/表空间),特别是软件直接使用裸设备
(逻辑卷),而不适用文件系统,也会使软件内存请求与操作系统缓存争用物理内存。为了改变这种低效率的内存使用状况,可以人为调整参数,控制操作系统
使用内存的方式。通常操作系统设定了80%和20%两个参数用于文件cache( persistent Objects)的内存,当前百分比的计算方式是Persistent 内存(即文件
系统Cache)与全部物理内存之比。参数作用如表3-3所示。

表 3-3 Maxperm 和MinPerm两个参数控制用于缓存的内存      
文件内存占物理内存百分比                          操作系统行为
>80%(MaxPerm)                       只将Persistent 内存(即文件系统缓存)释放供程序使用
20%<可用内存<80%                    根据repage情况,优先释放(交换)Persistent内存(即文件系统缓存)
<20%(MinPerm)                      平等释放(交换)Persistent 和Working 内存

注意:释放Persistent 内存不需要交换,如果没有被修改,可以直接释放掉,因此减少了一次磁盘写入动作,提高了效率

为了优先使用缓存(Persistent)内存,将内存释放给程序(Working 内存)使用,可以通过下面的一条命令:
#/usr/samples/kernel/vmtune  -P 10 -p 5 #AIX旧版
#vmo -p -o maxperm%=10  minperm=5  #AIX新版用vmo替代了vmtune

这样虚拟内存管理将更多地优先将文件内存(Persistent ,缓存)释放出来。表3-4 时通常的设置参数和说明。


             3-4 设置参数和说明
设置参数               通常要求                           说明
maxclient            =maxperm                          maxclient 必须小于等于maxperm,maxclient 对jfs2文件系统有效
minfree                                                一半使用默认值128(个4K页面)
maxfree              maxfree=                          maxpgahead一般为16,使用maxpgahead(JFS文件系统)或者j2_max_Read_Ahead
                     minfree +                         (JFS2 文件系统)两个参数中最大的一个数值作为计算maxgpahead值
                     maxpgahead

主要用于JFS文件系统
minperm%              15%
maxperm%              30%                              maxclient 需要小于等于maxperm

主要使用JFS2文件系统
minperm%            使用默认值
maxperm%            使用默认值

主要使用裸设备
minperm%              5%
maxperm%             10%                               maxclient需要小于等于maxperm

 

另一个重要vmo参数是lru_file_repage,此参数控制了当文件缓存比例处于minperm和maxperm之间时lrud的行为。如果lru_file_repage=1,则此时lrud对与文件系统
缓存也判断是否发生了颠簸现象(thrash),如果是,则不将此页释放(保留在缓存内),否则释放此页面颠簸的意思是由于需要内存保存数据,必须将另一块内存的数据清理掉,
然而当系统刚刚完成此操作,马上又需要被清理掉的那块数据,从而引起系统浪费大量的资源用于反复处理这两块数据。此时系统性能会严重下降,并且有大量的磁盘I/O发生。

lru_file_repage 就可以帮助系统识别这种情况,减少thrash,但是如果系统物理内存严重不足,没有任何参数能够避免此现象的发生,只能通过增加物理内存解决。

AIX V5.3以后的操作系统倾向于设置比较小的 minperm%(例如 5%),而设置比较大的maxperm%(例如90%),同时将lru_file_repage设置为1。在这种设置下,系统会
在5-90%的范围内试图优先交换文件缓存,而保留计算内存,只有文件缓存出现颠簸(刚释放就发现要再次读取)才会交换计算内存。通常,这种设置对于大部分系统都是
不错的选择,但是不适合单一数据库使用文件系统类型的表空间情况。对于数据库使用文件系统表空间,大量的数据库IO都会转化为文件IO。本身数据库已经通过自身内部的
缓存对IO进行了优化,如果继续使用操作系统文件缓存,就会发生第二次缓存的现象,用两倍的内存去缓存了同一块数据,显然是极大的浪费。因此,在这种情况下,不管有没有
颠簸发生,永远要优先交换文件缓存,也就是要将lru_file_repage 设置为0.
在数据库环境下,为了完全避免二次缓存现象,即使使用文件系统类型的表空间,也可以采用文件系统特别mount参数避免对该文件系统使用文件缓存。这需要更改mount参数,
设置DIO。

   

 

关于direct io


Use Direct I/O to improve performance of your AIX applications

Learn the benefits and the rules and find out when it pays to implement Direct I/O

Introduction

The alternative I/O technique called Direct I/O was first introduced in AIX 4.3 and has been available for all later releases of AIX, including AIX 5L. It bypasses the Virtual Memory Manager (VMM) altogether and transfers data directly to/from the disk to/from the user's buffer. You may find improved performance of your applications when you implement this technique for file handling.

In the following discussion any reference to JFS will imply reference to both JFS and JFS2. JFS (Journaled File System) is native to the POWER-based platform. Although JFS2 (also known as Enhanced Journaled File System) is not native to the POWER-based platform, it is available on POWER. Both JFS and JFS2, used in AIX, exploit database journaling techniques to maintain its structural consistency. This prevents damage to the file system when the system is halted abnormally.


--------------------------------------------------------------------------------
Back to top
Overview

Normally, when an I/O request to a JFS file is invoked, the I/O goes from the application buffer to the Virtual Memory Manager (VMM) and then from the Virtual Memory Manager to the JFS. When the application makes a request for a file read, if the file page is not in memory, the JFS reads the data from the disk into the file buffer cache, then copies the data from the file buffer cache to the user's buffer. On the other hand, when the application makes a request for a write, the data is copied from the user's buffer into the file buffer cache. The actual writes to disk are done later if the write requests cannot be accommodated immediately.

This type of caching policy can be extremely effective in improving performance of JFS I/O when the cache hit rate is high. It would fully exploit the read-ahead and write-behind features of JFS. This would allow file writes to be asynchronous so that the applications can continue to process without having to wait for I/O requests to complete. On the other hand, if the applications have poor cache hit rates or if they do very large I/Os, such caching policy may not be of much benefit.

If you know that certain files have poor cache-utilization characteristics, then you could open those files as Direct I/O files. Doing this most likely will lead to improved performance of your application.

Direct I/O for files and raw I/O for devices are functionally equivalent, but Direct I/O doesn't impact raw I/O performance. In comparison, raw I/O performance is slightly better than Direct I/O, but Direct I/O does provide the benefits of a JFS as well as enhanced performance.


--------------------------------------------------------------------------------
Back to top
Enabling your applications to use Direct I/O

At the programming level, Direct I/O access to a file is enabled by passing the O_DIRECT flag to the fcntl.h. This flag is defined in open function. Applications must be compiled with _ALL_SOURCE enabled to have the definition of O_DIRECT available.

At the user level, starting with AIX 5.1D Direct I/O is enabled using the "dio" option on the mount command e.g. mount -odio /xyz where xyz is a filesystem. This works for both JFS and JFS2 filesystems. A filesystem mounted with the "dio" option will have all I/O treated as Direct I/O as long as the alignment requirements are met. The I/O should be aligned to page (4K byte) boundaries and in multiples of the page size. If the I/O doesn't meet those requirements, then the I/O will go through kernel buffers and the buffers will be flushed after the I/O completes. This will result in poor performance. Therefore, you should use the "dio" option on the mount command only if all applications running against the files in the filesystem are well behaved in this respect.

Once Direct I/O is implemented, it's easy to verify if it's working: Mount a filesystem with the dio option and record the number of memory pages used. Repeat the process with the filesystem mounted without the dio option. Notice that for the Direct I/O implemented filesystem, memory pages will NOT be used to cache pages, hence the vmtune parameter 'numperm' would not grow as in the case of normal I/O.


--------------------------------------------------------------------------------
Back to top
Rules for Direct I/O at the API level

There are very strict rules for Direct I/O at the API level. Buffers for the I/O requests need to be 4K byte aligned, and the I/O lengths must be in multiples of 4K bytes. Failure on either at the API level will bypass Direct I/O. Normally, databases naturally obey these rules as they are true of raw logical I/O, too.
Direct I/O does not bypass i-node locking. If i-node locking is a problem because of writes, it will likely continue to be a problem with Direct I/O.
Direct I/O is unbuffered, so writes are synchronous. If the application does lots of writes which are buffered without Direct I/O, it may run very slowly with Direct I/O.
Direct I/O is unbuffered, so there is no read-ahead. If application is doing a lot of sequential reads and taking advantage of the filesystem making them into bigger physical I/O's, Direct I/O may be slower.
Direct I/O does not coalesce contiguous I/O's. This would be a possible issue for applications using aio, listio, or readv/writev.

--------------------------------------------------------------------------------
Back to top
Benefits of Direct I/O

Direct I/O is only supported for program working storage, that is, local persistent files. The main benefit of Direct I/O is in the reduction of CPU cycles needed for file reads and writes. This results from not having to copy files from the VMM file cache to the user buffer as in the normal cache situations. For normal cache situation, if the cache hit rate is low, most read requests would go to the disk. As mentioned before, these are the ideal situations where applications would benefit from Direct I/O implementation. However, for cases where cache hit rate is high in normal cache situations, applications would see reduced CPU utilization from Direct I/O implementation but would not be able to take advantage of the read-ahead algorithms available under normal cache policy. Writes are faster with normal cached I/O in most cases. But if a file is opened with O_SYNC or O_DSYNC, then the writes have to go to disk. In these cases, applications would benefit from Direct I/O because the overhead of data copy is eliminated.

Another benefit of Direct I/O is that it doesn't allow applications to compromise the effectiveness of caching of other files. When a file is read or written, the file competes for space in the file cache, and this could cause other file data to be pushed out of the cache. If you know that certain files have poor cache-utilization characteristics, then only those files could be opened with O_DIRECT.


--------------------------------------------------------------------------------
Back to top
Performance of Direct I/O reads

Even though the use of Direct I/O has the potential to reduce the need of CPU cycles for application execution, ironically it leads to longer elapsed times in many cases. This is especially true for a series of small I/O requests.

Direct I/O reads from the disk are synchronous, and this can result in poor performance if the data was likely to be in memory under the normal caching policy. Direct I/O bypasses the VMM read-ahead algorithms because the I/Os would not go through the VMM. The read-ahead algorithm can be quite useful for sequential access to files because the VMM can initiate disk requests and have the pages already be resident in memory before the application has requested the pages. Applications can compensate for the loss of this read-ahead feature by using one of the following methods:

Issue larger read requests.
Issue asynchronous Direct I/O read-ahead by the use of multiple threads.
Use the asynchronous I/O facilities such as aio_read() or lio_listio().

--------------------------------------------------------------------------------
Back to top
Performance of Direct I/O writes

Direct I/O writes bypass the VMM and go directly to the disk, so that there can be a significant performance penalty; in normal cached I/O, the writes can go to memory and then get flushed to disk later. Because Direct I/O writes do not get copied into memory, when a sync operation is performed, it will not have to flush these pages to disk, thus reducing the amount of work the syncd daemon has to perform.

Performance example

In the following example, performance is measured on an RS/6000 server running AIX 4.3.1. KBPS is the throughput in kilobytes per second, and %CPU is CPU usage in percent.
Listing 1. Performance example
       
# of 2.2 GB SSA Disks        1        2        4        6        8
# of PCI SSA Adapters        1        1        1        1        1

        Sequential read throughput, using normal I/O

KBPS                                7108  14170  18725 18519  17892
%CPU                                23.9   56.1   92.1  97.0   98.3

        Sequential read throughput, using Direct I/O
 
KBPS                                7098  14150  22035  27588  30062
%CPU                                 4.4    9.1   22.0   39.2   54.4

        Sequential read throughput, using raw I/O

KBPS                                7258  14499  28504  30946  32165
%CPU                                 1.6    3.2   10.0   20.9   24.5
      

 


--------------------------------------------------------------------------------
Back to top
Conflicting file access modes

In order to avoid consistency issues between programs that use Direct I/O and programs that use normal cached I/O, Direct I/O is by default used in an exclusive use mode. If there are multiple opens of a file and some of them are direct and others are not, the file will stay in its normal cached access mode. Only when the file is open exclusively by Direct I/O programs will the file be placed in Direct I/O mode.

Similarly, if the file is mapped into virtual memory via the shmat() or mmap() system calls, then file will stay in normal cached mode.

The JFS or JFS2 will attempt to move the file into Direct I/O mode any time the last conflicting. non-direct access is eliminated (either by close(), munmap(), or shmdt() subroutines). Changing the file from normal mode to Direct I/O mode can be rather expensive since it requires writing all modified pages to disk and removing all the file's pages from memory.


--------------------------------------------------------------------------------
Back to top
Candidates for Direct I/O

I/O-intensive applications that don't benefit much from the normal caching policy are likely to see improved performance when Direct I/O is implemented.

Programs that are typically CPU-limited and perform lots of disk I/O are good candidates for Direct I/O. Codes that have large sequential I/Os are good candidates as well. Applications that do numerous small I/Os will typically see less performance benefit, since Direct I/O is unable to exploit read-ahead or write-behind algorithms available under normal caching policy. Applications that benefit from striping are also good candidates.

 

Resources

AIX 5L Version 5.1 Performance Management Guide


About the author

Shiv Dutta is a technical consultant for IBM eServer group where he assists independent software vendors with the enablement of their applications on pSeries servers. Shiv has considerable experience as a software developer, system administrator and an instructor. He provides AIX support in the areas of system administration, problem determination, performance tuning and sizing guides. Shiv has worked with AIX from its inception. He holds a Ph.D. in Physics from Ohio University and can be reached at sdutta@us.ibm.com.

 

相关内容推荐