被折磨了一个礼拜!!!你奶奶的!!!
等空了,一定要抽时间把它里里外外翻个遍看看!!!

说道做到,今儿就来翻开monkey patch的大衣,看看里面到底是什么。

其实它一点都不神秘,挺简单的一个东西,但是当你不知道你的程序中被人打了猴子补丁的时候,你真的会疯掉,就像我一样,好东西要好好的用,不然全是一个个坑啊~_~ !!!

定义

首先什么是猴子补丁?网上有一些“猴子补丁”这个说法的来由,感兴趣的可以自己去搜,我这边直接讲重点。所谓猴子补丁就是在程序运行的过程中动态的修改一些模块、类、方法,而不是在静态代码中去修改相应的实现。

首先可以给一个简单的例子:

class XiaoMing(object):
    def favorite(self):
        print "apple"

比如小明小时候最喜欢的东西是Apple,我们可以通过调用它的方法favorite来知道。

xiaoming=XiaoMing()
xiaoming.favorite()
>> apple

但是有一天,上帝不想让小明喜欢apple了,因为上帝喜欢banana 。而小明已经制造出来了,上帝不想修改小明的制造工艺,怎么办?给它打个猴子补丁!

class XiaoMing(object):
    def favorite(self):
        print "apple"
        
def new_favorite():
    print "banana"

xiaoming=XiaoMing()
xiaoming.favorite()
>> apple

xiaoming.favorite = new_favorite
xiaoming.favorite()
>> banana

上面的代码可能看着有点low,那么就换个高级点的写法:

class XiaoMing(object):
    def favorite(self):
        print "apple"

class God(object):
    @classmethod
    def new_xiaoming_favorite(cls):
        print "banana"

    @classmethod
    def monkey_patch(cls):
        XiaoMing.favorite = cls.new_xiaoming_favorite

God.monkey_patch()

xiaoming = XiaoMing()
xiaoming.favorite()
>> banana

是不是跟实际中的使用很像了?当然,一般实际使用都是对模块执行monkey patch,相对而言会更复杂一点,例如eventlet可以对thread、socket等模块执行monkey patch。

原理

那么,为了争做第一等的程序员,不禁要问:为什么可以这么去实现呢?这个才是本文讲的重点。

NameSpace

在这之前需要先了解一下python的一个核心内容:命令空间(NameSpace)。

什么是namespace?简单的讲就是:name到对象的映射。为了方便理解,可以想象一下python中的字典,实际上的确有namespace是以字典的形式实现的。在python中主要有以下四类namespace:

namespace typenamespace description
locals函数的namespace,只记录当前函数内的对象。
enclosing function这个namespace针对的对象比较特殊,像闭包函数会有这样一个namespace,记录的是闭包函数所在函数内的对象。
globalspython模块的namespace,每个模块都有一个自己的namespace,记录模块内的class、function等
_builtins_python内置的namespace,在python解释器启动的时候创建,有很多内置的函数

那namespace是派什么用的呢?在python中,如果要访问某一个对象(包括变量,模块,方法等)都是会去namespace中根据对象名称去检索,这里涉及到一个检索顺序,称之为:LEGB ,就是:

locals -->> enclosing function -->> globals -->> _builtins_

如果这四类namespace中都找不到指定name的对象,那么就会报NameError:

# python
Python 2.7.5 (default, Aug  4 2017, 00:39:18) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-16)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> 
>>> not_exist_object()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'not_exist_object' is not defined

Module Import

monkey patch还涉及到python的另一个核心内容,就是模块的导入。

python在启动时会创建一个全局字典:sys.modules

# python
Python 2.7.5 (default, Aug  4 2017, 00:39:18) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-16)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> 
>>> import sys
>>> 
>>> print sys.modules
{
    'copy_reg': <module 'copy_reg' from '/usr/lib64/python2.7/copy_reg.pyc'>,    
    'sre_compile': <module 'sre_compile' from '/usr/lib64/python2.7/sre_compile.pyc'>,    
    '_sre': <module '_sre' (built-in)>,    
    'encodings': <module 'encodings' from '/usr/lib64/python2.7/encodings/__init__.pyc'>,    
    'site': <module 'site' from '/usr/lib64/python2.7/site.pyc'>,    
    '__builtin__': <module '__builtin__' (built-in)>,    
    'sysconfig': <module 'sysconfig' from '/usr/lib64/python2.7/sysconfig.pyc'>,    
    '__main__': <module '__main__' (built-in)>,    
    'encodings.encodings': None,    
    'abc': <module 'abc' from '/usr/lib64/python2.7/abc.pyc'>,    
    'posixpath': <module 'posixpath' from '/usr/lib64/python2.7/posixpath.pyc'>,    
    '_weakrefset': <module '_weakrefset' from '/usr/lib64/python2.7/_weakrefset.pyc'>,    
    'errno': <module 'errno' (built-in)>,    
    'encodings.codecs': None,    
    'sre_constants': <module 'sre_constants' from '/usr/lib64/python2.7/sre_constants.pyc'>,
    're': <module 're' from '/usr/lib64/python2.7/re.pyc'>,    
    '_abcoll': <module '_abcoll' from '/usr/lib64/python2.7/_abcoll.pyc'>,    
    'types': <module 'types' from '/usr/lib64/python2.7/types.pyc'>,    
    '_codecs': <module '_codecs' (built-in)>,    
    'encodings.__builtin__': None,    
    '_warnings': <module '_warnings' (built-in)>,    
    'genericpath': <module 'genericpath' from '/usr/lib64/python2.7/genericpath.pyc'>,    
    'stat': <module 'stat' from '/usr/lib64/python2.7/stat.pyc'>,    
    'zipimport': <module 'zipimport' (built-in)>,    
    '_sysconfigdata': <module '_sysconfigdata' from '/usr/lib64/python2.7/_sysconfigdata.pyc'>,    
    'warnings': <module 'warnings' from '/usr/lib64/python2.7/warnings.pyc'>,    
    'UserDict': <module 'UserDict' from '/usr/lib64/python2.7/UserDict.pyc'>,    
    'sys': <module 'sys' (built-in)>,    
    'codecs': <module 'codecs' from '/usr/lib64/python2.7/codecs.pyc'>,    
    'readline': <module 'readline' from '/usr/lib64/python2.7/lib-dynload/readline.so'>,    
    'os.path': <module 'posixpath' from '/usr/lib64/python2.7/posixpath.pyc'>,    
    'signal': <module 'signal' (built-in)>,    
    'traceback': <module 'traceback' from '/usr/lib64/python2.7/traceback.pyc'>,    
    'linecache': <module 'linecache' from '/usr/lib64/python2.7/linecache.pyc'>,    
    'posix': <module 'posix' (built-in)>,    
    'exceptions': <module 'exceptions' (built-in)>,    
    'sre_parse': <module 'sre_parse' from '/usr/lib64/python2.7/sre_parse.pyc'>,    
    'os': <module 'os' from '/usr/lib64/python2.7/os.pyc'>,    
    '_weakref': <module '_weakref' (built-in)>
}
>>> 
>>> type(sys.modules)
<type 'dict'>

当我们导入一个新的模块的时候,以下两件事情将会发生:

  1. 会在sys.module中插入一条key-value对,key是module名,value就是所导入的module对象。当下一次import相同模块的时候,会先在sys.module中查找该模块,如果存在则直接导入sys.module中的module对象。
  2. 将module对象加入到global namespace中,当程序需要调用该模块时,会从global namespace中检索。

monkey patch实现

根据上述两点,你对monkey patch的实现是否有所猜测?没错,其实很简单,就是将新的module替换掉sys.modules中的对象,如果该module还未被导入,则先对进行加载。这样,当程序需要导入module的时候,就会从sys.modules中导入被修改后打module对象,也就实现了monkey patch。

下面可以举个实际应用中的例子:eventlet库对thread、socket等模块的monkey patch。直接上代码:

def monkey_patch(**on):
    """Globally patches certain system modules to be greenthread-friendly.

    The keyword arguments afford some control over which modules are patched.
    If no keyword arguments are supplied, all possible modules are patched.
    If keywords are set to True, only the specified modules are patched.  E.g.,
    ``monkey_patch(socket=True, select=True)`` patches only the select and
    socket modules.  Most arguments patch the single module of the same name
    (os, time, select).  The exceptions are socket, which also patches the ssl
    module if present; and thread, which patches thread, threading, and Queue.

    It's safe to call monkey_patch multiple times.
    """

    # Workaround for import cycle observed as following in monotonic
    # RuntimeError: no suitable implementation for this system
    # see https://github.com/eventlet/eventlet/issues/401#issuecomment-325015989
    #
    # Make sure the hub is completely imported before any
    # monkey-patching, or we risk recursion if the process of importing
    # the hub calls into monkey-patched modules.
    eventlet.hubs.get_hub()

    accepted_args = set(('os', 'select', 'socket',
                         'thread', 'time', 'psycopg', 'MySQLdb',
                         'builtins', 'subprocess'))
    # To make sure only one of them is passed here
    assert not ('__builtin__' in on and 'builtins' in on)
    try:
        b = on.pop('__builtin__')
    except KeyError:
        pass
    else:
        on['builtins'] = b

    default_on = on.pop("all", None)

    for k in six.iterkeys(on):
        if k not in accepted_args:
            raise TypeError("monkey_patch() got an unexpected "
                            "keyword argument %r" % k)
    if default_on is None:
        default_on = not (True in on.values())
    for modname in accepted_args:
        if modname == 'MySQLdb':
            # MySQLdb is only on when explicitly patched for the moment
            on.setdefault(modname, False)
        if modname == 'builtins':
            on.setdefault(modname, False)
        on.setdefault(modname, default_on)

    if on['thread'] and not already_patched.get('thread'):
        _green_existing_locks()

    modules_to_patch = []
    for name, modules_function in [
        ('os', _green_os_modules),
        ('select', _green_select_modules),
        ('socket', _green_socket_modules),
        ('thread', _green_thread_modules),
        ('time', _green_time_modules),
        ('MySQLdb', _green_MySQLdb),
        ('builtins', _green_builtins),
        ('subprocess', _green_subprocess_modules),
    ]:
        if on[name] and not already_patched.get(name):
            modules_to_patch += modules_function()
            already_patched[name] = True

    if on['psycopg'] and not already_patched.get('psycopg'):
        try:
            from eventlet.support import psycopg2_patcher
            psycopg2_patcher.make_psycopg_green()
            already_patched['psycopg'] = True
        except ImportError:
            # note that if we get an importerror from trying to
            # monkeypatch psycopg, we will continually retry it
            # whenever monkey_patch is called; this should not be a
            # performance problem but it allows is_monkey_patched to
            # tell us whether or not we succeeded
            pass

    imp.acquire_lock()
    try:
        for name, mod in modules_to_patch:
            orig_mod = sys.modules.get(name)
            if orig_mod is None:
                orig_mod = __import__(name)
            for attr_name in mod.__patched__:
                patched_attr = getattr(mod, attr_name, None)
                if patched_attr is not None:
                    setattr(orig_mod, attr_name, patched_attr)
            deleted = getattr(mod, '__deleted__', [])
            for attr_name in deleted:
                if hasattr(orig_mod, attr_name):
                    delattr(orig_mod, attr_name)
    finally:
        imp.release_lock()

    if sys.version_info >= (3, 3):
        import importlib._bootstrap
        thread = original('_thread')
        # importlib must use real thread locks, not eventlet.Semaphore
        importlib._bootstrap._thread = thread

        # Issue #185: Since Python 3.3, threading.RLock is implemented in C and
        # so call a C function to get the thread identifier, instead of calling
        # threading.get_ident(). Force the Python implementation of RLock which
        # calls threading.get_ident() and so is compatible with eventlet.
        import threading
        threading.RLock = threading._PyRLock

首先是对accept的args进行check,检查需要打patch的模块是否在指定的范围之内('os', 'select', 'socket', 'thread', 'time', 'psycopg', 'MySQLdb', 'builtins', 'subprocess'),紧接着检查需要对哪些模块执行patch:

  for name, modules_function in [
        ('os', _green_os_modules),
        ('select', _green_select_modules),
        ('socket', _green_socket_modules),
        ('thread', _green_thread_modules),
        ('time', _green_time_modules),
        ('MySQLdb', _green_MySQLdb),
        ('builtins', _green_builtins),
        ('subprocess', _green_subprocess_modules),
    ]:
        if on[name] and not already_patched.get(name):
            modules_to_patch += modules_function()
            already_patched[name] = True

再后面就是核心代码实现:

    imp.acquire_lock()
    try:
        for name, mod in modules_to_patch:
            orig_mod = sys.modules.get(name)
            if orig_mod is None:
                orig_mod = __import__(name)
            for attr_name in mod.__patched__:
                patched_attr = getattr(mod, attr_name, None)
                if patched_attr is not None:
                    setattr(orig_mod, attr_name, patched_attr)
            deleted = getattr(mod, '__deleted__', [])
            for attr_name in deleted:
                if hasattr(orig_mod, attr_name):
                    delattr(orig_mod, attr_name)
    finally:
        imp.release_lock()

上述部分代码就是eventlet库monkey patch的核心。遍历每个需要patch的module,如果该module还未被导入到sys.modules,就先将其导入。然后对该module的相关属性进行替换,使用setattr方法。