PHP的内置函数是如何在内部实现的?

问题描述:

这些功能是否与用户功能一样写入?我的意思是用PHP代码和正则表达式和类似的东西?PHP的内置函数是如何在内部实现的?

例如:

filter_var($email, FILTER_VALIDATE_EMAIL);

http://www.totallyphp.co.uk/code/validate_an_email_address_using_regular_expressions.htm

PHP被写入C. PHP函数都写在高品质的C代码然后被编译以形成PHP langugae库

如果你想扩展PHP(编辑/写)自己的功能检查了这一点:http://www.php.net/~wez/extending-php.pdf

编辑:

在这里你去:

这是该函数的C原码:

/* {{{ proto mixed filter_var(mixed variable [, long filter [, mixed options]]) 
* Returns the filtered version of the vriable. 
*/ 
PHP_FUNCTION(filter_var) 
{ 
    long filter = FILTER_DEFAULT; 
    zval **filter_args = NULL, *data; 

    if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "z/|lZ", &data, &filter, &filter_args) == FAILURE) { 
     return; 
    } 

    if (!PHP_FILTER_ID_EXISTS(filter)) { 
     RETURN_FALSE; 
    } 

    MAKE_COPY_ZVAL(&data, return_value); 

    php_filter_call(&return_value, filter, filter_args, 1, FILTER_REQUIRE_SCALAR TSRMLS_CC); 
} 
/* }}} */ 



static void php_filter_call(zval **filtered, long filter, zval **filter_args, const int copy, long filter_flags TSRMLS_DC) /* {{{ */ 
{ 
    zval *options = NULL; 
    zval **option; 
    char *charset = NULL; 

    if (filter_args && Z_TYPE_PP(filter_args) != IS_ARRAY) { 
     long lval; 

     PHP_FILTER_GET_LONG_OPT(filter_args, lval); 

     if (filter != -1) { /* handler for array apply */ 
      /* filter_args is the filter_flags */ 
      filter_flags = lval; 

      if (!(filter_flags & FILTER_REQUIRE_ARRAY || filter_flags & FILTER_FORCE_ARRAY)) { 
       filter_flags |= FILTER_REQUIRE_SCALAR; 
      } 
     } else { 
      filter = lval; 
     } 
    } else if (filter_args) { 
     if (zend_hash_find(HASH_OF(*filter_args), "filter", sizeof("filter"), (void **)&option) == SUCCESS) { 
      PHP_FILTER_GET_LONG_OPT(option, filter); 
     } 

     if (zend_hash_find(HASH_OF(*filter_args), "flags", sizeof("flags"), (void **)&option) == SUCCESS) { 
      PHP_FILTER_GET_LONG_OPT(option, filter_flags); 

      if (!(filter_flags & FILTER_REQUIRE_ARRAY || filter_flags & FILTER_FORCE_ARRAY)) { 
       filter_flags |= FILTER_REQUIRE_SCALAR; 
      } 
     } 

     if (zend_hash_find(HASH_OF(*filter_args), "options", sizeof("options"), (void **)&option) == SUCCESS) { 
      if (filter != FILTER_CALLBACK) { 
       if (Z_TYPE_PP(option) == IS_ARRAY) { 
        options = *option; 
       } 
      } else { 
       options = *option; 
       filter_flags = 0; 
      } 
     } 
    } 

    if (Z_TYPE_PP(filtered) == IS_ARRAY) { 
     if (filter_flags & FILTER_REQUIRE_SCALAR) { 
      if (copy) { 
       SEPARATE_ZVAL(filtered); 
      } 
      zval_dtor(*filtered); 
      if (filter_flags & FILTER_NULL_ON_FAILURE) { 
       ZVAL_NULL(*filtered); 
      } else { 
       ZVAL_FALSE(*filtered); 
      } 
      return; 
     } 
     php_zval_filter_recursive(filtered, filter, filter_flags, options, charset, copy TSRMLS_CC); 
     return; 
    } 
    if (filter_flags & FILTER_REQUIRE_ARRAY) { 
     if (copy) { 
      SEPARATE_ZVAL(filtered); 
     } 
     zval_dtor(*filtered); 
     if (filter_flags & FILTER_NULL_ON_FAILURE) { 
      ZVAL_NULL(*filtered); 
     } else { 
      ZVAL_FALSE(*filtered); 
     } 
     return; 
    } 

    php_zval_filter(filtered, filter, filter_flags, options, charset, copy TSRMLS_CC); 
    if (filter_flags & FILTER_FORCE_ARRAY) { 
     zval *tmp; 

     ALLOC_ZVAL(tmp); 
     MAKE_COPY_ZVAL(filtered, tmp); 

     zval_dtor(*filtered); 

     array_init(*filtered); 
     add_next_index_zval(*filtered, tmp); 
    } 
} 

这是你要验证此电子邮件日常: - 这回答你的问题。 ,它由内部的正则表达式完成。

void php_filter_validate_email(PHP_INPUT_FILTER_PARAM_DECL) /* {{{ */ 
{ 
    /* 
    * The regex below is based on a regex by Michael Rushton. 
    * However, it is not identical. I changed it to only consider routeable 
    * addresses as valid. Michael's regex considers [email protected] a valid address 
    * which conflicts with section 2.3.5 of RFC 5321 which states that: 
    * 
    * Only resolvable, fully-qualified domain names (FQDNs) are permitted 
    * when domain names are used in SMTP. In other words, names that can 
    * be resolved to MX RRs or address (i.e., A or AAAA) RRs (as discussed 
    * in Section 5) are permitted, as are CNAME RRs whose targets can be 
    * resolved, in turn, to MX or address RRs. Local nicknames or 
    * unqualified names MUST NOT be used. 
    * 
    * This regex does not handle comments and folding whitespace. While 
    * this is technically valid in an email address, these parts aren't 
    * actually part of the address itself. 
    * 
    * Michael's regex carries this copyright: 
    * 
    * Copyright © Michael Rushton 2009-10 
    * http://squiloople.com/ 
    * Feel free to use and redistribute this code. But please keep this copyright notice. 
    * 
    */ 
    const char regexp[] = "/^(?!(?:(?:\\x22?\\x5C[\\x00-\\x7E]\\x22?)|(?:\\x22?[^\\x5C\\x22]\\x22?)){255,})(?!(?:(?:\\x22?\\x5C[\\x00-\\x7E]\\x22?)|(?:\\x22?[^\\x5C\\x22]\\x22?)){65,}@)(?:(?:[\\x21\\x23-\\x27\\x2A\\x2B\\x2D\\x2F-\\x39\\x3D\\x3F\\x5E-\\x7E]+)|(?:\\x22(?:[\\x01-\\x08\\x0B\\x0C\\x0E-\\x1F\\x21\\x23-\\x5B\\x5D-\\x7F]|(?:\\x5C[\\x00-\\x7F]))*\\x22))(?:\\.(?:(?:[\\x21\\x23-\\x27\\x2A\\x2B\\x2D\\x2F-\\x39\\x3D\\x3F\\x5E-\\x7E]+)|(?:\\x22(?:[\\x01-\\x08\\x0B\\x0C\\x0E-\\x1F\\x21\\x23-\\x5B\\x5D-\\x7F]|(?:\\x5C[\\x00-\\x7F]))*\\x22)))*@(?:(?:(?!.*[^.]{64,})(?:(?:(?:xn--)?[a-z0-9]+(?:-[a-z0-9]+)*\\.){1,126}){1,}(?:(?:[a-z][a-z0-9]*)|(?:(?:xn--)[a-z0-9]+))(?:-[a-z0-9]+)*)|(?:\\[(?:(?:IPv6:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){7})|(?:(?!(?:.*[a-f0-9][:\\]]){7,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?)))|(?:(?:IPv6:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){5}:)|(?:(?!(?:.*[a-f0-9]:){5,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3}:)?)))?(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))(?:\\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))){3}))\\]))$/iD"; 

    pcre  *re = NULL; 
    pcre_extra *pcre_extra = NULL; 
    int preg_options = 0; 
    int   ovector[150]; /* Needs to be a multiple of 3 */ 
    int   matches; 


    /* The maximum length of an e-mail address is 320 octets, per RFC 2821. */ 
    if (Z_STRLEN_P(value) > 320) { 
     RETURN_VALIDATION_FAILED 
    } 

    re = pcre_get_compiled_regex((char *)regexp, &pcre_extra, &preg_options TSRMLS_CC); 
    if (!re) { 
     RETURN_VALIDATION_FAILED 
    } 
    matches = pcre_exec(re, NULL, Z_STRVAL_P(value), Z_STRLEN_P(value), 0, 0, ovector, 3); 

    /* 0 means that the vector is too small to hold all the captured substring offsets */ 
    if (matches < 0) { 
     RETURN_VALIDATION_FAILED 
    } 

} 
/* }}} */ 
+0

表示正则表达式字符串很大。是不是PHP版本更快? – Alex 2011-04-15 23:25:03

PHP函数或者是:

  • C语言编写的 - 而不是在PHP
  • 或者只是包装来其他图书馆(例如提供的功能,PHP的卷曲扩展仅仅是一个包装卷曲库)


如果你好奇,你可以看看PHP的来源 - 这里是它的SVN:http://svn.php.net/viewvc/

例如,filter_var()功能应某处的源代码中定义filter extension

没有。 PHP内部函数是用C编写的,而不是用PHP代码编写的。由于许多Zend运行时宏以及参数如何从PHP传输到C结构,这看起来相当笨拙。

该特定函数使用正则表达式。这也是一个很好的例子:
http://svn.php.net/repository/php/php-src/branches/PHP_5_3/ext/filter/logical_filters.c
寻找regexp[]某处在中间。