PHP  
downloads | documentation | faq | getting help | mailing lists | reporting bugs | php.net sites | links | my php.net 
search for in the  
<qdom_treeereg_replace>
view the version of this page
Last updated: Sun, 02 May 2004

XCIII. Funzioni per le espressioni regolari (POSIX estesa)

Introduzione

Suggerimento: Il PHP, utilizzando le funzioni PCRE, supporta anche le espressioni regolari con una sintassi compatibile con Perl. Queste funzioni supportano riconoscimenti "pigliatutto", asserzioni, criteri condizionali, e diverse altre caratteristiche che non sono supportate dalla sintassi POSIX estesa.

Attenzione

Queste funzioni per l'espressioni regolari non sono binary-safe. Le funzioni PCRE lo sono.

In PHP, le espressioni regolari sono utilizzate per complesse manipolazioni di stringhe. Le funzioni che supportano le espressioni regolari sono:

Tutte queste funzioni usano una espressione regolare come loro primo argomento. Le espressioni regolari utilizzate da PHP sono di tipo POSIX esteso così come definito in POSIX 1003.2. Per una descrizione completa delle espressione regolari POSIX, vedere la pagina del manuale di regex inclusa nella directory di regex nella distribuzione di PHP. Questa è in formato man, pertanto per poterle leggere occorre eseguire man /usr/local/src/regex/regex.7.

Requisiti

Non sono necessarie librerie esterne per utilizzare questo modulo.

Installazione

Attenzione

Non variare TYPE se non si sa cosa si sta facendo.

Per abilitare il supporto a regex occorre configurare il PHP con --with-regex[=TYPE]. TYPE può essere: system, apache, php. Per default si usa php.

La versione per Windows di PHP ha già compilato il supporto per questo modulo. Non occorre caricare alcun modulo addizionale per potere utilizzare queste funzioni.

Configurazione di Runtime

Questa estensione non definisce alcuna direttiva di configurazione in php.ini

Tipi di risorse

Questa estensione non definisce alcun tipo di risorsa.

Costanti predefinite

Questa estensione non definisce alcuna costante.

Esempi

Esempio 1. Esempi di espressione regolare

<?php
/* Restituisce vero se "abc"
   viene trovata ovunque in $string. */
ereg("abc", $string);           

/* Restituisce vero se "abc"
   viene trovata all'inizio di $string. */
ereg("^abc", $string);

/* Restituisce vero se "abc"
   viene trovata alla fine di $string. */
ereg("abc$", $string);

/* Restituisce vero se il browser
   è Netscape 2, 3 oppure MSIE 3. */
eregi("(ozilla.[23]|MSIE.3)", $HTTP_USER_AGENT);

/* Posizione tre parole separate da spazio
   in $regs[1], $regs[2] e $regs[3]. */
ereg("([[:alnum:]]+) ([[:alnum:]]+) ([[:alnum:]]+)", $string, $regs);

/* Posiziona il tag <br /> all'inizio di $string. */
$string = ereg_replace("^", "<br />", $string);

/* Posiziona il tag <br /> alla fine di $string. */
$string = ereg_replace("$", "<br />", $string);

/* Toglie ogni carattere di invio
   da $string. */
$string = ereg_replace("\n", "", $string);
?>

Vedere anche

Per maggiori dettagli sulle espressioni regolari compatibili con Perl vedere il capitolo sulle funzioni PCRE. La funzione fnmatch() fornisce il riconoscimento dei caratteri jolly tipici della linea di comando.

Sommario
ereg_replace -- Sostituzioni con espressioni regolari
ereg -- Riconoscimento di espressione regolare
eregi_replace -- Sostituzioni con espressioni regolari senza distinzione tra maiuscole e minuscole
eregi -- Riconoscimento di espressioni regolari senza distinzione tra maiuscole e minuscole
split -- Suddivide una stringa in una matrice utilizzando le espressioni regolari
spliti --  Suddivide una stringa in una matrice usando le espressioni regolari senza distinguere tra maiuscole e minuscole
sql_regcase --  Genera una espressione regolare per riconoscimenti senza distinguere tra maiuscole e minuscole


add a note add a note User Contributed Notes
Funzioni per le espressioni regolari (POSIX estesa)
AccountDemander at gmx dot de
30-Nov-2003 07:10
It's always been a great problem to exclude whole strings. It's easy to exclude a single characters by [^jkhsd], but you cannot exclude whole strings that way... you think!

Here's an easy way to exclude strings:

$s = "string to exclude";

preg_match("/[^" . $s . "]/", $string);

Hope this might help... .
mina86 at tlen dot pl
19-Oct-2003 04:14
I tested how fast POSIX and Perl regular expresions are, and here are the results:

           | POSIX Extended  | Perl-Compatible |  POSIX - Perl
-----------+-----------------+-----------------+-----------------
     match |    0.1296420097 |    0.1006720066 |  0.0289700031
   match i |    0.1204010248 |    0.1101620197 |  0.0102390051
   replace |    0.1896649599 |    0.1298999786 |  0.0597649813
 replace i |  10.6998120546 |    0.1453789473 | 10.5544331074

So, as you can see, preg_* functions are faster then ereg* functions. You can find source code of my test script here: http://mina86.home.staszic.waw.pl/temp/regexp-speed-test.txt
russlndr at online dot no
02-Jul-2003 12:55
The Regex Coach - interactive regular expressions:
http://www.weitz.de/regex-coach/#install
tino at infeon dot com
11-Jun-2003 08:49
The book "Mastering Regular Expressions" is an invaluable resource.

http://www.oreilly.com/catalog/regex/
Anand Thakur
25-Mar-2003 05:43
I saw a link to this page somewhere.  It is a library of user-submitted regular expressions for various things.  Some good stuff there.
 
http://www.regexplib.com/
Robin
15-Jan-2003 04:53
Ever wondered how to exclude "[" and "]"?
Here it goes: "[^][]". Extra characters to exclude can beadded right in the middle like this: "[^]fobar[]".
moc DOT liamtoh AT ssengnorw
18-Oct-2002 03:28
In a PCRE \s matches whitespace, but not inside a character class:

preg_match ('/\s/', ' ') // match
preg_match ('/[\s]/', ' ') // no match

Within a character class [:space:] is treated as a single character that matches any single whitespace character:

$pattern = '/[[:space:]]/';
$subject = "space tab\tnewline\n";
preg_match_all($pattern, $subject, $out) // == 3

To match a hyphen from within a character class, it must either be first or last; otherwise, it will act as a range operator.

Example: To match a blank string or a string containing only uppercase letters, underscores, spaces, and hyphens:

preg_match('/^[A-Z_ -]*$/', $subject)

To match any whitespace, not just spaces:

preg_match('/^[A-Z_[:space:]-]*$/', $subject)
paper
09-Sep-2002 05:57
I have also experienced the same problem as bps7j@yahoo.com had been experiencing, except I did not recognize the problem until after many hours of debugging.

"\s" does not seem to represent spaces, however "[[:space:]]" does.

Another problem I was having was matching dashes/hyphens '-'. You must escape them "\-" and place them at the end of a bracket expression.

Example: To match a blank string or a string containing only uppercase letters, underscores, spaces, and hyphens:

^([A-Z_\-]|[[:space:]])*$

Hope this saves someone some time from debugging like I was. :)
bps7j at yahoo dot com
22-Aug-2002 01:40
Something that really got me: I'm used to using Perl's regexps, and so I used \s to check for a whitespace character in a password on a website. My PHP book (Wrox Press, Professional PHP Programming) agreed with me that this is exactly the same as [ \r\n\t\f\v], but it's NOT. In fact, what it did was keep anyone from joining the site if they put an 's' in their password! So beware, check for subtle differences between what you're used to and PHP.

[[:space:]] works fine, by the way.

I'm going to use the pcre functions from now on... I like Perl :o)
david at NOgreenhammerSPAM dot com
09-Mar-2002 04:40
Sadly, the Posix regexp evaluator (PHP 4.1.2) does not seem to support multi-character coallating sequences, even though such sequences are included in the man-page documentation.

Specifically, the man-page discusses the expression "[[.ch.]]*c" which matches the first five characters of "chchcc".  Running this expression in ereg_replace generates the error "Warning: REG_ECOLLATE".  (Running an equivalent expression with only one character between the periods does work, however.)

Multi-character coallating sequences are not supported!

This is really, really too bad, because it would have provided a simple way to exlude words from the target.

I'm going to go learn PCRE, now.  :-(
regex at dan42 dot cjb dot net
08-Mar-2002 04:33
Follow-up to my previous post:
Some simple optimization allowed me to realize that excluding a word at the beginning of a string has a degree of complexity O(n) rather than O(n^2). I only had to follow the logic:

if str[0] != badword[0] then OK
else
  if str[1] != badword[1] then OK
  else
   if str[2] != badword[2] then OK
   else ...

So excluding the word 'abc' at the beginning of a string is much more simple than I had made it out to be:
  ^([^a]|a[^b]|ab[^c])
spiceee at potentialvalleys dot com
07-Mar-2002 04:26
sorry to be picky here but saying ^ is beginning of a line or $ is end of line is rather misleading, if you're working on a daily basis with regexes.

it might be that it is most of the time correct BUT in some occasions you'd be better off to think of ^ as "start of string" and $ as "end of string".

there are ways to make your regex engine forget about your system's notion of a newline, it's what is commonly refered to as multiline regexes...
luciano_at_braziliantranslation.net
03-Mar-2002 06:15
mholdgate wrote a very nice quick reference guide in the next page (http://www.php.net/manual/en/function.ereg.php), but I felt it could be improved a little:
________________

^        Start of line
$        End of line
n?        Zero or only one single occurrence of character 'n'
n*        Zero or more occurrences of character 'n'
n+        At least one or more occurrences of character 'n'
n{2}        Exactly two occurrences of 'n'
n{2,}        At least 2 or more occurrences of 'n'
n{2,4}        From 2 to 4 occurrences of 'n'
.        Any single character
()        Parenthesis to group expressions
(.*)        Zero or more occurrences of any single character, ie, anything!
(n|a)        Either 'n' or 'a'
[1-6]        Any single digit in the range between 1 and 6
[c-h]        Any single lower case letter in the range between c and h
[D-M]        Any single upper case letter in the range between D and M
[^a-z]        Any single character EXCEPT any lower case letter between a and z.

       Pitfall: the ^ symbol only acts as an EXCEPT rule if it is the
       very first character inside a range, and it denies the
       entire range including the ^ symbol itself if it appears again
       later in the range. Also remember that if it is the first
       character in the entire expression, it means "start of line".
       In any other place, it is always treated as a regular ^ symbol.
       In other words, you cannot deny a word with ^undesired_word
       or a group with ^(undesired_phrase).
       Read more detailed regex documentation to find out what is
       necessary to achieve this.

[_4^a-zA-Z]    Any single character which can be the underscore or the
       number 4 or the ^ symbol or any letter, lower or upper case

?, +, * and the {} count parameters can be appended not only to a single character, but also to a group() or a range[].

therefore,
^.{2}[a-z]{1,2}_?[0-9]*([1-6]|[a-f])[^1-9]{2}a+$
would mean:

^.{2}        = A line beginning with any two characters,
[a-z]{1,2}    = followed by either 1 or 2 lower case letters,
_?        = followed by an optional underscore,
[0-9]*        = followed by zero or more digits,
([1-6]|[a-f])    = followed by either a digit between 1 and 6 OR a
       lower case letter between a and f,
[^1-9]{2}    = followed by any two characters except digits
       between 1 and 9 (0 is possible),
a+$        = followed by at least one or more
       occurrences of 'a' at the end of a line.
regex at dan42 dot cjb dot net
21-Feb-2002 03:12
It's easy to exclude characters but excluding words with a regular expression is a bit more tricky. For parentheses there is no equivalent to the ^ for brackets. The only way I've found to exclude a string is to proceed by inverse logic: accept all the words that do NOT correspond to the string. So if you want to accept all strings except those _begining_ with "abc", you'd have to accept any string that matches one of the following:
  ^(ab[^c])
  ^(a[^b]c)
  ^(a[^b][^c])
  ^([^a]bc)
  ^([^a]b[^c])
  ^([^a][^b]c)
  ^([^a][^b][^c])

which, put together, gives the regex
  ^(ab[^c]|a[^b]c|a[^b][^c]|[^a]bc|[^a]b[^c]|[^a][^b]c|[^a][^b][^c])

Note that this won't work to detect the word "abc" anywhere in a string. You need to have some way of anchoring the inverse word match
like: ^(a[^b]|[^a]b|[^a][^b])  ;"ab" not at begining of line
  or: (a[^b]|[^a]b|[^a][^b])&  ;"ab" not at end of line
  or: 123(a[^b]|[^a]b|[^a][^b]) ;"ab" not after "123"

I don't know why "(abc){0,0}" is an invalid synthax. It would've made all this much simpler.
 
 
Slightly off-topic, here's a regex date validator (format yyyy-mm-dd, remove all spaces and linefeeds):
  ^(19|20)([0-9]{2}-((0[13-9]|1[0-2])-(0[1-9]|[12][0-9]|30)|
  (0[13578]|1[02])-31|02-(0[1-9]|1[0-9]|2[0-8]))|([2468]0|
  [02468][48]|[13579][26])-02-29)$
03-Feb-2002 01:02
if you are looking for the abbreviations like tab, carriage return, regex-class definitions 

you should look here:
http://elvin.dstc.edu.au/doc/regex.html

some excerpts:

   \a    control characters bell
   \b    backspace
   \f    form feed
   \n    line feed
   \r    carriage return
   \t    horizontal tab
   \v    vertical tab

class example
   \cLu    all uppercase letters
webmaster at datamike dot org
18-Dec-2001 11:39
I noticed Cyro's link had gone old. So I made copy of the regex manpage and placed it on my site. You can get it from the following address:

http://www.datamike.org/man/regexman.txt

This is primarily for Windows users, who have no access to the man pages in Linux distributions.
bart at framers dot nl
07-Mar-2001 12:53
Dario seems to have made a nice tutorial about regular expressions:

http://www.phpbuilder.com/columns/dario19990616.php3

Thanks Dario! ...
07-Mar-2001 05:38
If you don't have commandline access to the manpage cited above, note that the "POSIX 1003.2 Regular Expressions" manpage is also widely re-published on the web.  See, for instance:

http://www.google.com/search?q=posix+1003%2E2+regular+expressions

The "POSIX 1003.2 Regular Expressions" manpage provides a good basic reference for the syntax used by ereg_* functions.  Most tutorials on "extended regular expressions" are also applicable.

<qdom_treeereg_replace>
 Last updated: Sun, 02 May 2004
show source | credits | sitemap | contact | advertising | mirror sites 
Copyright © 2001-2004 The PHP Group
All rights reserved.
This mirror generously provided by: Italia OnLine S.p.a.
Last updated: Fri May 21 04:11:23 2004 CEST