PHP CLI: Recursive Search and Replace

Revision as of 09:57, 15 August 2009 by Ric (talk | contribs) (New page: {{Uc nav PHP CLI}} '''''PHP CLI Recursive Search and Replace''''' On the previous page I covered single file search and replace although adequate for most applications occasionally you wi...)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

 

MPG UniCenter

UniServer 5.0-Nano
PHP CLI.

PHP CLI Recursive Search and Replace

On the previous page I covered single file search and replace although adequate for most applications occasionally you will need to search a folder and its sub-folders for the files. Not a problem download one of the many classes that can be found on the Internet.

I find they come with too many bells and whistles hence this page covers a basic PHP script to search and replace text in files in any folder and its sub-folders using a recursive function.

By now you will appreciate I like working code examples these you can hack around and tailor to your own applications. In keeping with this the following examples take a look at problems associated with a recursive design using PHP.

Initial test setup

Edit our two test files Run.bat and test_1.php contained in folder UniServer to have the following content:

Run.bat|| 

TITLE CLI TEST BAT
COLOR B0
@echo off
cls
echo.
usr\local\php\php.exe -n test_1.php
echo.
pause


Batch file to run test_1.php script

Note 1: UniServer Mona users change paths as shown:

  • udrive\usr\local\php\php.exe -n test_1.php
  • ./udrive/usr/local/mysql

All scripts on this page will require the above change.

test_1.php  
<?php
$sfolder = "./usr/local/mysql";        // Start folder

$Array=recur_dir($sfolder);             // Retrieve file list

foreach($Array as $line){               // Print list
 echo $line."\n";
}

//=== Recursive Directory ==============================================
  
function recur_dir($dir){

  $dirlist = opendir($dir);            // Open start directory
  while ($file = readdir($dirlist)){   // Iterate through list
      $newpath = $dir.'/'.$file;       // Create path. Either dir or file 
         $Array[]= $newpath;           // Save full file path to array
  }

  closedir($dirlist);                  // Close handle 
  return $Array;                       // Return array of files for 
}                                      // further processing
//========================================== END Recursive Directory =====
exit(0);
?>
  • First line sets the initial starting folder in variable $sfolder
  • Next line calls the recursive function and saves the returned list of files in array $Array
  • The foreach() function iterates through the array calling preg_replace() function however in this case just prints the list of files.

Recursive function:

  • The recursive function takes as parameter the starting directory $dir
  • opendir() opens a handle to the starting directory and saves it in variable $dirlist
  • readdir() function returns a single entry from the currently open directory and saves it in variable $file.
    Note: The returned value is either name of a directory or name of a file including its file extension.
  • The while loop is use to read every single entry in the open directory until there are no more entries to read.
  • Inside the while loop the full path is assembled and assigned to variable $newpath. This is added to the array $Array[]
  • With no more data to read we drop out of the while loop and close file handle closedir($dirlist). The file list $Array is returned to the caller.

Run the batch file (double click on Run.bat) Result as follows:

./usr/local/mysql/.
./usr/local/mysql/..
./usr/local/mysql/bin
./usr/local/mysql/data
./usr/local/mysql/my.cnf
./usr/local/mysql/share
Press any key to continue . . .


Well! What should be obvious there is no recursion, the code is only a starting point.

The output contains a mixture of folders (bin, data, share) and files (my.cnf, mysqlrun.bat, mysqlstop.bat, README.txt)

You will notice there are two special sub-directory names [.] and [..] these are normally hidden note that every folder contains them. A single period [.] means "the current default directory." Two periods [..] means "the directory which contains the current default directory" also known as the parent directory. They are useful for navigation however will cause problems if not removed from a directory listing.

Note: The above script is only a skeleton and needs refining.

Top

Refine - Remove and separate

The two special sub-folders need to be removed from any listings. Files and folders require separation. The following example is the next step before looking at recursion.

Edit test_1.php to have the following content:

test_1.php  
<?php
$sfolder = "./usr/local/mysql";         // Start folder
$Array=recur_dir($sfolder);             // Retrieve file list

foreach($Array as $line){               // Print list
 echo $line."\n";
}
//=== Recursive Directory ==============================================
  
function recur_dir($dir){

  $dirlist = opendir($dir);             // Open start directory
  while ($file = readdir($dirlist)){    // Iterate through list
    if ($file != '.' && $file != '..'){ // Skip if . or ..
      $newpath = $dir.'/'.$file;        // Create path. Either dir or file 

      if (is_dir($newpath)){            // Is it a folder
        recur_dir($newpath);            // yes: Repeat this function
      }                                 // for that new folder
      else{                             // no: Its a file
         $Array[]= $newpath;            // Save full file path to array
      }

    }
  }
  closedir($dirlist);                  // Close handle 
  return $Array;                       // Return array of files for 
}                                      // further processing
//========================================== END Recursive Directory =====
exit(0);
?>
  • Within the while loop first check for one of the special sub-folders. If not either of these contue otherwise skip. Let the while loop pick up another entry.
  • Using is_dir() function check to see if $newpath is a directory or file.
  • If it is a directory call the function recur_dir() with the new folder (calling itself is referred to as recursion)
  • Else it is a file save $newpath to $Array
  • The whole processes is repeated until there are no more entries to process.
  • Handle is closed using closedir($dirlist) and array returned to caller.

Run the batch file (double click on Run.bat) Result as follows:

./udrive/usr/local/mysql/my.cnf

Well what a pain!

What happened to the recursion?

After all the function is calling it-self.

Top

Modification

 if (is_dir($newpath)){         // Is it a folder
   recur_dir($newpath);         // yes: Repeat this function
   echo "Path = ".$newpath."\n";
 }                              // for that new folder


Add the echo line as shown to the above script.

It prints out $newpath displaying any folders.

Run the batch file.

Top

Result


Path = ./usr/local/mysql/bin
Path = ./usr/local/mysql/data/mysql
Path = ./usr/local/mysql/data/phpmyadmin
Path = ./usr/local/mysql/data
Path = ./usr/local/mysql/share/charsets
Path = ./usr/local/mysql/share/english
Path = ./usr/local/mysql/share

./usr/local/mysql/my.cnf

Press any key to continue . . .


An interesting result, it clearly shows recursion is taking place.

For all sub-folders to be visible means all files must have been processed otherwise the script would have been stuck in the while loop.

Why are only files in the starting directory listed?

You may have noticed from the initial test setup all folders were listed firsts. This means the function is called before processing any files. The script works down a folder chain until no more folders are found then works back up the chain processing files. If it encounters a folder works down that folder chain. Net result the initial starting folder files are processed last.

Solutions

To answer the question local variables and local arrays are not retained between function calls. So very time the function calls it-self a new array is created and any previous data stored is lost. The solution is to use static arrays however these are supported only for classes hence why so many recursive solutions using classes.

Another solution is to use a global array, its not neat because its detached from the function hence the need to remember its name and to clear it before use.

A neater solution is to pass the array when calling the function; this keeps the array alive and data intacked see next example

Top

Refine - pass array back to function

Solution is to pass the array back to the function during a recusre call as follows:

Edit test_1.php to have the following content:

test_1.php  
<?php
$sfolder = "./usr/local/mysql";  // Start folder
$Array=recur_dir($sfolder);             // Retrieve file list

foreach($Array as $line){               // Print list
 echo $line."\n";
}
//=== Recursive Directory ==============================================
  
function recur_dir($dir,&$Array){

  $dirlist = opendir($dir);             // Open start directory
  while ($file = readdir($dirlist)){    // Iterate through list
    if ($file != '.' && $file != '..'){ // Skip if . or ..
      $newpath = $dir.'/'.$file;        // Create path. Either dir or file 

      if (is_dir($newpath)){            // Is it a folder
        recur_dir($newpath,$Array);     // yes: Repeat this function
      }                                 // for that new folder
      else{                             // no: Its a file
         $Array[]= $newpath;            // Save full file path to array
      }

    }
  }
  closedir($dirlist);                  // Close handle 
  return $Array;                       // Return array of files for 
}                                      // further processing
//========================================== END Recursive Directory =====
exit(0);
?>


Run the batch file (double click on Run.bat) Result as follows:

Warning: Missing argument 2 for recur_dir(), called in ... test_1.php on line 3 and defined in ... test_1.php on line 10

Note: List of all files including sub-folder files.

Generally warnings are not an issue however this one is a pain and needs to be resolved. This warning occurs because of a parameter mismatch.

We have seen this before the solution is to change this line:

function recur_dir($dir,&$Array){

To:

function recur_dir($dir,&$Array=false){

Initial call to function pass a single parameter.

Recursive calls pass two parameters

Essentially that’s it for recursion all that is required is to add some filtering see next section:

Top

Refine - Add file filtering

Filtering files can be achieved using the function preg_match($pattern_regex, $string_to_search)

The pattern was covered in preg_replace() it has the following format '/regex_patern/'

Hence to filter files with a specific extension use something like this:

'/(\.txt|\.cnf|\.conf)/'

Pattern is delimited using '/' The entire regex is enclosed between brackets allowing the vertical bar (special character meaning or) to be used. The period (full stop) is a special regex character hence requires escaping using a backslash.

Final recursive file search

Edit test_1.php to have the following content:

test_1.php  
<?php
$sfolder = "./usr/local";               // Start folder
$File_list_array=recur_dir($sfolder);   // Retrieve file list

foreach($File_list_array as $line){     // Print list
 echo $line."\n";
}
//=== Recursive Directory ==============================================
  
function recur_dir($dir,&$Array=false){
  $f_str='/(\.txt|\.cnf|\.conf)/';         // Filter, required files

  $dirlist = opendir($dir);                // Open start directory

  while ($file = readdir($dirlist)){       // Iterate through list
    if ($file != '.' && $file != '..'){    // Skip if . or ..
      $newpath = $dir.'/'.$file;           // Create path. Either dir or file 

      if (is_dir($newpath)){               // Is it a folder
        recur_dir($newpath,$Array);        // yes: Repeat this function
      }                                    // for that new folder
      else{                                // no: Its a file
       if (preg_match($f_str, $newpath)){  // Filter extension. Required files
         $Array[]= $newpath;               // Save full file path to array
       }                                   // includes file name
      }
    }
  }
  closedir($dirlist);                      // Close handle 
  return $Array;                           // Return array of files for 
}                                          // further processing

//========================================== END Recursive Directory =====
exit(0);
?>
  • Run the batch file (double click on Run.bat)
  • Result Only files with the specified extensions are listed.
/(\.txt|\.cnf|\.conf)/

Note 1: The Start folder was moved allowing more folders to be to searched. Outside of the function changed $Array to $File_list_array to avoid confusion.

Note 2: The array is passed to the function using the and operator &$Array=false referred to as passing by pointer. It looks a little odd the array name is a pointer, however the array is not created until a value is assigned to it. If its not created it cannot be passed to the function for recursion. What the & operator does is to create a variable to hold a pointer to the array. This will be created when the function is first called.

File filtering is performed using preg_match() if a match found save the file to $Array

 if (preg_match($f_str, $newpath)){ 
   $Array[]= $newpath;             
 }                                 

Complete: Essentially that completes the recursive file search template. You can now add replace code either externally to the function or convert it to perform both search and replace.

Result of running the above script

./usr/local/apache2/conf/httpd.conf
./usr/local/apache2/conf/ssl.conf
./usr/local/apache2/LICENSE.txt
./usr/local/mysql/my.cnf

Press any key to continue . . . Top

Search and replace example 1

Edit test_1.php to have the following content:

test_1.php  
<?php
$s_str = '/\nListen\s\d+/';              // String to search for
$r_str = "\nListen 8080";                // Replacement string

$sfolder = "./usr/local";                // Start folder
$File_list_array=recur_dir($sfolder);    // Retrieve file list


foreach($File_list_array as $sfile){     // Scan file list
  $fh = fopen($sfile, 'r');              // Open file for read
  $Data = fread($fh, filesize($sfile));  // Read all data into variable
  fclose($fh);                           // close file handle

  $Data = preg_replace($s_str, $r_str, $Data); // Search and replace

  $fh = fopen($sfile, 'w');              // Open file for write
  fwrite($fh, $Data);                    // Write to file
  fclose($fh);                           // close file handle
}
//=== Recursive Directory ==============================================
  
function recur_dir($dir,&$Array=false){
  $f_str='/(\.txt|\.cnf|\.conf)/';         // Filter, required files

  $dirlist = opendir($dir);                // Open start directory

  while ($file = readdir($dirlist)){       // Iterate through list
    if ($file != '.' && $file != '..'){    // Skip if . or ..
      $newpath = $dir.'/'.$file;           // Create path. Either dir or file 

      if (is_dir($newpath)){               // Is it a folder
        recur_dir($newpath,$Array);        // yes: Repeat this function
      }                                    // for that new folder
      else{                                // no: Its a file
       if (preg_match($f_str, $newpath)){  // Filter extension. Required files
         $Array[]= $newpath;               // Save full file path to array
       }                                   // includes file name
      }
    }
  }
  closedir($dirlist);                      // Close handle 
  return $Array;                           // Return array of files for 
}                                          // further processing

//========================================== END Recursive Directory =====
exit(0);
?>

To perform a global search and replace:

  • Set the file types you wish to match
  • Set a search string – regex format
  • Set a replacement string
  • Set a start folder
  • Run the recur_dir function to obtain a list of file.
  • Scan this list of files line-by-line for each file
    • Open the file
    • Perform search and replace using function preg_replace()
    • Close file
  • Repeat above steps for all files in the list

Top

Search and replace example 2

The previous examples were designed to demonstrate certain concepts and potential issues during a recursive function design. Interestingly making the function what I refer to as self-contained most of the issues disappear.

There was no real need to return a function containing a list of files, having found a matching file why not just perform a string search and replace. All that is required is to throw parameters at the function and let it get on with the job. This example does that I have also changed a few names to make them more meaningful.

Edit test_1.php to have the following content:

test_1.php  
<?php
$start_dir   = './usr/local';             // Start folder
$file_type   = '/(\.txt|\.cnf|\.conf)/';  // Filter, required files
$search_str  = '/\nListen\s\d+/';         // String to search for
$replace_str = "\nListen 8080";           // Replacement string

if(file_sr_global($start_dir,$file_type,$search_str,$replace_str)){
 echo "\n Search and replace complete\n";
}

//=== Recursive File Search and replace  =======================================
  
function file_sr_global($start_dir,$file_type,$search_str,$replace_str){

  $dirlist = opendir($start_dir);              // Open start directory

  while ($file = readdir($dirlist)){           // Iterate through list
    if ($file != '.' && $file != '..'){        // Skip if . or ..
      $newpath = $start_dir.'/'.$file;         // Create path. Either dir or file 

      if (is_dir($newpath)){                   // Is it a folder
                                               // yes: Repeate this function
        file_sr_global($newpath,$file_type,$search_str,$replace_str); 
      }                                        // for that new folder
      else{                                    // no: Its a file
       if (preg_match($file_type, $newpath)){  // Filter extension. Required files

       $fh = fopen($newpath, 'r');             // Open file for read
       $Data = fread($fh, filesize($newpath)); // Read all data into variable
       fclose($fh);                            // close file handle

       $Data = preg_replace($search_str, $replace_str, $Data); // Search and replace

       $fh = fopen($newpath, 'w');             // Open file for write
       fwrite($fh, $Data);                     // Write to file
       fclose($fh);                            // close file handle
       echo $newpath."\n"; //***** Delete this line ***************************
       }                                       
      }
    }
  }
  closedir($dirlist);                          // Close handle 
  return true;                                 // Return 
}                                              
//=================================== END Recursive File Search and replace  ===

exit(0);
?>


To perform a global search and replace:

  • Set a folder to start the search from
  • Set the file types you wish to search
  • Set a search string – regex format
  • Set a replacement string
  • Pass these parameters to the function file_sr_global()

Since the line:

$Array[]= $newpath; has been replaced there is no feedback hence this test line:

echo $newpath."\n"; displays files that have been searched.

After testing it can be removed.


  • Run the batch file (double click on Run.bat)
  • Result Since it’s a powerful function I have restricted it to change only the Listen parameter in Apache’s configuration file. That said it still kills the server hence re-run script change this line:
$replace_str = "\nListen 8080";

To:

$replace_str = "\nListen 80";

Alternatively you can open the file and change it.

There is one final step! Turn it into a finished function and use it.

Top

Final recursive search and replace function

One thing that really annoys me professional programmers that never documents one line of code, how do they know where it all fits in a few years time. I do tend to go overboard but that’s my personal preference. I know the above code is not perfect but at least you have some idea what each line does. Similarly when turned into a function I add extra information as follows:

//=== Recursive File Search and replace  ==========================================
// Inputs:  $start_dir   Absolute or relative path to starting folder. Do not
//                       include a forward slash at the end. c:/test ./test 
//          $file_type   A regex patern containg file types to be searched 
//                       e.g.  $file_type = '/(\.txt|\.cnf|\.conf)/' 
//          $search_str  A regex patern e.g $search_str  = '/\nListen\s\d+/'
//          $replace_str A plain text string e.g. $replace_str = "\nListen 8080"
//
// Output:  Returns true --- Need to add error checking
//           
// Notes :  Searches for files of the specified type starting at $start_dir and 
//          incluse all sub-folders. Each file found a search and replace is
//          performed.
//          
// -----------------------------------------------------------------------------------
 
function file_sr_global($start_dir,$file_type,$search_str,$replace_str){

  $dirlist = opendir($start_dir);                // Open start directory

  while ($file = readdir($dirlist)){             // Iterate through list
    if ($file != '.' && $file != '..'){          // Skip if . or ..
      $newpath = $start_dir.'/'.$file;           // Create path. Either dir or file 

      if (is_dir($newpath)){                     // Is it a folder
                                                 // yes: Repeat this function
        file_sr_global($newpath,$file_type,$search_str,$replace_str); 
      }                                          // for that new folder
      else{                                      // no: Its a file
       if (preg_match($file_type, $newpath)){    // Filter by file extension.

         $fh = fopen($newpath, 'r');             // Open file for read
         $Data = fread($fh, filesize($newpath)); // Read all data into variable
         fclose($fh);                            // Close file handle

         $Data = preg_replace($search_str, $replace_str, $Data,-1,$count);// S & R
         if($count){                             // Was a replacement made
           $fh = fopen($newpath, 'w');           // yes: Open file for write
           fwrite($fh, $Data);                   // Write new $Data to file
           fclose($fh);                          // Close file handle
           echo $newpath." Replaced ".$count."\n"; //***** Delete this line *******
         }
       }                                       
      }//eof else
    }
  }//eof while

  closedir($dirlist);                          // Close handle 
  return true;                                 // Return 
}                                              
//=================================== END Recursive File Search and replace  ======

OK its true I never practice what I preach.

Top

Summary

To be honest I have never found any old fashioned PHP code to perform the above hence the reason for writing it, seems the trendy thing is classes. To justify this everyone wants to add far more than is required. I like simple! It’s less error prone.

Well I am not biased in anyway and will hack any code if it gets the job done. This series has resulted in the creation of some original code.

Top

Conclusion

True objective of this tutorial series was to give you an insight into UniServer 5.0-Nano’s new control architecture. All examples in the tutorial can be found within this control architecture or support scripts.

Esoteric batch files have been reduced to nothing more than interfaces to the PHP scripts. Uniform Server is now uniform regarding both scripting (control) and Web page language.

This I hope will allow you to tailor the server to meet any specific functionality you require without the need to compile any code.

Top


  MPG (Ric)