How to get content of website using CURL

0
4547
How to get content of website using CURL in PHP

There are various ways of getting web page content in PHP. Among them, CURL is one that is frequently used. CURL stands for Client URL. To use CURL library, you need to have curl enabled to use it. To enable it, simply edit php.ini file, uncomment this line: extension=php_curl.dll

Keep in mind that executing a CURL always include these following steps.

  • Create a CURL handle using curl_init().
  • Set up the request using curl_setopt() or curl_setopt_array().
  • Request the page using curl_exec().
  • Check if an error occurred using curl_errno().
  • Get the HTTP header using curl_getinfo().
  • Close the CURL handle using curl_close().

Basic syntax for CURL is as follow:

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.example.com');
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$data = curl_exec();
curl_close($ch);

Breaking down the code, we have following:

Code Description
curl_init() intiate the curl object
curl_setopt() specify the file or url to load
curl_exec() perform the cURL request
curl_close() close the connection

There are various configuration options while executing. You can find all of them at PHP official documentation (http://php.net/manual/en/book.curl.php).

Functions Description
curl_copy_handle Copy a cURL handle along with all of its preferences
curl_errno Return the last error number
curl_error Return a string containing the last error for the current session
curl_escape URL encodes the given string
curl_exec Perform a cURL session
curl_file_create Create a CURLFile object
curl_getinfo Get information regarding a specific transfer
curl_init Initialize a cURL session
curl_multi_add_handle Add a normal cURL handle to a cURL multi handle
curl_multi_close Close a set of cURL handles
curl_multi_errno Return the last multi curl error number
curl_multi_exec Run the sub-connections of the current cURL handle
curl_multi_getcontent Return the content of a cURL handle if CURLOPT_RETURNTRANSFER is set
curl_multi_info_read Get information about the current transfers
curl_multi_init Returns a new cURL multi handle
curl_multi_remove_handle Remove a multi handle from a set of cURL handles
curl_multi_select Wait for activity on any curl_multi connection
curl_multi_setopt Set an option for the cURL multi handle
curl_multi_strerror Return string describing error code
curl_pause Pause and unpause a connection
curl_reset Reset all options of a libcurl session handle
curl_setopt_array Set multiple options for a cURL transfer
curl_setopt Set an option for a cURL transfer
curl_share_close Close a cURL share handle
curl_share_errno Return the last share curl error number
curl_share_init Initialize a cURL share handle
curl_share_setopt Set an option for a cURL share handle
curl_share_strerror Return string describing the given error code
curl_strerror Return string describing the given error code
curl_unescape Decodes the given URL encoded string
curl_version Gets cURL version information

Now, lets write something useful on our real life scenario. Let’s create a function that takes url as a parameter and executes CURL library and return accordingly.

"GET",        //set request method type post or get
        CURLOPT_POST           =>false,        //set to GET
        CURLOPT_USERAGENT      => $user_agent, //set user agent
        CURLOPT_COOKIEFILE     =>"cookie.txt", //set cookie file
        CURLOPT_COOKIEJAR      =>"cookie.txt", //set cookie jar
        CURLOPT_RETURNTRANSFER => true,     // return web page
        CURLOPT_HEADER         => false,    // don't return headers
        CURLOPT_FOLLOWLOCATION => true,     // follow redirects
        CURLOPT_ENCODING       => "",       // handle all encodings
        CURLOPT_AUTOREFERER    => true,     // set referer on redirect
        CURLOPT_CONNECTTIMEOUT => 120,      // timeout on connect
        CURLOPT_TIMEOUT        => 120,      // timeout on response
        CURLOPT_MAXREDIRS      => 10,       // stop after 10 redirects
    );

    $ch      = curl_init( $url );
    curl_setopt_array( $ch, $options );
    $content = curl_exec( $ch );
    $err     = curl_errno( $ch );
    $errmsg  = curl_error( $ch );
    $header  = curl_getinfo( $ch );
    curl_close( $ch );

    $header['errno']   = $err;
    $header['errmsg']  = $errmsg;
    $header['content'] = $content;
    return $header;
}
?>

Now, we will call this function and handled the response. For this, we have to write code as below:

// call function to execute curl
$result = get_web_page( $url );

if ( $result['errno'] != 0 )
    ... error: bad url, timeout, redirect loop ...

if ( $result['http_code'] != 200 )
    ... error: no page, no permissions, no service ...

$page = $result['content'];

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.