You are here: Documentation » Tips, Tricks, Tutorials, and Screencasts » How to edit part of the feed before parsing it
How to edit part of the feed before parsing it
There are instances where for one reason or another, there's an issue with a feed that you need to correct before attempting to parse it. If you don't control the feed (to make the edits directly), you can still work around these types of issues.
This tutorial assumes that you're already familiar with using SimplePie, including looping through items. This is only sample code, and you should not create real pages using the (horrid) HTML generated by this example.
Compatibility
- Supported in SimplePie 1.0.
- Code in this tutorial should be compatible with PHP 4.3 or newer, and should not use PHP short tags, in order to support the largest number of PHP installations.
Sample "Bad Feed"
Here's a bad feed from Oracle that contains extra scripting information at the end of the feed that is completely invalid: http://pressroom.oracle.com/index.jsp?rss=yes
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE rdf:RDF > <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"> <channel rdf:about="http://pressroom.oracle.com/index.jsp?rss=yes"> <title>Oracle Press Releases</title> <link>http://pressroom.oracle.com/index.jsp</link> <description><![CDATA[]]></description> <dc:language>en-US</dc:language> <dc:publisher></dc:publisher> <dc:creator></dc:creator> <dc:rights></dc:rights> <dc:date>2007-07-31T18:27:21-07:00</dc:date> <items> <rdf:Seq> <rdf:li>http://www.oracle.com/corporate/press/2007_jul/jennycraig-july07.html?rssid=rss_ocom_pr</rdf:li> <rdf:li>http://www.oracle.com/corporate/press/2007_jul/forrester-aps-wave.html?rssid=rss_ocom_pr</rdf:li> <rdf:li>http://www.oracle.com/corporate/press/2007_jul/china-rd-owshanghai.html?rssid=rss_ocom_pr</rdf:li> <rdf:li>http://www.oracle.com/corporate/press/2007_jul/china-otn-owshanghai.html?rssid=rss_ocom_pr</rdf:li> <rdf:li>http://www.oracle.com/corporate/press/2007_jul/china-psc-owshanghai.html?rssid=rss_ocom_pr</rdf:li> <rdf:li>http://www.oracle.com/corporate/press/2007_jul/china-invests-owshanghai.html?rssid=rss_ocom_pr</rdf:li> <rdf:li>http://www.oracle.com/corporate/press/2007_jul/china-showopener-owshanghai.html?rssid=rss_ocom_pr</rdf:li> <rdf:li>http://www.oracle.com/corporate/press/2007_jul/china-ace-owshanghai.html?rssid=rss_ocom_pr</rdf:li> <rdf:li>http://www.oracle.com/corporate/press/2007_jul/china-partner-owshanghai.html?rssid=rss_ocom_pr</rdf:li> <rdf:li>http://www.oracle.com/corporate/press/2007_jul/detskymir.html?rssid=rss_ocom_pr</rdf:li> <rdf:li>http://www.oracle.com/corporate/press/2007_jul/schneider.html?rssid=rss_ocom_pr</rdf:li> <rdf:li>http://www.oracle.com/corporate/press/2007_jul/china-11g-owshanghai.html?rssid=rss_ocom_pr</rdf:li> <rdf:li>http://www.oracle.com/corporate/press/2007_jul/china-oa-owshanghai.html?rssid=rss_ocom_pr</rdf:li> <rdf:li>http://www.oracle.com/corporate/press/2007_jul/china-rac-owshanghai.html?rssid=rss_ocom_pr</rdf:li> <rdf:li>http://www.oracle.com/corporate/press/2007_jul/identity-governance-framework-openliberty.html?rssid=rss_ocom_pr</rdf:li> <rdf:li>http://www.oracle.com/corporate/press/2007_jul/fairfield-city-council.html?rssid=rss_ocom_pr</rdf:li> <rdf:li>http://www.oracle.com/corporate/press/2007_jul/crm-od-r14.html?rssid=rss_ocom_pr</rdf:li> <rdf:li>http://www.oracle.com/corporate/press/2007_jul/long-island-university.htm?rssid=rss_ocom_pr</rdf:li> <rdf:li>http://www.oracle.com/corporate/press/2007_jul/ampac-bd.htm?rssid=rss_ocom_pr</rdf:li> <rdf:li>http://www.oracle.com/corporate/press/2007_jul/royal%20groupandinfosys-072307.html?rssid=rss_ocom_pr</rdf:li> </rdf:Seq> </items> <image rdf:resource="http://www.siderean.com/images/16siderean.jpg"/> </channel> <image rdf:about="http://www.siderean.com/images/16siderean.jpg"> <title>Siderean Logo</title> <url>http://www.siderean.com/images/16siderean.jpg</url> <link>http://www.siderean.com</link> </image> <item rdf:about="http://www.oracle.com/corporate/press/2007_jul/jennycraig-july07.html?rssid=rss_ocom_pr"> <title>Leading Weight Management Company Jenny Craig Chooses Oracle&#8217;s Siebel CRM</title> <link>http://www.oracle.com/corporate/press/2007_jul/jennycraig-july07.html?rssid=rss_ocom_pr</link> <description></description> </item> <item rdf:about="http://www.oracle.com/corporate/press/2007_jul/forrester-aps-wave.html?rssid=rss_ocom_pr"> <title>Oracle Recognized as a Leader in Application Server Platforms by Independent Research Firm</title> <link>http://www.oracle.com/corporate/press/2007_jul/forrester-aps-wave.html?rssid=rss_ocom_pr</link> <description></description> </item> <item rdf:about="http://www.oracle.com/corporate/press/2007_jul/china-rd-owshanghai.html?rssid=rss_ocom_pr"> <title>Oracle Establishes New R&D Center in Shanghai</title> <link>http://www.oracle.com/corporate/press/2007_jul/china-rd-owshanghai.html?rssid=rss_ocom_pr</link> <description></description> </item> <item rdf:about="http://www.oracle.com/corporate/press/2007_jul/china-otn-owshanghai.html?rssid=rss_ocom_pr"> <title>Oracle Technology Network in China Surpasses 250,000 Members</title> <link>http://www.oracle.com/corporate/press/2007_jul/china-otn-owshanghai.html?rssid=rss_ocom_pr</link> <description></description> </item> <item rdf:about="http://www.oracle.com/corporate/press/2007_jul/china-psc-owshanghai.html?rssid=rss_ocom_pr"> <title>Oracle Launches Second Partner Solution Center in China</title> <link>http://www.oracle.com/corporate/press/2007_jul/china-psc-owshanghai.html?rssid=rss_ocom_pr</link> <description></description> </item> <item rdf:about="http://www.oracle.com/corporate/press/2007_jul/china-invests-owshanghai.html?rssid=rss_ocom_pr"> <title>Oracle Continues to Increase Investments in China</title> <link>http://www.oracle.com/corporate/press/2007_jul/china-invests-owshanghai.html?rssid=rss_ocom_pr</link> <description></description> </item> <item rdf:about="http://www.oracle.com/corporate/press/2007_jul/china-showopener-owshanghai.html?rssid=rss_ocom_pr"> <title>Oracle OpenWorld Asia Pacific in Shanghai to Focus on Helping Customers "Get Better Information for Better Results"</title> <link>http://www.oracle.com/corporate/press/2007_jul/china-showopener-owshanghai.html?rssid=rss_ocom_pr</link> <description></description> </item> <item rdf:about="http://www.oracle.com/corporate/press/2007_jul/china-ace-owshanghai.html?rssid=rss_ocom_pr"> <title>Alibaba Chief Database Administrator Becomes Oracle's 100th ACE Worldwide</title> <link>http://www.oracle.com/corporate/press/2007_jul/china-ace-owshanghai.html?rssid=rss_ocom_pr</link> <description></description> </item> <item rdf:about="http://www.oracle.com/corporate/press/2007_jul/china-partner-owshanghai.html?rssid=rss_ocom_pr"> <title>Oracle Reports Significant Partner Satisfaction In Asia Pacific, Invests In New Resources For Partners</title> <link>http://www.oracle.com/corporate/press/2007_jul/china-partner-owshanghai.html?rssid=rss_ocom_pr</link> <description></description> </item> <item rdf:about="http://www.oracle.com/corporate/press/2007_jul/detskymir.html?rssid=rss_ocom_pr"> <title>Detsky Mir Selects Oracle Applications to Support Growth Through Enhanced Forecasting and Improved Visibility into Merchandise Performance</title> <link>http://www.oracle.com/corporate/press/2007_jul/detskymir.html?rssid=rss_ocom_pr</link> <description></description> </item> <item rdf:about="http://www.oracle.com/corporate/press/2007_jul/schneider.html?rssid=rss_ocom_pr"> <title>Schneider National Leverages Oracle(r) Applications and Infrastructure Software to Enhance Enterprise Growth, Efficiency</title> <link>http://www.oracle.com/corporate/press/2007_jul/schneider.html?rssid=rss_ocom_pr</link> <description></description> </item> <item rdf:about="http://www.oracle.com/corporate/press/2007_jul/china-11g-owshanghai.html?rssid=rss_ocom_pr"> <title>Asia Pacific Customers and Partners Prepare for Release of Oracle(r) Database 11g</title> <link>http://www.oracle.com/corporate/press/2007_jul/china-11g-owshanghai.html?rssid=rss_ocom_pr</link> <description></description> </item> <item rdf:about="http://www.oracle.com/corporate/press/2007_jul/china-oa-owshanghai.html?rssid=rss_ocom_pr"> <title>Oracle Launches Program for Vocational Schools and Universities in China</title> <link>http://www.oracle.com/corporate/press/2007_jul/china-oa-owshanghai.html?rssid=rss_ocom_pr</link> <description></description> </item> <item rdf:about="http://www.oracle.com/corporate/press/2007_jul/china-rac-owshanghai.html?rssid=rss_ocom_pr"> <title>Over 500 Customers in China Select Oracle Real Application Clusters in FY07</title> <link>http://www.oracle.com/corporate/press/2007_jul/china-rac-owshanghai.html?rssid=rss_ocom_pr</link> <description></description> </item> <item rdf:about="http://www.oracle.com/corporate/press/2007_jul/identity-governance-framework-openliberty.html?rssid=rss_ocom_pr"> <title>Industry Leaders Submit Identity Governance Framework to openLiberty.org for Development of Open Source Implementations</title> <link>http://www.oracle.com/corporate/press/2007_jul/identity-governance-framework-openliberty.html?rssid=rss_ocom_pr</link> <description></description> </item> <item rdf:about="http://www.oracle.com/corporate/press/2007_jul/fairfield-city-council.html?rssid=rss_ocom_pr"> <title>Australia&#8217;s Fairfield City Council Successfully Implements Oracle&reg; Utilities Work and Asset Management</title> <link>http://www.oracle.com/corporate/press/2007_jul/fairfield-city-council.html?rssid=rss_ocom_pr</link> <description></description> </item> <item rdf:about="http://www.oracle.com/corporate/press/2007_jul/crm-od-r14.html?rssid=rss_ocom_pr"> <title>Oracle Announces Availability of Siebel CRM On Demand Release 14</title> <link>http://www.oracle.com/corporate/press/2007_jul/crm-od-r14.html?rssid=rss_ocom_pr</link> <description></description> </item> <item rdf:about="http://www.oracle.com/corporate/press/2007_jul/long-island-university.htm?rssid=rss_ocom_pr"> <title>Long Island University Implements Oracle&reg; Applications to Deliver an Enhanced, More Efficient Student Experience</title> <link>http://www.oracle.com/corporate/press/2007_jul/long-island-university.htm?rssid=rss_ocom_pr</link> <description></description> </item> <item rdf:about="http://www.oracle.com/corporate/press/2007_jul/ampac-bd.htm?rssid=rss_ocom_pr"> <title>Ampac Fine Chemicals Automates Regulatory Compliance, Enables Improved Manufacturing Efficiency and Automates Core Financial Processes</title> <link>http://www.oracle.com/corporate/press/2007_jul/ampac-bd.htm?rssid=rss_ocom_pr</link> <description></description> </item> <item rdf:about="http://www.oracle.com/corporate/press/2007_jul/royal%20groupandinfosys-072307.html?rssid=rss_ocom_pr"> <title>Royal Group Streamlines Business Processes and Infrastructure While Reducing Costs by Utilizing Infosys Expertise and Oracle Applications</title> <link>http://www.oracle.com/corporate/press/2007_jul/royal%20groupandinfosys-072307.html?rssid=rss_ocom_pr</link> <description></description> </item> </rdf:RDF> <!-- Script for Google Analytics tracking --> <script src="http://www.google-analytics.com/urchin.js" type="text/javascript"> </script> <script type="text/javascript"> _uacct = "UA-1152814-2"; urchinTracker(); </script> </BODY> </HTML>
The Problem with this feed
Looking at the source of the XML, I found this little nugget of wisdom at the end of the feed:
</rdf:RDF> <!-- Script for Google Analytics tracking --> <script src="http://www.google-analytics.com/urchin.js" type="text/javascript"> </script> <script type="text/javascript"> _uacct = "UA-1152814-2"; urchinTracker(); </script> </BODY> </HTML>
For those reading this who are new to RSS, this is called malformed XML. PHP's built-in XML parser chokes and dies when it comes across malformed XML. SimplePie is pretty good at correcting many of the more common mistakes, but this is quite an oversight on Oracle's part. We can fix this, but it'll take a bit more effort.
The Fix
We'll read in the feed first, correct the issue, then pass it back into SimplePie.
<?php require_once('../simplepie.inc'); // Fetch the RSS feed by itself first $file = new SimplePie_File('http://pressroom.oracle.com/index.jsp?rss=yes'); // Pass the content through a regular expression that gets rid of everything after the closing </rdf:RDF> tag. $body = preg_replace('/<\/rdf:RDF>(.|\s)*/i', '</rdf:RDF>', $file->body); // Now we'll pass our new custom data back into the SimplePie object. $feed = new SimplePie(); $feed->set_raw_data($body); $feed->init(); // Set the HTTP headers for the page automatically. $feed->handle_content_type(); // Loop through each item and display the title. foreach ($feed->get_items() as $item) { echo $item->get_title() . '<br />'; } ?>
tutorial/how_to_edit_part_of_the_feed_before_parsing_it.txt · Last modified: 2011/03/06 03:56 (external edit)