You are here: Documentation » SimplePie 2 » Goals for SimplePie 2
Differences
This shows you the differences between two versions of the page.
sp2:goals [2013/03/15 02:53] tonyguards |
sp2:goals [2013/08/11 05:03] (current) rmccue old revision restored |
||
---|---|---|---|
Line 1: | Line 1: | ||
====== Goals for SimplePie 2 ====== | ====== Goals for SimplePie 2 ====== | ||
Over the past four years or so, SimplePie has grown from a completely unknown set of functions sitting on top of MagpieRSS to one of the world's most popular feed parsers with thousands and thousands of users all over the world. Also in that time, SimplePie has started to outgrow its architecture. People use SimplePie for all sorts of tasks that we never really anticipated, so we believe we've now reached a point where it's time for a reset. | Over the past four years or so, SimplePie has grown from a completely unknown set of functions sitting on top of MagpieRSS to one of the world's most popular feed parsers with thousands and thousands of users all over the world. Also in that time, SimplePie has started to outgrow its architecture. People use SimplePie for all sorts of tasks that we never really anticipated, so we believe we've now reached a point where it's time for a reset. | ||
- | [[http://www.tiraimodern.com/product/vertical-blind|Vertical Blind]] & [[http://www.tiraimodern.com/product/roller-blind|Roller Blind]] | + | |
SimplePie 2 is currently in the planning stages, and is both a fork and a ground-up re-write of SimplePie. The intention is to enhance the performance by trimming the fat, to build something more extensible, to make it easier to contribute, and to optimize for the kinds of tasks that we see people wanting to do frequently. The purpose of this document is to put together a list of goals for SimplePie 2 that will improve the overall project as a whole, and unshackle some of the early design decisions which now seem to be holding us back. | SimplePie 2 is currently in the planning stages, and is both a fork and a ground-up re-write of SimplePie. The intention is to enhance the performance by trimming the fat, to build something more extensible, to make it easier to contribute, and to optimize for the kinds of tasks that we see people wanting to do frequently. The purpose of this document is to put together a list of goals for SimplePie 2 that will improve the overall project as a whole, and unshackle some of the early design decisions which now seem to be holding us back. | ||
- | [[http://www.mitrainti.com|SAP Indonesia]] | + | |
That being said, I took some time to write down some thoughts about what should go into SimplePie 2, and I would really like to get your thoughts as well. Are there things that SP2 should do that SP1 doesn't? Would you like to use SimplePie in ways that are currently more difficult than they should be? Are you somebody who has a different design philosophy and you think we should pay better attention to certain things? This is your chance to weigh in with your thoughts, opinions, comments, and other feedback. | That being said, I took some time to write down some thoughts about what should go into SimplePie 2, and I would really like to get your thoughts as well. Are there things that SP2 should do that SP1 doesn't? Would you like to use SimplePie in ways that are currently more difficult than they should be? Are you somebody who has a different design philosophy and you think we should pay better attention to certain things? This is your chance to weigh in with your thoughts, opinions, comments, and other feedback. | ||
Line 28: | Line 28: | ||
* This should be relatively simple to have, but something I think we should have. The question if nothing else is if we it accessible from, say, an item level, does changing options there have an effect elsewhere, or just on that item? ~~ gsnedders | * This should be relatively simple to have, but something I think we should have. The question if nothing else is if we it accessible from, say, an item level, does changing options there have an effect elsewhere, or just on that item? ~~ gsnedders | ||
* Is there a use-case where something like this might be valuable? ~~ skyzyx | * Is there a use-case where something like this might be valuable? ~~ skyzyx | ||
- | * **IRI Module:** We need to cope with converting Internationalized Resource Identifiers (IRIs) to their absolute counterparts as well as mapping IRIs to URIs for the sake of HTTP (e.g. möbius.com). ~~ skyzyx/gsnedders (Currently in development: | + | * **IRI Module:** We need to cope with converting Internationalized Resource Identifiers (IRIs) to their absolute counterparts as well as mapping IRIs to URIs for the sake of HTTP (e.g. möbius.com). ~~ skyzyx/gsnedders (Currently in development: http://hg.gsnedders.com/iri/) |
* **Character Transcoding Module:** Handles on-the-fly conversions between character encodings. Will continue to use UTF-8 internally. Uses the built-in ''iconv'' support by default, but will be enhanced by ''mbstring'' support. ~~ skyzyx | * **Character Transcoding Module:** Handles on-the-fly conversions between character encodings. Will continue to use UTF-8 internally. Uses the built-in ''iconv'' support by default, but will be enhanced by ''mbstring'' support. ~~ skyzyx | ||
- | * On the whole I don't like iconv because we can't guarantee we'll comply to XML (behaviour is system dependent); mbstring is better because it is only dependant on PHP version; but what is better still is . :) Hopefully, that'll be able to cope with virtually anything of sufficient complexity to not be able to be done from a UCM/CharMapML file at reasonable expense, and use a UCM/CharMapML file otherwise. ~~ gsnedders | + | * On the whole I don't like iconv because we can't guarantee we'll comply to XML (behaviour is system dependent); mbstring is better because it is only dependant on PHP version; but what is better still is <http://hg.gsnedders.com/Unicode/>. :) Hopefully, that'll be able to cope with virtually anything of sufficient complexity to not be able to be done from a UCM/CharMapML file at reasonable expense, and use a UCM/CharMapML file otherwise. ~~ gsnedders |
* I suggest ''iconv'' because it's built into PHP5, and ''mbstring'' as an enhancement because it's not. If we can provide the same functionality without the extension dependencies and not take a large performance hit, I think this is a good idea. ~~ skyzyx | * I suggest ''iconv'' because it's built into PHP5, and ''mbstring'' as an enhancement because it's not. If we can provide the same functionality without the extension dependencies and not take a large performance hit, I think this is a good idea. ~~ skyzyx | ||
* **Parsing Module:** Parses conformant feeds into a standard internal data structure. Uses the same namespace-based organization as SP 1.0. ~~ skyzyx | * **Parsing Module:** Parses conformant feeds into a standard internal data structure. Uses the same namespace-based organization as SP 1.0. ~~ skyzyx | ||
Line 36: | Line 36: | ||
* That's very inefficient. What we ought to do, now we can on PHP5, is use the DOM extension. Also, I think we should go beyond conformant feeds, and use XML5 to parse anything (we should be able to use libxml's XML 1 parser first and fallback to our XML5 parser, as the libxml parser be quicker), though the XML5 spec needs to be more written. ~~ gsnedders | * That's very inefficient. What we ought to do, now we can on PHP5, is use the DOM extension. Also, I think we should go beyond conformant feeds, and use XML5 to parse anything (we should be able to use libxml's XML 1 parser first and fallback to our XML5 parser, as the libxml parser be quicker), though the XML5 spec needs to be more written. ~~ gsnedders | ||
* We should definitely use DOM. For RSS feeds, I think it would do a better job with oft-ill-formed XML than SimpleXML (which I prefer when I know the data is clean). In regard to XML5, my understanding is that it still has a ways to go. I understand the W3C process enough to know that there should be some solid implementations in place, but I'm not aware of how far along the path the spec is. ~~ skyzyx | * We should definitely use DOM. For RSS feeds, I think it would do a better job with oft-ill-formed XML than SimpleXML (which I prefer when I know the data is clean). In regard to XML5, my understanding is that it still has a ways to go. I understand the W3C process enough to know that there should be some solid implementations in place, but I'm not aware of how far along the path the spec is. ~~ skyzyx | ||
- | * **Core API Layer Module:** Translates the internal data structure into logical API methods that third-party developers interact with. This "core" module will cover the normalization of various RSS/Atom data types, and should include all supported data types. ~~ skyzyx | + | * **Core API Layer Module:** Translates the internal data structure into logical API methods that third-party developers interact with. This "core" module will cover the normalization of various RSS/Atom data types, [[http://microformats.org/wiki/hatom|hAtom]], and should include all supported data types. ~~ skyzyx |
* I, as part of having everything as a module, would prefer that Atom and RSS were different modules (maybe to the extreme of Atom 0.3, Atom 1.0, RSS 0.90, RSS 1, and RSS 2 all being different modules). Then we can have the fun of having it all coming together in one API. Also, I'd rather almost anything returned an object with several methods, dependant on the loaded modules, giving methods like ::get_xhtml(), ::get_html(), and ::get_text(). ~~ gsnedders | * I, as part of having everything as a module, would prefer that Atom and RSS were different modules (maybe to the extreme of Atom 0.3, Atom 1.0, RSS 0.90, RSS 1, and RSS 2 all being different modules). Then we can have the fun of having it all coming together in one API. Also, I'd rather almost anything returned an object with several methods, dependant on the loaded modules, giving methods like ::get_xhtml(), ::get_html(), and ::get_text(). ~~ gsnedders | ||
* That's an interesting way to solve a problem such as whether titles should be text or HTML, but I fear the added complexity another layer of subclasses would provide. I'm wondering if there's value in using something like ''__toString()'' to provide a default value along with subclasses. | * That's an interesting way to solve a problem such as whether titles should be text or HTML, but I fear the added complexity another layer of subclasses would provide. I'm wondering if there's value in using something like ''__toString()'' to provide a default value along with subclasses. | ||
==== Extended Functionality (i.e. non-standard, optional modules) ==== | ==== Extended Functionality (i.e. non-standard, optional modules) ==== | ||
- | * **HTTP Module:** Handles requesting data over HTTP (with proper HTTP 1.1 support), and can understand and format the response into something more usable. Based on cURL, and supports curl_multi_exec() for parallel fetching of feeds. Should also have support for proxies, HTTP basic auth, and HTTP digest auth. ~~ skyzyx (Will be based on RequestCore:) | + | * **HTTP Module:** Handles requesting data over HTTP (with proper HTTP 1.1 support), and can understand and format the response into something more usable. Based on cURL, and supports curl_multi_exec() for parallel fetching of feeds. Should also have support for proxies, HTTP basic auth, and HTTP digest auth. ~~ skyzyx (Will be based on RequestCore: http://requestcore.googlecode.com/svn/trunk/) |
* I'd rather not use cURL. I've had too much bad experience with it. That HTTP class is also rather useless in the real world, and doesn't really even work well for some stuff valid per HTTP/1.1. ~~ gsnedders | * I'd rather not use cURL. I've had too much bad experience with it. That HTTP class is also rather useless in the real world, and doesn't really even work well for some stuff valid per HTTP/1.1. ~~ gsnedders | ||
* I'd much prefer to use cURL. Leveraging ''curl_multi_exec()'' will substantially improve the fetch times for MultiFeed users because it can fetch in parallel. I'm hoping that we can determine the issues it has with HTTP/1.1 and improve it to make a more robust standalone fetching class. Although cURL seems to be fairly widely supported, I'm certainly interested in including baseline fetching support using another method. It's just that using ''fsockopen()'' has caused increased maintenance that I'd prefer to avoid moving forward. ~~ skyzyx | * I'd much prefer to use cURL. Leveraging ''curl_multi_exec()'' will substantially improve the fetch times for MultiFeed users because it can fetch in parallel. I'm hoping that we can determine the issues it has with HTTP/1.1 and improve it to make a more robust standalone fetching class. Although cURL seems to be fairly widely supported, I'm certainly interested in including baseline fetching support using another method. It's just that using ''fsockopen()'' has caused increased maintenance that I'd prefer to avoid moving forward. ~~ skyzyx | ||
* I've moved these to "Extended" instead of "Standard" as our focus should be on the parsing instead of fetching the data. Fetches should be manual. | * I've moved these to "Extended" instead of "Standard" as our focus should be on the parsing instead of fetching the data. Fetches should be manual. | ||
- | * **Caching Module:** Extensible caching system that manages functionality, along with an actual caching plugin (file-based). ~~ skyzyx (File-based, APC, Memcache, MySQL, PostgreSQL, and SQLite caching will be based on CacheCore: | + | * **Caching Module:** Extensible caching system that manages functionality, along with an actual caching plugin (file-based). ~~ skyzyx (File-based, APC, Memcache, MySQL, PostgreSQL, and SQLite caching will be based on CacheCore: http://cachecore.googlecode.com/svn/trunk/) |
* This for the most part looks fine. However, how are we going to cache? Using the DOM extension we have cheap enough XML parsing to just save XML, but then do we cache processed content (like sanitized HTML content)? ~~ gsnedders | * This for the most part looks fine. However, how are we going to cache? Using the DOM extension we have cheap enough XML parsing to just save XML, but then do we cache processed content (like sanitized HTML content)? ~~ gsnedders | ||
* Provide separate cache files for each feed item, much like how we do for MySQL caching in the trunk. We would still need a way to keep track of which items are in the feed though. This is easy with a SQL system, but a bit more challenging with a flat cache (File, APC, Memcache). Alternatively, flat caches are better for storing large chunks of data for the long term. We need to find the proper balance between the two and make sure they're well supported. | * Provide separate cache files for each feed item, much like how we do for MySQL caching in the trunk. We would still need a way to keep track of which items are in the feed though. This is easy with a SQL system, but a bit more challenging with a flat cache (File, APC, Memcache). Alternatively, flat caches are better for storing large chunks of data for the long term. We need to find the proper balance between the two and make sure they're well supported. | ||
Line 61: | Line 61: | ||
* Content munging modules such as text shortening, extracting images, and other related things. Caching content images. Finding and caching favicons. ~~ skyzyx | * Content munging modules such as text shortening, extracting images, and other related things. Caching content images. Finding and caching favicons. ~~ skyzyx | ||
* These are oft-requested bits of functionality, but have no place in the standard package. These should by developed (by someone) as purely optional modules. ~~ skyzyx | * These are oft-requested bits of functionality, but have no place in the standard package. These should by developed (by someone) as purely optional modules. ~~ skyzyx | ||
- | * Fetching favicons is harder than people realize. | + | * Fetching favicons is harder than people realize. http://nick.typepad.com/blog/2008/11/favicon-hell-sm.html |
* JSON Web Service module that translates the internal data structure into JSON and can serve it efficiently using REST-style methods (same as SimplePie Live!) ~~ skyzyx | * JSON Web Service module that translates the internal data structure into JSON and can serve it efficiently using REST-style methods (same as SimplePie Live!) ~~ skyzyx | ||
* If we're just using DOM as the internal data structure, what's the diff. from just using x-domain XHR? Really a ECMAScript implementation of a feed reader should be entirely separate, which would allow you to do cool ECMAScript stuff, and not have something PHP-like (though admittedly you'd need a proxy to circumvent same origin restrictions on XHR, unless all feeds are served with whatever the suitable Access Control header is (which is also going to be used by XDR, as well as XHR Level 2). ~~ gsnedders | * If we're just using DOM as the internal data structure, what's the diff. from just using x-domain XHR? Really a ECMAScript implementation of a feed reader should be entirely separate, which would allow you to do cool ECMAScript stuff, and not have something PHP-like (though admittedly you'd need a proxy to circumvent same origin restrictions on XHR, unless all feeds are served with whatever the suitable Access Control header is (which is also going to be used by XDR, as well as XHR Level 2). ~~ gsnedders | ||
Line 69: | Line 69: | ||
==== Other Cool Ideas ==== | ==== Other Cool Ideas ==== | ||
* (I'm not sure where this should go, as I'm a first timer here, but ..) Really looking for good Atompub (Atom Publishing Protcol) support ... an API for building entries and POST, PUT, DELETing them from collections. -- lewen7er9 | * (I'm not sure where this should go, as I'm a first timer here, but ..) Really looking for good Atompub (Atom Publishing Protcol) support ... an API for building entries and POST, PUT, DELETing them from collections. -- lewen7er9 | ||
- | * Ability to cache favicons and content images to a third-party CDN service such as (leveraging the third-party toolkit for example). ~~ skyzyx | + | * Ability to cache favicons and content images to a third-party CDN service such as [[http://aws.amazon.com/s3|Amazon S3]] (leveraging the third-party [[http://tarzan-aws.com|Tarzan]] toolkit for example). ~~ skyzyx |
* I don't think something like this should be included at all. It simply requires far too much code to bundle. ~~ gsnedders | * I don't think something like this should be included at all. It simply requires far too much code to bundle. ~~ gsnedders | ||
* Perhaps as a non-standard module. If we have an (optional) module for parsing out images and favicons, caching them to S3 would be a simple matter of binding the two. ~~ skyzyx | * Perhaps as a non-standard module. If we have an (optional) module for parsing out images and favicons, caching them to S3 would be a simple matter of binding the two. ~~ skyzyx | ||
Line 75: | Line 75: | ||
* I would love to see a module that helps you update a database with feed items based on a set of feed urls (making sure that you are not entering the same items etc. I guess its similar to a database caching ~~ eb | * I would love to see a module that helps you update a database with feed items based on a set of feed urls (making sure that you are not entering the same items etc. I guess its similar to a database caching ~~ eb | ||
* I'd love for Simplepie to be able to parse larger RSS files than currently possible (larger than my memory). I understand this will mean it can't use a method that loads the whole DOM in memory. I however don't fully know the consequences of this. ~~PanMan | * I'd love for Simplepie to be able to parse larger RSS files than currently possible (larger than my memory). I understand this will mean it can't use a method that loads the whole DOM in memory. I however don't fully know the consequences of this. ~~PanMan | ||
- | * I'd like to see a way of differentiating which entries come from which feed when aggregating feeds. Say if I had an array with appropriate keys eg: ''$feeds['twitter'] = "' etc.. then I could access that ''$key'' in the loop to use as a CSS class name or to do some custom parsing/display of the feed item. ~~ sanchothefat | + | * I'd like to see a way of differentiating which entries come from which feed when aggregating feeds. Say if I had an array with appropriate keys eg: ''$feeds['twitter'] = "http://twitter.com/username/rss";'' etc.. then I could access that ''$key'' in the loop to use as a CSS class name or to do some custom parsing/display of the feed item. ~~ sanchothefat |
* It would be great to have support for CSS-sprites, e.g. for the favicons. If you have a site with, let's say, 90 feeds, there are a lot of HTTP-requests decreasing load-time. ~~ marcfalk | * It would be great to have support for CSS-sprites, e.g. for the favicons. If you have a site with, let's say, 90 feeds, there are a lot of HTTP-requests decreasing load-time. ~~ marcfalk | ||
- | * I understand the performance implications (i.e. However you can't do spriting on the fly, which means you'd have to pull the favicons and create a sprite ahead of time. You'd have to add this to your CSS/HTML manually, so I'm not sure how this relates to SimplePie. Clarification? | + | * I understand the performance implications (i.e. [[http://developer.yahoo.com/performance/rules.html#opt_sprites|CSS Sprites]]). However you can't do spriting on the fly, which means you'd have to pull the favicons and create a sprite ahead of time. You'd have to add this to your CSS/HTML manually, so I'm not sure how this relates to SimplePie. Clarification? |
* No I see. Well, I just need an opportunity to implement it, right now I cannot? You could add favposition to the array, so that when I e.g. call 'favicon' => '../images/source.png', I could also specify 'favposition' => '0px,-52px' ... I don't know. Anyway, as you say it probably doesn't relate that much to SimplePie, and it IS possible without integrating it. | * No I see. Well, I just need an opportunity to implement it, right now I cannot? You could add favposition to the array, so that when I e.g. call 'favicon' => '../images/source.png', I could also specify 'favposition' => '0px,-52px' ... I don't know. Anyway, as you say it probably doesn't relate that much to SimplePie, and it IS possible without integrating it. | ||
* Oh, you're talking about NewsBlocks. That is a completely separate demo from the SimplePie 2.0 core. | * Oh, you're talking about NewsBlocks. That is a completely separate demo from the SimplePie 2.0 core. | ||
Line 116: | Line 116: | ||
* ''rparman.interface.wordpress.php'' | * ''rparman.interface.wordpress.php'' | ||
===== Requirements ===== | ===== Requirements ===== | ||
- | * PHP 5.1.x (which includes iconv) | + | * PHP 5.1.x (which includes [[http://php.net/iconv|iconv]]) |
- | * PHP 5.2, please! What I was planning [[http://riovinh.wordpress.com|Mancing]] on doing would be very hard without PHP 5.2, and PHP 5.2 is //already// | + | * PHP 5.2, please! What I was planning on doing would be very hard without PHP 5.2, and PHP 5.2 is //already// wide-spread enough (heck, PHP 5.3 may be possible as a realistic requirement when SP2 ships). ~~ gsnedders |
+ | * [[http://php.net/pcre|PCRE]] (regular expression support) | ||
+ | * I'd rather require PCRE with Unicode support compiled in (which it has had by default for several years now). ~~ gsnedders | ||
+ | * [[http://php.net/domdocument|DOMDocument]] (better than SimpleXML at handling malformed HTML/XML markup) | ||
+ | * No, it's no better in terms of parsing stuff that isn't well-formed. They use the same parser. The issue is that SimpleXML has had behaviour changes within PHP 5.2, and also (used to?) vary on Windows/*nix OSes. ~~ gsnedders | ||
+ | * Raise the knowledge requirement to something sensible. People should already be able to know how to write and call simple functions, jump in-and-out of PHP blocks, write to the page (i.e. output buffer), define simple arrays (indexed and associative), know how to utilize existing constants, and understand the basic parent-child nature of objects and methods/properties. SimplePie is a toolkit for PHP developers. It needs to start acting like one. |
sp2/goals.1363315988.txt.gz · Last modified: 2013/03/15 02:53 by tonyguards