SimplePie Developer Weblog.  Not that we really have anything to say, but if you'll listen, why not?

How can SimplePie’s API improve? 14 Mar 2008 

[Development]

SimplePie is a tool that I use nearly every single day, in nearly every single project I work on. I use it partially because I work on it, and partially because I really believe that it’s the best tool for the job (when that job is RSS/Atom parsing). At the same time I know that there are things that I do that are very different from what other people are doing with SimplePie, such as processing thousands of feeds at a time, building web-based aggregators, building start pages (a la PopURLs, Original Signal, and others), and doing all sorts of other things that I may or may not even know about.

I know that Geoffrey has talked a bit about SimplePie 2.0’s planned modularity (keeping the fetching, parsing, caching, and API components separate but being able to load them when necessary, or even be able to swap in your own components), and we’d like to see SimplePie 2.0 be better commented and slimmer code-wise, opting to move some of the more superfluous functionality into helpers and other outside classes.

So my big question is: what can we do to make your job easier as it pertains to SimplePie? I know that there are some cool things that we’ve talked about for SimplePie 2.0, but there also things coming in SimplePie 1.2 that are cool. I’ve been thinking about the kinds of things that would make things easier for me and how we could bundle them as on-demand “helpers” instead of necessarily building them into the core (like how our Internationalized Domain Name support is separate).

The first (very simple) thing that comes to mind is a function that that shortens text (e.g. titles and descriptions). This is something that has been asked for hundreds of times and we’ve got some sample code in the wiki for it, but what if we could bundle helpers like this in an on-the-side fashion? What kinds of things would you like to see? What kinds of tasks do you find yourself doing over and over that the community might be able to benefit from?

Perhaps people parsing thousands of feeds might like to have (or contribute) their parsing scripts and/or cron jobs? Perhaps people building feed aggregators would like to see improved HTTP status code messages to know whether they should update a feed URL in a database? Perhaps it’d be nice to have some software-specific helpers for Drupal or CodeIgniter?

Fire away! :)

Posted by Ryan Parman at 5:21 pm.

Comment by Michael Shipley 14 Mar 2008 at 9:13 pm 

Gravatar

Anything that gives the SP user more control and granularity over how SP does it’s thing is a very good idea if you ask me. One example is the “set_image_handler” function. I’d like to see that put in a separate class so I could have more control over it because right now its seems to be an all or nothing proposition. All the images in an item are fetched and cached when you do a “get_content” even if you end up not displaying most of items. I’m actually forced to write my own custom image caching right now because of the lack of control. I can’t just override the image caching class because — there is none.

So yes I agree completely that modularization is the way to go. Let the user decide what pieces of the ‘Pie they want so they can mix and match or at least control it better. I think SP should focus on being more of a framework that allows you to easily plug in already best of breed modules dont you think? Why do everything from scratch when its already been done better ages ago? This is what open source is or should be all about isn’t it? Take the best code from here, the best code from there? Like the emperor said: “Focus! It makes you strongga! Muahaha!” Do one thing and do it well as they say. For instance, I prefer htmlpurifier to SP’s built in html sanitizer. But the sanitize class tries to do so many things you cant just override it easily as far as I can tell.

What is SimplePie all about? What is it best at? SP is best at making RSS feeds easy to get, parse, and display. Anything else, like caching files, sanitizing html, could be done by other modules that are the already best in their class. SP should just be the glue that binds them all together. IMHO.

Permalink

Comment by Geoffrey Sneddon 15 Mar 2008 at 6:50 am 

Gravatar

I’ve been thinking about the kinds of things that would make things easier for me and how we could bundle them as on-demand “helpers” instead of necessarily building them into the core (like how our Internationalized Domain Name support is separate).

FWIW, there won’t be any need to keep the IDNA support in SP1.2, as the new IRI class (which provides all kinds of things dealing with IRIs all over the place).

The first (very simple) thing that comes to mind is a function that that shortens text (e.g. titles and descriptions).

I’m sure you, Ryan, know my opinion of this well: The only way to properly shorten HTML is to use an HTML parser (as we end up needed all kinds of mad things to deal with fallback content within object elements, for example).

Perhaps people parsing thousands of feeds might like to have (or contribute) their parsing scripts and/or cron jobs?

I still don’t really want to even really suggest using SP for such things: it really isn’t suited to such huge numbers of feeds, due to the speed of PHP. Even something like Python (and Universal Feed Parser) is far more suited (IMO, by then it is quick enough for the network to be the bottleneck, unless you have an insanely quick connection).

Perhaps people building feed aggregators would like to see improved HTTP status code messages to know whether they should update a feed URL in a database

To fully do this, we need also to be able give a new SimplePie::subscribe_url() — or to return null when we get 410 Gone. This is certainly something I want to see, but implementing this is far from simple: the current HTTP support can’t cope with it, and nor can most PHP implementations. I don’t really want to even attempt to try and redo the support until http-parsing has at least a -01 draft published. I don’t particularly think that any aggregator needs to actually know the specific status code. Of course, if anyone thinks otherwise, feel free to say.

Why do everything from scratch when its already been done better ages ago? This is what open source is or should be all about isn’t it? Take the best code from here, the best code from there?

Part of the reason why so much code was written for SP itself was that although it is easy to find code to do such things, most of it really doesn’t work very well: for example, the only decent working HTML sanitiser was kses; HTTP tends to be very very badly implemented (SP’s is far from brilliant, too). The only other thing is the desire for SP to work without needing to have loads of things thrown together to get some basic parsing.

But the sanitize class tries to do so many things you cant just override it easily as far as I can tell.

Splitting it up would be a huge effort, though. Part of me would much prefer to just wait until SP2 before having it split up… Though if someone else splits it up I won’t complain :P

Permalink

Comment by Ryan Parman 15 Mar 2008 at 1:46 pm 

Gravatar

Great feedback so far guys. We’re not making any definitive decisions here just yet, but we’re eager to hear back from more people about what, in your opinion, could be better or easier. Keep it coming! :)

Permalink

Comment by Ryan McCue 16 Mar 2008 at 12:28 am 

Gravatar

I’d like to see some way of intercepting the DB cache writes, for example, to have a moderation queue of items from each feed (one of our requested features).

I don’t particularly think that any aggregator needs to actually know the specific status code.

I have to agree with Geoffery on this one. I think that the aggregator should just check to see if subscribe_url() equals the old one and if not, to cache the new one instead. They shouldn’t have to check status codes themselves, unless they want to.

What is SimplePie all about? What is it best at? SP is best at making RSS feeds easy to get, parse, and display. Anything else, like caching files, sanitizing html, could be done by other modules that are the already best in their class. SP should just be the glue that binds them all together. IMHO.

Agreed also. As another example, the social sharing URLs should be in a module, as most people aren’t going to use them.

Permalink