Wednesday, October 22, 2008

PHP Blog Aggregator

So here's a bit of what I'm working on for the backend of this year's Digital Arts BFA website.

It takes this xml (provided by my feed burner NewsLife) and dumps all the feeds into one very large xml file.

<blogs>

<outline text="Nina Pavlich" title="Nina Pavlich" description="" type="RSS" version="RSS" htmlUrl="http://www.ninalp.com/bfarts" xmlUrl="http://www.ninalp.com/bfarts/rss/"/>

<outline text="Zach Rose" title="Zach Rose" description="" type="RSS" version="RSS" htmlUrl="http://zachrose.tumblr.com/" xmlUrl="http://zachrose.tumblr.com/rss"/>

<outline text="Dominic C" title="Dominic C" description="" type="Atom" version="Atom" htmlUrl="http://dom4art225.blogspot.com/" xmlUrl="http://dom4art225.blogspot.com/feeds/posts/default?alt=rss"/>

<outline text="Joe" title="Joe" description="" type="Atom" version="Atom" htmlUrl="http://contempjoe.blogspot.com/" xmlUrl="http://contempjoe.blogspot.com/feeds/posts/default?alt=rss"/>

<outline text="Nathan Emerson-Verhoeven" title="Nathan Emerson-Verhoeven" description="" type="Atom" version="Atom" htmlUrl="http://nevpdx.blogspot.com/" xmlUrl="http://nevpdx.blogspot.com/feeds/posts/default?alt=rss"/>

<outline text="Sarah Moore" title="Sarah Moore" description="" type="Atom" version="Atom" htmlUrl="http://smoore5.blogspot.com/" xmlUrl="http://smoore5.blogspot.com/feeds/posts/default?alt=rss"/>

<outline text="Bryson" title="Bryson" description="" type="Atom" version="Atom" htmlUrl="http://gazzookabazookaz.blogspot.com/" xmlUrl="http://gazzookabazookaz.blogspot.com/feeds/posts/default?alt=rss"/>

<outline text="Daniel Strong" title="Daniel Strong" description="" type="Atom" version="Atom" htmlUrl="http://danielstrongdesign.blogspot.com/" xmlUrl="http://danielstrongdesign.blogspot.com/feeds/posts/default?alt=rss"/>

<outline text="Mac" title="Mac" description="" type="Atom" version="Atom" htmlUrl="http://macschubert.blogspot.com/" xmlUrl="http://macschubert.blogspot.com/feeds/posts/default?alt=rss"/>

<outline text="Travis" title="Travis" description="" type="Atom" version="Atom" htmlUrl="http://thelightisfading.blogspot.com/" xmlUrl="http://thelightisfading.blogspot.com/feeds/posts/default?alt=rss"/>

<outline text="Shawna" title="Shawna" description="" type="Atom" version="Atom" htmlUrl="http://shawna-x.blogspot.com/" xmlUrl="http://shawna-x.blogspot.com/feeds/posts/default?alt=rss"/>

<outline text="Lindsay AuCoin" title="Lindsay AuCoin" description="" type="Atom" version="Atom" htmlUrl="http://sheddingthequills.blogspot.com/" xmlUrl="http://sheddingthequills.blogspot.com/feeds/posts/default?alt=rss"/>

<outline text="Peter Baston" title="Peter Baston BFA 08" description="" type="Atom" version="Atom" htmlUrl="http://bastonbfa08.blogspot.com/" xmlUrl="http://bastonbfa08.blogspot.com/feeds/posts/default?alt=rss"/>

<outline text="Dustin Design" title="Dustin Design" description="" type="Atom" version="Atom" htmlUrl="http://dybevikda1.blogspot.com/" xmlUrl="http://dybevikda1.blogspot.com/feeds/posts/default?alt=rss"/>

<outline text="Andrew Parnell" xmlUrl="http://andrewparnell.blogspot.com/feeds/posts/default?alt=rss"/>

</blogs>


the code in question:

<?php

header ("content-type: text/xml");


$blogfeeds = array(array(),array());

$doc = new DOMDocument();

$doc->load( 'blogs.xml' );


$blogs = $doc->getElementsByTagName("outline");

$x=0;

foreach($blogs as $blog) {

$blogfeeds[$x]["path"] = $blog->getAttribute("xmlUrl");

$blogfeeds[$x]["title"] = $blog->getAttribute("text");

//array_push($blogfeeds[0], $blog->getAttribute("xmlUrl"));

//array_push($blogfeeds[1], $blog->getAttribute("text"));

$x++;

}


$out = new DOMDocument();

$out->preserveWhiteSpace = false;

$out->loadXML("<blogs/>");

for($i=0;$i<count($blogfeeds);$i++) {

$docBlog[$i] = new DOMDocument();

$docBlog[$i]->preserveWhiteSpace = false;




if(strpos($blogfeeds[$i]["path"],"blogspot.com")!==false)

{

$tot = new DOMDocument();

$tot->preserveWhiteSpace = false;

$tot->load($blogfeeds[$i]["path"].'&max-results=0');

foreach($tot->getElementsByTagNameNS('http://a9.com/-/spec/opensearch/1.1/', 'totalResults') as $openSeach_totalResults) $total = $openSeach_totalResults->nodeValue;

$blogfeeds[$i]["path"] .= "&max-results=".$total;


}


$docBlog[$i]->load($blogfeeds[$i]["path"]);

$docBlog[$i]->formatOutput = true;



foreach($docBlog[$i]->getElementsByTagName("channel") as $chan) {

//$chan->setAttribute("auth",$blogfeeds[$i]);

$in = $out->importNode($chan, true);

$in->setAttribute("auth",$blogfeeds[$i]["title"]);

$in->setAttribute("path",$blogfeeds[$i]["path"]);

$out->documentElement->appendChild($in);

}



//echo $docBlog[$i]->saveXML();

}


$out->formatOutput = true;

echo $out->saveXML();

?>




Notes:
Nina's feed seems to be home brewed, so everything she has is there. AWESOME!
Blogger feeds are by default limited to only the 25 latest entries. I subvert that by first opening them empty, so as to get the value from the openSearch:totalResults node, and then reopen them with the query string max-results=[that total] appended to them.
Tumbler seems to also truncate its feed...but I can't find any way around that (it also doesn't make any mention of this truncation.)

Still a work in progress.

No comments: