Dear Lazyweb: scrape video from IO9?

Dear Lazyweb: The folks at IO9 have some great videos, but they're not presently embeddable -- while they're waiting for the next rev of their player, I need a way to embed them here on BB. I'm looking for a simple way of scraping and reposting (or simply embedding) the videos they post there. Cross-platform is great -- but I need Ubuntu-friendly tools if I'm gonna be able to use it. Kibbitz in the comments! Link

Discussion

Take a look at this

I dunno if this will work for you, but I was able to grab it to my desktop using Safari 3.0.4 in Mac OS X 10.4.11.

Open the IO9 link in a Safari window. Command + Option + A opens the Activity window. View these two windows side-by-side.

When you hit the play button back in your main browser window, you'll see a file name incrementing its file size as the .flv video streams and loads into your main window. Total size will be about 17.6MB.

In the activity window, double click on the name of this file: "http://io9.com/5013523/when-humans-punch-aliens-the-video-remix"

Safari will open a new window, and that file will appear there as raw text/code. Go to Safari's file menu and choose "Save as..." Ditch the ".txt" file extension so that you're saving the file as "knockuuout.flv."

With the free Perian codec/plugin for QuickTime, it'll load and play in a QT player. You can get that here: http://perian.org/

I'm playing it right now from my desktop as I compose this message.

Take a look at this

My apologies, and can't edit my first message.

The file name you want to look in Safari's activity window for was supposed to paste in my message as:

"http://cache.io9.com/assets/video/knockuuout.flv"

Take a look at this

haven't tried this but using Firefox and the 'unplug' add-on/extension may work.

Take a look at this

Most flash video players end up dumping the .flv in a tempfile (/tmp/FlashXXXXXX, at least for youtube vids; that might be a Firefox or Flash naming convention) and just play that.

Once the video's completely downloaded, you can just grab that flv and play it in mplayer, totem, whatever, and transcode it as easily. Then you can upload it to another, more accessible service, probably lossily.

Reposting is likely illegal unless the video's license explicitly allows it.

Take a look at this

Hmm..

I tried looking at the pages source, and found the still image in the article referenced from "http://cache.gawker.com/assets/stills/knockuuout.flv.jpg"

So then I did a google (using the site:/inurl: modifiers) for the gawker video directory.

Getting: "http://cache.gawker.com/assets/util/videoModule.swf?videoURL=blahblahblah.flv"

And then inserted the original video reference name to give: "http://cache.gawker.com/assets/util/videoModule.swf?videoURL=knockuuout.flv"
__

Weirdly, when I tried Phos' method it worked too.. how does that work, with two different sources?

Take a look at this
#6 posted by Kent , June 9, 2008 10:52 AM

Maybe VodPod.com could do it?

Take a look at this

While I realize, the swf will reference the flv (and that could explain the two different urls), when I actually watch both videos in the activity window, the flvs are downloading from those two different urls.. I would expect the flv to resolve to the actual source if it was referenced.

Take a look at this

I just tried the "unplug" add-on/extension/whateverthehellthey'recallingthemnow thingy with Firefox for a few minutes, and it doesn't seem to find the actual .flv file, just HTML containers and 4k pointers to the actual media.

Either way, thanks for the heads up on that, Lennox.

Take a look at this

Yeh, it definitely has two urls, this is the resoloution of the swf refernce file I mentioned above:

http://cache.gawker.com/assets/video/knockuuout.flv

And Phos'

http://cache.io9.com/assets/video/knockuuout.flv

..take your pick Cory :)

Take a look at this

One nice Firefox add-on that is handy anyhow. This will easily let you get the full FLV url.
https://addons.mozilla.org/en-US/firefox/addon/966

Take a look at this

On a moral note.. Is embedding the same as hot-linking, if you aren't given express permission (eg. an "embed" link on the page) to do it?

I have no clue, but it sounds the same.

Take a look at this

@ #4: Then you can upload it to another, more accessible service, probably lossily.

How do you figure lossless? Every free service I can think of (YouTube, Blip, Vimeo, Revver) automatically transcodes uploaded files. Transcode = compression loss. Still, this method will definitely work, it's just clumsy.

Take a look at this

We have developed a content syndication tool that will effortlessly allow the sharing/embedding of these videos.

If anyone is interested, please contact me at:

a s t a n h o p e AT g m a i l dot c o m

Take a look at this
#15 posted by Takuan , June 9, 2008 11:50 AM

does that constitute "guerrilla marketing"?

Take a look at this

@#13
Had he meant lossless, he'd likely have said losslessly instead of lossily. ;)

Take a look at this

If you know the URL of the .flv file (which seems to be the case here), you can embed it using FreeVideoCoding's FLV player, which is just a SWF file that can be fed the FLV's address right in the URL.

For example, to play

http://airshowfan.com/fencecheck/RedBullAirRaces.flv

you embed

http://freevideocoding.com/flvplayer.swf?file=http://airshowfan.com/fencecheck/RedBullAirRaces.flv&autoStart=false

such as by writing

src="http://freevideocoding.com/flvplayer.swf"
width="640"
height="500"
flashvars="file=http://airshowfan.com/fencecheck/RedBullAirRaces.flv&autoStart=false"/>

Of course, you can choose the FLV player of your choice,

src="http://www.jeroenwijering.com/embed/mediaplayer.swf"
width="640"
height="500"
flashvars="file=http://airshowfan.com/fencecheck/RedBullAirRaces.flv&autoStart=false"/>

which embeds

http://www.jeroenwijering.com/embed/mediaplayer.swf?file=http://airshowfan.com/fencecheck/RedBullAirRaces.flv&autoStart=false

And yes, this is the same as hotlinking. By that I mean; If you don't have permission (or if the video is not Creative Commons, etc) it is evil and of dubious legality.

Downloading the FLV, uploading it to a site like YouTube, and embedding THAT, is not any less evil or illegal. I say, if you've decided you're going to embed the video, at least embed the original (hotlink) so that the creator/host can see where his viewers are, rather than having multiple untrackable copies floating around.

Take a look at this

@16
oops. You are wise and your reading comprehension kung fu is strong. Though you have to admit, "lossily" is a silly word. Thanks for correcting me niceily.

Take a look at this

#13
He doesn't even need to reupload/encode if it's just embedding he wants to do, and hot-linking isn't an issue (like with youtube).

Take a look at this

Apparently the start of my HTML code got auto-baleeted. There should be a [smaller-than sign]EMBED[space] before each of the two SRC=...

Take a look at this

Let me try my own idea with the URL you guys have:

With jeroenwijering.com's player;

http://www.jeroenwijering.com/embed/mediaplayer.swf?file=http://cache.io9.com/assets/video/knockuuout.flv&autoStart=false

[smaller-than sign]embed
src="http://www.jeroenwijering.com/embed/mediaplayer.swf"
width="640"
height="500"
flashvars="file=http://cache.io9.com/assets/video/knockuuout.flv&autoStart=false"/>

With freevideocoding.com's;

http://freevideocoding.com/flvplayer.swf?file=http://cache.io9.com/assets/video/knockuuout.flv&autoStart=false

[smaller-than sign]embed
src="http://freevideocoding.com/flvplayer.swf"
width="640"
height="500"
flashvars="file=src="http://freevideocoding.com/flvplayer.swf"
width="640"
height="500"
flashvars="file=http://cache.io9.com/assets/video/knockuuout.flv&autoStart=false"/>

Take a look at this

Messed up that last one, pasted the flashvars line twice and... Ah, never mind, you get the idea.

Take a look at this

@ airshowfan:

Let's see if the character entity '<' will work:

width="640"
height="500"
flashvars="file=http://airshowfan.com/fencecheck/RedBullAirRaces.flv&autoStart=false"/>

width="640"
height="500"
flashvars="file=http://airshowfan.com/fencecheck/RedBullAirRaces.flv&autoStart=false"/>

Magic PreviewBall says "YES!"

:o)

Take a look at this

Aw, crap.

Pre-submission Preview showed the code perfectly.

Actually posting it killed the character-entity coding.

Sorry for the mess.
I loathe inconsistencies like that.

Take a look at this

You guys are silly, with your third-party extensions. If you are going suggest the use of any extension, it might as well be the DOM Inspector.

But fuck that noise.

Highlight the video in the standard way by click-and-drag. Start selecting in the video description text block at "Sure, we come in peace. . ." and move backwards to highlight the video. To make sure the video is highlighted, you can include some of the text in the heading ("When Humans Punch Aliens: The Video Remix") in the selection. Bring up the context menu (usually right click) and select "View selection source." Then, you can just copy the embed element straight from the source. Make sure you modify the src param to include the io9.com domain.

That should give you:

<embed type="application/x-shockwave-flash" src="http://io9.com/assets/util/videoModule.swf" style="" id="videoPlayer_knockuuout" name="videoPlayer_knockuuout" bgcolor="#000000" quality="best" scale="noscale" salign="tl" allowscriptaccess="always" flashvars="videoURL=knockuuout.flv&amp;permalink=undefined&autoplay=undefined&stageWidth=506&stageHeight=423&waterMarkImageURL=" height="423" width="506">

Sorry if that breaks H-scroll for anyone.

Aside: "[Y]ou may use HTML tags for style" is incredibly misleading. And why does the preview modify the contents of the comment in the textarea? That breaks things, e.g., if you use HTML entities, like &gt; to properly escape your HTML, the comment preview evaluates those strings and replaces those with their respective representations—as it should—but also modifies the contents of the textarea to use those evaluated entities, e.g., "&gt;" is evaluated and replaced with a literal ">". So if you preview like any responsible commenter, and then POST without rechecking the contents of the textarea, what you post doesn't match your preview.

Take a look at this

Find your best screencast software for Ubantu (I use a Mac and use iShowU for the most part), and just set the size to fit over the video and hit record.

As others have said, this will be a transcoding issue and have a loss of quality, but it is quick and dirty. From there, drop it into whatever you want...I've had to use this software to get movies into my phone when others 'fairuse' didn't allow for it, or so that I could get the crap up for a class (of which, I'd had to do enough of that I ended up working with the techies at my university to study ways of transmitting the video: http://www.iupui.edu/~nmstream/ ...I should write a tutorial for educators on how to rip this stuff using various OSs).

But screencasting and reposting on your own server (or youtube if they don't get all pissy about it), is by far the easiest and most nontechnical way.

Take a look at this

Hoo-ray, Phos noticed it as well.

Seriously though, why is everyone going through so much effort to do this? Downloading extensions? Capturing the URL? Then rewriting embed elements with the src pointing to that URL? Just copy it from the source.

Also, I should have added a disclaimer that the previous method only works in Firefox, which I'm guessing Cory is using ("on Ubuntu"). Safari (for those on Mac and Windows machines), Opera, Konqueror, etc. may or may not have similar features accessible through different means.

Take a look at this

#26
Noooooooooo!!!!

dude, that is a serious ghetto-rip.

Just d/l if you want to keep it, or hot-link/embed if you just want to show it.

Take a look at this

Dang it, how did Throne347 get the smaller-than sign? I even tried "ampersand LT;" and "ampersand number sixty;"...

And while I'm here, I should also mention that you can download those swf-based flv players (or make your own) and/or the flv files themselves, put them on your server, and thus embed the video in a self-contained way that does not rely on other people's servers playing nice.

(I tend to like downloading the FLV and putting it on my server, but embedding the SWF-based player right off of some other server. Hmmm, I may have just confessed to copyright infringement...)

And now that I think about it, maybe the question to start with was what tool to use to FIND the url of the flv file, in the cases when it's neither clear from looking at the html code nor formulaic from the url of the page where the video is shown. I must admit I only know of Windows solutions for the former case, and for the latter I use web-based solutions like KeepVid...

Take a look at this

@ arkizzle

"Dude, that is a serious ghetto-rip."

You are right...but quite a few systems don't allow for hotlinking or embedding -- which is what the problem is. Most of the time, I do a screen capture of the video player if I'm putting this on my own blog and a fakeout link to go into the other with the reader thinking they are actually going to see it embedded.

But yeah, it is a ghetto-rip...and it works :-)

Take a look at this
#31 posted by arkizzle , June 9, 2008 1:13 PM

Shit, I didn't mean to sound "above" ghetto tactics, I've been there a-plenty :)

I just meant, in this case, there are flvs and swf links found, so no need to go the extra mile. But when there's no other option.. do it!

Take a look at this
#32 posted by Phos.... Author Profile Page, June 9, 2008 1:27 PM

@airshowfan in #29:

"And now that I think about it, maybe the question to start with was what tool to use to FIND the url of the flv file, in the cases when it's neither clear from looking at the html code nor formulaic from the url of the page where the video is shown."
A'yup, there's the rub. Before I learned the Safari/Activity Window trick a couple weeks ago, the trick was finding the actual file name, especially if it's obfuscated, either intentionally, or because of CMS renaming to something that is humanly-inscrutable.

Since I learned the Safari/Perian trick, I've been a .flv downloadin' foo'!

Take a look at this
#33 posted by chip , June 9, 2008 1:56 PM

Whenever I need to embed an unfriendly video in my tumblog, I grab it with the Firefox extension Video Downloadhelper: http://www.downloadhelper.net Then I upload it to an embedable video sharing site that doesn't pay close attention to things like copyright (using livevideo.com at the moment). It's a multistep process, but it has yet to fail me.

This is probably better than hotlinking directly to the video, which will probably be pretty obvious to any observant site admin, and may earn you some bad mojo.

Take a look at this
#34 posted by mlennox Author Profile Page, June 9, 2008 3:55 PM

I'm thinking some *nix guru with sed and/or awk skills could whip together a script to parse the html from the page, find the EMBED tag (with attributes 'type="application/x-shockwave-flash"' and 'src="/assets/util/videoModule.swf"', the kernel of what we need is then in the 'flashvars' attribute of the same tag.

We are looking for the 'videoURL' key in the list of name-value pairs - grab the value and paste it into the string below and pipe/append it to a file:

http://cache.gawker.com/assets/video/{filename}

and Bob's your uncle

Anyone with the requisite skills?

Take a look at this
#35 posted by pjcamp , June 9, 2008 7:11 PM

Use one of the many Firefox plugins designed for this exact purpose. Download Embedded, for example.

Take a look at this

Does anyone know who invented the 'lazyweb'? Was it Jamie Zawinski? He certainly seems to use it frequently.

Take a look at this
#37 posted by mlennox Author Profile Page, June 10, 2008 3:02 AM

Cory, I know you use Linux, so:

Cut and paste the following into a .sh file:

wget -O ~/tempfile $1

grep newVideoPlayer\( tempfile | sed 's/newVideoPlayer("\(.*\.flv\)".*/http:\/\/cache.gawker.com\/assets\/video\/\1/' > ~/tempfile

wget -i ~/tempfile

rm ~/tempfile


then save the file, and from the command line do:

chmod +x yourfilename.sh

Now to grab a video, all you have to do is:

./yourfilename.sh http://io9URLtovideopage

This will download the flv file to the directory you are in currently

It's not the best solution - but it works!

Take a look at this

Doesn't this mean i09 will be paying for the bandwidth without getting the ad click-throughs that probably pay for their site?

I thought this sort of thing was kind of uncool, but BoingBoing isn't known for uncoolness (more like the opposite), so... what am I missing?

Post a comment

Anonymous