Your own wikipedia….



I’ve made quite a bit of use out of the wikipedia in recent years. I know it has it’s flaws (I’ve run across some first hand), but I’ve found typos in textbooks as well. However that doesn’t mean that it can’t be a very useful reference. In fact, in some of my browsing I’ve gone through the spanish language version of the wikipedia putting some of my spanish reading skills to the test. Anyway, in the last couple days I became curious for various reasons about actually downloading a copy and installing the wikipedia locally. Now, I know one of the benefits of the wikipedia is that it’s collaborative and this way I’ll miss out on current and changing/improving/updating articles. But I can see some reasons to want to have a “snapshot”.


The first reason I can think of is that for all our utilities, the cable internet is the most fragile it seems. Not that we’ve had major problems, but…. It would be nice to have that comprehensive a reference available “offline”. I feel like I go through google withdrawal every time we have an outage of that type. An offline wikipedia might be enough to tide me over. The real dream might be to have it on my laptop, so that no matter where I am, wireless connection or no, I’ve still got a good reference available…. anyway. It is possible.

Here’s what you need – web server with php/mysql support. (A basic LAMP setup (Linux/Apache/Mysql/Php) should do the trick.) Next, install MediaWiki into the a subfolder of the web. (install is fairly straightforward.) Next, it’s time to get some data. My first experiment was with the spanish language wikipedia, since there are fewer articles than the English language version (quicker download…) The database dumps are found at http://download.wikimedia.org/wikipedia/es/ (substitute the language shortcut for es to find another language (like en for english)…. Anyway, you’ll see a lot of database dumps there. I went for the most recent dated “pages_articles” download. Pages_articles have the basic articles, pages_current is the articles+discussion, pages_full adds the history of a page to that. For most, either pages_articles or pages_current will be your pick.

After downloading, on the machine that you’re importing the data on…. do the following….

bunzip2 -dc 20051211*.xml.bz2 | php maintenance/importDump.php

(In my case the spanish language download file was 20051211_pages_articles.xml.bz2, the importDump.php script will import the articles into the database. It took longer, I think, to import than to download, but it seemed to work well.

More information can be found here. And more information here.

Of course, there is another way to do it, that’s through a “static” html download of a “snapshot” of the wikipedia. Some of those are available here. It’s worth noting that the previous download link is for personal use only, not for redistribution of the static wiki’s.

This last, static html download, is probably the simplest way to have your own personal copy of the wikipedia for disconnected use. The English language version uncompresses to 29GB though from what they say, so don’t download if you’re cramped for space. (Download size (compressed) is between 1-2GB). The only problem I see with the “static” wikipedia downloads (other than size) is that they’re a bit further out of date (currently 2 months). (Images for the English language wikipedia are 75GB btw…)

Maybe they’ll release the wikipedia on either a dedicated external harddrive or on blu-ray disc or something for those that would like to hedge against connectivity outages.

Related Posts

Blog Traffic Exchange Related Posts
  • Cross browser javascript vulnerability It sounds like this vulnerability would take a great deal of user interaction, but cio-today is reporting on a browser vulnerability that affects pretty much every javascript enabled browser. According to Symantec .... "This issue is triggered by utilizing JavaScript 'OnKeyDown' events to capture and duplicate keystrokes from users," and......
  • Remote tech support with anything - would I do it? I've tried to ask myself if I'd trust someone enough to let them run a remote session on my own desktop to solve a problem. I think the answer is "it depends". If you think about it, I do tech support for home users quite a bit and they let......
  • Wikipedia download revisited A while ago I wrote about various ways to have an offline copy of the wikipedia. Yes, I know there are credibility issues with wikipedia, but taken with a few grains of salt and objective thinking it is one of the single most useful references online. (I don't count search......
Blog Traffic Exchange Related Websites
  • eBook Fishing in California The Complete Guide to California Fishing Download Your 32 Page FREE eBook Are you planning a vacation to California? Looking for a better way to fish the more than 1000 lakes throughout this state? You'll find everything you need to know inside The Complete Guide to California Fishing! We've......
  • What’s Wrong With Blogging – Blogger’s Perspective Yesterday, I looked at reader's take on What's wrong with blogging - a question asked by Darren Rowse on his blog. Today I want to write about the aspects of blogging which I feel are still undeveloped. Few years ago, blogging was unheard of and there were only few hobbyists......
  • Finovate Startup 2009 Live Twitter I am attending Finovate Startup 2009. It is an action packed 1 day format which I belive will lend itself perfectly to a live twitter. Stay tuned it should be an amazing day. http://twitter.com/BlogTrafficExch In the break I have scheduled some talks with SimplFi, Mint, and Calendar Budget. I am......
www.pdf24.org    Send article as PDF   

Similar Posts


See what happened this day in history from either BBC Wikipedia
Search:
Keywords:
Amazon Logo

Comments are closed.


Switch to our mobile site