A Look at Online Information Databases
icekin — Wed, 2006/03/08 - 07:42
For the purpose of this document, a database referes to a site that contains a large collection of data in some form, like text, images, audio files etc. Nearly all databases are accessed through the WWW. Databases differ from a regular website in that they have a large volume on information on the subject. Most databases, excepting Media Sites and Online/ Digital Libraries are run by people or organizations who are authorities on the subject. Even information from libraries generally bear a good level of reliability.
Contents
5) Professional Organizations, Companies & Interest Groups
6) Media Sites
a) Drawbacks of Media for information
i) Audience Bias
ii) Political Bias
iii)Exaggeration
b) Independent Media & News Sites
7) Summary
8) Bibliography
Databases can be further classified as :
1) Online Libraries
Most online libraries are in fact brick and mortar libraries that maintain a website which is often linked to their Online Public Access Catalog (OPAC). This means that it becomes possible to search through all the library's books online. Some libraries have also made it possible to view the contents of the book online, if the book is out of copyright. If the book is an E-book, it is usually possible to download a copy for later use. It costs a time and money to create and maintain a database of all the books and other content in a library. Thus, only libraries with good finances and technical know how have been able to do this till now. There are probably several thousand small street side, university and county level libraries that have out of print books and other rare data which will never be digitized, let alone be put on the WWW. Efforts like the Million Book Project and Google Print are thus to be commended. Hopefully more of such projects will be started in other parts of the world, either through governments, companies or community efforts. With cheap hardware and free library management solutions now available, this is quite an achievable goal.
Several countries have at least one of these. Most national libraries have a website which lists a collection of the items in the library. If the item is in the public domain, it can be viewed online if the library has digitized it. When looking for detailed historical data about a country, this is the best place to head to. Old archives of national records and documents are usually maintained at these libraries. Several libraries are involved in efforts to try and make old, out of copyright (and print) data available for online viewing under the public domain. An example is American Memory by the LOC. The LOC also showcases certain events through the exhibition sections of their site. Some information such as images and original document scans found at the national library can't often be found anywhere else on the net.
After National and Public libraries in order of volume of data, but still significant nonetheless. Academic journals and whitepapers are best found here, especially if the author was from that university. Not all academic research papers succeed in being accepted into national and public libraries. E-Books and audiobooks may also be available for download. More university libraries can be found by links from the official university website.
Hosted at University of Michigan and has a good number of links to other sources in addition to containing information of its own.
2) Digital Libraries
These are libraries that are entirely digital. Many such libraries only exist on the net and are not actual buildings. Several brick and mortar libraries have however started digital library projects (e.g. National Digital Library) along with online library system explained in the previous section. Also see list of digital libraries and list of open content projects.
Run together with UNC-Chapel Hill, this is one of the best digital libraries online. Information available on several research topics. Data available as text, images, audio files and other formats. Quite of lot of the data is public domain, so its free for use in other places. Anyone who has public domain data (i.e. GNU FDL or CC licensed) can send in the data and it will be reviewed and added to the digital library.
Contains archives of several years of BBS posts from the pre-90s era. Interesting information one can find include several guides on hacking, cracking, phreaking, anarchy etc. Totse also contains a very detailed archive of BBS posts on ararchy activities and hacking. Some folks have told me that Totse has archived more BBS messages than Textfiles.
A database of purely FAQs and RFCs on Usenet, mainly related to technology is maintained at the Internet FAQ archives. FAQs and RFCs are ranked according to popularity and usage. The Etext Archives is another similar project that attempts to archive Usenet posts of lesser known groups like activists, politics and anarchy. However, with Google Groups acquiring Deja News, there is a freely accesible archive of all Usenet posts since 1981.
Ingenta is a digital library only specializing in academic publications. It catalogs over 100,000 academic and professional research papers, whitepapers adn other publications. Some of the data requires a paid subscription for viewing, but a large amount of it is also available for free viewing and downloading as .pdf. Another similar site indexing academic and scientific journals is the Directory of Open Access Journals.
The main objective is to create and maintain content related to the history of the Hartford African-American community. There is also a good world history section with over 9000 documents.
3) Ebook Repositories
E-Books are books that are distributed in electronic formats like .txt, .html, .xml or other formats. They can be read with the appropriate software, usually free. Several sites have attempted to make e-books freely available for download and use. Many digital libraries also hold e-books for free viewing and download.
The oldest project to make out of copyright works available in electronic format. While most of the content here is fiction (and good fiction), there are also a number of non-fiction works which I have found. There is also a Project Gutenberg Australia, which displays some other books that are still copyrighted in the US but in the public domain in Australia. Due to the open nature of the net though, anyone is able all the information. The works of George Orwell are one example of US copyrighted data that can be obtained from the Australian server for free. It would however be illegal to use the data if you were a resident in the US. You can assist Project Gutenberg through proofreading texts before they are added to the database. The University of Pennsylvania Online Books Page contains some books in common with the Project Gutenberg, but also some other works.
National Academies Press (NAP)
From the NAP web site : "NAP was created to publish reports issued by the National Academy of Sciences, the National Academy of Engineering, the Institute of Medicine, and the National Research Council. NAP publishes more than 200 books a year on a wide range of topics in science, engineering, and health, capturing the most authoritative views on important issues in science and health policy. The institutions represented by the NAP are unique in that they attract the nation's leading experts in every field to serve on their award-winning panels and committees. This is the right place for definitive information on everything from space science to animal nutrition."
Mainly a site that attempts to catalog and review free books. Reviews are also written by visitors and everyone is encouraged to review a book once they have read it. Links are provided to the books. Most books are either in .html or .pdf formats. The index is also browsable by subject and category.
Alex Catalogue of Electronic Texts
Documents on American, English literature and philosophy. All data is in the public domain. Some of the data is in common with Project Gutenberg. All files are downloadable as text and pdf.
An independent effort that has several books in common with project Gutenberg, but also some others. Books are viewable on PDAs and iPods as well. All books are free.
More of a project to find out if availability of free e-books encourages sales of paper books. Around 70 free e-books, mainly on science fiction category are available.
University of Virginia Etext Center Free E-books
The electronic text center has a goal to create a large repository of freely accesible text and images. The ebooks require a program that can read the palm reader format.
Some free e-books available here. All free books can be read online and searched, but there seems to be no way of downloading for personal use.
Also see end of this article for some sites that distribute free E-books. The Internet Archive also maintains an Open Library. TeleRead is a project to bring e-book access to the needy. They have published a short guide to e-books, which is worth reading.
4) University Resources
In addition to a university library, one can find large amounts of academic material in the form of lecture notes, quizzes and assignments at various course websites. Several major courses taught at universities in North America and Western Europe have course websites. In several cases, the course website displays content in an unrestricted manner. Some universities use Academic Learning Portals like Blackboard, WebCT and Moodle to distribute notes, hence restricting access of the notes to only those with a valid course login. On the other hand, some universities have even gone as far as openly licensing the academic material for distribution. Massachusetts Institute of Technology was among the first to do this with the OpenCourseWare program. Links to other such similar projects from other academic institutions can be found at the bottom of this page. If there is no clear Open Notes program like MIT's, you can try going to the specific department's course listing page. There will usually be a link to the sites of all the classes beign taught that semester. Another way is to scour the personal sites of professors in the department. Many of them usually post notes, past papers and other information on their sites.
One can also find academic articles through a specialized search engine like a scientific journal search engine. Good examples include CiteSeer and ArXiv. CiteULike, Connotea and Unalog, though technically social bookmarking services, are worth mentioning since their speciality is academic articles and resources. Certain private sites like Sparknotes and BookRags also provide notes and academic material, mainly related to literature.
5) Professional Organizations, Companies & Interest Groups
Several organizations and groups, either profit oriented or otherwise, form teams of specialists to make databases on various subjects. In some cases, these organizations may be the deciding groups that define the industry standards that everyone else follows. Examples include IUPAC for defining chemistry naming conventions and the IETF for creating and maintaining various internet protocols. In addition, some individuals also start information databases, maybe as a hobby at first, but find that it has grown to a large capacity and has become one of the main sites on that topic.
About writes guides on a large variety of topics and is an useful source of information, at least to get to other links. The guides are written by a wide group of people, some of whom are experts in that field.
Contains detailed statistics on every country. Widely used as a reference when looking for numbers to back up facts on a report. Most of the information seems to be taken from the CIA World fact book. Has good presentation like pie charts and bar graphs for all the statistics.
Database on all the known languages in the world, with links to other sources and guides for learning the languages.
Over a decade old and keeps a large databases of myths and folklore from all around the world. Includes both ancient and modern myths. Similar sites include Snopes and AFU Urban legends, both of which document several well known modern day myths and stories. Many modern day myths include internet hoaxes, mainly through email like the Nigerian Money Laundering incident. Sites like Quatloos keep archives of various hoax incidents including emails and documents received from scammers etc. Scambusters has useful information on protecting oneself from internet scam. Virus Myths keeps an eye on fake virus reports that are sometimes spread on the internet, labelling a file or program as a virus or trojan when it actually isn't.
Run by the Center for Media and Democracy; a think tank that focuses on public relationships. Think tanks are usually not deciding authorities on a public matter. Rather, they do research into large scale phenomenom and then report their findings, usually to governments. This makes them quite influential in the decision making process. Since they collect a good deal of information through research, think tank sites are also a good database of information.
Source Watch itself is an Open Content News site (actually a wiki), meaning anyone can add or edit data. The Center for Media and Democracy also publishes another quarterly called PR Watch, which is more of a news source.
Here is a list of other such think tanks and consultancies around the world.
6) Media Sites
These are news websites that keep tabs on new information, either in a broad sense or of a specific category, like technology news, politics etc. Media, in any form, be it radio, TV, print or the net, has always served as a first source of information. Media is usually what gets us to know about the issue before we go ahead and try to find out more related information out of curiosity. Media sites are also the best place to head to for news on ongoing events like current wars, natural disasters etc. Despite the wide audience that this source of information has, many do not realize the drawbacks in media sites.
a) Drawbacks of Media for information
i) Audience Bias
To begin with, the media can be and usually are biased. Journalists may not be biased themselves, but remember that news companies are also profit making corporations. This means that they need to sometimes provide the type of news that majority of their audience desires. Sometimes, they need support the view of this majority to strengthen their position as a people's news source.
ii) Political Bias
Not all countries have freedom of speech and a non regulated press. Even in a democratic country, political powers can always influence and control the release of news behind the scenes. Thus, journalists and news sites can't quite publish the naked truth, should they even be aware of it.
iii) Exaggeration
It would be wrong to say that the news lies. What they do is take a small truth, which in fact could be valid and then speculate on what else could have happened or might happen. Combined with subtle use of the right language, you have a hot news item, one that people will be gossiping on for a long time. This means that but for the astute and wise readers, few will be able to judge the level of accuracy of the article and value the information accordingly. They may simply jump to hasty conclusions.
The bottom line is that while media sites are a juicy read, don't use them as the absolute source of information for anything. Rather, the media sites are meant to get you interested and provide a nice introduction to the subject. The real data should then be found through information databases and search engines. Also see A Look at Searching to learn how to search the net for any kind of information.
Sometimes, media sites may be the only source of information on a topic, like in the case of current events. In such a case, at least draw the information from a number of media sources, both corporations as well as organizations like independent media sites (see below) and individuals (i.e. Blogs). Also remember to assess your information correctly and not elevate it to the accuracy of a factual article like an academic journal.
b) Independent Media & News Sites
These are generally not owned or backed by a big corporation or political party and hence will be less biased, at least with respect to political views. Note that I write 'less biased' because there can never possibly be a bias free news source. The news is still written by a journalist, a person who has undergone experiences throughout life, all of which can influence the standpoint from which the news report is written. Unless we get news reporting machines in the future, we will always have at least a small percentage of bias and opinion along with facts in every article. These last few lines themselves contain an opinion. Here is a more comprehensive list of independent media sites.
A media watch group that aims to reduce bias, erroneous reporting and censorship in the media. Nearly 20 years old and more focused on North American news, but also covers major world news like the War on Iraq etc.
This is one of the sites owned by the Independent Media Institute, different from Indymedia. Also publishes discussions on several topics.
Run by the Sonoma State University, Project Censored publishes news which was less covered or not reported by mainstream media services. The news is collected and compiled into articles by students and reviewed by faculty, experts and others. Majority of the news is North American focused. The project is nearly 3 decades old and well establised as an alternative news source.
More of an article source than an up to date news site. Focuses on internet related news and comes out in monthly issues. Anyone is free to cotribute articles which are then peer reviewed. All content is strictly assessed, thus the final polished articles are a great read and valuable information source.
More information and links to sites on specific topics can be found on Alternative Media and Alternative News Agencies under Wikipedia.
c) Open Content News Sites
These are similar to Independent Media since the content is not published by a big corporation. Anyone can submit news and see it appear instantly. The news is then fine tuned by edits from other members. This is analogous to a Wiki, with minor differences in edit policy. Some sections of the site may be written only by moderators, but rest of the news is entirely community contributed. Hence, lesser known news and incidents may be found here in good detail. The most outstanding example of such a site is Slashdot. Community contributed means the content may be inaccurate when initially published. However, articles are peer reviewed to a great deal so quality is somewhat maintained quite well.
Also more commonly known as IndyMedia. A well known alternative news source, but it also has its critics, especially with respect to its content editing policies. IndyMedia does a good job of covering local and international news. There are currently over 150 local IndyMedia Centers across the globe.
A community that focuses on technology and culture related issues. A great number of people have contributed articles here, maybe because it was perceived to be better than Slashdot in system of peer review and moderation.
A community driven site on politics and culture. Uses the same software as Slashdot to run the site.
This site provides a useful stop for information, mainly on acronyms like those found on the internet. Also contains extended information on several topics. Hyperdictionary, Acronym Search and Webopedia also give definitions of various terms. Such sites are often widely found on natural language and answer based engines' results.
7) Summary
Most of the Online and Digital libraries, Organizations and Open Content Media sites can be found through CompletePlanet, a site that classifies over 70,000 databases. This is logically the first place to start when searching the invisible internet. Databases themselves are classified, for the purpose of this document, into six types. Each type of database offers a different nature of information and corresponding level of credibility. One must ideally choose data from a variety of databases and cross check the facts, especially when using them for research purposes.
8) Bibliography
There is a List of World portals and Databases at the U.S. Library of Congress.
UNESCO has a good list of portals on various topics like digital libraries and Open Source Software.


