Pathways To Learning... Since 2005 Hong Kong Registered School 566985 & 600733

SEARCH THE DEEP WEB (INVISIBLE WEB/HIDDEN WEB)

Search Engines for Academic Research

A great selection of academic and scholarly research tools to get results not displayed by mainstream search engines.

Search engines work by indexing pages. They basically scan the web using a “spider” tool and create a huge index of what is out there. When you search, they choose the selection you see from this index - often referred to as the surface web. But many pages are not visible to the spider and so are hidden. This is what we mean by the Deep Web (this is different from the “Dark Web” which is deliberately concealed content).

The deep web is 400 to 500 times larger than the surface internet according to some estimates (www.worldwidewebsize.com) which means there could be over 1,800 BILLION hidden items of content.

It makes sense then that the deep web is an excellent potential source of information for those doing academic research and data mining.


Quick links:

[Multidisciplinary web portals/search services]   [Multidisciplinary websites that provide access to research]  [Digitized books]


Subject/discipline Specific Deep Web Resources

[Art & Design]  [Business]  [Law and Politics]  [Medical and Health]  [Science and Technology]  [Video & Audio Resources ]


What makes up the Deep Web?

  • Private websites - such as VPN (Virtual Private networks) and sites that require passwords and logins.
  • Contextual web pages - with content varying for different access contexts (e.g., ranges of client IP addresses or previous navigation sequence).
  • Limited access content sites - these limit access in a technical way, such as using Captcha, Robots Exclusion Standard or no-cache HTTP headers, which prohibit search engines from browsing them and creating cached copies.
  • Unlinked content - without hyperlinks to other pages which prevents web crawlers from accessing information.
  • Non html/Textual content - often encoded in image or video files or in specific file formats not handled by search engines.
  • Dynamic content - created for a single purpose and not part of a larger collection of items. These pages are returned in response to a submitted query or accessed only through a form, especially if open-domain input elements (such as text fields) are used; such fields are hard to navigate without domain knowledge.
  • Scripted content - pages only accessible using Java Script, as well as content downloaded using Flash and Ajax solutions.
  • Software - certain content is intentionally hidden from the regular Internet, accessible only with special software, such as Tor, I2P, or other darknet software. For example, Tor allows users to access websites using the .onion server address anonymously, hiding their IP address.
  • Web archives - web archival services such as the Wayback Machine enable users to see archived versions of web pages across time, including websites which have become inaccessible, and are not indexed by search engines such as Google.

What is available for Academic Research?

There are many high-value collections to be found within the deep web. Some of the material found there that most people would recognize and, potentially, find useful include:

  • Academic studies and papers
  • Blog platforms
  • Pages created but not yet published
  • Scientific research
  • Academic and corporate databases
  • Government publications
  • Electronic books
  • Bulletin boards
  • Mailing lists
  • Online card catalogues
  • Directories
  • Many subscription journals
  • Archived videos
  • Images

Searching the Deep Web


Don’t just accept Paywalls

If you find content that you think may be useful but it is behind a paywall, don’t just assume that you have to pay. A lot of content is published in multiple places and you may find that the content is available in an open-access format elsewhere. The paywall options come up first because those earning from the paywall make the effort to get their site to come up better in search engine results.

A free extension for Google Chrome  “Google Chrome browser extension - Unpaywall searches the web for a FREE version of the content protected by the paywall.


Multidisciplinary Web Portals and Academic Search Services that give access to the Deep Web.

Academic Index – a scholarly academic search engine accessing only websites previously selected by librarians, teachers and library and educational consortia.

Archive.org – the Internet Archive is a San Francisco–based nonprofit digital library with the stated mission of "universal access to all knowledge." It provides free public access to collections of digitized materials, including websites, software applications/games, music, movies/videos, moving images, and nearly three million public-domain books. Its web archive, the Wayback Machine, contains over 308 billion web captures.  The Archive also oversees one of the world's largest book digitization projects.

ArchiveGrid - includes over 5 million records describing archival materials, bringing together information about historical documents, personal papers, family histories, and more. With over 1,000 different archival institutions represented, ArchiveGrid helps researchers looking for primary source materials held in archives, libraries, museums and historical societies.

BASE - is one of the world’s most voluminous search engines especially for academic open access web resources. BASE provides more than 120 million documents from more than 6,000 sources. BASE is operated by Bielefeld University Library, Bielefeld, Germany.

Chabot College - Chabot Librarians have created a custom search engine that only searches quality websites.

Directory of Open Access Journals (DOAJ) - a community-curated online directory that indexes and provides access to quality open access, peer-reviewed journals. It is focused on providing access only to those journals that employ the highest quality standards to guarantee content. They are presently a repository of 9,740 journals with more than 1.5 million articles from 133 countries.

Education Resources Information Center  (ERIC) - is an online library of education research and information, sponsored by the Institute of Education Sciences (IES) of the U.S. Department of Education.

FindArticles.com – has articles from about 500 periodicals with coverage back to 1998, and is completely free of charge. Funded by CBS.

FOIA Data – search the US Freedom of Information data base.

Google Scholar - is a freely accessible web search engine that indexes the full text or metadata of scholarly literature across an array of publishing formats and disciplines.

GovInfo - provides free online access to official publications from all three branches of the US Federal Government.

Journal Seek  - (Genamics) promotes itself as “the largest completely categorized database of freely available journal information available on the internet,” with more than 100,000 titles currently. Categories range from Arts and Literature, through both hard- and soft-sciences, to Sports and Recreation.

JSTOR – (short for Journal Storage) is a digital library founded in 1995. Originally containing digitized back issues of academic journals, it now also includes books and primary sources, and current issues of journals. It provides full-text searches of almost 2,000 journals.

LibGuides – LibGuides resource searches the knowledge from librarians at thousands of institutions worldwide, and is a valuable resource for anyone doing research.

Library Genesis / LibGen - search engine for scientific articles and books on various topics which allows free access to content that is otherwise paywalled or not digitized elsewhere. Based in Russia, this is the largest and longest running currently openly available collection. LibGen have several initiatives: i. over 1.5 million files of mainly non-fiction ebooks, ii. an equivalent number of mainly fiction ebooks, iii. +20 million papers from journals of science, history, art etc., iv. comics, magazines and paintings; totally amounting to at least 100 TB -- easily the Library of Congress of the digital world.

Library of Congress – The library has been digitizing all items held in it.  The library has been collecting documents and books for the last 200 years.  An excellent source for academic research.

Microsoft Academic Search - is a free public web search engine for academic publications and literature, developed by Microsoft Research. Re-launched in 2016, the tool features an entirely new data structure and search engine using semantic search technologies. It currently indexes over 375 million entities,170 million of which are academic papers. The Academic Knowledge API offers information retrieval from the underlying database using REST endpoints for advanced research purposes.

Open Access Journals Search Engine  (OAJSE) - The Open Access Journals Search Engine (OAJSE) service covers free, full text, quality controlled journals. Aim to cover journals in all subjects that are published in English language. There are over 4,500 journals in the directory.

OpenDOAR -  is an authoritative directory of academic open access repositories. Each OpenDOAR repository has been visited by project staff to check the information that is recorded here. This in-depth approach does not rely on automated analysis and gives a quality-controlled list of repositories.

Reference Repository - is a comprehensive resource for article and book references. It is a digital platform that holds research output and provides free, immediate and permanent access to research results for anyone to use, download and distribute.

RefSeek - academic search engine for students and researchers. Locates relevant academic search results from web pages, books, encyclopaedias, and journals.

The Internet Public Library - (ipl and ipl2) is a non-profit, student-run website at Drexel University. Students volunteer to act as librarians and respond to questions from visitors. Categories of data include those directed to Children and Teens.

Smithsonian Institution Libraries — 20 libraries from museum complexes around the world.

The Multidisciplinary Digital Publishing Institute  - based in Switzerland,  a publisher of more than 110 peer-reviewed, open access journals covering arts, sciences, technology and medicine.

UC Santa Barbara Library - offers access to a diverse group of research databases useful to students, researchers and the casual searcher. It should be noted that many of these resources are password protected.

USA.gov - offers access to a huge volume of information, including all types of forms, databases, and information sites representing most government agencies.

Virtual LRC - The Virtual Learning Resources Center, facilitates high school and community college students in their search for quality information for school and college academic projects. The Virtual LRC InfoBot searches excellent general information Web sites with just one search, including About.com, InfoSeek Web Guide, Librarians' Index to the Internet, Big Hub, Smithsonian, the LibrarySpot and an increasing number of university library Internet subject guides.

Voice of the Shuttle - (VoS) offers access to a diverse assortment of sites, including literature, literary theory, philosophy, history and cultural studies.


Websites that provide access to research

CIA Factbook — reference materials containing information on every country in the world.

DataBank – A service from the World Bank. DataBank is an analysis and visualisation tool that contains collections of time series data on a variety of topics. You can create your own queries; generate tables, charts, and maps; and easily save, embed, and share them.

Encyclopedia Britannica – information can be found using subjects or a search function. Current news is provided by The New York Times, the BBC, and SBS World News. Videos are available for viewing online. Timelines give interactive breakdowns of the history of subject areas. Data and statistics are available for every country. Maps are provided. Quotations are available by the author or subject.

GPO’s Catalog of US Government Publications — USA Federal publications database. The CGP is the finding tool for federal publications that includes descriptive information for historical and current publications as well as direct links to the full document, when available. Users can search by authoring agency, title, subject, and general keywords, or click on "Advanced Search" for more options.

National Security Archive – declassified papers and such. Provide online access to critical declassified records on issues including U.S. national security, foreign policy, diplomatic and military history, intelligence policy, and more.  Updated frequently.

Scholarpedia - a peer-reviewed open-access encyclopedia written and maintained by scholarly experts from around the world. Scholarpedia is inspired by Wikipedia and aims to complement it by providing in-depth scholarly treatment of topics within the fields of mathematics and sciences including physical, biological, behavioural, and social sciences.

The National Archives — US National Archives’ research tools and online database. This is a searchable database focusing on America.  This includes both historic information and current information.

UK Statistics Authority - an independent, non-ministerial department strives to promote and safeguard the production and publication of official statistics that serve the public good.


Websites that provide access to digitized versions of books

Getty Research Institute – the Getty Research Institute collections include over 1 million books, periodicals, photos, and catalogs.  The institute also boasts a large collection of rare or unique items that largely focus of art history and architecture.

Google Books – is a service from Google Inc. that searches the full text of books and magazines that Google has scanned, converted to text using optical character recognition (OCR), and stored in its digital database. Books are provided either by publishers and authors, through the Google Books Partner Program, or by Google's library partners, through the Library Project. Additionally, Google has partnered with a number of magazine publishers to digitize their archives.  All text is digitized (and searchable). Ability to read the results depends on the copyright license on that individual piece of text.

Hathi Trust –— a partnership of major research institutions and libraries working to ensure that the cultural record is preserved and accessible long into the future. There are more than sixty partners in HathiTrust, and membership is open to institutions worldwide.

InTech - Is a pioneer and world's largest multidisciplinary open access publisher of books covering the fields of Science, Technology and Medicine.  Claims to be the world's leading publisher of Open Access books Built by scientists, for scientists.

Project Gutenberg — offers over 57,000 free eBooks. Choose among free epub books, free kindle books, download them or read them online.

Scribd – Scribd is a digital library, e-book and audiobook subscription service that includes one million titles. Scribd hosts 60 million documents on its open publishing platform.  The documents section allows users to upload almost anything with very few restrictions, therefore it has become a great repository for a very large amount of  textbooks and other academic content. 

The Online Books Page — is an index of e-text books available on the Internet.  It is hosted by the library of the University of Pennsylvania. The Online Books Page lists over 2 million books and has several features, such as A Celebration of Women Writers and Banned Books Online.

The web site was named one of the best free reference web sites in 2003 by the Machine-Assisted Reference Section of the American Library Association.



Subject / discipline Specific Deep Web Resources


Art & Design

ArtNet  - deals with pricing and sourcing work in the art market. They also keep track of the latest news and artists in the industry.

Musée du Louvre  - the renowned museum, maintains a site filled with navigable sections covering its collections.

Public Art Online - a resource detailing sources, creators, prices, projects, legal issues, success stories, resources, education and all other aspects of the creation of public art.

Smithsonian Art Inventories Catalog - a subset of the Smithsonian Institution Research Information System (SIRIS). A browsable database of over 400,000 art inventory items held in public and private collections.

The Metropolitan Museum of Art - site hosts an impressively interactive body of information on their collections, exhibitions, events and research.

The National Gallery of Art - premier museum of arts also maintains a site detailing the highlights, exhibitions and education efforts the institution oversees.

Web Gallery of Art  - is a searchable database of European art, containing nearly 34,000 reproductions. Additional database information includes artist biographies, period music and commentaries.


Business

Better Business Bureau - (BBB) US Information System Search allows consumers to locate the details of ratings, consumer experience, governmental action and more of both BBB accredited and non-accredited businesses.

BPubs.com - the business publications search engine. They offer more than 200 free subscriptions to business and trade publications.

BusinessUSA - is an excellent and complete database of everything a new or experienced business owner or employer should know.

EDGAR: U.S. Securities and Exchange Commission - contains a database of Securities and Exchange Commission. Posts copies of corporate filings from US businesses, press releases and public statements.

FRED – Up-to-date financial data.  Download, graph, and track 508,000 US and international time series from 86 sources. Provided by the Federal Reserve Bank of St. Louis. FRED links out to a number of other equally impressive resources for economic data. It is an excellent  primary resource for anyone doing research in the areas of finance and economic theory. (ALFRED ) - allows you to retrieve vintage versions of economic data that were available on specific dates in history.

Global Edge - delivers a comprehensive research tool for academics, students and businesspeople to seek out answers to international business questions.

Hoover’s - a subsidiary of Dun & Bradstreet, is one of the best known databases of American and International business. A complete source of company and industry information, especially useful for investors.

The National Bureau of Economic Research - is perhaps the leading private, non-partisan research organization dedicated to unbiased analysis of US economic policy. This database maintains archives of research data, meetings, activities, working papers and publications.

U.S. Department of Commerce - Bureau of Economic Analysis is the source of many of the economic statistics we hear in the news, including national income and product accounts (NIPAs), gross domestic product, consumer spending, balance of payments and much more.


Law and Politics

Federal Bureau of Investigation (FBI) Stats & Services – organizes US crime statistics, criminal history checks, a sex offender registry, resources for businesses, communities, crime victims, law enforcement, job seekers, researchers and students.

Global Legal Information Network — laws, regulations, judicial decisions, and other legal sources.

Homeland Security Digital Library - (HSDL) maintains databases, policy and strategy statements, special collections and research tools.

Law Library of Congress —Contains the world's largest collection of law books and legal resources, with strong foreign law and comprehensive United States Legal publications.

LexisNexis – is a corporation providing computer-assisted legal research as well as business research and risk management services.  During the 1970s, LexisNexis pioneered the electronic accessibility of legal and journalistic documents. As of 2006, the company has the world's largest electronic database for legal and public-records related information.  This is a paid for service.

National Criminal Justice Reference Service - (NCJRS) is a US federally funded resource offering extensive databases detailing issues of justice, substance abuse, and victim assistance information to victims of crime, among other topics.

Quandi – A collection of 9,000,000 of financial, economic, and social datasets.  An excellent  source for financial, economic, and alternative datasets, serving investment professionals.

Social Work Policy - The Social Work Policy Institute (US) examines issues that relate to the work of social workers, including how to serve people who have multiple or complex needs and how public agencies and other structures deliver health and human services.

U.S. Department of Justice Resources - is a comprehensive database for the Department of Justice, including archives, initiatives, news, publications and resources.


Medical and Health

BioMed Central - is the UK-based publisher of 258 peer-reviewed open access journals. Their published works span science, technology and medicine and include many well-regarded titles.

Globalhealthfacts.org – is a project of the Henry J. Kaiser Family Foundation, provides free, up-to-date and easy-to-access data by country on HIV/AIDS, tuberculosis, malaria and other key health and socio-economic indicators. Global Health Facts is comprised of more than 100 indicators and provides users with the ability to map, rank, and download the data for custom analyses.  This indexed database of world health information is searchable by disease type, country, conditions, symptoms, and more.

Cases Database - is a searchable database of more than 32,000 peer-reviewed medical case reports from 270 journals covering a variety of medical conditions.

Center for Disease Control - (CDC) WONDER’s online databases permit access to the substantial public health data resources held by the CDC.

HCUPnet - is an online query system for those seeking access to statistical data from the US Agency for Healthcare Research and Quality.

National Center for Biotechnology Information - (NCBI) is an offshoot of the US National Institutes of Health (NIH). This site provides access to some 65 databases from the various project categories currently being researched.

National Institute for Health Research Archive — database of ongoing or completed projects funded by the British  NHS.

New England Journal of Medicine – is a weekly medical journal published by the Massachusetts Medical Society. It is one of the leading medical journals.  There are complete text past issues available online. Some parts require payment but also much is available for free.

OMIM - offers access to the combined research of many decades into genetics and genetic disorders. With daily updates, it represents perhaps the most complete single database of this sort of data.

PubMed  —is a free search engine accessing primarily the MEDLINE database of references and abstracts on life sciences and biomedical topics.  PubMed comprises more than 28 million citations for biomedical literature from MEDLINE, life science journals, and online books.

TOXNET - is the access portal to the US Toxicology Data Network, an offshoot of the National Library of Medicine.

U.S. National Library of Medicine - is a database of medical research, available grants, available resources. The site is maintained by the National Institutes of Health.

Wiley Open Access  - a subsidiary of New Jersey-based global publishers John Wiley & Sons, Inc., publishes peer reviewed open access journals specific to biological, chemical and health sciences.

World Health Organization - (WHO) is a comprehensive site covering the many initiatives the WHO is engaged in around the world.


Science and Technology

AGRIS - (International Information System for Agricultural Science and Technology) is a global, public domain database maintained in multiple languages by the Food and Agriculture Organization of the United Nations. They provide free access to agricultural research and information.

Arxiv - a repository of electronic preprints (known as e-prints) approved for publication after moderation, that consists of scientific papers in the fields of mathematics, physics, astronomy, computer science, quantitative biology, statistics, and quantitative finance, which can be accessed online. In many fields of mathematics and physics, almost all scientific papers are self-archived on the arXiv repository.

Copernicus Publications - has been an open-access scientific publisher in Germany since 2001. They are strong supporters of the researchers who create these articles, providing top-level peer review and promotion for their work.

De Gruyter Open - (formerly Versita)  is one of the world’s leading publishers of open access scientific content. Today De Gruyter Open (DGO) publishes about 600 own and third-party scholarly journals across all major disciplines.

Digital Library for Physics and Astronomy– Physics and Astronomy data engine for academic papers.

EDP Sciences - (Édition Diffusion Presse Sciences) is a France-based scientific publisher with an international mission. They publish more than 50 scientific journals, with some 60,000 published pages annually.

Elsevier of Amsterdam - is a world leader in advancing knowledge in the science, technology and health fields. They publish nearly 2,200 journals, including The Lancet and Cell, and over 25,000 book titles, including Gray’s Anatomy and Nelson’ s Pediatrics.

Hindawi Publishing Corporation - based in Egypt, publishes 434 peer-reviewed, open access journals covering all areas of Science, Technology and Medicine, as well as a variety of Social Sciences.

IEEE Xplore Digital Library – contains over 1.4 million documents from the Institute of Electronics and Electrical Engineers. Searchable database of up-to-date materials in relation to electrical engineering and technology as a whole.

National Science Digital Library - (NSDL) is a source for science, technology, engineering and mathematics educational data. It is funded by the US National Science Foundation.

Networked Computer Science Technical Reports Library (NCSTRL) - was developed as a collaborative effort between NASA Langley, Virginia Tech, Old Dominion University and University of Virginia.

Open Science Directory  - contains about 13,000 scientific journals, with another 7,000 special programs titles.

Osti.gov – US Government website. OSTI provides access to energy, science, and technology information through publicly available web-based systems, with supporting tools and technologies to enable information search, retrieval and re-use.

ResearchGate - is a social networking site for scientists and researchers to share papers, ask and answer questions, and find collaborators. According to a study by Nature and an article in Times Higher Education, it is the largest academic social network in terms of active users.

SciELO (Scientific Electronic Library Online) - is a bibliographic database, digital library, and cooperative electronic publishing model of open access journals. SciELO was created to meet the scientific communication needs of developing countries and provides an efficient way to increase visibility and access to scientific literature Originally established in Brazil in 1997, today there are 14 countries in the SciELO network and its journal collections: Argentina, Bolivia, Brazil, Chile, Colombia, Costa Rica, Cuba, Mexico, Peru, Portugal, South Africa, Spain, Uruguay, and Venezuela. Paraguay is developing a journal collection.

Science-advisor.net - is a free and open-source inspired online forum for scientific discussions addressed to researchers and students.

ScienceHUβ - is a global science and technology publisher and provides free access to research articles and latest research information without any barrier to scientific community.

Science.gov — Science.gov is a web portal and specialized search engine. Using federated search technology, Science.gov serves as a gateway to United States government scientific and technical information and research. Searches an database of 200 million different publications and journals, great for people doing research on topics that are covered mainly under the “science” classification.

Science Research -  is a free, publicly available deep web search engine that to use a sophisticated technology that permits queries to more than 300 science and technology sites simultaneously, with the results collated, ranked and stripped of duplications.

Scopus - is the largest abstract and citation database of peer-reviewed literature: scientific journals, books and conference proceedings. Delivering a comprehensive overview of the world's research output in the fields of science, technology, medicine, social sciences, and arts and humanities, Scopus features smart tools to track, analyze and visualize research. Scopus offers free features to non-subscribed users, and is available through Scopus Preview.

Springer Open - offers a roster of more than 160 peer-reviewed, open access journals, as well as their more recent addition of free access books, covering all scientific disciplines.

TechXtra — For technology, free access to reports, articles, key websites, books, the latest industry news, job announcements, ejournals, eprints, technical reports, the latest research, thesis & dissertations and more. It serves as an archive for submitted scientific abstracts and other research products.

The SAO/NASA Astrophysics Data System – is an online database of over eight million astronomy and physics papers from both peer reviewed and non-peer reviewed sources. Abstracts are available free online for almost all articles, and full scanned articles are available in (GIF) and (PDF) for older articles. It was developed by the National Aeronautics and Space Administration (NASA), and is managed by the Harvard–Smithsonian Center for Astrophysics.

WebCASPAR - provides access to science and engineering data from a variety of US educational institutions. It incorporates a table builder, allowing a combined result from various National Science Foundation and National Center for Education Statistics data sources.

World Wide Science.org - is a global scientific gateway, comprised of US and international scientific databases. Because it is multilingual, it allows real-time search and translation of reporting from an extensive group of databases.


Video & Audio Resources

LibreVox.org – Huge selection of audio recordings and AudioBooks read by volunteer voice artists.

VideoLectures.net – The world's biggest academic online video repository with 14,251 video lectures delivered by 10,763 presenters.

ITS Sub-Pages Background
Share Now!
Facebook
Twitter
Google+
LinkedIn