A smarter way to search Google Scholar

Most of us are familiar Google Scholar: a freely available subset of Google that indexes the world’s scholarly literature across a range of disciplines. With its extensive database of scientific literature (claimed to contain over 389 million documents including articles, citations and patents), it has become an indispensable resource for scholars and researchers across the globe. Enter a few keywords in that iconic search box, and that knowledge can be brought to our fingertips in a fraction of a second. For research questions needing rapid answers, Google Scholar can feel like all we’ll ever need.

But not all research tasks require quick answers to simple questions. On the contrary, some research tasks require a highly structured approach, based on rigour and repeatability. Systematic literature reviews, for example, are predicated on the use of exhaustive methods with strict standards for reporting and reproducibility. In these instances, how can we ensure a similar level of repeatability and rigour? What approaches can we adopt to support effective systematic searching using Google Scholar?

One approach is to resort to the use of Boolean search queries and expressions. By combining keywords with Boolean operators and appropriate use of parentheses, it is possible to construct search queries of arbitrary complexity. But as we have seen in previous posts, crafting Boolean strings is tedious, inefficient and error-prone. Moreover, a Boolean query crafted using the syntax of one database will rarely work in another, forcing searchers to ‘translate’ their search strategies from one source to another. This creates further sources of inefficiency and error, and undermines the ability of others to validate or reproduce the results.

So we’re delighted this week to announce support for Google Scholar, which means that in addition to Google, Bing and PubMed, you can now use 2dSearch to search Google Scholar. But what might this mean in practice, and why should you care? To answer that, let’s recap on some of the basics.

A visual approach to systematic searching

At the heart of 2dSearch is a graphical editor which allows the user to formulate search strategies using a visual framework in which concepts are expressed as objects on a two-dimensional canvas. Concepts can be simple keywords or attribute:value pairs representing controlled vocabulary terms (e.g. Mesh terms) or database-specific search operators (e.g. field tags and other commands). They can be combined using Boolean (and other) operators to form higher-level groups and then iteratively nested to create expressions of arbitrary complexity. Groups can be expanded or collapsed on demand to facilitate readability and comprehension. Like conceptual Lego blocks, they can be effortlessly combined and recombined to explore different configurations, offering a more transparent and intuitive approach to search strategy development and validation.

The application itself consists of two panes: a query canvas on the left and a search results pane on the right (which can be resized or detached in a separate tab or window):

gs.png

The canvas can be resized or zoomed, and includes an ‘overview’ widget which allows the user to view or navigate to elements that may be outside the current viewport. Adopting design cues from Google’s Material Design language, a sliding menu is offered on the left, providing file I/O and other options. This is complemented by a navigation bar across the top which provides support for common document-level functions such as naming and sharing search strategies.

Although 2dSearch supports the creation of complete strategies from a blank canvas, its value is most readily understood by reference to an existing (i.e. text-based) search strategy such as the following, which was developed for a systematic review of Asynchronous and synchronous teleconsultation for diabetes care:

((telemedicine |telehealth |"Internet based"|telecare|"web based"|"mobile phone"|telemedical|videoconferencing|"text messaging"|"e mail"|telephone|"cell Phone"|pda|"e health")(diabetes|diabetic|insulin))| Telediabetes

This was published as part of a study investigating the comparative recall of Google Scholar vs PubMed for biomedical systematic reviews, and is part of a larger set available in the appendix. Although relatively simple, it is hard to visualise the conceptual structure of this expression, and difficult to optimise, debug or reliably reproduce searches expressed in this form. However, when opened using 2dSearch, its structure becomes much more transparent:

telediabetes.png

It can be seen that the overall structure consists of a disjunction between a single term (‘Telediabetes’) and a conjunction of two disjunctions, the first of which articulates variations on the telemedicine concept, while the latter articulates variations on the diabetes concept. Moreover, in the above example, we have taken advantage of the opportunity to give meaningful names to these components, enhancing their ability to be productively re-used.

Although visualisation of search strategies in this manner can offer immediate utility, the true value of the approach is not so much in the information design, but in the interaction design. For example, we can move terms from one block to another using direct manipulation, and create new groups simply by combining terms. We can also cut, copy, delete, and lasso multiple objects. If we want to study the effect of one block in isolation, we can execute it individually. Conversely, if we want to remove one element from consideration, we can temporarily disable it. In each case, the effects of each editing operation are displayed in real time in the adjacent search results pane.

We may also choose to explore the use automated search suggestions to identify and include related concepts:

suggestions.png

If we inadvertently apply too many, the system will automatically highlight any duplicates:

dupes.png

In this instance we can either delete the spurious terms manually, or simply hit the ‘Undo’ button and they are gone.

Validation and reproducibility

It is common for researchers to want to search more than one database, particularly when undertaking a systematic literature review. In practice, this requires a process of ‘translation’ of the search strategy to match the syntax of the target database and the search operators it supports. For a relatively simple query this may not be a major undertaking, particularly if such operators form a relatively small proportion of the overall search strategy. However, the user still has to understand which elements are platform-specific, identify the closest equivalent in the other database and manually edit their query, all of which is laborious and time consuming.

2dSearch provides support for search strategy translation in the form of a ‘Messages’ tab on the results pane. This serves a purpose similar to a console or messages pane in a software IDE, alerting the user to issues and offering advice, fixes and workarounds. For example, if the user tries to execute via Google Scholar a query string containing operators specific to PubMed, an alert is shown highlighting the problematic issues:

messages.png

In due course, this mechanism could be extended to offer a greater degree of interactive support for the automated translation of strategies across databases.

Moreover, strategies expressed in this way can be shared at the click of a button as reproducible, executable artefacts. Alternatively, they can be exported (or indeed imported) as traditional Boolean strings. In addition, 2dSearch supports copying and pasting of multi-line, single line, multi-block and single block search strategies, so conventional and new ways of working can harmoniously co-exist, with clear interoperability between them.

In closing

2dSearch is a framework for search query formulation in which concepts are expressed as objects on a two-dimensional canvas. Transforming logical structure into physical structure mitigates many of the shortcomings of Boolean strings, eliminates many sources of syntactic error and makes the query semantics more transparent. Moreover, it offers new ways to for search strategies to be validated, shared and made reproducible. By integrating with Google Scholar, we hope to offer a tool of immediate utility to anyone wishing to search the world’s scientific literature in a systematic manner.

In due course, we hope to undertake a formal, user-centric evaluation of the approach, and we welcome feedback of any sort. In the meantime, head on over to 2dSearch, and let us know what you think.