Systematic searching using hit counts and error detection

Faceted search offers tremendous potential for transforming the search experience. It provides a flexible framework that can satisfy a wide variety of user needs, from simple fact retrieval to complex exploratory search. It is now the dominant interaction paradigm for many library sites and is increasingly applied to a wide variety of professional search applications and resources.

A key part of the faceted search experience is support for hit counts: by revealing the number of results associated with each facet value, hit counts act as numerical signposts that provide an overview of the information space and a preview to guide the user to regions of maximal productivity. In tasks predicated on discovery and exploration, hit counts play a pivotal role in the search experience.

But not all search tasks are based around serendipitous discovery and exploration. On the contrary, some search tasks require a highly structured approach, based on rigour and repeatability. Systematic literature review, for example, is predicated on the use of exhaustive methods with strict standards for reporting and reproducibility. In these instances, how can we provide a similar level of signposting and support? What approaches can we adopt to guide the user toward effective search strategies?

Hit counts as navigational signposts

One approach is to look to the past. For example, Anick et al (1989) developed a system for creating structured queries as a set of movable tiles on a two-dimensional canvas. Crucially, the system displayed the number of hits in the lower left corner of each tile, allowing the user to optimise their query based on an understanding of the effects of each query component:


This idea is remarkably prescient. However, it would be misleading to portray this innovation as unique. On the contrary, the use of hit counts is quite routine in many professional search applications, and is offered as a standard feature of traditional query builder interfaces such as PubMed (labelled as ‘Items found’):


Presenting hit counts in this manner is clearly a useful source of insight. But it misses the opportunity to go one step further. Whereas Anick’s system allowed users to create structured queries through a process of exploration and optimisation, using the hit count feedback in an interactive manner, PubMed’s hit counts are little more than a static, historical record of search events. Like operating system commands in your console history, they represent what took place in a search session, but offer little scope for creative refinement, remixing or reuse.

Which is why we’re pleased to announce this week support for hit counts on 2dSearch, which means that when you’re searching PubMed, you get interactive feedback to support your work. Like Anick’s system, we display hit counts as numerical signposts that adorn objects on the canvas (in their uppermost right corner), providing insight into the effects of each individual component:


But crucially, and unlike conventional query builders, the user is at liberty to compose or decompose canvas elements in whatever manner they wish, exploring alternative configurations, and studying the effects (in real time) as the results update. Like conceptual Lego blocks, these can be endlessly combined and recombined to explore different configurations, offering a more transparent approach to search strategy validation and optimisation.

Moreover, these strategies can be shared at the click of a button as reproducible, executable artefacts. If you prefer to use plain text as your favoured representation, that’s fine too - you can export (or indeed import) your work at any time as a traditional Boolean string. In fact, 2dSearch also supports copying and pasting multi-line, single line, multi-block and single block search strategies. So ‘old’ and ‘new’ ways of working can harmoniously co-exist, with a clear interoperability path between them.

Automatic error detection

Finally, did you notice the little orange exclamation marks in the group shown top left? If you thought those items were duplicates, you’d be correct. This example is actually taken from a published search strategy which is now part of a curated collection. Ironically, it seems this duplication is in fact an error, but presumably one that remained undetected throughout the review/curation process. However, when strategies such as this are opened using 2dSearch, duplicates are highlighted by default, and the user is at liberty to correct or ignore them as they prefer. We discovered this particular error more or less by accident, but it would be interesting to retrospectively apply this treatment to validate other published strategies, and see how they stand up to a similar degree of scrutiny.

In Closing

If you were tasked with designing a framework for structured searching from scratch, it is unlikely that anyone would start from a command line paradigm. That approach reflects the days when searches were conducted using command line instructions to remote databases, and in that respect, it represents the past, not the future. Moreover, text-based Boolean strings suffer from a number of  fundamental shortcomings, in particular regarding scalability, efficiency and transparency.

In this article, we have explored an alternative approach in which structured queries are expressed as objects on a two-dimensional canvas. In particular, we have focused on the use of a hit counts as a visual signpost to help optimise search strategies and the use of automated error checking to validate search strategies. We have also discussed how they can be shared as executable artefacts, which contributes to their reproducibility while offering a clear migration path to and from traditional reporting approaches.

In due course, we hope to undertake a formal, user-centric evaluation, particularly in relation to traditional query builders and approaches. But for now, try out 2dSearch for yourself, and let us know what you think.

Tony Russell-RoseComment