Logo DISQOVER User Manual

2. Start Your Search¶

A search with an ordinary search engine usually returns a large number of results, but it is often time-consuming and challenging to distill relevant information. In DISQOVER, the data is immediately shown within its context. Semantic data types help you guide your search in the right direction by picking types that fit the context of your search. In this way, you get smarter search results, faster.

Canonical types represent the context of a search and refer to underlying real-life concepts (active substances, diseases, genes,…). They are used to categorize searchable entities, which are called instances. Properties characterize instances. For example, the canonical type “Disease” contains the instances “Malaria” and “Anemia”. Properties such as “Prevalence” and “Causative agent” apply to all instances of the canonical type “Disease”, but their values are different for each disease. Every instance in DISQOVER belongs to at least one canonical type. If no specific canonical type is available, it automatically classifies under the “Uncategorized” canonical type.

2.1. The start screen¶

After login, you are directed to the start screen from which you can start your search. Then you are redirected to the dashboard environment (section 3), where you can view your search results (section 3.2) and refine or analyze them using widgets (section 3.3). Through linked data, you can then navigate further to related results.

The start screen or search page (Figure 2.1) contains the following elements:

  1. The general menu
    • Dashboard environment
    • Data
    • Collections
    • Notifications
    • Help and feedback
    • Settings
  2. Dashboard overview
    • Dashboard search bar
    • New search button
    • Existing dashboards
  3. Search bar
  4. Canonical types
image

Figure 2.1 The typical start of a new search.

The search page is highly customizable and can be adapted to different use cases. This means your search page can have a different lay-out (for example Figure 2.2). You can find an overview of the customization options in section 4.1.

image

Figure 2.2 A customized search page.

2.2. How to search¶

2.2.1. Basic queries¶

To start a new search, you enter your search term(s) into the search bar. When DISQOVER performs a search, it matches your search term to the instances and instance properties and shows you which instances are a hit. Alternatively, you can browse through all the data in DISQOVER if you leave the search field blank. You notice keywords in the search bar are automatically tokenized, which simplifies complex searches (see section 2.2.2).

Depending on the results found, the tiles containing the canonical types change color. The numbers within each tile indicate how many instances from that canonical type match your search term. If no instances belonging to a canonical type are found, that tile is grayed out. If your search term matches the instance name or its synonyms, it is called a semantic hit, and the corresponding canonical type highlights. The number of semantic hits is shown in addition to the total number of instances found.If you click the round tile, you open a dashboard with all results within that canonical type. If you click the “exact hit” box underneath the tile, you open a dashboard with only semantic hits.

To continue your search, click the relevant canonical type.

Example

Malaria is a disease, so you expect a semantic hit with the “Disease” canonical type. When performing the search, you do see that 1 out of the 239 results within the Disease tile is a semantic hit. On top of that, you see 10 semantic hits within the Project canonical type and 1 within Medicine and Chemical. The search term also has hits based purely on a text search within most canonical types. For example, 2.80k clinical studies refer to “malaria” in some way.

To see all clinical trials mentioning malaria, click the canonical type “Clinical trials”, which brings you to the dashboard.

malaria-search

2.2.2. Query fine-tuning ui-expert-blue¶

Synonyms ui-expert¶

It is possible to expand a search with synonyms of the entered search term to achieve more robust and accurate search results. Suppose you want to find all the information about the concept “Aspirin”. Some documents may use the actual term “Aspirin”, but others may only use the more scientific term “Acetylsalicylic acid”. On the other hand, not all synonyms of a term have the same prevalence or relevance, while some may even cause ambiguity. DISQOVER contains an algorithm to score synonyms and avoid reduced precision caused by the addition of ambiguous synonyms, while still retaining sufficient recall by including the most relevant synonyms.

In order for synonyms to be added, two conditions must be met:

  1. Instances must have the same canonical type. For example, it does not make sense to merge the synonyms of the “DOG” gene and the animal “dog”. Instances only merge if there is at least one overlapping canonical type. Note that some instances have multiple canonical types: “Aspirin” is both a “Chemical” and an “Active Substance”.
  2. Instances must share a minimum number of synonyms. For example, the search term “ALS” yields 2 semantic hits in the “Chemical” canonical type. By definition, they share one synonym (“ALS”). Nevertheless, it is clear that Antilymphocyte Serum and Ammoniumlaurylsulfate are entirely unrelated chemicals which happen to share an abbreviation. This second rule prevents the merging of these instances into the same semantic concept. The minimum number of shared synonyms is set to 3 as this has proven to provide optimal results during calibration tests.

If you want to select synonyms of your search term, you click the plus-sign next to the tokenized search term. You can choose which classes of synonyms to include by clicking the corresponding switch (ui-switch). You can select or deselect individual terms by clicking on their tag (Figure 2.3). When you have made your selection, you see the number of included synonyms next to the search term. In composite searches using query operators (see below), you can select synonyms for keywords separately.

image

Figure 2.3 Query expansion allows inclusion of synonyms of a search term. To perform query expansion, click the “+”-button next to a search term.

Often, synonyms are extensions or abbreviations of one another. In the case of Aspirin, some synonyms are Aspirin lysine, Aspirin sodium, Aspirin calcium, and Aspirin potassium. However, adding these generic drug names, you do not get extra search hits: they will already be found by searching for Aspirin. These synonyms are called shadowed synonyms. Visualizing them in the application might seriously clutter the overview of the synonyms, and therefore, they are hidden in the default user interface (Figure 2.4 and Figure 2.5).

To further optimize the synonym search, DISQOVER leverages its ability to bring together many data sources. The instance for Aspirin, for example, collects data from no less than 13 different databases, from which 8 contribute to the synonyms. In total, the public databases give 75 distinct synonyms, not all of which are equally relevant. By requiring synonyms to be present in at least two data sources (if multiple data sources are available), DISQOVER can separate the relevant from the less relevant synonyms. The most relevant synonyms are called the core set.

When testing this filter on the 500 most popular search terms, the resulting core set consisted of only 16% of all synonyms on average. Conversely, the number of search hits was only reduced by 1.7%. One of the reasons this reduction in recall is so low is that texts that use uncommon synonyms often use more common synonyms as well. You can inspect all synonyms of your search term, including the less relevant ones, by unchecking the “Core set” checkbox. You can override the default choice by selecting individual synonyms or by selecting everything by clicking the switch box for that semantic concept.

image

Figure 2.4 A search term that covers shadowed synonyms can be recognized by the arrow (ui-arrowright) next to its name.

image

Figure 2.5 Clicking the arrow displays the shadowed synonyms.

Query operators ui-expert¶

Within the search box, you can use the following query operators which give you the flexibility to define advanced searches. Figure 2.6 shows an example of a search using query operators.

  • OR: this logic finds all the instances that match with either of the entered search keywords.
  • AND: if two or more words are entered into the search bar separated by a space, the query performs using AND logic by default. Using AND logic results in a query where instances only match if the keywords both match to a single instance. For example, using plantar fasciitis as a search query results in 14 reported instances for the “Disease” canonical type while using plantar OR fasciitis reports 488 “Disease” instances (exact numbers observed may vary due to updates of the data). Note that the keywords do not necessarily need to be present together in the same property value or instance name. Indeed, the two keywords may match separately within two different property values.
  • NOT: using NOT excludes any instance from the results that match with the keyword following the operator. For example, using cancer NOT lung finds all the instances that match with “cancer”, but do not match with “lung”.
  • Exact match: an exact match can be performed by using double quotation marks around two or more search terms. The keywords can not match separately, as the search query forms one coherent unit that must be matched as is. “Lung cancer” has 170 matches within the “Disease” canonical type, while lung AND cancer has 407.
  • Tilde (~n): the tilde operator allows for an extension of the exact match where n number of words are allowed between search keywords. Thus “attention deficit disorder”~0 is the same as an exact match and “attention deficit disorder”~1 matches both Attention Deficit Disorder and Attention Deficit Hyperactivity Disorder.
  • Wildcards (? or *): two wildcard characters are supported: “*” and “?”. If you want to match one character exactly but it does not matter which one, you can use a question mark. For example, “rpo?” matches the genes rpoE, rpoC, rpoA,… To match any number of characters, use an asterisk instead. Thus, “rpo*” matches the genes rpoE, rpoC, rpoA,… but also rpoC1 and rpoC2.
  • Round brackets: when combining multiple query operators and keywords, the search is performed from left to right. Brackets can be used to group the different operations such that the operations within these groups are prioritized. For example, the query lung OR breast AND cancer finds “lung” and “breast cancer”, but (lung OR breast) AND cancer matches with both “lung cancer” and “breast cancer”.
image

Figure 2.6 A search that uses query operators.

Table Of Contents

  • 1. Introduction
  • 2. Start Your Search
    • 2.1. The start screen
    • 2.2. How to search
      • 2.2.1. Basic queries
      • 2.2.2. Query fine-tuning
  • 3. Dashboard Environment
  • 4. Customizations
  • 5. Widget Reference List
  • 6. Data Overview
  • 7. Collections
  • 8. Glossary