2. Start Your Search¶
A search with an ordinary search engine usually returns a large number of results, but it is often time-consuming and challenging to distill relevant information. In DISQOVER, the data is immediately shown within its context. Semantic data types help you guide your search in the right direction by picking types that fit the context of your search. In this way, you get smarter search results, faster.
Canonical types represent the context of a search and refer to underlying real-life concepts (active substances, diseases, genes,…). They are used to categorize searchable entities, which are called instances. Properties characterize instances. For example, the canonical type “Disease” contains the instances “Malaria” and “Anemia”. Properties such as “Prevalence” and “Causative agent” apply to all instances of the canonical type “Disease”, but their values are different for each disease. Every instance in DISQOVER belongs to at least one canonical type. If no specific canonical type is available, it automatically classifies under the “Uncategorized” canonical type.
2.1. The start screen¶
After login, you are directed to the start screen from which you can start your search. Then you are redirected to the dashboard environment (section 3), where you can view your search results (section 3.2) and refine or analyze them using widgets (section 3.3). Through linked data, you can then navigate further to related results.
The start screen or search page (Figure 2.1) contains the following elements:
- The general menu
- New search
- Dashboard environment
- Data
- Collections
- Notifications
- Help and feedback
- Settings
- Search bar
- Canonical types
The search page is highly customizable and can be adapted to different use cases. This means your search page can have a different lay-out (for example Figure 2.2). If a search page template is global, you can share it with others via a link by clicking the share icon. You can find an overview of the customization options in section 4.1.
2.2. How to search¶
2.2.1. Basic queries¶
To start a new search, you enter your search term(s) into the search bar. When DISQOVER performs a search, it matches your search term to the instances and instance properties and shows you which instances are a hit. Alternatively, you can browse through all the data in DISQOVER if you leave the search field blank. You notice keywords in the search bar are automatically tokenized, which simplifies complex searches (see section 2.2.3).
Depending on the results found, the tiles containing the canonical types change color. The numbers within each tile indicate how many instances from that canonical type match your search term. If no instances belonging to a canonical type are found, that tile is grayed out. If your search term matches the instance name or its synonyms, it is called a semantic hit, and the corresponding canonical type highlights. The number of semantic hits is shown in addition to the total number of instances found.If you click the round tile, you open a dashboard with all results within that canonical type. If you click the “exact hit” box underneath the tile, you open a dashboard with only semantic hits.
To continue your search, click the relevant canonical type.
Example
Malaria is a disease, so you expect a semantic hit with the “Disease” canonical type. When performing the search, you do see that 1 out of the 238 results within the Disease tile is a semantic hit. On top of that, you see 10 semantic hits within the Project canonical type and 1 within Medicine and Chemical. The search term also has hits based purely on a text search within most canonical types. For example, 5.59k clinical studies refer to “malaria” in some way.
To see all clinical trials mentioning malaria, click the canonical type “Clinical trials”, which brings you to the dashboard.
2.2.2. Synonyms ¶
It is possible to expand a search with synonyms of the entered search term to achieve more robust and accurate search results. Suppose you want to find all the information about the concept “Aspirin”. Some documents may use the actual term “Aspirin”, but others may only use the more scientific term “Acetylsalicylic acid”. On the other hand, not all synonyms of a term have the same prevalence or relevance, while some may even cause ambiguity. DISQOVER contains an algorithm to score synonyms and avoid reduced precision caused by the addition of ambiguous synonyms, while still retaining sufficient recall by including the most relevant synonyms.
In order for synonyms to be added, two conditions must be met:
- Instances must have the same canonical type. For example, it does not make sense to merge the synonyms of the “DOG” gene and the animal “dog”. Instances only merge if there is at least one overlapping canonical type. Note that some instances have multiple canonical types: “Aspirin” is both a “Chemical” and an “Active Substance”.
- Instances must share a minimum number of synonyms. For example, the search term “ALS” yields 2 semantic hits in the “Chemical” canonical type. By definition, they share one synonym (“ALS”). Nevertheless, it is clear that Antilymphocyte Serum and Ammoniumlaurylsulfate are entirely unrelated chemicals which happen to share an abbreviation. This second rule prevents the merging of these instances into the same semantic concept. The minimum number of shared synonyms is set to 3 as this has proven to provide optimal results during calibration tests.
If you want to select synonyms of your search term, you click the plus-sign next to the tokenized search term. You can choose which classes of synonyms to include by clicking the corresponding switch (). You can select or deselect individual terms by clicking on their tag (Figure 2.3). When you have made your selection, you see the number of included synonyms next to the search term. In composite searches using query operators (see below), you can select synonyms for keywords separately.
Often, synonyms are extensions or abbreviations of one another. In the case of Aspirin, some synonyms are Aspirin lysine, Aspirin sodium, Aspirin calcium, and Aspirin potassium. However, adding these generic drug names, you do not get extra search hits: they will already be found by searching for Aspirin. These synonyms are called shadowed synonyms. Visualizing them in the application might seriously clutter the overview of the synonyms, and therefore, they are hidden in the default user interface (Figure 2.4 and Figure 2.5).
To further optimize the synonym search, DISQOVER leverages its ability to bring together many data sources. The instance for Aspirin, for example, collects data from no less than 13 different databases, from which 8 contribute to the synonyms. In total, the public databases give 75 distinct synonyms, not all of which are equally relevant. By requiring synonyms to be present in at least two data sources (if multiple data sources are available), DISQOVER can separate the relevant from the less relevant synonyms. The most relevant synonyms are called the core set.
When testing this filter on the 500 most popular search terms, the resulting core set consisted of only 16% of all synonyms on average. Conversely, the number of search hits was only reduced by 1.7%. One of the reasons this reduction in recall is so low is that texts that use uncommon synonyms often use more common synonyms as well. You can inspect all synonyms of your search term, including the less relevant ones, by unchecking the “Core set” checkbox. You can override the default choice by selecting individual synonyms or by selecting everything by clicking the switch box for that semantic concept.
2.2.3. Query operators ¶
Within the search box, you can use the following query operators which give you the flexibility to define advanced searches. Figure 2.7 shows an example of a search using query operators.
- Wildcard ?: If you want the last character of your search term to match one character but it does not matter which one, you can use a question mark. For example, “rpo?” matches the genes rpoE, rpoC, rpoA,…
- Wildcard *: Place a question mark at the end to match anything that starts with the given characters. For example, “rpo*” matches the genes rpoE, rpoC, rpoA,… but also rpoC1 and rpoC2.
- Quotes “”: An exact match can be performed by using double quotation marks around two or more search terms. The keywords can not match separately, as the search query forms one coherent unit that must be matched as is. “Lung cancer” has 170 matches within the “Disease” canonical type, while lung AND cancer has 407.
- Proximity ~n: The tilde operator allows for an extension of the exact match where n number of words are allowed between search keywords. Thus “attention deficit disorder”~0 is the same as an exact match and “attention deficit disorder”~1 matches both “Attention Deficit Disorder” and “Attention Deficit Hyperactivity Disorder”.
- Brackets ( ): Brackets can be used to override the default operator precedence. For example, the query lung OR breast AND cancer finds “lung” and “breast cancer”, but (lung OR breast) AND cancer matches with both “lung cancer” and “breast cancer”.
- NOT: Using NOT excludes any instance from the results that match with the keyword following the operator. For example, using cancer NOT lung finds all the instances that match with “cancer”, but do not match with “lung”.
- Spaced: If two or more search terms are entered into the search bar separated by a space, they are treated as if there was an AND keyword in between.
- AND: Using AND logic results in a query where instances only match if the keywords both match to a single instance. For example, using plantar fasciitis as a search query results in 14 reported instances for the “Disease” canonical type while using plantar OR fasciitis reports 488 “Disease” instances (exact numbers observed may vary due to updates of the data). Note that the keywords do not necessarily need to be present together in the same property value or instance name. Indeed, the two keywords may match separately within two different property values.
- OR: this logic finds all the instances that match with either of the entered search keywords.
Search query problems ¶
When the search query contains a syntax error, a warning icon appears. When you hover over the icon, an explanation of the problem is displayed. If the explanation is not clear, please contact support.