UBiQuati ESPY FULL CONTENT SEARCH

 

 

What is UBiQuati ESPY?  The Dictionary and Thesaurus describe ESPY as:

 

To catch sight of (something distant, partially hidden, or obscure); glimpse. To perceive, especially barely or fleetingly (catch, descry, detect, discern, glimpse, spot, spy).  We chose the name primarily because it best fits the challenges of finding information from mounds of documents and their content.  ESPY is a Search Engine which relies on text that is hidden within an image. It does not matter if the text was embedded in a word document or an image file such as a .pdf. 

 

ESPY is composed of an Indexer and a Search Engine.

  

ESPY Indexer and Search Engine

 

ESPY relies on OCR (Optical Character Recognition), a conversion tool as our TR OCR solution that reads binary images and converts the black and white through a text recognition process.  All other documents whether they be incapsulated into a PDF or in their native form contain a text nature (layer).  ESPY searches through all text natured documents.

 

ESPY Search Form

 

There are three sections to the input form.  Section One is the user defined list box with a choice of search parameters and types to be deployed.  Second is the Results Sections which tells which database the result was located from, the folder name within UB and the Filename. You will notice a text area which displays the first hit within the focused document. The Third section displays the PREVIEW of hits requested, embedded into HTML with the highlighted hits. To view, click on FILE VIEW and the native document format (.doc, .xls, etc.) is displayed. In the future, (currently under construction) the system will locate the requested search results within the native document and be highlighted.

 

Records that are scanned and recognized are saved as a .pdf file with hidden text, are then subsequently placed into a sub-directory of other records. Simultaneously, through a timed event (Indexer) within ESPY these documents will be indexed.  “Other records” are any formatted record whether a Word Document, Excel Spreadsheet or OCR image. The indexer will provide a searchable index file that ESPY search engine can read and locate through the sub directory.

 

ESPY Index Search Screen with search options is displayed below:

 

UBSEARCH Page

 

Figure 10.1:  Index Search screen displaying search options!

 

The Search Criteria is a User interface which lets the user add in his search words or phrases.  There are different types of search capabilities built into ESPY.  The following will be defined.

 

  • Boolean
  • Wildcards
  • Fuzzy Search
  • Synonym
  • Stemming
  • Phonic
  • Natural Language
  • Anyword

 

 

Boolean

Boolean search: a group of words, phrases, or macros linked by connectors such as

AND and OR that indicate the relationship between them.

 

Examples:

Search Request Meaning

apple and pear both words must be present

apple or pear either word can be present

apple w/5 pear apple must occur within 5 words of pear

apple not w/5 pear apple must occur, but not within 5 words of pear

apple and not pear only apple must be present

name contains smith the field name must contain smith

apple w/5 xfirstword apple must occur in the first five words

apple w/5 xlastword apple must occur in the last five words

 

You can use variable term weighting in a search request to weight some words more

heavily than others in ranking search results.

 

Example: apple:5 and pear:3

 

Search Types

Any words: use quotation marks around phrases, put + (plus) in front of any word or

phrase that is required, and - (minus) in front of a word or phrase to exclude it.

 

Examples:

banana pear "apple pie"

"apple pie" -salad +"ice cream"

 

All words: is like an "any words" search except that all of the words in the search request

must be present for a document to be retrieved.  This is like using “equal to” which depending on the type of document condition may bring back nothing.

 

Search Features

 

Wildcards

Use * to match any number of characters and ? to match any single character.

Stemming finds other grammatical forms of the words in your search request. Example:

A search for applies would also find apply, applying or applied.

 

Fuzzy Search

 

Fuzzy search sifts through scanning and typographical errors. Fuzziness adjusts from 1

to 10 depending on the degree of misspellings. A search for alphabet with a fuzziness of

1 would find alphaqet; with a fuzziness of 3, it would find both alphaqet and alpkaqet.

 

Synonym

 

Synonym searching finds synonyms of a word that you include in a search request. For example, a search for fast would also find quickly. To enable synonym searching, check the Synonym search box in the search dialog box. You can also enable synonym searching selectively by adding the & character after certain words in your request. Example: improve& w/5 search

UBESPY provides three ways to perform synonym searching:

Check Synonyms to find synonyms using the WordNet concept network included with UBESPY.

Check Related Words to find related words from the WordNet concept network.

Check User synonyms to find synonyms that you have defined in your own thesaurus.

 

 

User Synonym  To be Added

Wordnet Synonym  Current and Included

Wordnet Related Words Current and Included

 

Stemming 

 

Stemming Search finds other grammatical forms of the words in your search request.

 

 

Phonic Search

 

Phonic Search finds words that sound similar to words in your request, like Smith and

Smythe.

 

Natural Language Search

 

Natural Language. A natural language search request consists of an unstructured natural language or "plain English" query. In a natural language search request, words such as AND and OR are disregarded. Any specific word ordering such as phrases will also be disregarded. A natural language search can rank retrieved files from most to least relevant according to the density and rarity of matching words in your documents.

 

Wild Card Search

 

Wildcard search Use * to match any number of characters and ? to match any single character.

 

 

Searching using the Search Screen

 

Indexes to search

 

The top right of the form shows a list box and labeled with search criteria.  Add data phases etc., you wish to look for and click search.  Remember you can search by any search engine or combinations of engines.

 

As you will see results are displayed in ranked order. (100 being the highest down to 1.)  Notice the second section which displays the ranked text along with the first 20 words of the first document. Now look at the third section for PREVIEW and FILEVIEW.  Notice that preview marks by highlight your hits.  The double arrows allow the user to move back and forth by clicking double arrow forward and double arrow backwards.

 

If you decide to view the original formatted document click the FILE VIEW tab.

 

 

Note: your first search may take as long as 3 to 5 seconds then after they are instant.  The reason the LAN/WAN server must be communicated with the index server and location to image directory.

 

espyhitreturn  
 

Figure 10.2.  Displays PREVIEW and FILE VIEW

 

EspytextReturn 

 

Notice the double arrows will move the user through the document forwards and backwards.

 

 

Adobe Return 

 Figure  Displays FILE VIEW

 

Remember by clicking on the file tab moves the user from Preview to Fileview!