|
UBiQuati ESPY FULL CONTENT SEARCH
What is UBiQuati ESPY? The
Dictionary and Thesaurus describe ESPY as:
To catch sight of (something distant, partially hidden, or obscure);
glimpse. To perceive, especially barely or fleetingly (catch, descry, detect, discern, glimpse, spot, spy). We chose the name
primarily because it best fits the challenges of finding information from
mounds of documents and their content.
ESPY is a Search Engine which relies on text that is hidden within an
image. It does not matter if the text was embedded in a word document or an
image file such as a .pdf. ESPY is composed of an
Indexer and a Search Engine. ESPY Indexer and
Search Engine
ESPY relies on OCR (Optical Character Recognition), a conversion tool as our
TR OCR solution that reads binary images and converts the black and white
through a text recognition process.
All other documents whether they be incapsulated
into a PDF or in their native form contain a text nature (layer).
ESPY searches through all text natured documents. ESPY Search Form
There are three sections to the input form.
Section One is the user defined list box with a choice of search
parameters and types to be deployed.
Second is the Results Sections which tells which database the result
was located from, the folder name within UB and the Filename. You will
notice a text area which displays
the first hit within the focused document. The Third section displays the
PREVIEW of hits requested, embedded into HTML with the highlighted hits. To
view, click on FILE VIEW and the native document format (.doc, .xls, etc.) is displayed. In the future, (currently under
construction) the system will locate the requested search results within the
native document and be highlighted.
Records that are scanned and recognized are saved as a .pdf file with hidden
text, are then subsequently placed into a sub-directory of other records.
Simultaneously, through a timed event (Indexer) within ESPY these documents
will be indexed. “Other records”
are any formatted record whether a Word Document, Excel Spreadsheet or OCR
image. The indexer will provide a searchable index file that ESPY search
engine can read and locate through the sub directory.
ESPY Index Search Screen with search options is displayed below:
Figure 10.1: Index Search screen
displaying search options!
The Search Criteria is a User interface which lets the user add in his
search words or phrases. There
are different types of search capabilities built into ESPY.
The following will be defined.
Boolean
Boolean search: a group of words, phrases, or macros linked by connectors
such as
AND
and OR that indicate the relationship between them.
Examples:
Search Request Meaning
apple and pear both words must be present
apple or pear either word can be present
apple w/5 pear apple must occur within 5 words of pear
apple not w/5 pear apple must occur, but not within 5 words of pear
apple and not pear only apple must be present
name contains smith the field name must contain smith
apple w/5 xfirstword apple must occur in the first five words
apple w/5 xlastword apple must occur in the last five words
You can use variable term weighting in a search request to weight some words
more
heavily than others in
ranking search results.
Example: apple:5 and pear:3 Search Types
Any words: use quotation marks around phrases, put + (plus) in front of any
word or
phrase that is required, and
- (minus) in front of a word or phrase to exclude it.
Examples:
banana pear "apple pie"
"apple pie" -salad +"ice cream"
All words: is like an "any words" search except that all of the words in the
search request
must be present for a
document to be retrieved. This
is like using “equal to” which depending on the type of document condition
may bring back nothing.
Search Features Wildcards
Use * to match any number of characters and ? to match any single character.
Stemming finds other grammatical forms of the words in your search request.
Example:
A search for applies would also find apply, applying or applied. Fuzzy Search
Fuzzy search sifts through scanning and typographical errors. Fuzziness
adjusts from 1
to 10 depending on the
degree of misspellings. A search for alphabet with a fuzziness of
1 would find alphaqet; with a fuzziness of 3, it
would find both alphaqet and
alpkaqet. Synonym
Synonym searching finds synonyms of a word that you include in a search
request. For example, a search for fast would also find quickly. To enable
synonym searching, check the Synonym search box in the search dialog box.
You can also enable synonym searching selectively by adding
the & character after certain words in your request. Example:
improve& w/5 search
UBESPY provides three ways to perform synonym searching:
Check Synonyms to find synonyms using the WordNet concept network included
with UBESPY.
Check Related Words to find related words from the WordNet concept network.
Check User synonyms to find synonyms that you have defined in your own
thesaurus.
User Synonym To be Added
Wordnet Synonym Current and
Included
Wordnet Related Words Current and Included Stemming
Stemming Search finds other grammatical forms of the words in your search
request. Phonic Search
Phonic Search finds words that sound similar to words in your request, like
Smith and
Smythe. Natural Language
Search
Natural Language. A natural language
search request consists of an unstructured natural
language or "plain English" query. In a natural language search request, words such as AND
and
OR are disregarded. Any specific word ordering such as phrases will also be
disregarded. A natural language search can
rank retrieved files from most to least relevant according to the density
and rarity of matching words in your documents. Wild Card Search
Wildcard search Use * to match any number of characters
and ?
to match any single character.
Searching using the Search Screen Indexes to search
The top right of the form shows a list box and labeled with search criteria. Add data phases etc., you wish to look
for and click search. Remember
you can search by any search engine or combinations of engines.
As you will see results are displayed in ranked order. (100 being the
highest down to 1.) Notice the
second section which displays the ranked text along with the first 20 words
of the first document. Now look at the third section for PREVIEW and
FILEVIEW. Notice that preview marks by highlight
your hits. The double arrows
allow the user to move back and forth by clicking double arrow forward and
double arrow backwards.
If you decide to view the original formatted document click the FILE VIEW
tab.
Note: your first search may take as long as 3 to 5 seconds then after they
are instant. The reason the LAN/WAN server must be
communicated with the index server and location to image directory.
Figure 10.2. Displays PREVIEW and
FILE VIEW
Notice the double arrows will move the user through the document forwards
and backwards.
Remember by
clicking on the file tab moves the user from Preview to Fileview! |