ArgusSearch

Introduction:

Planet's magic Full Text search engine ArgusSearch is based on a brand new Artificial Intelligence technology (pending patent) and Planet's outstanding award winning handwriting recognition technology. It allows us to find matches for any arbitrary search request within billions of ancient documents within seconds!

The outstanding features in a short overview:

  • no manual referencing (transcription) of the documents is necessary,
    therefore all scanned documents are accessible

  • super fast full text search engine

  • allows almost any arbitrary requests (Regular Expressions are supported)

  • provides a list of matches sorted by confidence

 

In fact the first point is probably one of the most exciting ones. For content providers, to make use of this search engine the scanning of the historic documents is needed, all the rest is done by our ArgusSearch-Bots in fully automatic mode. Reading these documents they create a universal reference for all handwritten or machine printed texts and therefore create a complex and dynamic memory waiting for your search requests.

The real magic happens when you start asking our system to search for results. For your questions you can use simple strings as well as complex search expressions (Regular Expressions), but different from any standard string matching algorithm, our system will find matching text areas with instinctive certainty (and sometimes even with a little bit of fantasy). All matches will be provided as a list sorted by confidences (matching probabilities as score values).

 

The Demonstrator:
To provide a first impression about the look and feel of ArgusSearch, we developed a first prototype based on genealogical documents.
This demonstrator comes with the following limitations:

  • only a limited source of 1930 US Census Data was used (approx. 5.000 pages)

  • only the name fields are referenced by our ArgusSearch-Bots

  • all requests are case sensitive (please care about the spelling)

  • only a limited result list will be provided (max 50 best matches and only with reasonable score values)

 

Nevertheless you can play around with the power of requests based on Regular Expressions. You can initiate many different search requests. Just to get some ideas, take a look into the HowTo section later.

Please be creative and play around with your own ideas, especially requests with few matches can be very interesting. Since the search technology is NOT based on any previous transcription of the texts, you will experience some surprising effects:

  • almost no mistakes are possible, even nearly unreadable texts will be found

  • especially in case of exotic requests, the system tries to get matches, sometimes even with some fantasy (e.g. “Caplaw Wolf” and “Leaplaw Wolf”, you will see the same result)

 

HINT: Check the score values provided together with the result: all results will be presented with a “cost” value, the lower the cost, the better the match, costs above values of 1.0 indicate already an unreliable match, the higher the costs the more fantasy was used by the system to find a match.

 

Short HowTo and Examples are better than any documentation:

Start-Screen with Search Window:


Simply key in (Google-like) your search expression and look what happens! Here are some ideas/example about how such a search request may look like:

  • you can create a very simple search for the name “John” or “Smith

  • or ask for “(John Smith)|(Smith, John)” to get matches of both name styles

  • or “Smith .*iam” to get all “Smith” with first names ending on “iam”

  • or “Mc Donald (John|James|J )” for Mc Donalds with certain first names

  • or “[A-Z][a-z]{2,10} Newton” for first name Newton

  • or “Newton [A-Z][a-z]{2,10}” for last name Newton

 

And enjoy the result page! The blueish marked areas indicate the region, where the Regular Expression (see in window header) matches. A short intro-video you can find here.

 

 

Technological Background:
For those people who really kept up reading till here, lets take a short look behind the scenes of the magic Full Text search engine, into our patented technology:

The entire system is based on 2 separate processing steps:

 

Step 1: Processing all text lines by our ArgusSearch-Bots is done in fully automatic mode in the background once for every document creating a compact and searchable memory (Perception Matrix) for all processed texts. Deep Recurrent Neural Networks deliver the recognition power to our bots.
Once, this processing step is done for a document, it is part of the memory and ready for Step 2 - the Search!

 

Step 2: Each request in the web front-end, expressed as a Regular Expression will then be matched against that memory (Perception Matrix) instantaneously when the request is sent by a user. This matching process is extremely fast and tolerant, it will find even only partially fitting matches, all matches are sorted by matching confidences. The additional ability to process Regular Expressions provides an extremely flexible and powerful search-functionality not thought possible with historic documents, especially with handwritten documents.