Try this now

RecogniContact Contact Data & Address Parser

RecogniContact is LoquiSoft's solution for parsing international addresses and contact data. RecogniContact splits up a string containing textual contact information into fields such as first name, last name, street address, city, phone number etc.

Highlights

   Parses international addresses
   Standardizes data: phone numbers, country names, etc.
   Unmatched recognition performance

Main features

   Supports 13 languages
   Automatically identifies country for postal addresses
   Automatically searches text for contact data
   Automatically identifies a person’s gender
   Automatically identifies mobile phone numbers
   Understands all common formats and conventions

Online demo

To test RecogniContact online, press: Try this now

Products

The RecogniContact technology is available in these LoquiSoft products.

Products for software developers
RecogniContact/Web – RecogniContact as a Web service
RecogniContact/COM – RecogniContact as a Windows COM component

Products for consumers
ContactCopy – Our user-friendly Windows tool for transferring contact information from virtually any source to Microsoft Outlook, Microsoft Excel, etc., with just a mouse click


Countries and languages

For the following countries, RecogniContact recognizes contact information including postal addresses. For these countries, RecogniContact includes a comprehensive database with place names, so that it can identify the country in a postal address even if the country is not explicitly specified.

Countries
RecogniContact splits up contact information including postal address data for the following countries and regions:

  • USA
  • Canada
  • Europe (except greek and cyrillic alphabets)
  • Australia

Languages
RecogniContact recognizes all commonly used strings for structuring contact information (e.g., name: address: phone: email: etc.) in the following languages:

  • Catalan
  • Danish
  • Dutch
  • English
  • Finnish
  • French
  • German
  • Icelandic
  • Italian
  • Norwegian
  • Portuguese
  • Spanish
  • Swedish


Example: USA postal address
Street address=       11 West 53 Street
ZIP/Postal code=      10019
Place name=           New York
State/Region/Province=NY
Country=              United States
ISO country code=     US
Example: UK postal address
Street address=       53 Bankside
ZIP/Postal code=      SE1 9TG
Place name=           London
Country=              United Kingdom
ISO country code=     GB
Example: German postal address
Street address=       Potsdamer Str. 50
ZIP/Postal code=      10785
Place name=           Berlin
Country=              Deutschland
ISO country code=     DE

For all countries of the world, RecogniContact splits up contact information without postal addresses.

The following preconditions apply:

  • The data is noted in Latin letters (as opposed to, for example, Greek or Cyrillic letters).
  • Language-dependent elements are specified in one of the 13 languages currently supported.

Country detection

RecogniContact automatically identifies the country that contact information comes from. It uses the following information for this purpose:
  • ZIP/postal code format and place name (the integrated database comprises more than 200,000 place names)
  • Country codes in telephone numbers
  • Country domains in email and Web addresses

This information will be used to standardize phone numbers to a unified format or to add country information that is missing from a postal address.

Example: Detect country from address format and place name
Street address=       Potsdamer Str. 50
ZIP/Postal code=      10785
Place name=           Berlin
Country=              Germany
ISO country code=     DE
Example: Detect country from phone number
Phone=                +43 (1) 5556666
Country=              Austria
ISO country code=     AT

Format-independent parsing

In addition to standardized formats for addresses and phone numbers, RecogniContact recognizes all other commonly used conventions for each country. In particular, RecogniContact does not require contact information elements to be separated by any specific separators, or separators to be used consistently throughout the text.

This is particularly helpful

  • if contact data comes from sources (email messages, websites) where the items don't have any predefined structure, or
  • if addresses are copied from tabular sources such as spreadsheets or tables on websites.
Even if there are no separators at all, RecogniContact can parse unambiguous data correctly.
Example: Unformatted address, no separators
Street address=       11 West 53 St
ZIP/Postal code=      10019
Place name=           New York
State/Region/Province=New York
Country=              United States
ISO country code=     US

Contact information fields

RecogniContact extracts the following fields from text containing contact information:

  • Person-related fields
    • Name prefixes (Mr., Dr., etc.)
    • First name or initial letter
    • Second name or initial letter
    • Last name
    • Suffix – titles such as Ph.D., MBA or name suffixes such as Junior, Jr.
    • Position (technical director, marketing manager, etc.)

  • Company/organization-related fields
    • Company/organization name
    • Department

  • Address
    • Street address
    • ZIP/postal code of the street address
    • Post office box address
    • ZIP/postal code of the post office box address
    • Place name
    • Country
    • Region information: state (USA), county (Ireland), province (Italy), canton (Switzerland), Bundesland (Germany) …

  • Telephone numbers
    • Fixed line
    • Mobile phone
    • Fax number

  • Internet
    • Email address
    • Website address

Example: Name parsing
Prefix=               Mr
First name=           Walter
Middle name=          W.
Last name=            Wagoner
Suffix=               PhD
Gender=               M
Example: Name, position and organization
First name=           Walter
Last name=            Wagoner
Company=              Museum of Modern Art
Position=             Exhibition Manager
Gender=               M
Example: Separate street and post office box addresses
Company=              Street address
Street address=       Rozenstraat 59
Postbox address=     Postbus 75082
Postal code of postbox=1070 AB
ZIP/Postal code=      1016 NN
Place name=           Amsterdam
Country=              Netherlands
ISO country code=     NL

Phone number parser

If country information is available, RecogniContact automatically standardizes phone numbers to an international format. For some countries, regional codes are identified.
Example: Standardize phone numbers – country prefix, regional prefix
Country=              Germany
Phone=                +49 (30) 5556666
ISO country code=     DE
If a phone number starts with the prefix of a mobile phone network, RecogniContact automatically assigns it to the mobile phone number field.
Example: Mobile phone numbers
Country=              Germany
Mobile phone=        +49 (151) 5556666
ISO country code=     DE


Integrated databases of known items

RecogniContact maintains a comprehensive database that contains, among other things, the following information:

  • More than 200,000 place-names in the USA, Canada, Australia, and the following European countries:
    Austria, Belgium, Denmark, Finland, France (plus Monaco), Germany, Iceland, Ireland, Italy (plus San Marino and the Vatican), Luxembourg, The Netherlands, Norway, Portugal, Spain (plus Andorra), Sweden, Switzerland (plus Liechtenstein), United Kingdom

    This place name database permits the identification of the country according to the postal address given, even if the country is not specified explicitly.


  • 12,000 first names with gender information.

  • The international country prefixes of all countries in the world, from +1 (USA & Canada) to +997 (Bahamas).

  • Multilingual place names such as Vienna, Vienne, Wien, Wenen

  • Country-specific regional information, including state (USA), province (Italy), canton (Switzerland), Bundesland (Germany), county (Ireland) …

  • Strings in 13 languages for the following elements:

    • Country names
      Germany, Deutschland, Allemagne, Duitsland

    • Job titles
      Director, Direktor, Directeur

    • Common street identifiers
      -street, -straße, rue, -straat

    • Post office box identifiers
      P.O. Box, Postfach, Boîte postale, Postbus

    • Salutations
      Mrs., Fr., Mme, Mevr

    • Company types
      Ltd, GmbH, Sarl, BV

    • Strings used to structure contact information:
      Name, Nom, Naam, Namn 

Person's gender

If a contact record contains a person's name, RecogniContact automatically adds the person's gender from the first name.

First names that don't allow a conclusion on the person's gender are taken into consideration: Alex, Cameron, Chris, Sasha.

Example: Gender detection
First name= Walter
Last name=  Wagoner
Gender=     M

Structuring elements, unrecognized information

RecogniContact recognizes structuring elements that are embedded in contact information (Name:, Address:, Tel:, Fax:) and uses them to help interpret the data. RecogniContact understands structuring elements in 13 different languages (see above).

Example: Structuring elements in Finnish
Street address=       Mannerheiminaukio 2
ZIP/Postal code=      00100
Place name=           Helsinki
Country=              Finland 
Phone=                +358 (9) 1733 6501
ISO country code=     FI

Any information RecogniContact is unable to handle will be returned as an unrecognized item. This ensures that no information will be lost in the parsing process.
Example: Unrecognized information
Company=      Museum of Modern Art
Phone=        (212) 708-9400

Unrecognized values:
Opening times: 10:30 a.m.–5:30 p.m.


Unmatched recognition accuracy

Recognition algorithms that try to reduce the problem of contact information parsing to a few standard recognition patterns show their limitations very quickly. They are unable to bear the challenges of robust and reliable contact and address parsing in some very common situations:

  • When an unexpected address supplement is used
  • When unexpected punctuation characters are used in the source text or white spaces are misplaced
  • When items are not separated by new line characters or by consistent separator characters
  • When elements are copied from spreadsheets in which the data is arranged in a tabular format

Input data as diverse and complex as contact and address information cannot be captured in a limited number of recognition patterns or regular expressions. This is particularly true with international address data. In practice, only a small fraction of all textual contact information complies with standards. It is impossible to create a comprehensive list of all address formats and conventions ever used in reality. And even if such a list were available, at a certain level of complexity merely pattern-based recognition would be very inefficient.

To meet these challenges, LoquiSoft has created a set of algorithms tailored specifically to the problem of address and contact information parsing.

RecogniContact splits up contact data as far as possible, independently of standard address formats, specific separator characters and consistent structure. With this strategy, RecogniContact achieves unmatched recognition results.


Quality measures

As in all software that processes semantic and language-dependent information, a certain residual error rate cannot be avoided in address parsing, due to unknown or ambiguous information (example: automatic spell-checking).

To ensure the quality of RecogniContact's recognition results, LoquiSoft uses the following methods:

  • Test database
    During every software update, a test database with thousands of manually tagged contact data records is used as a basis to verify and improve RecogniContact's parsing algorithms.

  • Flexible recognition rules
    The set of recognition rules within RecogniContact is highly flexible. New rules, exceptions and exceptions of exceptions can be added without the risk of making the recognition algorithm slow, inefficient or overly complex to manage.

  • User feedback
    Whenever users of our ContactCopy product find recognition problems, they can report them to us with just a mouse click. This valuable feedback has helped us eliminate problems and improve our recognition algorithms since ContactCopy's original release in 2007.

Reference clients

Our clients that use the RecogniContact technology in their products include