RecogniContact Contact Data & Address Parser
RecogniContact is LoquiSoft's solution for parsing international addresses and contact data. RecogniContact splits up a string containing textual contact information into fields such as first name, last name, street address, city, phone number etc.
Parses international addresses
Standardizes data: phone numbers, country names, etc.
Unmatched recognition performance
Supports 13 languages
Automatically identifies country for postal addresses
Automatically searches text for contact data
Automatically identifies a person’s gender
Automatically identifies mobile phone numbers
Understands all common formats and conventions
To test RecogniContact online, press:
The RecogniContact technology is available in these LoquiSoft products.
Products for software developers
RecogniContact/Web – RecogniContact as a Web service
RecogniContact/COM – RecogniContact as a Windows COM component
Products for consumers
ContactCopy – Our user-friendly Windows tool for transferring contact information from virtually any source to Microsoft Outlook, Microsoft Excel, etc., with just a mouse click
Countries and languages
For the following countries, RecogniContact recognizes contact information including postal addresses. For these countries, RecogniContact includes a comprehensive database with place names, so that it can identify the country in a postal address even if the country is not explicitly specified.
RecogniContact splits up contact information including postal address data for the following countries and regions:
- Europe (except greek and cyrillic alphabets)
RecogniContact recognizes all commonly used strings for structuring contact information (e.g., name: address: phone: email: etc.) in the following languages:
For all countries of the world, RecogniContact splits up contact information without postal addresses.
The following preconditions apply:
- The data is noted in Latin letters (as opposed to, for example, Greek or Cyrillic letters).
- Language-dependent elements are specified in one of the 13 languages currently supported.
RecogniContact automatically identifies the country that contact information comes from. It uses the following information for this purpose:
- ZIP/postal code format and place name (the integrated database comprises more than 200,000 place names)
- Country codes in telephone numbers
- Country domains in email and Web addresses
This information will be used to standardize phone numbers to a unified format or to add country information that is missing from a postal address.
In addition to standardized formats for addresses and phone numbers, RecogniContact recognizes all other commonly used conventions for each country. In particular, RecogniContact does not require contact information elements to be separated by any specific separators, or separators to be used consistently throughout the text.
This is particularly helpful
Even if there are no separators at all, RecogniContact can parse unambiguous data correctly.
- if contact data comes from sources (email messages, websites) where the items don't have any predefined structure, or
- if addresses are copied from tabular sources such as spreadsheets or tables on websites.
Contact information fields
RecogniContact extracts the following fields from text containing contact information:
- Person-related fields
- Name prefixes (Mr., Dr., etc.)
- First name or initial letter
- Second name or initial letter
- Last name
- Suffix – titles such as Ph.D., MBA or name suffixes such as Junior, Jr.
- Position (technical director, marketing manager, etc.)
- Company/organization-related fields
- Company/organization name
- Street address
- ZIP/postal code of the street address
- Post office box address
- ZIP/postal code of the post office box address
- Place name
- Region information: state (USA), county (Ireland), province (Italy), canton (Switzerland), Bundesland (Germany) …
- Telephone numbers
- Fixed line
- Mobile phone
- Fax number
- Email address
- Website address
Phone number parser
If country information is available, RecogniContact automatically standardizes phone numbers to an international format. For some countries, regional codes are identified. If a phone number starts with the prefix of a mobile phone network, RecogniContact automatically assigns it to the mobile phone number field.
Integrated databases of known items
RecogniContact maintains a comprehensive database that contains, among other things, the following information:
- More than 200,000 place-names in the USA, Canada, Australia, and the following European countries:
Austria, Belgium, Denmark, Finland, France (plus Monaco), Germany, Iceland, Ireland, Italy (plus San Marino and the Vatican), Luxembourg, The Netherlands, Norway, Portugal, Spain (plus Andorra), Sweden, Switzerland (plus Liechtenstein), United Kingdom
This place name database permits the identification of the country according to the postal address given, even if the country is not specified explicitly.
- 12,000 first names with gender information.
- The international country prefixes of all countries in the world, from +1 (USA & Canada) to +997 (Bahamas).
- Multilingual place names such as Vienna, Vienne, Wien, Wenen …
- Country-specific regional information, including state (USA), province (Italy), canton (Switzerland), Bundesland (Germany), county (Ireland) …
- Strings in 13 languages for the following elements:
- Country names
Germany, Deutschland, Allemagne, Duitsland
- Job titles
Director, Direktor, Directeur
- Common street identifiers
-street, -straße, rue, -straat
- Post office box identifiers
P.O. Box, Postfach, Boîte postale, Postbus
Mrs., Fr., Mme, Mevr
- Company types
Ltd, GmbH, Sarl, BV
- Strings used to structure contact information:
Name, Nom, Naam, Namn
If a contact record contains a person's name, RecogniContact automatically adds the person's gender from the first name.
First names that don't allow a conclusion on the person's gender are taken into consideration: Alex, Cameron, Chris, Sasha.
Structuring elements, unrecognized information
RecogniContact recognizes structuring elements that are embedded in contact information (Name:, Address:, Tel:, Fax:) and uses them to help interpret the data. RecogniContact understands structuring elements in 13 different languages (see above).
Any information RecogniContact is unable to handle will be returned as an unrecognized item. This ensures that no information will be lost in the parsing process.
Unmatched recognition accuracy
Recognition algorithms that try to reduce the problem of contact information parsing to a few standard recognition patterns show their limitations very quickly. They are unable to bear the challenges of robust and reliable contact and address parsing in some very common situations:
- When an unexpected address supplement is used
- When unexpected punctuation characters are used in the source text or white spaces are misplaced
- When items are not separated by new line characters or by consistent separator characters
- When elements are copied from spreadsheets in which the data is arranged in a tabular format
Input data as diverse and complex as contact and address information cannot be captured in a limited number of recognition patterns or regular expressions. This is particularly true with international address data. In practice, only a small fraction of all textual contact information complies with standards. It is impossible to create a comprehensive list of all address formats and conventions ever used in reality. And even if such a list were available, at a certain level of complexity merely pattern-based recognition would be very inefficient.
To meet these challenges, LoquiSoft has created a set of algorithms tailored specifically to the problem of address and contact information parsing.
RecogniContact splits up contact data as far as possible, independently of standard address formats, specific separator characters and consistent structure. With this strategy, RecogniContact achieves unmatched recognition results.
As in all software that processes semantic and language-dependent information, a certain residual error rate cannot be avoided in address parsing, due to unknown or ambiguous information (example: automatic spell-checking).
To ensure the quality of RecogniContact's recognition results, LoquiSoft uses the following methods:
- Test database
During every software update, a test database with thousands of manually tagged contact data records is used as a basis to verify and improve RecogniContact's parsing algorithms.
- Flexible recognition rules
The set of recognition rules within RecogniContact is highly flexible. New rules, exceptions and exceptions of exceptions can be added without the risk of making the recognition algorithm slow, inefficient or overly complex to manage.
- User feedback
Whenever users of our ContactCopy product find recognition problems, they can report them to us with just a mouse click. This valuable feedback has helped us eliminate problems and improve our recognition algorithms since ContactCopy's original release in 2007.
Our clients that use the RecogniContact technology in their products include