Use Case - Data-Mining -Cloud Access v02 Jan 5 2015

The  YourSaleLeads  SaaS  Platform

The Platform provides the following functionality:

Data Integration

YourSaleLeads platform is being built on advanced data management technologies including data enhancement for marketing segmentation and  up to date data profiles.

Multi Sourced Prospect Data Warehouse

YourSaleLeads integrates prospect data from multiple proprietary and third party databases, providing access to targeted business and consumer leads. YourSaleLeads prospect database enables users to generate real time marketing lists.

Automated Modeling and Segmentation

Automate statistical and predictive modeling to better segment, target and communicate with your contacts. YourSaleLeads statistical modeling and segmentation delivers the right message, to the right individual at the right time.


Use Case - Data-Mining v04 Jan 5 2015


YourSaleLeads  Platform Databases Aggregation &  Calculation Infrastructure

Sales Optimization Module (SOM) provides the following functionality:  in response  to customer data criteria SOM generates few sets of databases from the pool of all available databases/data sources. Each set optimized by two parameters (see problem statement below): price and data quality.

Optimization Problem v02 Jan 5 2015

We expect to serve the following types of customer requests:

Detailed Requests: user supply a details set of categories accompanied by numerical values.  For example: sales leads are generated on the basis of demographic criteria such as credit score FICO, income, age, household income HHI, etc.

Generic Requests:  user supply a generic request without proper categorization.  Our goal is to convert it to Detailed Request. For example: sales leads for specific product, service, etc.

Mixed Requests: user supply part of details or some details required adjustments or replacement.


Processing Detailed requests

Step 1.1

We start execution of Type I request with Sales Optimization Module (SOM).  SOM provides the following functionality:  in response to customer detail request SOM generates few sets of databases (DB) from the pool of all available databases/data sources for further processing. Each DB in a set contains subset of categories from the customer detail request. For example, if set consists of three DBs and customer request contains five categories, then first DB contain Category 3; second DB contain Category 1 and 5 and third DB include Category 2 and 4.

Each set optimized by few parameters defined by sales team.  It could be optimization defined by min or max price of data or min price of data plus certain data quality parameters (for example by data up today period) and so on.

Step 1.2

On this step we employ cluster of servers (possible Hadoop) for speedy parallel processing: we query defined by SOM DBs in parallel on separate data nodes. We store resulting multiple lists of rows in intermediary staging database (SDB). Each list create independent table in SDB.
Each list contains unique set of categories (see Step 1 example) and one of primary keys.

Step 1.3

On this stage we aggregate multiple lists together. As the result each row will contain all requested categories. Aggregation will include resolution of incompatibilities between similar fields, e.g. spelling errors, differences in formatting and geographical locations, and so on. Such resolution will require usage of additional data sources.  This is a standard well developed technology and we do not expect big troubles here.

We store clean lists and aggregated output lists in our proprietary Lists Data Warehouse (LDW) in separate dimension tables while upending entries to central fact table. It will be handy to serve future customer requests.


Detailed Requests Processing v02 Jan 06 2015

Detailed Requests Processing


Processing Generic Requests

Here user supply a generic requests without proper categorization.  Our goal is to convert it to Detailed Requests.

This is the typical predictive analytic and data-mining task. For example, we need to figure out types of B2B or B2C buyers for specific product. In case of B2B we will look at the companies who already bought the product. Then we analyze parameters describing such companies.  The set of such parameters describe a so called pattern of a potential buyer. Using this pattern we can predict potential buyers.  To do so we need to compare our pattern with pattern describing another B2B customer. If patterns match then we have potential buyer.  The same apply to B2C case.

The critical part here is the accuracy of pattern creation. Nobody need 50% results. The predictive analytic use accurate statistical analysis for pattern creation. In our case tool for job is a logistic regression. Logistic regression is an excellent way to predict whether or not something will happen, and how confident we are in such predictions. It takes a number of numeric attributes into account and then uses those through a training data set to predict the probable outcomes in a comparable scoring data set (set of new prospective customers).

To implement such approach we need the following components:

  • Data-mine training data (refer to our example – information’s about the companies who already bought the product and possibly circumstances describing the sale). We can get such data from multiple sources. Our task is to develop access to such sources.
  • Implement predictive analytic procedures. It will include the following stages:  data preparation (cleaning, normalization, dealing with outliers, etc.), modelling ( logistic regression algorithms),
    and deployment( putting model in production together with machine leaning procedures in order to keep our model permanently updated on new training data).  I do not expect to have problem on this stage – there are multiple open source software packages and not expensive third party tools.


Processing Mixed Requests

It will be identical to Generic Request but will require more interaction with a customer.


As the result of Detailed and Generic Requests execution we will accumulate the following resources:

    • Our Lists Data Warehouse (LDW). Using it we will reuse the data to serve future requests.
    • Library of patterns from multiple business and individual customer’s domains. Such library will dramatically increase accuracy of Detailed Requests

US PTO Provisional Patent Application
Filed 17-AUG-2015

Fig 1 Patent upd

Fig. 1:  The System Architecture


Fig. 1 is the system architecture illustrating logical flow of processes in accordance with various embodiments of present invention. A User #130 of Fig.1, initiates process of generating qualified sales leads. First, user needs to build a customer profile. Here he has two options: using one or multiple “Customer Profile Templates”, #110 or engage pre-build “Customer Profile Models”, #120. In case of #110, user using prebuild for user’s type of business templates and input/select in the named fields attributes values. A user has options to create new named fields. In case of #120, user selects the models for user’s type of business from the list of models. Output of #110 or #120 selection creates “The Final Customer Profile”, #150. By using “Model Changing Events Tracking System”, #260 we adjust the final output of #150. The “Subset of data Sources”, #170 for matching customer profile #150 created by optimization procedure “Data Sources Price & Quality Optimization”, #100. The optimization procedure required user input of desired level of price and data quality. Process #100 takes this input and apply price and data quality optimization criteria to select Subset of data sources #170 from “Data Sources”, #160 described by “Metadata of Data Sources”, #140. The process “Generate Qualified Data Leads”, #180 provides matching between customer profile #150 and Subset of Data sources. Output of #180 populates “Leads Database”, #190. “Follow-Up Sales Team”, #200 consumes generated leads #190. All closed by #200 deals populates “Close Deals Database”, #250. Information about customer profiles from #250 transferred to “Model’s Data Repository”, #230 and used in machine learning process “Adjust the Model”, #210. “Follow-Up Sales Team”, #200 generates additional attributes and values for adjusting customer profiles. Such data transferred to “Model’s Data Repository”, #230 and used in machine learning processes “Adjust the Customer Model”, #210 and “Adjust Customer Profile Template”, #240.