AKS
deployment kubernetes azure aksOn this page
What are Kubernetes and Azure Kubernetes Services (AKS)?
Kubernetes is open-source software that helps deploy and manage containerized applications at scale. It orchestrates a cluster of Azure virtual machines, schedules containers, automatically manages service discovery, incorporates load balancing, and tracks resource allocation. It also checks the health of individual resources and heals apps with auto-restart and auto-replication. AKS provides a managed Kubernetes service with automated provisioning, upgrading, monitoring, and on-demand scaling. (Source: https://azure.microsoft.com/en-us/services/kubernetes-service/#faq)
There are several ways to install an AKS cluster: ARM templates, Azure CLI, Terraform - the choice is yours.
To deploy Azure resources for CluedIn, you need to provide CluedIn Partner GUID (e64d9978-e282-4d1c-9f2e-0eccb50582e4
). The way you provide the Partner GUID depends on the way you deploy AKS:
Deploy AKS
Role-Based Access Control (RBAC)
To deploy and manage Azure resources, you need sufficient access rights. You can read more about it in Microsoft Documentation: Manage access to your Azure environment with Azure role-based access control Azure built-in roles You need a Contributor role on the Subscription level. If it’s not possible to have this role, you need to ask someone with enough permissions to create an AKS cluster. When you create a new AKS cluster in a particular resource group, Microsoft Azure automatically creates an infrastructure resource group (with “MC_” prefix) to keep AKS-related resources: disks, public IP, identity, etc. Therefore, you should have enough permissions to create resource groups in a given subscription to create a cluster. Then, to manage the cluster, you need to be a Contributor in two AKS resource groups - the group where you have created the AKS and the related infrastructure group.
Azure CLI
Walkthrough (Microsoft Docs): https://docs.microsoft.com/en-us/azure/aks/kubernetes-walkthrough Microsoft’s instructions to deploy the Partner GUID: https://docs.microsoft.com/en-us/azure/marketplace/azure-partner-customer-usage-attribution#example-azure-cli
To install the Partner GUID, you need to add an environment variable to your terminal session.
Bash:
export AZURE_HTTP_USER_AGENT='pid-e64d9978-e282-4d1c-9f2e-0eccb50582e4' ;
echo AZURE_HTTP_USER_AGENT # should print pid-e64d9978-e282-4d1c-9f2e-0eccb50582e4
PowerShell:
$env:AZURE_HTTP_USER_AGENT='pid-e64d9978-e282-4d1c-9f2e-0eccb50582e4' ;
$env:AZURE_HTTP_USER_AGENT # should print pid-e64d9978-e282-4d1c-9f2e-0eccb50582e4
ARM Template
Walkthrough (Microsoft Docs): https://docs.microsoft.com/en-us/azure/aks/kubernetes-walkthrough-rm-template
Walkthrough with Partner GUID (CluedIn Docs): https://documentation.cluedin.net/kb/azure-customer-usage-attribution.
Microsoft’s instructions to deploy the Partner GUID: https://docs.microsoft.com/en-us/azure/marketplace/azure-partner-customer-usage-attribution#add-a-guid-to-a-resource-manager-template
To deploy with the Partner GUID, you only need to add this deployment to the resources section:
{
"apiVersion": "2020-06-01",
"name": "pid-e64d9978-e282-4d1c-9f2e-0eccb50582e4",
"type": "Microsoft.Resources/deployments",
"properties": {
"mode": "Incremental",
"template": {
"$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"resources": []
}
}
},
Sizing
The recommended size of the AKS cluster depends on the amount of data you plan to process. But, of course, you can always scale it up and down as you need.
More than five million records
Nodepool | Nodepool Type | Nodepool Size | Node | Workload | CPU Request (Cores) | RAM Request | CPU Limit (Cores) | RAM Limit | Disk Type | Disk Size | Purpose | Taint | Toleration |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Processing | F16s_v2 | 15.7 Cores, 28 GB (Allocatable ) | 1 | CluedIn Processing | 14 | 20GB | 15 | 24GB | - | - | Processing incoming and outgoing master data | ||
2 | CluedIn Processing | 14 | 20GB | 15 | 24GB | - | - | Additional processing for historical ingestion (temporary) | |||||
Datalayer | D8s_v4 | 7.6 Cores, 28 GB (Allocatable) | 1 | Neo4j | 7 | 28 | 7 | 28 | Standard SSD | 500GB | Graph Database | datalayerPool=true | datalayerPool=true |
2 | ElasticSearch | 7 | 28 | 7 | 28 | Standard SSD | 500GB | Search Index | datalayerPool=true | datalayerPool=true | |||
3 | SQL Server | 3 | 8 | 6 | 16 | Standard SSD | 750GB | Relational Database | datalayerPool=true | datalayerPool=true | |||
RabbitMQ | 1 | 4 | 2 | 4 | Standard SSD | 150GB | Service Bus | datalayerPool=true | datalayerPool=true | ||||
Redis | 0.5 | 512Mi | 1 | 1 | Standard SSD | 32GB | Cache | datalayerPool=true | datalayerPool=true | ||||
Generic | D4_v3 | 3.8 Cores, 14 GB (Allocatable) | 1 | Annotation | 0.125 | 64Mi | 1 | 0.512 | - | - | Annotation Service | - | - |
GQL | 0.2 | 64Mi | 1 | 1 | - | - | GraphQL Layer | - | - | ||||
Submitter | 0.25 | 128Mi | 1 | 1 | - | - | Clue Submitter Service | - | - | ||||
UI | 0.25 | 0.75 | 256Mi | 512Mi | - | - | User Interface | - | - | ||||
Webapi | 0.25 | 256Mi | 0.5 | 512Mi | - | - | User Interface Wrapper | - | - | ||||
2 | CluedIn API | 1.5 | 6Gi | 2 | 10Gi | - | - | CluedIn Server WebApi | - | - | |||
CluedIn Crawling | 2 | 8Gi | 2 | 12Gi | - | - | CluedIn Crawling Pod | - | - |
Less than five million records
Nodepool | Nodepool Type | Nodepool Size | Node | Workload | CPU Request (Cores) | RAM Request | CPU Limit (Cores) | RAM Limit | Disk Type | Disk Size | Purpose | Taint | Toleration |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Processing | F8s_v2 | 7.6 Cores, 14GB (Allocatable) | 1 | CluedIn Processing | 7 | 12 | 7 | 12 | - | - | Processing incoming and outgoing master data | processingPool=true | processingPool=true |
Datalayer | D8s_v4 | 7.6 Cores, 28 GB (Allocatable) | 1 | Neo4j | 3.5 | 12 | 3.5 | 12 | Standard SSD | 250GB | Graph Database | datalayerPool=true | datalayerPool=true |
ElasticSearch | 3.5 | 12 | 3.5 | 12 | Standard SSD | 500GB | Search Index | datalayerPool=true | datalayerPool=true | ||||
2 | SQL Server | 3 | 8 | 6 | 16 | Standard SSD | 750 GB | Relational Database | datalayerPool=true | datalayerPool=true | |||
RabbitMQ | 1 | 4 | 2 | 4 | Standard SSD | 150 GB | Service Bus | datalayerPool=true | datalayerPool=true | ||||
Redis | 0.5 | 512Mi | 1 | 1 | Standard SSD | 32GB | Cache | datalayerPool=true | datalayerPool=true | ||||
Generic | D4_v3 | 3.8 Cores, 14 GB (Allocatable) | 1 | Annotation | 0.125 | 64Mi | 1 | 0.512 | - | - | Annotation Service | - | - |
GQL | 0.2 | 64Mi | 1 | 1 | - | - | GraphQL Layer | - | - | ||||
Submitter | 0.25 | 128Mi | 1 | 1 | - | - | Clue Submitter Service | - | - | ||||
UI | 0.25 | 0.75 | 256Mi | 512Mi | - | - | User Interface | - | - | ||||
Webapi | 0.25 | 256Mi | 0.5 | 512Mi | - | - | User Interface Wrapper | - | - | ||||
2 | CluedIn API | 1.5 | 6Gi | 2 | 10Gi | - | - | CluedIn Server WebApi | - | - | |||
CluedIn Crawling | 2 | 8Gi | 2 | 12Gi | - | - | CluedIn Crawling Pod | - | - |