Using and Installing the Alation Data Catalog

Reading Time: 3 minutes

What is the Alation Data Catalog?

Alation is a data catalog tool used to search, query and collaborate on large data sets, using machine learning to gain insights incredibly quickly. In my environment, I have a Teradata source database that uses clickstream to record user behavior on the website, it tracks everything for clicks on the site, orders, user interaction – everything!

Alation’s enterprise data catalog dramatically improves the productivity of analysts, increases the accuracy of analytics, and drives confident data-driven decision-making while empowering everyone in your organization to find, understand, and govern data.

https://www.alation.com/product/data-catalog/

The Alation data catalogue is used to make sense of these Petabytes of data sets. Alation also connects to our AWS Redis Data sources and a few PowerBI instances on site.

Pre-Reqs

The key tasks ahead of installation of the Alation Data Catalogue to ensure a successful implementation are:

  • Procure & configure Alation compute instance
  • Confirm network rules are in place
  • Obtain Alation email account and SMTP server details
  • Create DNS entries for Alation URL
  • Procure & Configure Alation Analytics V2 compute instance
  • Prepare Service Accounts and collect connection details for in-scope data sources

Ports needed for you security group:

ServiceDirectionPortsDestination
DNSoutbound53DNS Server
Emailoutbound250.0.0.0/0
SSHoutbound465Email server
HTTP/HTTPSinbound80
443
Alation Node
Management Consoleinbound443Alation Node
LDAPoutbound389LDAP / AD Server
LDAPSoutbound636LDAP / AD Server

How to install Alation

First thing you need to do is reach out to Alation for a trail licence. This step was done for me by other members of the project. This guide is a high level overview of how to install Alation.

  • Reach out to Alation for Trial Licence and Installation files. An install can be done offline or via RPM or YUM. I would only recommend using Linux for Alation.
  • Provision a server instance. I did this in AWS – here are the specs:

AWS Instance – M5.2xLarge ( 8 CPU and 32GB RAM)

Configure Storage – 3x XFS file system 80GB Root partition, 500GB App Partition, 750GB Backup Partition)

sudo mkdir /data
sudo mkdir /backup
sudo lvcreate -n data vg_xfs
sudo lvcreate -l 100%FREE -n data 
sudo vgcreate vg_xfs /dev/nvme2n1
sudo vgcreate vg_backup_xfs /dev/nvme2n1
sudo lvcreate -l 100%FREE -n backup  
sudo mkfs.xfs /dev/vg_backup_xfs/backup

/etc/fstab
UUID=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx     /root           xfs    defaults,noatime  1   1
UUID=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx     /data xfs    defaults,noatime  1   1
UUID=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx     /backup xfs defaults,noatime  1   1
, Using and Installing the Alation Data Catalog
  • Download the Alation package. This can be done offline using the Alation Customer Portal https://customerportal.alationdata.com/ or via RPM. You get you access code from the Alation Customer Portal.
curl -kLH "Authorization: Token YOUR ACCESS TOKEN" https://customerportal.alationdata.com/api/build/137867/file/RPM/ > alation-2021.2-8.2.1.137867.rpm
, Using and Installing the Alation Data Catalog
Snapshot of the Alation Website
  • Next Install Alation
sudo yum update -y
sudo rpm -ivh alation-7.2.5.136994.rpm
sudo service alation init /data /backup
  • Now enter the Alation Shell
sudo service alation shell
  • You can look at the existing configuration by typing
alation_conf
  • Here is my recommended Alation configuration
alation_conf alation.profiling.v2.distribution.show_distribution_chart -s True
alation_conf alation.profiling.v2.distribution.max_unbatched_values -s 10
alation_conf alation.profiling.v2.distribution.batch_count -s 10
alation_conf alation.feature_flags.enable_profiling_v2 -s True
alation_conf alation.taskserver_timeouts.profileColumnV2 -s 120
alation_conf alation.feature_flags.enable_gbm_v2_connector_strategy -s True
alation_conf alation.feature_flags.enable_permissions_middleware_feature -s True
alation_conf alation.feature_flags.enable_swagger -s True
alation_conf alation.authentication.token.disable_v0_api_token_auth -s True
alation_conf alation.feature_flags.enable_lineage_v2 -s True
alation_conf alation.backup_v2.incr_backup -s True
alation_conf alation.backup_v2.incr_backup_versions -s 6
alation_conf alation.install.is_trial -s true
alation_conf nginx.use_ssl -s False
  • Now enable backups
alation_action enable_backupv2
  • restart the alation server
alation_action restart_alation
  • You now need to configure an AWS application load balancer. Note: It MUST be an Application Load Balancer
, Using and Installing the Alation Data Catalog

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *