aws

Monitoring AWS Redshift With Elasticsearch

Elasticsearch can be used to gather logs and metrics from different cloud services in order to do further monitoring with elastic stack. In this post, we are going to get the monitoring data from AWS Redshift and make it available for Elastic cloud. Some of the steps in this guide can be used for other AWS services as well.

Purpose of Monitoring

AWS Redshift is a petabyte scale datawarehouse service from amazon. We want to answer the following question:

“Which users are running which queries on AWS RedShift and which of those queries are putting a lot of pressure on the underlying system.” 

To answer this question we need to get Redshift system metrics like read & write latency, query duration, CPU, disk space & network usages together with the actual queries & users that are running those queries. This way we can identify expensive queries and attempt to optimise them or use machine learning features in elastic stack to predict high loads and increase the system capabilities when necessary.

The system metrics have been already created by AWS, you can find them via this link. These metrics are only accessible through AWS CloudWatch. It is possible to “consume” the metrics from CloudWatch using other applications like Logstash and Grafana or AWS Lambda functions. The information regarding users and queries can be found in one of the Redshift system tables which you can find here. This data can be retrieved with Logstash or AWS Lambda functions and sent to Elasticsearch.

To achieve our goal, we are going to have to take the following steps:

  1. Set up infrastructure
  2. Set up roll-over and indexes
  3. Send CloudWatch metrics to elastic cloud
  4. Send Redshift system table information to elastic cloud

1. Setup Infrastructure

In this step we are first going to setup an Elastic Cloud cluster. Then we are going to create an EC2 instance and install the necessary applications on it.

Assumption regarding Redshift
  • Redshift cluster is located in eu-west-1 zone
  • Redshift cluster name is redshift-poc
  • If you don’t have a cluster setup you can follow this guide.
Setup Elastic Cluster

Elastic.io provides an Elasticsearch solution that is hosted on the cloud of your choice. Elastic Cloud has an easy to use interface and supports different tools of Elastic Stack, some of which are not available for other solutions. In addition, in Elastic Cloud all the x-pack features are available for free. For a fast proof of concept, you can get a 14 days free trial and experience all the different features of Elastic Stack. For this guide, you will need to create a basic Elasticsearch cluster on elastic cloud (https://cloud.elastic.co) on AWS  in eu-west-1 zone. It takes only couple of clicks for you to have your cluster.

Setup an EC2 instance

We are going to create an EC2 instance that has access to Cloud Watch (for retrieving the metrics) and Redshift (for retrieving the system tables). In this EC2 instance we will install Logstash (to send metrics to elastic cloud) and curator (to perform daily roll-overs). In the next steps, we will deep dive into the purpose of each application.

1. Create an IAM Role that has the rights to get metrics from AWS Cloud Watch:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "1",
            "Effect": "Allow",
            "Action": [
                "cloudwatch:PutMetricData",
                "cloudwatch:GetMetricStatistics",
                "cloudwatch:GetMetricData",
                "cloudwatch:ListMetrics"
            ],
            "Resource": "*"
        }
    ]
}

2. Create an EC2 instance based on this IAM role. See here how to do that.

3. Open the network connection between Redshift and EC2 instance:
a) go to EC2 console > click on EC2 instance > click on security group >  copy the group id
redshift

b) go to Redshift console > click on cluster name > click on security group > add the security group of EC2 to the Redshift Inbound security rules. Take a look at the second rule in the picture.

4. install Logstash to your EC2 instance.

a) check java -version; make sure the version is higher than 8 . If not, install it with yum.

b) don’t forget to get the latest version of Logstash. current version can be installed by

yum 
install
-y https:.elastic.co/downloads/logstash/logstash-6.2.4.rpm
then 
start by
initctl start logstash

5. Install Curator from here.

yum install-y https:.elastic.co/curator/5/centos/6/Packages/elasticsearch-curator-5.5.1-1.x86_64.rpm

a) on cmd try to run curator command

b) if the command came back with an error you need to add locals

export LC_ALL="en_US.UTF-8"
export LC_CTYPE="en_US.UTF-8"

Setup roll-over and indexes

When the number of documents in an index in Elasticsearch becomes too big, one server cannot keep the whole index. In Elasticsearch, it is possible to scale horizontally by partitioning the index into x number of shards. There is also a hard limit on the number of documents that a single shard can contain which is about 2,147,483,519. But what if your index size exceeds the number of specified shards? Then you have to re-shard the data. In Elasticsearch that means creating a new index with right number of shards and re-indexing the data.

For time series data including logs and metrics, we can use a pattern called rollover. In this pattern, data will be bucketed based on a time duration (i.e. a day, a month or a year) into a new index. To make the search across all the data possible, the aggregation of all indexes will be accessible via an alias. Using this pattern we can avoid the unnecessary load on the system that can be caused by re-indexing when the data gets too big. Moreover, it gives the possibility of managing hot and cold data based on the arrival time. For example, we can remove very old data or make reasonably old data read-only which will decrease the size of that data significantly.

In our case, we are going to put the metrics that are retrieved from Cloud Watch in a specific date to `redshift_metrics_{date}`. To simplify the daily writing path, the metrics of today are accessible via `redshift_metrics_daily`. All the metrics that are gathered throughout the time, will be accessible via `search_redshift_metrics` alias. Subsequently we are going to put all the information that was retrieved from the Redshift system table on a specific date in redshift_query_metadata_{date} and all the documents throughout the time will be accessible via `search_redshift_query_metadata`.

1. Go to Elastic Cloud Kibana instance and from the left menu select dev tools

2. Create the following index and template in Elasticsearch for Redshift cloud watch metrics:

# PUT /<logs-{now/d}-1> with URI encoding:
PUT /%3Credshift_metrics-%7Bnow%2Fd%7D-000001%3E
{
  "aliases": {
    "redshift_metrics_daily": {}
  }
}

#This is the view over all the searchable metrics
PUT _template/all-redshift-metrics
{
  "index_patterns": "redshift_metrics-*",
  "aliases": {
    "search_redshift_metrics": {}
  }
}

3. Create the following index and template in Elasticsearch for Redshift system table data:

# PUT /<logs-{now/d}-1> with URI encoding:
PUT /%3Credshift_query_metadata-%7Bnow%2Fd%7D-000001%3E
{
  "aliases": {
    "redshift_query_metadata_daily": {}
  }
}

#This is the view over all the searchable query metadata
PUT _template/all-redshift-query-metadata
{
  "index_patterns": "redshift_query_metadata-*",
  "aliases": {
    "search_redshift_query_metadata": {}
  }
}

4. For this example, we have decided to rollover the data every day. Elasticsearch has an API to perform rollover but you have to call that API on every interval that you want the data to be rolled over. To set this we have decided to use Curator which is an Elasticsearch management tool. After setting up the curator we are going to call it as a cron job that is running everyday.

a) create curator.yml and connect curator to your Elastic Cloud.

# Remember, leave a key empty if there is no value.  None will be a string,
# not a Python "NoneType"
client:
  hosts:
    - https://{elasticSerach instance}
  http_auth: {user}:{password}
  url_prefix:
  use_ssl: False
  certificate:
  client_cert:
  client_key:
  ssl_no_validate: False
  timeout: 30
  master_only: False

logging:

loglevel: DEBUG   logfile: /tmp/curator.log   logformat: default   blacklist: [] b) create action.yml for curator with the following content

 

actions:
  1:
    action: rollover
    description: >-
      Rollover the index associated with index 'name', which should be in the
      form of prefix-000001 (or similar), or prefix-YYYY.MM.DD-1.
    options:
      disable_action: FALSE
      name: redshift_metrics_daily
      conditions:
        max_age: 1d
      extra_settings:
        index.number_of_shards: 3
        index.number_of_replicas: 1
      disable_action: FALSE
  2:
    action: rollover
    description: >-
      Rollover the index associated with index 'name', which should be in the
      form of prefix-000001 (or similar), or prefix-YYYY.MM.DD-1.
    options:
      disable_action: FALSE
      name: redshift_query_metadata_daily
      conditions:
        max_age: 1d
      extra_settings:
        index.number_of_shards: 3
        index.number_of_replicas: 1
      disable_action: FALSE

In this action file, action #1 is rolling over redshift_metrics index and action #2 is rolling over redshift_query_metadata. The bucket size or rollover limit is defined by `max_age`. If you want other bucket sizes you can change that here. Values for max_age are:

d - days
h - hours
m - minutes
s - seconds
ms - milliseconds
micros - microseconds
nanos - nanoseconds

c) Test running curator and check the log file to see if everything going smoothly

curator --config curator.yml action.yml

5. Run curator as a cronjob which will run every night at 00:00.

crontab -l -u root
sudo crontab -e

b) when prompted to edit the crontab list, you would use the line below, making sure to use fully qualified paths to the config and log files:

00* * * /usr/local/bin/curator --config /tmp/curator-config.yml /tmp/curator-action.yml >> /tmp/curator.log 2>&1

Send CloudWatch metrics to elastic cloud

Now that we have setup the indexes we are going to first pass the metrics from cloud watch to elastic cloud.

1. Connect Cloud Watch to Logstash with Cloud Watch input plugin:
https://www.elastic.co/guide/en/logstash/current/plugins-inputs-cloudwatch.html

2. Create a Logstash pipeline which takes Redshift metrics from cloud watch as input and sends those metrics to Elasticsearch. Here is how the logstash.conf will look like:

input {
  cloudwatch {
    namespace => "AWS/Redshift"
    metrics => [ "CPUUtilization", "DatabaseConnections", "HealthStatus", "MaintenanceMode", "NetworkReceiveThroughput",
                 "NetworkTransmitThroughput", "PercentageDiskSpaceUsed", "QueriesCompletedPerSecond", "QueryDuration",
                 "ReadIOPS", "ReadLatency", "ReadThroughput", "WLMQueriesCompletedPerSecond", "WLMQueryDuration",
                 "WriteIOPS", "WriteLatency", "WriteThroughput" ] # list of RedShift metrics that is available in cloudwatch
    filters => { "ClusterIdentifier"=> "redshift-poc"} #name of the red shift cluster
    region => "eu-west-1"
    interval => 300 #Set how frequently CloudWatch should be queried. Cannot go lower than 300
    period => 300 #Set the granularity of the returned datapoints. Must be at least 60 seconds and in multiples of 60.
  }
}
output {
      elasticsearch 
        hosts => ["https://{es_cloud_url}:{port}"]
        user => "elastic"
        password => "{password}"
        index => "redshift_metrics_daily" #alias for the daily metrics
    }
}

Send Redshift system table information to elastic cloud

Logstash has a plugin to connect to any datastore with JDBC connection by specifying the correct driver. But before setting up the Logstash configuration, we need to define our query in Redshift and then add that to the Logstash configuration.

1. We want to get the userId and the queries that each user has run. stl_query table has all the information we are looking for (https://docs.aws.amazon.com/redshift/latest/dg/r_STL_QUERY.html).

/*query, starttime, endtime*/
select  * from stl_query  where userid > 1 and userid <> ( current_user_id) order by starttime desc;

2. In the query above we have excluded two users from the result.

a) First one is user with id 1. This is the system user. This user does many internal queries that will only clutter our data.

b) The other user is the one that is running the query. This is to avoid having the above queries in the result of the queries. To make sure we are not missing any other queries from the current user, we are going to define a special user that only handles this query:

/*unrestricted access to see other users actions & queries see: https://docs.aws.amazon.com/redshift/latest/dg/r_ALTER_USER.html# alter-user-syslog-access */
create user logstash_read_metadata password '{password}' syslog access unrestricted;

3. Make a JDBC connection to Redshift DB via logstash, see here.

a) install Redshift jdbc driver which is available here to a local folder, example: /etc/logstash/lib/

b) add an extra input step in  logstash.conf with jdbc connection. To make sure that both inputs will be sent to their designated indexes we are using type field as a variable that holds part of the name of the index.

input {
  cloudwatch {
    namespace => "AWS/Redshift"
    metrics => [ "CPUUtilization">, "DatabaseConnections", "HealthStatus", "MaintenanceMode", "NetworkReceiveThroughput",
                 "NetworkTransmitThroughput", "PercentageDiskSpaceUsed", "QueriesCompletedPerSecond", "QueryDuration",
                 "ReadIOPS", "ReadLatency", "ReadThroughput", "WLMQueriesCompletedPerSecond", "WLMQueryDuration",
                 "WriteIOPS", "WriteLatency", "WriteThroughput" ]
   filters => { "ClusterIdentifier" => "redshift-poc" }
    region => "eu-west-1"
    interval => 300
    period => 300
    type => "redshift_metrics"
  }
  jdbc {
    jdbc_driver_library => "/etc/logstash/lib/RedshiftJDBC42-1.2.12.1017.jar"
    jdbc_driver_class => "com.amazon.redshift.jdbc42.Driver"
    jdbc_connection_string => "jdbc:redshift://{redshift_instance}.eu-west-1.redshift.amazonaws.com:5439/poc"
    jdbc_user => "logstash_read_metadata"
    jdbc_password => "{pasword}"
    # the query will run every minute
    schedule => "* * * * *"
    # use_column_value => false will define sql_last_value as the last time the query has ran.
    # see https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html#_configuring_multiple_sql_statements
    use_column_value => false
    # sql_last_value is a default logstash jdbc variable. It contains the time that privies query ran.     
    # select current_user_id will return the userid we are running the query with and userid 0 is the system userid.
    # This is to avoid the running queries to trigger more queries
    statement => "select  * from stl_query  where userid > 1 and userid <> (select current_user_id) and starttime > :sql_last_value"
    sql_log_level => "debug"
    jdbc_paging_enabled => "true"
    jdbc_page_size => "50000"
    jdbc_validate_connection => true
    clean_run => true
    type => "redshift_query_metadata"
  }
}
filter {
  mutate {
     strip => ["querytxt"]
  }
}
output {
     elasticsearch {
        hosts => ["https://{es_cloud_url}:{port}"]
        user => "elastic"
        password => "{password}"
        index => "%{type}_daily"
    }
}

There is a filter added inbetween the input and output. This filter is there because after consuming the data, we noticed that there are a lot of white spaces in `querytxt` column. With this filter we are cleaning that up.

After following all of these steps we finally have all the data that we need available at Elasticsearch. Thanks to our role-over strategy the indexes are manageable. Now that we have all the data in one place, we can create dashboards in Kibana that combine the data that is gathered in the different indexes to answer the question we had at the beginning. In the future, we can also use machine learning features of x-pack to do anomaly detection and predictions.

Sources:

https://www.elastic.co/guide/en/logstash/current/index.html

https://hackernoon.com/aws-cloudwatch-monitoring-with-grafana-ace63e1ab507

http://www.devops-engineer.com/how-to-install-and-configure-logstash-in-aws/

https://www.elastic.co/guide/en/elasticsearch/reference/current/_basic_concepts.html

https://www.elastic.co/blog/managing-time-based-indices-efficiently for roll-over indexes and strategies

https://www.elastic.co/guide/en/elasticsearch/reference/master/indices-rollover-index.html

Like this article and want to stay updated of more news and events?
Then sign up for our newsletter!