Understand the current method/technique/design/approach
- What is it?
- server-less solution to configure and maintain
- often used to collect, temporarily store, analyze and visualize log data
- teams to gain visibility to ensue that apps are available and performs at all times
- Components
- Elasticsearch
- Open-source, full-text search and analysis engine, based on the Apache Lucene search engine
- Index, search and analyse log data — Log data can be parsed and normalised before indexed
- Logstash
- log aggregator that collects data from various input resources, routes transformation and enhancements and ships to various output destinations
- Acts like a pipeline for Elasticsearch
- Can aggregate logs and event data from various data sources, even S3 buckets and CloudWatch
- Kibana
- Visualisation layer; works on top of Elasticsearch
- Why
- requires less maintenance
- allows for scaling
- allows collection + processing form multiple data sources
- fulfills need in log management + analytics space
- Logs have always existed but the importance of log management and analysis is now more important than ever because solution architectures have (1) Evolved into containers, microservices and orchestration infrastructure that are (2) Deployed on the cloud, and are (3) Generating so much (log) data, and is only continually growing.
- There is where a centralised log management and analytics solution like the ELK Stack can help. With a dedicated log management and analysis tool, the engineers managing the infrastructure can quickly look at the logs to ensure the solution, and its components, are available and performing as expected.
- A specific reason for the client could be that systems within the banking sector rarely have downtime; they should always be up and running. Without existing log management and analysis solution (or one that does not have the capability to scale in a way like ELK), will restrict the team’s ability to prevent these downtimes in the first-place; cannot proactively identify bugs, security threats etc.
- Pros
- Very powerful
- Range of capabilities quite impressive
- Fraction of the price as compared to long-existing solutions such as Splunk
- Cons
- Tricky to set up
- not specialized for managing log data
- customization needed to the specific needs of managing the data
- configuration between the different services which doesn’t talk to the next one
- Others: https://www.chaossearch.io/blog/elk-stack-pros-and-cons
- Scaling issues: More data = more shards = more space
- Complex setup
- Headache to maintain
- routine maintenance — patching, upgrading OS packages, checking CPU, RAM and disk usage and making adj when required
- Expensive
- the alternative
- https://www.toptal.com/aws/aws-elk-stack
- Install CloudWatch Agent on the target.
- Configure CloudWatch Agent to ship the logs to CloudWatch logs.
- Trigger invocation of Lambda functions to process the logs.
- The Lambda function would post messages to a Slack channel if a pattern is found.
- Where possible, apply a filter to the CloudWatch log groups to
avoid calling the Lambda function for every single log (which could ramp up the costs very quickly)
- limitations of solution
- lambda can only support 1k calls
Literature Review
Existing technique/design/method/approach and how yours differ
- Comparison with splunk which currently has advantage for data loading
- Splunk
- market leader
- numerous functionalities — very expensive
- Datadog
Method/Approach
Your proposed new technique/design/method/approach
- how elk is deployed in an architecture
References
https://logz.io/learn/complete-guide-elk-stack/#installing-elk