Introduction

I recently passed the AWS Certified SysOps Administrator - Associate exam and I've put together this post to outline how I prepared for the exam and notes I took along the way. If you want to learn more about the certification, check out this link from AWS.

About me

Image description

Everyone who starts their preparation for this exam will have arrived there by their own unique journey. For myself, this is my 6th AWS certification having already achieved the other two associate certifications, the cloud practitioner and two data specialities. Of the exams I have studied for so far, the SysOps exam overlapped most with the Architect associate exam. This was mainly in the area of networking. Study for both certs covers all parts of cloud networking extensively. Study for the other certifications has also prepared me for questions on services such as S3 and DynamoDB. However, the SysOps covers a range of services that I had never encountered before. It is a tough exam to prepare for, requiring you to study a lot of different services. There are 65 services listed on the study guide.

Where to begin

Every AWS certification has a page on the AWS certification website and I always find it the best place to start. https://aws.amazon.com/certification/certified-sysops-admin-associate You will find details here about the exam with a study guide and sample questions. You'll also find links to the FAQs for the services and white-papers to read. You might be tempted to ignore these but they are worth reading. From there, you can attend the free Exam Readiness course provided by AWS on their skillbuilder site. This course is useful in that it you gives an outline of where you should focus. The exam breaks down into following 6 domains and the exam readiness course goes through the services that you need to study in each domain.

Domain % of Examination
Domain 1: Monitoring, Logging, and Remediation 20%
Domain 2: Reliability and Business Continuity 16%
Domain 3: Deployment, Provisioning, and Automation 18%
Domain 4: Security and Compliance 16%
Domain 5: Networking and Content Delivery 18%
Domain 6: Cost and Performance Optimization 18%

The instructors also cover more sample questions and walk through the answers. One big plus is the sample labs they run you through as well. The SysOps exam is different from all other AWS certification exams in that it includes a practical component. Most of the remainder of this article is built around these domains and the services outlined in the Exam Readiness course. Unless you're using all 65 services everyday in your role, you are going to need an additional course to help you pass. I used Stephane Marek's Udemy course and found the content excellent. It doesn't have a sandbox or labs that the CloudAcademy or A Cloud Guru courses provide but I liked the content and Stephane's style. He does a lot of walk throughs and provides you with very detailed slides that you can go back to. If you have access to any of these courses or similar, they all cover much the same content and are essential to help with preparing for the exam.
The excellent Andrew Brown makes his course available under the Free Code Camp channel in Youtube. This course is just as good as the others mentioned and can be helpful if you're operating on a budget. If you can afford it though, please remember to donate. Free Code Camp is such a wonderful service and they appreciate all the help they can get. Before we dive in, it's worth considering what this exam is about. Obvious as it might seem, it is a SysOps exam, therefore services that are marketed as Serverless will feature a lot less in this exam than services than need Ops support. You will not be asked about the inner workings of Lambda but you will need to know EC2 very well. You will need to know a lot more about RDS than DynamoDB. I think it's worth keeping that frame of reference in your mind while studying for this exam. Personally, I have more experience with the managed and serverless services of AWS so this was my first time having to get to grips with services like AWS Systems Manager and Config while having to dust off my notes on VPCs.

Domain 1: Monitoring, Logging, and Remediation (20%)

For domain 1, you need to be able to collect metrics and logs from your applications and infrastructure. You should be able to create alarms and notifications based on the logs and metrics collected. You should use all metrics, logs and alarms to monitor and troubleshoot your applications and infrastructure. And you also need to be able to fix any issues. Automation is also a big factor, you should be thinking about how you can remediate issues automatically before they become a bigger issue.

CloudWatch

CloudWatch Metrics

Most services push metrics to CloudWatch by default. Logging of EC2 metrics is 5 minutes by default and you can enable detailed logging (at a cost) down to every minute. CloudWatch can monitor CPU, Network and a Status Check of an EC2 instance by default but it does not monitor memory of your instance by default. If you need custom metrics or more fine grained detail from your EC2 instances, you need to install the CloudWatch agent on your instance. Namespaces are used for storing metrics. CloudWatch metrics cannot be deleted and will be expired after 15 months.

CloudWatch Agent

You should also understand about the CloudWatch agent. The CloudWatch agent allows you to collect metrics and logs from Amazon EC2 instances and on-premises servers. If you want that, you'll need to use the CloudWatch agent. If using the CloudWatch Agent, make sure you attach an IAM role to your instance with permissions to push to CloudWatch. Using the CloudWatch agent, you can capture metrics at a maximum frequency of 1 second.

CloudWatch Logs

You can think of CloudWatch Logs as the data store for your application logs. You should understand how data gets in here either by default or how it can be pushed in. This isn't just for AWS services, you can also push logs your on-premise applications and infrastructure using the SDK.

A log event is a single line item detailing the event. This is what is pushed to CloudWatch. A log stream is a sequence of log events that share the same source. Each separate source of logs in CloudWatch Logs makes up a separate log stream. And a log group is a group of log streams that share the same retention, monitoring, and access control settings. You can define log groups and specify which streams to put into each group. There is no limit on the number of log streams that can belong to one log group.

You can use CloudWatch Logs Insights to query your logs with a AWS custom query language. Each log comes with 3 system fields @message, @logStream, and @timestamp for all logs sent to CloudWatch. @message contains the raw unparsed log event, @logStream contains the name of the source that generated the log event, and @timestamp contains the time at which the log event was added to CloudWatch. Logs Insights can also generate visualizations such as bar charts, line charts, and stacked area charts from the output of your queries.

You can monitor log events as they are sent to CloudWatch Logs by creating Metric Filters. Metric Filters turn log data into CloudWatch Metrics for graphing or alarming.

CloudWatch Dashboards

CloudWatch Dashboards are a good way to get a visual overview of your metrics. You can centralise metrics from multiple regions into a single dashboard.

CloudWatch Alarms

You can create a CloudWatch Alarm to monitor any CloudWatch metric in your account. This include custom metrics. When you create an alarm, you choose

To set a threshold, set a target value and choose whether the alarm will trigger when the value is greater than (>), greater than or equal to (>=), less than ( <), or less than or equal to (<=) that value.
One thing to consider is that if you want your alarm to trigger at more than the default collection frequency of the metric, you may need to enable detailed logging of the metrics.
An alarm can be in 3 states, OK, INSUFFICIENT_DATA and ALARM. You should understand how the state of an alarm moves between these 3 states.
An alarm can be used to trigger an auto-scaling action, be sent to an SNS topic or trigger an EC2 action such as terminate, reboot or recover.
A composite alarm is a combination of multiple alarms (and therefore metrics) into alarm hierarchies. You can choose then to integrate an action or notification at any level of the hierarchy.
Alarm history is available for 14 days.

EventBridge/CloudWatch Events

Amazon EventBridge used to be known as CloudWatch Events and EventBridge is CloudWatch Events repackaged and supercharged. In the exam, the two service names could be referenced interchangeably but generally it's referred to EventBridge. EventBridge is an event bus and is a great option for integrating different services and applications together. Some services will have direct integration like AWS Config and AWS Systems Manager but EventBridge can be a good catch all option for integrating services. In the context of the SysOps certification, all CloudWatch Alarm state changes will be sent to EventBridge. From there you can create an EventBridge rule to trigger a Lambda or Step Functions to remediate the issue.

SNS (for sending alerts)

Use Amazon SNS to deliver emails or sms messages to people about a specific alarm state change. People and groups can subscribe to a SNS topic so that they will be notified when an alarm state changes.

Other Services

CloudTrail

AWS CloudTrail provides visibility into user activity and API activity by recording actions taken on your account. Basically, if CloudWatch tracks what is happening in your system, CloudTrail tracks who is performing actions in your system.
CloudTrail records information about each action, including who made the request, the services used, the actions performed, parameters for the actions, and the response elements returned by the AWS service. CloudTrail is enabled by default on your account for management events with create, modify, and delete API calls and account activity. If you need more detailed event, you'll need to create a Trail and save it to S3. You can choose which events you want to include in the trail.
CloudTrail stores 90 days of activity. Again, if you need to store specific events for longer, you'll need to create a Trail and save it to S3.
Logs from CloudTrail can be sent to CloudWatch Logs where CloudWatch metrics and alarms can be built against them.

Config

AWS Config can also be considered a monitoring service. Config enables you to assess, audit, and evaluate the configurations of your AWS resources. It continuously monitors and records your AWS resource configurations and allows you to automate the evaluation of recorded configurations against desired configurations. You configure Config by setting up rules with the desired configuration for a resource. Config will then track and record compliance with these rules over time. You can configure CloudWatch Events or SNS to alert you when a resource breaches a rule. AWS supplies pre-configured rules that you can use or you can write your own.
For example, you can use Config to track changes to CloudFormation stacks, EC2 instances and EBS volumes.
You can also use AWS Systems Manager Automation documents to take action based on AWS Config rules to remediate non-compliance with Config rules. This AWS blog gives a detailed overview of how these two services work together.

Health

The AWS Health Dashboard is a single place to learn about the availability and operations of AWS services. You can view the overall status of AWS services, and you can sign in to view personalized communications about your particular AWS account or organization. While the AWS Health Dashboard is a good way to view the overall status of each AWS service, but provides little in terms of how the health of those services is impacting your resources. AWS Personal Health Dashboard provides a personalized view of the health of the specific services that are powering your workloads and applications. You can configure EventBridge to get notifications for events that might affect your services and resources.

Service Quotas

Service Quotas is an AWS service that helps you manage your quotas for many AWS services, from one location. Along with looking up the quota values, you can also request a quota increase from the Service Quotas console.

Domain 2: Reliability and Business Continuity (16%)

For this domain, AWS wants the student to show that they know how to use the features of individual services to build a system that can remain in operation after an incident or in response to extra load on the system. Having a system that can scale out during times of peak load and scale back in when the load subsides is important. This is where the courses like Stephane Marek really start to come into their own.
You should also understand why it is important to architect your system so that it can run in a multi AZ configuration. Can your system remain online if an AWS availability zone goes offline within a region?

EC2 Auto Scaling and Elastic Load Balancers are essential to know for the exam. You must understand them in theory and in practise. You should spend time in the console setting up and integrating the 2 services.

EC2 Auto Scaling

ASGs helps you keep your application available by allowing you to automatically add or remove EC2 instances according to conditions you define. They also help with fault tolerance as an unhealthy EC2 instance can be terminated and replaced with a newer one. ASGs are composed of many elements:

ASGs scale based on CloudWatch alarms and you can set up the group to scale in or out based on a CloudWatch metric. Auto-scaling will always try to ensure capacity is balanced across AZs.
You can set up simple scaling policies that add or remove instances when scaling in or out. You can also use target tracking to scale. This works by tracking a metric like CPU and works to keep all instances at a certain level, adding or removing nodes based on the CPU of the instances present in the group. It's definitely worth diving deeper into these for the exam.
Termination Policy - which instances to terminate and in what order. You can terminate oldest instance and/or instances with the oldest launch configuration. Can use default termination policy or set your own.
Instance protection does not protect instances from manual termination initiated via the console or via an api.
Health checks are how instances get removed from the group. You can use an EC2 status check or if integrated with an ELB, use the ELB check.

Elastic Load Balancers

An ELB distributes incoming application traffic across multiple targets and virtual appliances in one or more Availability Zones (AZs). It exposes a single DNS endpoint for clients to access your application. You can setup ELBs so that they service clients internal to your AWS network or clients that exist outside your network.

There are 3 types of ELBs that you should know about for the exam.

When you use an ELB, the underlying application won't know where the request originated from unless they can reference the x-forwarded-for header in the request. It's worth understanding this concept for the exam.

You should also understand the error codes that an ELB can return.

Health Checks are an integral part of ELBs. They are used to decide if traffic can be routed to an instance. They work different to EC2 healthchecks in that they check if traffic is getting to the instance, not just that the instance is healthy. You can specify an ELB healthcheck on an ASG which might be more useful than the EC2 check.

Caching

Caching is also covered in this domain and it's worth understanding where you would use Elasticache and also caching with CloudFront. You could also be asked a question on caching for DynamoDB with DAX.

Route 53 routing

While we cover Route 53 in more depth in Domain 5: Networking and Content Delivery, it is also relevant for this domain. Using Route 53, you can architect a multi-region solution to failover when an application in one region becomes unavailable. You can do this by combining a Route 53 health check and an Alias record. Alias records allow you to point to an AWS Resource (ELB, S3 hosted Website, CloudFront distribution, Elastic Beanstalk, API Gateway, VPC Endpoint). Queries to alias records are free of charge.

RDS

RDS Read-replicas are for increased scalability. You can run read-only workloads against read-replicas. You can enable automatic backups on a MySQL or MariaDB read replica. You can't against SQL Server, Oracle or Postgres. You can take a manual snapshot of a Postgres read replica.
A read-replica can be promoted to be it's own standalone database instance but once promoted, it won't be linked to the primary instance anymore.

RDS Multi-AZ is for increased availability. The database copies created in a multi-az setup are not readable. However, automated backups and DB Snapshots are taken from the standby to avoid I/O suspension on the primary.

In the event of a failure you should understand how you can restore your database and to which point you can recover it. RTO is Recovery Time Objective and is the amount of time you can take to restore your database. Recovery Point Objective is the state at a point of time to which you can restore your database. You should also understand how to use replication to enable a restore in another region. You should understand the difference between automated backups and snapshots. With automated backups, RDS take a full backup of your database at a regular frequency (generally daily) and then backs up the transaction logs (generated when updates are made on the database) a more regular frequency. Generally, your RPO is tied to how often you are backing up your transaction logs. A point-in-time recovery is generally getting you to an RPO between the latest full backup by applying all transaction logs up to that point. You'll never get this down to realtime and it's generally within minutes.
Snapshots are taken manually and allow you to restore a copy of your data in a separate RDS instance.

S3

For S3, you should understand how you can protect accidental deletion of objects. To do this, you can use versioning, MFA Delete and/or Object lock. Cross-region replication can also help. This requires versioning to be enabled and deletes are not replicated to the secondary region. These are all features of S3 worth knowing for this part of the exam.

Data Lifecycle Manager

You can use Amazon Data Lifecycle Manager to protect your EBS snapshots and EBS-backed AMIs.

Other Services

While you won't need to know serverless services like Lambda, DynamoDB, SQS and others in depth, they may come up in the exam as they can help address different use cases. For this domain, you should know how to use an SQS to decouple domains and how a queue can be used as a buffer to handle extra load on an application.

Domain 3: Deployment, Provisioning and Automation (18%)

With this domain, AWS is testing your knowledge on how to deploy and run your AWS systems hands free. You need to have a good understanding of the different services that can be used to deploy infrastructure across your accounts and also how you can keep them up to date with patches and changes.

CloudFormation

CloudFormation is AWS's Infrastructure as Code service and will more than likely come up in the exam. You should understand how to create an EC2 instance in CloudFormation and how to specify the different networking attributes. You specify resources for CloudFormation to create within a template. A template can be either in json or yaml format and consists of several sections. The top 3 sections don't do much and you just need to know that they exist.

Nested Stacks are stacks that are used in other stacks. They enable you to standardise the creation of common resources within an account.

StackSets work within an AWS Organisation to standardise resource creation across accounts within an organisation. You can define a stackset in an administration account and then use it as a basis for deployed resources in all target accounts. The important thing to remember is that the deployment is controlled from a single administration account and used to deploy resources in 1 or more target accounts.

Before you deploy a stack, you can use a changeset to know what changes will happen before the stack is deployed.

With the DeletionPolicy attribute you can preserve, and in some cases, backup a resource when its stack is deleted. You specify a DeletionPolicy attribute for each resource that you want to control. If a resource has no DeletionPolicy attribute, AWS CloudFormation deletes the resource by default.

Using the UpdatePolicy attribute you can specify how AWS CloudFormation handles updates to a number of services including Auto Scaling Groups, Elasticache, OpenSearch, Elasticsearch and Lambda. For the exam, it's good to know how it works with ASGs. For ASGs, this attribute can be set to:

A stack policy is a policy attached to a CloudFormation stack that controls if and how a resource can be updated. For example, if your stack creates a production database, you may want to prevent CloudFormation from changing the name of the database after it has been created. A stack policy can be added to prevent stack resources from being unintentionally updated or deleted during a stack update.

And finally, you'll need to know how to troubleshoot if your Cloudformation stack fails.

Other Services

There are other services within AWS that are also relevant when studying for this domain.
You can use EC2 Image builder to create and manage AMIs. It can be used at an AWS organisation which helps with maintaining standards across all accounts.
With AWS Opworks, you can use a managed Puppet or Chef service to manage your instances.
And Elastic Beanstalk can be used to deploy your application code. By using Elastic Beanstalk, you push responsibility for managing the OS to AWS. As with CloudFormation, you should understand the different deployments for Elastic Beanstalk. You can perform much the same as CF with all at once, rolling, rolling with additional batches and immutable deployments all supported.
You can automate patching across your EC2 instances with AWS Systems Manager Patch Manager.
And to schedule automated updates or tasks you can utilise EventBridge or AWS Config.
EC2 Image Builder simplifies the building, testing, and deployment of Virtual Machine and container images for use on AWS or on-premises. It integrates with AWS Resource Access Manager, AWS Organizations, and Amazon ECR to enable sharing of automation scripts, recipes, and images across AWS accounts.

Fix deployment issues

You may get several questions in the exam concerning failed deployments. You will have to be able to troubleshoot these and pick the correct answer. Errors with deployments may not be related to anything to do with your application but could be AWS region specific in terms of service quotas. You should understand what are they, why are they and how they can be changed. For example the default limit for a number of instances of a particular type in each region is 20. If you get a InstanceLimitExceeded error when spinning up a new instance, it means that you are over your limit and will need to either terminate older instances or request an increase in your quota. If you get an InsufficientInstanceCapacity error, that means that AWS does not have enough instances of that type in the AZ you are trying to launch.

Domain 4: Security and Compliance (16%)

With this domain, AWS wants the student to show how they can utilise AWS services to implement polices to control access and ensure compliance. They also want to see how you can protect data at rest and in flight.
There are a lot of separate security services in AWS and I found the table on this page gives a very good overview of them.

Identity and Access Management (IAM)

IAM is at the core of all AWS services, working to ensure only those who have been granted access can execute api calls and actions. Generally when you hear someone talking about implementing least privilege, it is with IAM. I cannot hope to cover it in this article and you will need to study this in depth to pass the exam. You should start by reading the IAM faqs. There is a lot of good material in here. The following videos also give a very good overview of the service.

AWS re:Inforce 2019: The Fundamentals of AWS Cloud Security (FND209-R)
https://www.youtube.com/watch?v=-ObImxw1PmI
AWS re:Invent 2019: [REPEAT 1] Getting started with AWS identity (SEC209-R1)
https://www.youtube.com/watch?v=Zvz-qYYhvMk&secd_iam5

For the exam, you will need to be able to look at the json of an IAM policy and understand what it does. With the IAM policy simulator, you can test and troubleshoot identity-based policies, IAM permissions boundaries, Organizations service control policies (SCPs), and resource-based policies.

Working with AWS Organizations

You must understand AWS Organizations and they help centrally manage and govern a multi-account environment for the exam.
The Using AWS Organizations for security page gives a very good overview of how they work with Control Tower, service control policies and other services like CloudTrail, Config and CloudFormation StackSets.
AWS Resource Access Manager (RAM) helps you securely share your resources across AWS accounts, within your organization or organizational units (OUs) in AWS Organizations, and with IAM roles and IAM users for supported resource types. While IAM Access Analyzer helps identify resources in your organization and accounts that are shared with an external entity.

Trusted Adviser

The Trusted Adviser service includes several security checks. It's a very passive service, only reporting on what it sees but it can be useful. It can be configured to run at an organization so can be good for checking compliance across all accounts.

GuardDuty

Amazon GuardDuty is a threat detection service that continuously monitors your AWS accounts and workloads for malicious activity and delivers detailed security findings for visibility and remediation. Integrates with AWS Detective. You can also integrate GuardDuty with Amazon EventBridge to automate best practices for GuardDuty, such as automating responses to new GuardDuty findings.

Detective

Amazon Detective helps customers conduct security investigations by distilling and organizing data from source such as, AWS Cloudtrail, Amazon VPC Flow Logs, and Amazon GuardDuty, into a graph model that summarizes resource behaviors and interactions observed across a customer's AWS environment.

Inspector

Amazon Inspector is an automated security assessment service that helps improve the security and compliance of applications deployed on AWS. It assesses applications for vulnerabilities or deviations from best practices. After performing an assessment, Amazon Inspector produces a detailed list of security findings prioritized by level of severity. Inspector does not automatically remediate issues but could integrate with Systems Manager to do so. One important to note about Inspector is that it works primarily with EC2 instances.

Parameter Store

AWS Systems Manager Parameter Store provides secure, hierarchical storage for configuration and secrets. You can store data such as passwords, database strings, Amazon Machine Image (AMI) IDs, and license codes as parameter values. You can encrypt the parameters using KMS. It also integrates with CloudFormation.

Secrets Manager

AWS Secrets Manager is the go to service to help you protect secrets needed to access your applications, services, and IT resources. It integrates with database services like RDS and Redshift to rotate credentials and keep them in sync.

If you're wondering if you should use Parameter Store instead of Secrets Manager, have a read of this Handling Secrets with AWS article from Corey Quinn.

KMS

You must know this service for the exam. Generally questions will generally be on how you can use KMS with different services and not just on KMS itself. It makes sense when you think about it as it doesn't exist as a standalone service. KMS is the go to service for encryption in AWS so it will come up in the exam.
Start with the faqs and I found this video very useful in understanding the service.

AWS re:Inforce 2019: How Encryption Works in AWS (FND310-R)
https://www.youtube.com/watch?v=plv7PQZICCM

S3 Encryption

You should understand the 4 methods for encrypting data at rest in S3. These are:

Certificate Manager

AWS Certificate Manager (ACM) handles the complexity of creating and managing public SSL/TLS certificates for your AWS based websites and applications. ACM can also be used to issue private SSL/TLS X.509 certificates that identify users, computers, applications, services, servers, and other devices internally. When you think of encrypting data in flight, ACM is a major part of it.

CloudHSM

AWS CloudHSM provides hardware security modules (HSM) in the AWS Cloud. An HSM is a computing device that processes cryptographic operations and provides secure storage for cryptographic keys. CloudHSM allows you to generate, store, import, export, and manage cryptographic keys, including symmetric keys and asymmetric key pairs.

Macie

Amazon Macie is a service that will scan data in S3 and discover any PII or sensitive fields that may be contained therein. If you get a question about identifying PII data or sensitive fields, chances are that Macie will be an option and a worthy candidate.

Firewall Manager

AWS Firewall Manager is a security management service you use to centrally configure and manage firewall rules and other protections across the AWS accounts and applications in your organization. Using Firewall Manager, you can roll out AWS WAF rules, create AWS Shield Advanced protections, configure and audit Amazon Virtual Private Cloud (Amazon VPC) security groups, and deploy AWS Network Firewalls. Use Firewall Manager to set up your protections just once and have them automatically applied across all accounts and resources within your organization, even as new resources and accounts are added.

Domain 5: Network and Content Delivery

This is a very important domain and you'll definitely need a separate resource to help study it. AWS is testing the student to see how they can implement networking features and connectivity, configure domains, DNS services, and content delivery and troubleshoot network connectivity issues.

VPC Configuration

This a very large topic and impossible to cover in detail. All of the aforementioned cloud training providers covered this in-depth. To get started, you could take a look at the VPC FAQs. These are some of the better faqs on AWS.

A VPC can span multiple Availability Zones but a subnet must reside within a single Availability Zone. When you launch an Amazon EC2 instance, you must specify the subnet in which to launch the instance. The instance will be launched in the Availability Zone associated with the specified subnet. You are initially limited to launching 20 Amazon EC2 instances at any one time and a maximum VPC size of /16 (65,536 IP addresses) and a minimum of /28 (16 IP addresses).

Network ACLs are stateless. If you allow traffic in, you must also allow traffic out. NACLs operate at the subnet level.
Security Groups are stateful. By default, if you allow traffic in, it will be allowed out unless you add an explicit deny. Security Groups operate at the instance level.
You may get asked to troubleshoot a scenario where an EC2 instance is not reachable or communication between two components is blocked. Because they operate at different levels of the VPC, you will need to ensure that both the NACLs and Security Groups are working together.

CIDR blocks

When you setup a VPC and subnets, you must specify a CIDR block of IP addresses for both. Your subnet CIDR block will be a subset of your VPC CIDR block. Remember that 256 is the maximum in an octet so if you need more than that, you need to shift left to the next octet.

*/28 gives 16 IP Addresses
*/24 gives 256 IP Addresses (1 octet)
*/16 gives 65636 IP Addresses (2 octets)

When assigning IPs to subnets, you can't overlap them. For each CIDR block assigned to a subnet, AWS reserves 5 IP addresses for it's own use.

VPC Connectivity Options

You should also understand the different connectivity options from and to VPCs and this whitepaper gives you a good place to start. It covers:

You need to know the difference between a VPN, Direct Connect and a Transit VPC and also how they can complement each other. Also know the difference between a Virtual Private Gateway and a Customer Gateway.

AWS PrivateLink is a highly available, scalable technology that enables you to privately connect your VPC to supported AWS services, services hosted by other AWS accounts (VPC endpoint services), and supported AWS Marketplace partner services.

Route 53

Amazon Route 53 provides highly available and scalable Domain Name System (DNS), domain name registration, and health-checking web services. You could start with the faqs.
A hosted zone in Route 53 is analogous to a traditional DNS zone file; it represents a collection of records that can be managed together, belonging to a single parent domain name. There are several record types that you can use within a hosted zone but these are the most important:

An alias record is a Route 53 extension to DNS. It's similar to a CNAME record, but you can create an alias record both for the root domain, such as example.com, and for subdomains, such as www.example.com. You cannot create CNAME records for the zone apex.
If you see a question, asking which type of Route 53 record should you use for pointing to an AWS service such as ELB, S3 hosted Website, CloudFront distribution, Elastic Beanstalk, API Gateway or VPC Endpoint, the correct option is an alias record. Route 53 doesn't charge for alias queries to ELB load balancers or other AWS resources. Also, an alias record has a native healthcheck.

Routing policies

TTL

The time for which a DNS resolver caches a response is set by a value called the time to live (TTL) associated with every record. Route 53 does not have a default TTL for any record type. You must always specify a TTL for each record so that caching DNS resolvers can cache your DNS records to the length of time specified through the TTL.

Healthchecks

You can use Route 53 healthchecks to monitor the health of your resources and route traffic accordingly. You can put a healthcheck on the health of a specified resource, such as a web server, the status of other health checks and the status of an Amazon CloudWatch alarm.

CloudFront for content delivery

Amazon CloudFront is a content delivery network (CDN) that can deliver your content across the globe using the AWS network. Your read only content can be cached at the edge, in any of AWS's local points of presence.
Your content can come from (aka origin servers) an Amazon S3, Amazon Elastic Compute Cloud (EC2), Elastic Load Balancing (ELB), or a custom server outside of AWS.
An Origin Access Identity (OAI) allows you to restrict access to the contents of the bucket so that all users must use the CloudFront URL instead of a direct S3 URL. An OAI is a special CloudFront user that can access the files in the bucket and serve them to users. You should block public access to the S3 bucket to prevent users from accessing the files directly.
You can use an origin request policy to configure CloudFront to include cookies and HTTP request headers in origin requests.
You should understand how long it takes for new content to roll out to the different points of presence and what you can do to speed it up. Basically, you can flush the cache and that will speed it up.

S3 static website hosting

You can host a static website on S3. The url will be the bucket name. You must update the bucket policy to allow public reads. You should also understand CORS in relation to allowing access to files in your S3 bucket.

Troubleshooting networking and connectivity issues

You can collect and evaluate logs from VPC flow logs, ELB access logs, AWS WAF web ACL logs and CloudFront logs to help troubleshoot network issues.

AWS WAF vs AWS Shield

AWS WAF is a web application firewall that helps protect web applications or APIs against common web exploits that can affect availability, compromise security, or consume excessive resources. AWS WAF gives you control over how traffic reaches your applications by offering you the ability to create security rules that block common attack patterns, such as SQL injection or cross-site scripting. You also can create rules that filter out specific traffic patterns that you define.
AWS Shield is a managed Distributed Denial of Service (DDoS) protection service that safeguards applications running on AWS. AWS Shield provides always-on detection and automatic inline mitigations that minimize application downtime and latency, so there is no need to engage AWS Support to benefit from DDoS protection. There are two tiers of AWS Shield - Standard and Advanced. Standard operates at layer 3/4 whereas Advanced can operate at layer 7. You can use AWS Shield Standard with Amazon CloudFront and Amazon Route 53.
AWS Shield Advanced is available globally on all Amazon CloudFront, AWS Global Accelerator, and Amazon Route 53 edge locations.
AWS Shield Advanced also gives you 24x7 access to the AWS Shield Response Team (SRT) and protection against DDoS related spikes in your Amazon Elastic Compute Cloud (EC2), Elastic Load Balancing (ELB), Amazon CloudFront, AWS Global Accelerator and Amazon Route 53 charges.

Domain 6: Cost and Performance Optimisation

I really like the way AWS puts both cost and performance optimisation in the same domain. It reinforces that performance optimisation in the cloud can lead to cost optimisation. Corey Quinn outlines this well in his article The Key to Unlock the AWS Billing Puzzle is Architecture.

The first part of this domain will focus on implementing cost optimization strategies and the first place to start is understanding your existing costs. To do this you need data and a report. Data is useful to categorise your resources so that you can report on their costs accurately. AWS uses tags to drive a lot of it's reporting. If you wish to use a tag for cost reporting, you must activate it as a cost allocation tag in the Billing and Cost Management console.

Resource Groups Tag Editor

With AWS Resource Groups, you can create, maintain, and view a collection of resources that share common tags. Tag Editor manages tags across services and AWS Regions. Tag Editor can perform a global search and can edit a large number of tags at one time. Resource Groups work within an AWS Organization and you can use Tag Editor to tag resources across all accounts.

Cost Explorer

AWS Cost Explorer is a reporting tool built into the AWS console that helps you view and analyze your costs over a time period. You can slice and dice your data by many dimensions including cost allocation tags. Within the cost explorer console, you will also get a forecast of how much you will spend in the next months and recommendations on how to cut costs.

Cost and usage report

For more comprehensive cost and usage data, you can enable the cost and usage report to run and save it's output to S3. You can receive reports that break down your costs by the hour, day, or month, by product or product resource, or by tags that you define yourself. AWS updates the report in your bucket once a day in comma-separated value (CSV) format.

Budgets and billing alarms

You can set custom budgets with the AWS Budgets service that alert you when you exceed your budgeted thresholds. You can also setup a billing alarm in CloudWatch if you breach a certain threshold within a period of time.

Trusted Advisor

In addition to the security checks already mentioned, AWS Trusted Advisor also includes checks for costs.

Compute Optimizer

AWS Compute Optimizer recommends optimal AWS resources for your EC2 instances, EBS volumes and Lambda functions based on their usage data. For example, Optimizer can tell if you have over-provisioned an EC2 instance and may recommend that you save costs by right-sizing the instance to a smaller one.

EC2 Spot instances

If your workload allows it, EC2 spot instances can be a very cost efficient way to run then. If you can run your jobs at off-peak or start and stop them easily, you can pay a fraction of what it would cost for on-demand or even reserved instances. Spot can be a very good way to run background processes where latency is not so important.

S3 Lifecycle Management

You should understand the different storage classes in S3 and where it makes sense to use them. Generally the cheaper the storage, the more expensive it is to access. Therefore you need to align the access patterns to the correct storage. Use cheaper storage for data that is not accessed often and more expensive storage for data that is accessed regularly.

The second part of this domain is to implement performance optimization strategies. This section is not about getting the cheapest performance but more about getting the best value for money. For example, if you need to reduce latency between applications running on separate EC2 instances, it may be worth paying for a placement group that ensures your instances are located close to one another. It is being aware that it is an option and understanding the costs associated with it.
Other examples are utilising the right EBS volumes to match your use case. Would paying more for Provisioned IOPS over SSD be a better choice for your application?
Turning on S3 Transfer Acceleration will cost you more but will deliver objects into your bucket at a much faster rate. Splitting an object into parts and using multipart uploads to send it to S3 can also increase the transfer rate and also make the transfer more fault-tolerant. If only one part fails, that part can be re-tried rather than the entire object.
With RDS, you can use metrics to identify any processes that are consuming resources beyond what you expect. A badly performing query can consume a disproportionate amount of resources and reduce performance of the database. Use RDS Proxy to more efficiently re-use and balance open database connections across all clients.

Exam Labs

The SysOps exam is the only AWS exam that has a practical element. I practised them and watched instructors going through them. If you're used to working with in the AWS console combined with the instructions for each lab, you should be able to handle this part of the exam. My exam had three labs and I had never touched the services involved in two of them before. I know AWS gets a hard time for inconsistent UX but I have a different perspective after the exam. There is a general consistency between services and practise with one will help you familiarise with others.

Summary

I think this is the longest article I have ever written and it reflects the breadth of services that the exam covers. I enjoyed the exam and learned a lot from studying for it. I hope this article can help you with your own study. Please comment or ask questions. I'm generally pretty good at getting back to people.