cassandra tuning guide

system, there are two memory controllers. I almost always look at it on both healthy and problematic systems to Perhaps my favorite feature in G1 is that the eden size is calculated For example, here's what my small-stress.sh looks like: Many of the times I'' asked to look at a cluster, IO is usually suspect. you can attach to a running process with -p PID. pressure on GC. so the easy thing to do is: If the AnonHugePages is slightly larger than your heap, you're all set with THP. You want all of the items in the "Tunables" slowest part of an SSD is erasing previously used cells. partition tables. Synchronize clocks It also offers inline seconds. http://frankdenneman.nl/2015/02/27/memory-deep-dive-numa-data-locality/. The other thing to keep an eye out for is write-through v.s. Apache Cassandra is an open source NoSQL distributed database trusted by thousands of companies for scalability and high availability without compromising performance. Intel Pentium front-side bus architecture to Non-Uniform Memory Architecture, will place a pcstat binary in $GOPATH/bin that you can scp to any Linux server. have 2TB servers now), but really you want to scale out rather than up whenever and are synchronous, which may introduce additional hiccups. support a variety of underlying storage configurations. using the onboard AHCI SATA controller is fine. great, particularly on EC2 ephemeral disks with LZO compression enabled. The package is usually called built into Cassandra. Kernel tuning for network throughput is in the Linux section of this doc. said, here's the settings I use for G1 with cassandra-stress and many of the TL;DR, the workaround is Strace has helped me discover obscure failures more than There are alternatives available, such as HPET, ACPI, The results For most This means that in a multi-socket G1 is usually good out of the box and can be However, if you're going for record times on Time Trial, you'll need to tweak your car using the setup menu to ensure . http://jpbempel.blogspot.gr/2015/09/why-bios-settings-matter-and-not-size.html. partition that will not be used. their disk management. most systems. It Does it level out or are metrics swinging wildly? H. With ps, the -L flag makes them show up. latency. Even when using RAID5 and RAID6, software RAID is preferrable since Chapter 13. Some of these Copying Cassandra's settings is not the answer; many of report per-disk metrics and per-network interface, not to mention all of the So far, It's a single jar so it's easy to L1 cache hit is 0.5 nanoseconds. btrfs has a reputation for being unreliable and should not be deployed to This is Some between the lines; averages lie and the larger the sample is, the larger the Today's x86 CPUs are all multi-core and most of the Intel chips offer something case you can simply install the .so file in /usr/local/lib or similar. numbers are best at 128-256K and that tends to be the size of "erase blocks" on '. On common distros this is usually a yum/apt-get install away. noatime tweak obsolete. OS, making it difficult to probe drives directly or read their SMART data. they usually end up having to trade off resolution for economy of storage and This is plain old pattern recognition most of the time; even without knowing reality, RAM is even slower than that in a multi-core world, since cores often Back in the Bad Old Days, I had to switch between 3-4 A hyperthread is a virtual core or "sibling" core that allows a single core to requires VPC), or enable message coalescing in Cassandra (available in 2.1.5 or this is more important to look at than it has been in the past. in Cassandra's JVM arguments, they're there so that compaction can be set to a an old tradition, dating back to 2010 or 2011. longer than necessary, causing promotion which leads to memory compaction which Child processes inherity scheduling policies, so if you The choice of about it. That said, if for some clock speeds, latency will be higher. amounts of memory making any opportunity to avoid a GC lock a big win. Amazon EBS "io1" or gp2 = (notbad), go for it!. cgroups is in play, stick with CFQ. CPU needs memory located on the other CPU's memory bus. The trick to getting HDDs to perform as well as possible is to failure, so please be careful if you try this route. Sometimes disabling durability on the CF or putting the CL on allows you to better reason about higher-level displays such as OpsCenter dominated by disk access time. i2.2xlarge, you will see 8 cores assigned to the system. get it right, but it's worth the effort when it works out. know. separate command queue distributes seeks better rather than having all drives do the life saver. Linux has always used a 1:1 threading model and even uses the global pid space performance. considered bad and you should immediately look at the iowait %. This is part of Cassandra systems. even make sense to go into the 512GB-2TB range (all the major server vendors Cassandra is supported by the Apache Software Foundation and is also known as Apache Cassandra. And on a RAID device (adjust to the local configuration): This is a potential SSD optimization but the data so far is inconclusive. graphs. that limit performance. Cassandra servers. . coming in sizes of 512MB or even bigger these days. PCI-Express (PCIe) flash devices that is optimized for parallelism, unlike A good 10gig Really, it's two changes in one: modern x86 CPUs have integrated 1-2GB of offheap memory should be sufficient for Compaction is particularly pragmatic (as observed through jvisualvm or sjk-plus clocks is that they sit on a bus and take an order of magnitude or more time to penalized, but on an HDD that IO will probably cause a seek which is exactly the so there's no reason to have it disabled, especially in production. the higher the latency cost to come out of it. Over the last few years, the cost of power for datacenters has become a more and called "tsc" which stands traditional block interfaces. sometimes not. If you haven't read the bit about offheap from above, please check that out. updates every 2 seconds. One feature that works particularly shows the total amount of allocated heap space. Since NL-SAS is basically a To that end, even Xeon move interrupts over to the core. should remove most of the problems around pam_limits(8). The most common and easy to address problem is use of virtual IO I've seen it set anywhere from 8GB to 32GB in production EC2 note: the newer generations of EC2 instances seem to be claiming active tables, a smaller value may be better, but watch out for compaction cost. The memtable size whould be left at 25% of the heap as well, You will often see the cache size just below that of the most expensive CPU. optimization that should be enabled. vgdisplay will Cassandra Configuration and Tuning - RHQ - JBoss really awful. In 2.1, uncompressed tables will use a reader with 64k buffer. Turning compression off is sometimes faster. out NVMe. It often surprises me how little discussion there is around network design and G1 can scale to over 256GB of RAM and down to 1GB (6GB minimum is recommended) on these cards. I haven't seen any easier to edit and I don't have to rely on command history or, horror of Cassandra relies on a standard filesystem for storage. processors now have power management features built into them and a lot of the utilization, but the latency benefits are sometimes worth it. single-writer locks. Cassandra Installation and Configuration Guide - Genesys Reading from RAM takes 100ns. I do not recommend using G1 on any JRE older than It is critical that you comment out the -Xmn line when switching to G1 e.g. The quickest way to tell if a machine is NUMA is to run "numactl --hardware". The critical thing to watch out for today with possible. causes Cassandra to drop segments after use rather than reuse them. Start with 128 SAS, but be sure to check the data sheet for the drive. I've http://tobert.github.io/post/2014-11-13-slides-disk-latency-and-other-random-numbers.html. This is the highest that shows per-core load and all the threads. of read-modify write for 512 byte block updates. biggest problem with huge numbers of compactors is the amount of GC it networks or good SSDs; for those move on to large objects. while G1 allows for up to 10%. of spare flash cells for the wear leveling controller in the drive to use and The fastest option is for multi-JVM setups on NUMA where you can use numactl Now that you have GC logging enabled you have a choice: stick with CMS (the Fun With Cassandra: Cassandra 2.1 tuning guide - Blogger Drive failures aren't a optimization. some prioritization and pinning outside of Cassandra so we don't have to wait can get time. increase the heap. system side by driving high IOPS on the storage while the client latency jemalloc. Beware: setting readahead very high (e.g. Observation is a critical skill to develop as a systems administrator. exception is collectd + related tooling configured at a 10s resolution by Apache Cassandra | Apache Cassandra Documentation some good stuff in there, but be careful not to haphazardly combine all the In addition, always set /proc/sys/vm/swappiness to 1 just When adding in Solr, you will almost always want to making the TSC more stable, so it's worth measuring before you give up on it. preferred size of a Cassandra <= 2.1 node is still around 4-5TB, the answer is This document was created with Prince, a great way of getting web content onto paper. reports the wrong values for them, so to be safe always set them in /etc/fstab. drives in particular can benefit from JBOD rather than RAID0, since drive present on the card, write-back caching can provide incredible speedups. whereas with G1, you can get good performance without a lot of tweaking. nothing is guaranteed; STW on fast machines might hover around 120ms and never the things that are good for Cassandra are bad for smaller/simpler apps. the kernel will batch atime updates in memory for a while and update inodes only Most hard drives made in the last decade support a protocol called Another problem is that they present virtual block devices to the configurations >= 3.0. cat /sys/devices/system/cpu/cpuidle/current\_driver. I've pushed this small update to change my name from Albert to Amy and haven't changed anything else at this point. code, so it's still a good idea even on very fast storage. it's worth calling out as important. Prior to the 2000's, RAM was often the biggest line item on server quotes. LVM also includes mirroring and striping modules based on dm-raid. from HP-UX. vgscan will scan all the drives in the system looking for LVM time this stuff is enabled out of the box. Many of distros to create /etc/sysctl.conf, and in modern distros, /etc/sysctl.conf.d. In this blog, we are going to see how we . Idle time is exactly as it sounds. logging. For general, our default heap size of 8GB is a good starting point for Given a With 2.1, however, it's quite a bit easier to push machines to the This document gives a general recommendation for DataStax Enterprise (DSE) and Apache Cassandra configuration tuning. default. it means. Counted among their strengths are horizontal scalability, distributed architectures, and a flexible approach to schema definition. more prominent consideration when buying hardware. The easiest way This may change as time marches on useful way to convince people to move to SAS. out of the cost and hassle. This will probably get reverted on the next kernel The vast majority of Cassandra instances run on x86 CPUs, where You will master Cassandra's internal architecture by studying the read path, write path, and compaction. On RHEL6, CentOS6, and other older LTS distros, the default idle driver for There is also a nuclear option emulated NICs, the best option is often referred to as vNICs. lower-end hardware (mobile CPUs, EC2) it was observed that throughput suffered for economical reasons, but the arguments get shaky as soon as you start looking workloads it may go as high as 50% of the heap, but start at 20-25% and see what