Quantcast
Channel: SCN: Message List - SAP Adaptive Server Enterprise (SAP ASE) for Custom Applications
Viewing all articles
Browse latest Browse all 3587

Re: Events Cluster Edition

$
0
0

I can't speak to whether or not the individuals problem was resolved, however, I can speak to the notion of writing to the same object.    Attempting to write to the same table from different nodes is often used as a measuring stick for horizontal scalability - which is something that shared disk clusters are horrible at.    While it is possible to write to an object from any node of the cluster, you will suffer performance hits while doing so - and depending on the situation how bad the peformance is.   For example, if inserting into a heap table or a table ordered by an index that is monotonic (e.g. trade_date), all the inserts will be after the last page.   As a common page, this page has to be synchronized in the caches among all cluster nodes - which means essentially that any write from any of the nodes will be competing against all other nodes.   Now, a point to remember, is we are not just talking data pages - but also index pages - which due to nature of sorted leaf pages is a high contested area for cache synchronization.   Further, keep in mind that to insert a single row, we are often performing 20+ IO's - we also need to traverse each index tree....and if we have ~6 indexes with an index level of 5 - there's 30 IOs.....and if inserts are happening on different nodes, it is likely that we are attempting to do cache synchronization on multiple pages while negotiating locking on a lot more for read consistency of intermediate nodes of index trees, etc.

 

Where some have tried to alleviate this problem is by partitioning the tables/indexes and binding different partitions to different nodes.   E.g. A-F on node1, G-K on node2, etc.   The problem with this becomes apparent when you have a transaction that inserts both an 'F' and a 'J' row.....the immediate question is whether this is an oddity and you do the cache synchronzition as a rare event...or you do you do a 2PC across the nodes.   Given the index problem above, early tests showed that doing query fragmentation in which parts of the query are sent to other nodes vs. cache synchronization was much faster.....but there is a considerable overhead in the 2PC layer vs. an SMP implementation.....in addition the network latency.   But, fundamentally, even RAC customers agree - horizontal scaling works best when there is no contention between nodes - which implies either implicit or explicit application partitioning in conjunction with database implementations to aid the separation.

 

Where this technique has worked the best - and is a common use case with RAC - is for DSS systems in which query fragmentation/distributed parallel query processing can provide performance boosts for large queries.   However, for OLTP systems, we have found that the real impediments to scaling are elsewhere - and that attempting to horizontally scale often results in negative results.   For example, one of the biggest bottlenecks to OLTP scaling is IO processing - especially if using HDD's - whether in a SAN or not (and most SDC's require a SAN for shared storage).   One customer test went from 3K inserts/sec on HDD based SAN to 100K inserts/sec using an SSD - for a 30x scaling factor that horizontal scaling could not have achieved....simply because the bottleneck was the IO subsystem - which would be shared.   With 16sp02, we have added a number of features which we have seen improve performance by 2x-7x - which is better than horizontal scaling which is often <<2x even for 2 nodes.

 

Funny thing is that the most oft quoted reason for horizontal scaling is "start small/grow big".   In reality, the economics are extremely against you on that one.   5 years ago, the top-end HP DL580 was roughly $100K and supported a max of 32 cores.   Today, that same box has double the cores but still has the same price.   So, it simply makes sense - especially from a power & cooling standpoint - to rip & replace.

 

Having said all of that, ASE CE 16sp01 (just released in Dec for Linux) includes support for RDMA.   One of the biggest impediments we had for any internode communications with ASE CE 15.7 was the reliance on UDP - a packet framing protocol that has considerable latency built into the network handling.   RDAM is similar to disk IO DMA in that it provides much lower latency directly to the remote daa instead.   Early tests showed that this improved general CIPC performance by at least 35% and in the cases of "badly partitioned" applications, they improved 200%.    Does that mean we are now suggesting horizontal scalability is a recognized use case for ASE CE???   No.   It does mean, however, that a lot of applications may benefit due to dramatically reduced CIPC overhead.    Will you be able to scale horizontally with a specific app??? Who knows.   Only testing it can say for sure.    However, the odds are not good just from a DBMS/SDC science perspective.


Viewing all articles
Browse latest Browse all 3587

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>