SQL Server

      1. Sharding pattern:

https://msdn.microsoft.com/en-us/library/dn589797.aspx

      1. High Availability(HA) options:

Clustering, Replication, Synch database mirroring, Always on availability groups,

      1. Database disaster recovery(DR) techniques:

Async database mirroring, Log shipping, SAN/VM replication

      1. Scalability feature in SQL Azure:

SQL Federation which uses Sharding. https://msdn.microsoft.com/en-us/magazine/hh848258.aspx

      1. Difference between HA and DR?

High Availability is dealing with single box failure, whereas DR  deals with an entire DC or multiple servers failure related contingency measures.

http://www.sqlshack.com/sql-server-disaster-recovery/

      1. Failover clustering is a concept where a SQL Server instance is installed on the shared storage. It provides the infrastructure that supports high-availability and disaster recovery scenarios of hosted server applications. If a cluster node fails, the services that were hosted on that node can be automatically or manually transferred to another available node in a process known as failover. There is a short period of downtime while SQL Server is failing over.

 

 

From <http://www.sqlshack.com/sql-server-disaster-recovery/>

 

 

 

      1. Index questions:

https://www.simple-talk.com/sql/performance/14-sql-server-indexing-questions-you-were-too-shy-to-ask/

      1. Instead of trigger usage
      2. Scaling out in SQL Server:

https://msdn.microsoft.com/en-us/library/aa479364.aspx

      1. Collation:

Set of rules to define how data is sorted and compared

      1. Different solutions to scale out:

Scalable Shared Databases

Peer-to-Peer Replication

Linked Servers and Distributed Queries

Distributed Partitioned Views

Data-Dependent Routing

SQL Azure Federation

  Update Frequency Ability to Change Application Data Partitionability Data Coupling
Scalable Shared Databases Read Only. Little or no change required. No requirement. No requirement.
Peer-to-Peer Replication Read mostly, no conflicts. Little or no change required. No requirement. No requirement.
Linked Servers Minimize cross-database updates. Minor changes. Not generally required. Very important to have low coupling.
Distributed Partitioned Views Frequent updates OK. Some changes may be required. Very important. Little impact.
Data-Dependent Routing Frequent updates OK. Significant changes possible. Very important. Low coupling may help some applications.

 

From <https://msdn.microsoft.com/en-us/library/aa479364.aspx>

 

      1. MySQL sharding & clustering tool:

http://vitess.io/

      1. Partitioning concepts:

https://docs.oracle.com/cd/B28359_01/server.111/b32024/partition.htm

      1. Query performance – index seek, scan
      2. Locking
      3. Cold backup vs hot backup
      4. Query to find 2nd largest salary of each dept

with ranks as (
select
year(entrydate) as [year],
month(entrydate) as [month],
views,
row_number() over (partition by year(entrydate), month(entrydate) order by views desc) as [rank]
from product
)
select
t1.year,
t1.month,
max(case when t1.rank = 1 then t1.views else 0 end) as [best],
max(case when t1.rank = 2 then t1.views else 0 end) as [second best]
from
ranks t1
where
t1.rank in (1,2)
group by
t1.year, t1.month

 

From <http://stackoverflow.com/questions/4440598/query-to-find-the-first-and-second-largest-value-from-a-group>

 

 

      1. Benefits of partitioning:

https://msdn.microsoft.com/en-us/library/ms190787.aspx

      1. Multi-tenant data architecture:

https://msdn.microsoft.com/en-us/library/aa479086.aspx

      1. Horizontal table partitioning in SQL Server:

Horizontal Table Partitioning  SQL Server

 

      1. Partitioning concepts:

https://docs.oracle.com/cd/B28359_01/server.111/b32024/partition.htm

      1. Current Isolation level?

SELECT CASE transaction_isolation_level
WHEN 0 THEN ‘Unspecified’
WHEN 1 THEN ‘ReadUncommitted’
WHEN 2 THEN ‘ReadCommitted’
WHEN 3 THEN ‘Repeatable’
WHEN 4 THEN ‘Serializable’
WHEN 5 THEN ‘Snapshot’ END AS TRANSACTION_ISOLATION_LEVEL
FROM sys.dm_exec_sessions
where session_id = @@SPID

 

From <http://stackoverflow.com/questions/1038113/how-to-find-current-transaction-level>

https://msdn.microsoft.com/en-us/library/aa479086.aspx

      1. Diff between ROW_NUMBER, RANK and DENSE_RANK

ROW_NUMBER will order the rows 1,2,3,4 based upon the order given. RANK will also order the data based upon the key given, but if there is a tie of data, it will do like: 1,2,2, 4,5.. DENSE_RANK will not leave any ranks in the order. Ex: 1,2,2,3,3,4,5,6..

      1. What is NTILE

NTILE gives a provision to divide a particular data with an expression. It will divide data into groups.

      1. What are ACID properties?

ACID Properties

      • Transaction processing must guarantee the consistency and recoverability of SQL Server databases.
      • Ensures all transactions are performed as a single unit of work regardless of hardware or system failure.
      • A – Atomicity C – Consistency I – Isolation D– Durability
        • Atomicity: Each transaction is treated as all or nothing – it either commits or aborts.
        • Consistency: ensures that a transaction won’t allow the system to arrive at an incorrect logical state – the data must always be logically correct.  Consistency is honored even in the event of a system failure.
        • Isolation: separates concurrent transactions from the updates of other incomplete transactions. SQL Server accomplishes isolation among transactions by locking data or creating row versions.
        • Durability: After a transaction commits, the durability property ensures that the effects of the transaction persist even if a system failure occurs. If a system failure occurs while a transaction is in progress, the transaction is completely undone, leaving no partial effects on data.

 

From <http://blog.sqlauthority.com/2012/11/15/sql-server-concurrency-basics-guest-post-by-vinod-kumar/>

 

 

      1. How to handle concurrency in SQL Server

 

Concurrency Models

      • Pessimistic Concurrency
        • Default behavior: acquire locks to block access to data that another process is using.
        • Assumes that enough data modification operations are in the system that any given read operation is likely affected by a data modification made by another user (assumes conflicts will occur).
        • Avoids conflicts by acquiring a lock on data being read so no other processes can modify that data.
        • Also acquires locks on data being modified so no other processes can access the data for either reading or modifying.
        • Readers block writer, writers block readers and writers.
      • Optimistic Concurrency
        • Assumes that there are sufficiently few conflicting data modification operations in the system that any single transaction is unlikely to modify data that another transaction is modifying.
        • Default behavior of optimistic concurrency is to use row versioning to allow data readers to see the state of the data before the modification occurs.
        • Older versions of the data are saved so a process reading data can see the data as it was when the process started reading and not affected by any changes being made to that data.
        • Processes modifying the data is unaffected by processes reading the data because the reader is accessing a saved version of the data rows.
        • Readers do not block writers and writers do not block readers, but, writers can and will block writers.

 

From <http://blog.sqlauthority.com/2012/11/15/sql-server-concurrency-basics-guest-post-by-vinod-kumar/>

 

 

      1. Isolation level
        • Read UnCommited
        • Read Commited
        • Repeatable Read
        • Snapshot
        • Serializable

READ UNCOMMITTED

Specifies that statements can read rows that have been modified by other transactions but not yet committed.

Transactions running at the READ UNCOMMITTED level do not issue shared locks to prevent other transactions from modifying data read by the current transaction. READ UNCOMMITTED transactions are also not blocked by exclusive locks that would prevent the current transaction from reading rows that have been modified but not committed by other transactions. When this option is set, it is possible to read uncommitted modifications, which are called dirty reads. Values in the data can be changed and rows can appear or disappear in the data set before the end of the transaction. This option has the same effect as setting NOLOCK on all tables in all SELECT statements in a transaction. This is the least restrictive of the isolation levels.

 

      1. Clustered Index and non-clustered indexes

Clustered Indexes and Heaps

Clustered indexes and heaps are two different ways to organize the data in tables. A clustered index consists of index pages as well as data pages. This means that, although the name clustered index suggests that this is an index, it is not just an index, but also contains the table data. A clustered index is organized as a B-tree, where the nonleaf nodes are index pages and the leaf nodes are data pages. The data in the clustered index is ordered with respect to the column(s) constituting the clustered index. Pages at any level (whether leaf or nonleaf) in the B-tree are linked to the previous and next pages at the same level. A table can only have one clustered index and the index is always identified by index_id = 1 in the catalog tables (such as sys.indexes, sys.partitions, and so on). For more information on the organization of clustered indexes, see Clustered Index Structures in SQL Server 2005 Books Online.

A heap consists only of data pages. Neither the data pages nor the physical placement of the pages are guaranteed to be in any particular order. A heap is always identified by index_id = 0 in the catalog tables. For more information on the organization of heaps, see Heap Structures in SQL Server 2005 Books Online.

Whether it is organized as a heap or a clustered index, a table can have zero or more nonclustered indexes. A nonclustered index is organized as a B-tree. Unlike a clustered index, a nonclustered index consists of only index pages. The leaf nodes in a nonclustered index are not data pages, but contain row locators for individual rows in the data pages. A nonclustered index is identified by an index_id that is greater than one in the catalog tables. For more information on the organization of nonclustered indexes, see Nonclustered Index Structures in SQL Server 2005 Books Online.

 

From <https://technet.microsoft.com/library/Cc917672>

 

      1. What is the structure of storage of a table when in case of clustered/non-clustered/ heap?

Actual tables with a PK are stored in a b-tree structure. Root node has partition number. Intermediate rows contain b-tree, leaf node contain data. Non-clustered index will be pointing to this data in the leaf node. Heap as such doesn’t have any structure.

 

 

From <https://technet.microsoft.com/en-us/library/ms189051(v=sql.105).aspx>

 

 

 

https://technet.microsoft.com/en-us/library/ms177484(v=sql.105).aspx

https://technet.microsoft.com/en-us/library/ms188270(v=sql.105).aspx

https://technet.microsoft.com/en-us/library/ms177443(v=sql.105).aspx

https://technet.microsoft.com/en-us/library/ms189051(v=sql.105).aspx

 

      1. What is IAM(Index Allocation Map):

An Index Allocation Map (IAM) page maps the extents in a 4-gigabyte (GB) part of a database file used by an allocation unit. An allocation unit is one of three types:

      • IN_ROW_DATA
        Holds a partition of a heap or index.
      • LOB_DATA
        Holds large object (LOB) data types, such as xmlvarbinary(max), and varchar(max).
      • ROW_OVERFLOW_DATA
        Holds variable length data stored in varcharnvarcharvarbinary, or sql_variant columns that exceed the 8,060 byte row size limit.

Each partition of a heap or index contains at least an IN_ROW_DATA allocation unit. It may also contain a LOB_DATA or ROW_OVERFLOW_DATA allocation unit, depending on the heap or index schema.

https://technet.microsoft.com/en-us/library/ms187501(v=sql.105).aspx

 

      1. Database Synchronization and the Microsoft Sync Framework:

Database synchronization providers are a part of the Microsoft® Sync Framework. Sync Framework is a comprehensive synchronization platform that enables developers to add synchronization capabilities to applications, services, and devices. Sync Framework solves the problem of how to synchronize any type of data in any store using any protocol over any topology. Fundamental to Sync Framework is the ability to support offline and collaboration of data between any types of endpoints (such as device to desktop, device to server, etc.).

Sync Framework database synchronization providers enable synchronization between ADO.NET-enabled databases. Since the database synchronization providers are part of the Sync Framework, any database that uses these providers can then also exchange information with other data sources that are supported by Sync Framework, such as web services, file systems, or custom data stores.

The primary focus of this document will be on synchronizing information between database systems and how Sync Framework helps developers avoid many of the common issues associated with OCAs.

 

From <https://msdn.microsoft.com/en-us/sync/bb887608.aspx>

 

SQL database Active Geo-Replication enables you to configure up to 4 readable secondary databases in the same or different data center locations (regions). Secondary databases are available for querying and for failover in the case of a data center outage or the inability to connect to the primary database.

The Active Geo-Replication feature implements a mechanism to provide database redundancy within the same Microsoft Azure region or in different regions (geo-redundancy). Active Geo-Replication asynchronously replicates committed transactions from a database to up to four copies of the database on different servers, using read committed snapshot isolation (RCSI) for isolation. When Active Geo-Replication is configured a secondary database is created on the specified server. The original database becomes the primary database. The primary database asynchronously replicates committed transactions to each of the secondary databases. While at any given point, the secondary database might be slightly behind the primary database, the secondary data is guaranteed to always be transactionally consistent with changes committed to the primary database

 

From <https://azure.microsoft.com/en-in/documentation/articles/sql-database-geo-replication-overview/>

 

      1. On emp table..i create composite pk on cols id, name.. select * from emp where id = 1001 OR select * from emp where name=’abc’ OR select * from emp where id = 1001 and name = ‘abc’ which will be faster?

 

 

      1. How to get query plan in Prod?

SELECT UseCounts, Cacheobjtype, Objtype, TEXT, query_plan
FROM sys.dm_exec_cached_plans
CROSS APPLY sys.dm_exec_sql_text(plan_handle)
CROSS APPLY sys.dm_exec_query_plan(plan_handle)

 

From <http://stackoverflow.com/questions/7359702/how-do-i-obtain-a-query-execution-plan>

 

      1. Localization implementation in DB:

Create a language table with ID and name. This Id should be used in whichever table that needs to be localized.

      1. Latest features in SQL 2014

In-Memory OLTP

For OLTP, the most exciting new feature in SQL Server 2014 is In-Memory OLTP (aka “Hekaton”), which allows you to move individual tables to special in-memory structures. The performance boost can be as huge as 30x. There are a number of limitations and special requirements for these tables, so they won’t work under every circumstance. But when they do, your OLTP performance will go through the roof. This is better than other in-memory solutions that require the entire database to be placed in memory. You can get more performance by converting existing stored procedures into in-memory procedures, too.

 

From <http://www.infoworld.com/article/2606990/database/147960-10-best-new-features-in-SQL-Server-2014.html#slide2>

 

Managed Backup to Azure

There are plenty of small and midsize shops without qualified DBAs on staff. Quite often these shops don’t discover that their backups aren’t being handled properly until it’s too late. True to its name, Managed Backup automatically backs up your database (or your instance) based on your defined recovery interval and workload patterns. When the system determines the data has changed significantly enough, it takes a backup to Azure. This feature only works with Azure blob storage. But since your backups are already offsite, there’s no need to worry about tapes.

 

From <http://www.infoworld.com/article/2606990/database/147960-10-best-new-features-in-SQL-Server-2014.html#slide3>

 

Azure VMs for Availability replicas

With SQL Server 2014, you can define an Availability Group replica that resides in Azure. When a primary failure happens, you have to fail over manually, but you will be up and running very quickly. And as long as your primary is online you can still push your reporting to the Azure replica to offload that activity from production. If you need reliable, off-site HA but don’t have a second site, then this feature is for you. Just pick the location you’d like when you create the Azure VM, and you’re set.

 

 

See larger image

 

From <http://www.infoworld.com/article/2606990/database/147960-10-best-new-features-in-SQL-Server-2014.html#slide4>

 

SQL Server Data Files in Azure

Data Files in Azure is just what it sounds like: Your database runs locally in your data center, while the database files themselves live in an Azure blob container. This can offer advantages in DR and migration. But depending on the size of the database and its workload, the potential performance cost of pushing the data for every transaction across the Internet could be prohibitive. A better use of this feature may be to store the data files in an Azure VM in the same data center. This can also get you around the current limitation of having only 16 mounted disks in an Azure VM.

 

From <http://www.infoworld.com/article/2606990/database/147960-10-best-new-features-in-SQL-Server-2014.html#slide5>

 

Updateable columnstore indexes

Columnstore indexes in SQL Server 2014 brought a dramatic boost to data warehouse performance, but with a hitch: They couldn’t be updated. With SQL Server 2014, now they can. This means you no longer have to drop and re-create columnstore indexes every time you need to load your warehouse tables. Not only that, but updateability also means you may be able to look at columnstore indexes for certain OLTP applications. The caveat is that you must have a clustered columnstore index on the table. Non-clustered columnstores aren’t supported.

 

From <http://www.infoworld.com/article/2606990/database/147960-10-best-new-features-in-SQL-Server-2014.html#slide6>

 

Resource Governor for I/O

Disk I/O is typically the most constrained resource of a database system, and often a large or rogue query will take up more precious I/O resources than you can afford. Microsoft has finally given us some control over runaway I/O. With Resource Governor for I/O, you can now put queries into their own resource pool and limit the amount of I/O per volume they’re allowed. MIN_IOPS_PER_VOLUME andMAX_IOPS_PER_VOLUME set the minimum and maximum reads or writes per second allowed by a process in a disk volume.

 

From <http://www.infoworld.com/article/2606990/database/147960-10-best-new-features-in-SQL-Server-2014.html#slide7>

 

Resource Governor I/O control, continued

MIN_IOPS_PER_VOLUME reserves a minimum number of I/O transactions per second, while MAX_IOPS_PER_VOLUME provides a maximum number. This maximum doesn’t limit the number of I/O operations a query can perform, but merely keeps it from monopolizing a disk. This way your large queries can still run, but other things will run as well. A good use of I/O control is to reserve some IOPS for administrators to be able to investigate issues when the disks are overloaded.

 

From <http://www.infoworld.com/article/2606990/database/147960-10-best-new-features-in-SQL-Server-2014.html#slide8>

 

Delayed durability

In SQL Server, changes to data are written to the log first. This is called write ahead logging (WAL). Control isn’t returned to the application until the log record has been written to disk (a process referred to as “hardening”). Delayed durability allows you to return control back to the application before the log is hardened. This can speed up transactions if you have issues with log performance. Nothing is free, though, and here you sacrifice recoverability. Should the database go down before the log is committed to disk, then you lose those transactions forever. It may be worth the risk if your log performance is severely degrading application response times.

 

From <http://www.infoworld.com/article/2606990/database/147960-10-best-new-features-in-SQL-Server-2014.html#slide9>

 

SSD buffer pool extension

Creating a buffer pool extension for SQL Server 2014 is like being able to define a different page file in Windows. As data pages move into memory, they begin to fill up the buffer pool. If the buffer pool fills up, the less frequently used pages will be paged to disk. Then when they’re needed again, they’ll be swapped with something else in the buffer pool and moved back into memory. The buffer pool extension option allows you to define an SSD as a buffer file location. Because SSD is so much faster than spinning disk, the paging is considerably quicker, which increases performance dramatically in some cases. You can define a buffer pool extension file up to 32 times the size of your memory.

 

From <http://www.infoworld.com/article/2606990/database/147960-10-best-new-features-in-SQL-Server-2014.html#slide10>

 

Incremental statistics

Updating statistics in SQL Server is the very definition of redundant work. Whenever statistics need to be rebuilt, you can’t just update the new items — you have to update everything. This means that a table with 200 million rows and only 40 million changes will need to update all 200 million rows in order to pick up those changes. Incremental statistics in SQL Server 2014 allow you to update just those rows that have changed and merge them with what’s already there. This can have a big impact on query performance in some configurations.

 

From <http://www.infoworld.com/article/2606990/database/147960-10-best-new-features-in-SQL-Server-2014.html#slide11>

 

Lock priority of online operations

You can now specify a lock priority for online re-indexing. In previous versions of SQL Server, long-running queries could block re-indexing operations, chewing up your maintenance window while your re-index op sits waiting, doing nothing. In SQL Server 2014, you can specify how your re-index operation will handle being blocked. You specify how long it will wait and what to do when the wait is over. Will you have it follow traditional behavior and wait indefinitely? Will you have it terminate and move to the next table? Or will you kill the blocking query, so your re-indexing can complete? It’s your choice.

 

From <http://www.infoworld.com/article/2606990/database/147960-10-best-new-features-in-SQL-Server-2014.html#slide12>

 

Windows Server 2012 R2 highlights

Many of our favorite new Windows Server 2012 R2 features are in Hyper-V. Check them out in the 10 best new features of Windows Server 2012 R2 Hyper-V.

But Hyper-V isn’t the only news in Windows Server 2012. Read about 10 more excellent new features elsewhere in Windows Server 2012 R2 that will make an impact on your day-to-day operations. Some of these enhancements, particularly in the storage arena, have Microsoft taking direct aim at traditional partners with new out-of-the-box functionality.

 

From <http://www.infoworld.com/article/2606990/database/147960-10-best-new-features-in-SQL-Server-2014.html#slide13>

 

 

Leave a comment