Databases, DevOps and Software development: 2014

Friday, December 5, 2014

ORMs Suck?

As systems become more complex, software developers must rely upon more abstractions. Each abstraction tries to hide complexity, letting a developer write software that "handles" the many variations of modern computing.
However, the "leaky abstraction" law claims that developers of reliable software must learn the abstraction's underlying details anyway.

Example:
The SQL language abstracts away the procedural steps for querying a database, allowing one to merely define what one wants. But certain SQL queries are thousands of times slower than other logically equivalent queries. On an even higher level of abstraction, ORM systems, which isolate object-oriented code from the implementation of object persistence using a relational database, still force the programmer to think in terms of databases, tables, and native SQL queries as soon as performance of ORM-generated queries becomes a concern.

If you find this useful, you are welcome to press one of the ads in this page.. Thanks!

Tuesday, December 2, 2014

Will NoSQL kill the DBA position?

I’m afraid it’s unlikely. You see, every new or old technology has its advantages and drawbacks. The reality is that NoSQL stands for “Not Just SQL” rather than “No SQL” so the relational database isn't going away anytime soon. Organizations still need transaction atomicity for their mission critical business transactions. When a customer makes a credit card payment or a trader executes a trade, those business transactions need to be committed in real-time to disk. The state of the transaction and its data needs to be consistent wherever that data is updated and accessed from. The reason why NoSQL solutions are so fast, elastic, scalable, and available is because they are basically in-memory distributed data stores that run across multiple nodes and physical servers–meaning that if one node fails, there are plenty others to take over. Therefore when a transaction writes to one node, it’s incredibly fast–but that data has to then be replicated across all nodes in the NoSQL grid for the data to be truly consistent.

Read/Write consistency is still something that NoSQL based solutions have yet to conquer; they sacrifice ACID for read/write performance, elasticity, and high availability. So while NoSQL enables applications to scale with high transaction concurrency and data volumes, they need to work in parallel with relational databases to ensure data consistency.

If you find this useful, you are welcome to press one of the ads in this page.. Thanks!

Monday, October 20, 2014

Network problems between MongoDB nodes

PROBLEM:
MongoDB replica sets provide high availability through replication and automated failover. We have a cluster comprising three nodes: replicas "mentos-a" and "mentos-b", plus an arbiter. The problem is that every X seconds, the PRIMARY steps down and the cluster failover to the other node.

SOLUTION:
The way we detect a downed node is by a loss of heartbeats and heartbeat responses. Heartbeat responses time out after 10 seconds and then if we have not received a heartbeat from them in the past two seconds (they are sent every two seconds), we mark them as down. So it is common for the election process to take 10 seconds before it starts.

We can change the number of seconds that the replica set members wait for a successful heartbeat from each other. If a member does not respond in time, other members mark the delinquent member as inaccessible.
In the following example we will change the default 2 seconds heartbeat to 30 seconds

rs0:PRIMARY> cfg = rs.conf();
{
 "_id" : "rs0",
 "version" : 2,
 "members" : [
  {
   "_id" : 0,
   "host" : "mentos-a:27017"
  },
  {
   "_id" : 1,
   "host" : "mentos-b:27017"
  },
  {
   "_id" : 2,
   "host" : "mentos-c:27017",
   "arbiterOnly" : true
  }
 ]
}
rs0:PRIMARY> cfg["settings"] = { heartbeatTimeoutSecs : 30 }
{ "heartbeatTimeoutSecs" : 30 }
rs0:PRIMARY> rs.reconfig(cfg);
{ "down" : [ "mentos-a:27017" ], "ok" : 1 }
rs0:PRIMARY> rs.conf()
{
 "_id" : "rs0",
 "version" : 3,
 "members" : [
  {
   "_id" : 0,
   "host" : "mentos-a:27017"
  },
  {
   "_id" : 1,
   "host" : "mentos-b:27017"
  },
  {
   "_id" : 2,
   "host" : "mentos-c:27017",
   "arbiterOnly" : true
  }
 ],
 "settings" : {
  "heartbeatTimeoutSecs" : 30
 }
}

If you find this useful, you are welcome to press one of the ads in this page.. Thanks!

Sunday, October 19, 2014

Push/Pull to git repos without entering your credentials all the time

PROBLEM:
You need to enter your git credentials all the time when you pull/push

SOLUTION:
Well, do this...

nano ~/.netrc

# Put the following 8 lines in ~/.netrc file
# change "yourSecret" to the secret key you are using to pull/push (not your github login password)
# change "yourLogin" to the github login
# Note about a limitation: password in .netrc file should not contain spaces, since the .netrc file is parsed against spaces, tabs and new-lines

machine github.com
login yourLogin
password yourSecret
protocol https
machine api.github.com
login yourLogin
password yourSecret
protocol https

# set chmod permissions (600 - owner can read and write)
chmod 600 ~/.netrc

# try get latest from master (and see you don't need to enter your credentials)
git checkout master
git pull origin master

If you find this useful, you are welcome to press one of the ads in this page.. Thanks!

Thursday, October 2, 2014

Why skipping Windows 9 ?

PROBLEM:
Why Microsoft is skipping Windows 9 ?

SOLUTION:
One of the technical reasons is too many legacy apps don't use the correct versioning APIs.
You can see examples in the following Java legacy code:

See more details here and here. But before, you are welcome to press one of the ads in this page.. Thanks!

Monday, September 29, 2014

How to fix Shellshock (CVE-2014-6271 bash vulnerability)

PROBLEM:
You need to fix remote code execution through bash (Shellshock)

SOLUTION:

Ubuntu

sudo apt-get update && sudo apt-get install bash

See more details here

Centos

sudo yum update bash

See more details here

Do I need to reboot or restart services after installing this update?
No, a reboot of your system or any of your services is not required. This vulnerability is in the initial import of the process environment from the kernel. This only happens when Bash is started. After the update that fixes this issue is installed, such new processes will use the new code, and will not be vulnerable. Conversely, old processes will not be started again, so the vulnerability does not materialize.

Shellshock vs. Heartbleed
Heartbleed allowed remote access to small amount of data in the memory of affected machines. Shellshock is enabling remote code injection of arbitrary commands pre-auth which is potentially far more dire.

If you find this useful, you are welcome to press one of the ads in this page.. Thanks!

Sunday, February 9, 2014

Find TOP 20 problems in your SQL Server

PROBLEM:
You need to investigate problems with your SQL Server database and to gather some useful information about performance

SOLUTION:
First, do not lock anything, and do not get held up by any locks

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

Show the version of SQL Server

SELECT @@VERSION

Get property info about server instance. IsIntegratedSecurityOnly: 1 = windows only. ProductLevel - contains RTM or SP level. Is SP the correct one?!

SELECT 
    SERVERPROPERTY('ServerName') AS [ServerName]
    ,SERVERPROPERTY('InstanceName') AS [InstanceName]
    ,SERVERPROPERTY('MachineName') AS [MachineName]
    ,SERVERPROPERTY('Edition') AS [Edition]
    ,SERVERPROPERTY('ProductVersion') AS [ProductVersion]
    ,SERVERPROPERTY('ProductLevel') AS [ProductLevel]
    ,SERVERPROPERTY('IsIntegratedSecurityOnly') AS [IsIntegratedSecurityOnly]
    ,SERVERPROPERTY('IsClustered') AS [IsClustered]

Get OS information. CPU/Memory/last reboot time useful

SELECT cpu_count AS [Logical CPUs]
, cpu_count / hyperthread_ratio AS [Physical CPUs]
, CAST(physical_memory_in_bytes / 1024.0 / 1024.0 /1024.0 AS DECIMAL(28,2)) AS [Memory (GB)]
, DATEADD(ss, -(ms_ticks / 1000), GetDate()) AS [Start DateTime]
--  , sqlserver_start_time AS [Start DateTime]  -- In 2008+
FROM sys.dm_os_sys_info

Show SQL Server database info. Check compatibility level. is_read_committed_snapshot_on (1 is good for concurrency). recovery_model (want simple on non-prod boxes. Bulk_logged for prod). page_verify - want CHECKSUM

SELECT name, compatibility_level, recovery_model_desc, page_verify_option_desc, is_read_committed_snapshot_on
FROM sys.databases ORDER BY name

Show SQL Server configuration info. Priority boost - should be off (0). Cost threshold for parallelism (evaluate with MAXDOP). Max degree of parallelism (1 or 8?). Max server memory (MB) - evaluate in context of server memory. CLR enabled - generally disable, unless needed. Optimize for ad hoc workloads - often recommended to have on

SELECT name, description, value_in_use 
FROM sys.configurations 
WHERE NAME IN(
  'clr enabled'
, 'max degree of parallelism'   
, 'cost threshold for parallelism'  
, 'max server memory (MB)'          -- Set appropriately
, 'optimize for ad hoc workloads'   -- should be 1.
, 'priority boost'                  -- should be 0
)
ORDER BY name

Identify what is causing waits

WITH Waits AS
(SELECT wait_type, wait_time_ms / 1000. AS wait_time_s,
100. * wait_time_ms / SUM(wait_time_ms) OVER() AS pct,
ROW_NUMBER() OVER(ORDER BY wait_time_ms DESC) AS rn
FROM sys.dm_os_wait_stats
WHERE wait_type NOT IN ('CLR_SEMAPHORE','LAZYWRITER_SLEEP','RESOURCE_QUEUE','SLEEP_TASK'
,'SLEEP_SYSTEMTASK','SQLTRACE_BUFFER_FLUSH','WAITFOR', 'LOGMGR_QUEUE','CHECKPOINT_QUEUE'
,'REQUEST_FOR_DEADLOCK_SEARCH','XE_TIMER_EVENT','BROKER_TO_FLUSH','BROKER_TASK_STOP','CLR_MANUAL_EVENT'
,'CLR_AUTO_EVENT','DISPATCHER_QUEUE_SEMAPHORE', 'FT_IFTS_SCHEDULER_IDLE_WAIT'
,'XE_DISPATCHER_WAIT', 'XE_DISPATCHER_JOIN', 'SQLTRACE_INCREMENTAL_FLUSH_SLEEP','ONDEMAND_TASK_QUEUE'))

SELECT W1.wait_type, 
CAST(W1.wait_time_s AS DECIMAL(12, 2)) AS wait_time_s,
CAST(W1.pct AS DECIMAL(12, 2)) AS pct,
CAST(SUM(W2.pct) AS DECIMAL(12, 2)) AS running_pct
FROM Waits AS W1
INNER JOIN Waits AS W2
ON W2.rn <= W1.rn
GROUP BY W1.rn, W1.wait_type, W1.wait_time_s, W1.pct
HAVING SUM(W2.pct) - W1.pct < 99; -- percentage threshold

How much time is spent swapping threads - above 20% is bad

SELECT CAST(100.0 * SUM(signal_wait_time_ms) / SUM (wait_time_ms) AS NUMERIC(20,2)) AS [%signal (cpu) waits],
CAST(100.0 * SUM(wait_time_ms - signal_wait_time_ms) / SUM (wait_time_ms) AS NUMERIC(20,2)) AS [%resource waits]
FROM sys.dm_os_wait_stats WITH (NOLOCK) OPTION (RECOMPILE);

CPU utilization usage per object database. Is a single DB hogging CPU? Maybe needs a separate server

--USE [YourDatabaseName]
WITH DB_CPU_Stats
AS
(SELECT DatabaseID, DB_Name(DatabaseID) AS [DatabaseName], 
 SUM(total_worker_time) AS [CPU_Time_Ms]
 FROM sys.dm_exec_query_stats AS qs
 CROSS APPLY (SELECT CONVERT(int, value) AS [DatabaseID] 
              FROM sys.dm_exec_plan_attributes(qs.plan_handle)
              WHERE attribute = N'dbid') AS F_DB
 GROUP BY DatabaseID)
SELECT ROW_NUMBER() OVER(ORDER BY [CPU_Time_Ms] DESC) AS [row_num],
       DatabaseName, [CPU_Time_Ms], 
       CAST([CPU_Time_Ms] * 1.0 / SUM([CPU_Time_Ms]) 
       OVER() * 100.0 AS DECIMAL(5, 2)) AS [CPUPercent]
FROM DB_CPU_Stats
WHERE DatabaseID > 4 -- system databases
AND DatabaseID <> 32767 -- ResourceDB
ORDER BY [CPUPercent] DESC OPTION (RECOMPILE);

Memory usage per database. Is a single DB hogging memory? Maybe needs a separate server

--USE [YourDatabaseName]
SELECT 
    ISNULL(DB_NAME(database_id), 'ResourceDb') AS DatabaseName
    , CAST(COUNT(row_count) * 8.0 / (1024.0) AS DECIMAL(28,2))  AS [Size (MB)] 
    , CAST(COUNT(row_count) * 8.0 / (1024.0) / 1024.0 AS DECIMAL(28,2)) AS [Size (GB)] 
FROM sys.dm_os_buffer_descriptors
GROUP BY database_id
ORDER BY [Size (MB)] DESC

Find what are the physical disk metric values. Under 20ms is ok, above this need investigating

--USE [YourDatabaseName]
SELECT DB_NAME(database_id) AS DatabaseName
 , file_id
 , io_stall_read_ms / num_of_reads AS 'Average read time'
 , io_stall_write_ms / num_of_writes AS 'Average write time'
FROM sys.dm_io_virtual_file_stats(NULL, NULL)
WHERE num_of_reads > 0 
AND num_of_writes > 0
ORDER BY DatabaseName

Check reads per write, by database. Separate out a DB or for OLAP/OLTP? OLAP often needs more indexes.. NULL DatabaseName means this query was run adhoc or prepared

--USE [YourDatabaseName]
SELECT TOP 10 
        DB_NAME(qt.dbid) AS DatabaseName
        , SUM(total_logical_reads) AS [Total Reads]
        , SUM(total_logical_writes) AS [Total Writes]
        , SUM(total_logical_reads) / CASE WHEN SUM(total_logical_writes) = 0 THEN 1 ELSE SUM(total_logical_writes) END AS [Reads Per Write]
        , SUM(qs.execution_count) AS [Execution count]
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) as qt
GROUP BY DB_NAME(qt.dbid)
ORDER BY [Reads Per Write] DESC;

Check space used. What is DB size, and how much is unused?

--USE [YourDatabaseName]
EXEC sp_SpaceUsed
-- Database statistics settings
-- is_auto_create_stats_on should be on (1)
-- is_auto_update_stats_on should be on (1)
-- is_auto_update_stats_async_on should be off (0)
SELECT name
, is_auto_create_stats_on AS [AutoCreateStatistics]
, is_auto_update_stats_on AS [AutoUpdateStatistics]
, is_auto_update_stats_async_on AS [AutoUpdateStatisticsAsync]
FROM master.sys.databases
ORDER BY name

Check the state of your statistics. When last updated?! Maybe needs intelligent stats utility?

--USE [YourDatabaseName]
SELECT 
 ss.name AS SchemaName
 , st.name AS TableName
 , s.name AS IndexName 
 , STATS_DATE(s.id,s.indid) AS 'Statistics Last Updated'
 , s.rowcnt AS 'Row Count'    
 , s.rowmodctr AS 'Number Of Changes' 
 , CAST((CAST(s.rowmodctr AS DECIMAL(28,8))/CAST(s.rowcnt AS DECIMAL(28,2)) * 100.0)
     AS DECIMAL(28,2)) AS '% Rows Changed'
FROM sys.sysindexes s 
INNER JOIN sys.tables st ON st.[object_id] = s.[id]
INNER JOIN sys.schemas ss ON ss.[schema_id] = st.[schema_id]
WHERE s.id > 100 -- user table/index
AND s.indid > 0  -- Clustered and non-clustered indexes
AND s.rowcnt >= 500 -- want at least 500 rows
ORDER BY 'Statistics Last Updated' DESC

Missing indexes can be indicative of other bad practices in a given DB

--USE [YourDatabaseName]
SELECT 
 DB_NAME(database_id) AS DatabaseName
 , COUNT(*) AS [Missing Index Count]
FROM sys.dm_db_missing_index_details
GROUP BY DB_NAME(database_id)
ORDER BY [Missing Index Count] DESC

What are the most costly missing indexes?

--USE [YourDatabaseName]
SELECT TOP 40 -- Too many INCLUDE cols in top entries...  
 ROUND(avg_total_user_cost * avg_user_impact   
  * (user_seeks + user_scans),0) AS [Total Cost]  
 , [statement] AS [Table Name]
 , equality_columns
 , inequality_columns
 , included_columns
FROM  sys.dm_db_missing_index_groups g 
INNER JOIN sys.dm_db_missing_index_group_stats s 
ON s.group_handle = g.index_group_handle 
INNER JOIN sys.dm_db_missing_index_details d 
ON d.index_handle = g.index_handle
ORDER BY [Total Cost] DESC

Identify missing PKs. Most tables should have PK (for RI, FK), unless good reason

--USE [YourDatabaseName]
SELECT DB_NAME(DB_ID()) AS DatabaseName, SCHEMA_NAME(schema_id) AS SchemaName,name AS TableName
FROM sys.tables
WHERE OBJECTPROPERTY(OBJECT_ID,'TableHasPrimaryKey') = 0
ORDER BY SchemaName, TableName;

Identify heaps that have non-clustered indexes. Often Heaps with Non-clustered Indexes better as Cluster based on NC

--USE [YourDatabaseName]
SELECT 
 DB_NAME(DB_ID()) AS DatabaseName
 , ss.name AS SchemaName
 , st.NAME AS TableName
 , i.NAME AS IndexName
 , i.type_desc
 , si.rowcnt
INTO #HeapWithIndexes2
FROM sys.indexes I 
LEFT JOIN sys.sysindexes SI ON SI.indid = I.index_Id AND i.object_id = si.id
INNER JOIN sys.tables st ON st.[object_id] = si.[id]
INNER JOIN sys.schemas ss ON ss.[schema_id] = st.[schema_id]
WHERE 1=2

Check tables names that have a heap...

--USE [YourDatabaseName]
SELECT DISTINCT ss.name AS schemaName, st.name
INTO #Heaps
FROM sys.indexes I 
LEFT JOIN sys.sysindexes SI ON SI.indid = I.index_Id 
       AND i.object_id = si.id
INNER JOIN sys.tables st ON st.[object_id] = si.[id]
INNER JOIN sys.schemas ss ON ss.[schema_id] = st.[schema_id]
WHERE si.id > 100 
AND i.type_desc = 'HEAP'
INTERSECT
SELECT DISTINCT ss.name AS schemaName, st.name
FROM sys.indexes I 
LEFT JOIN sys.sysindexes SI ON SI.indid = I.index_Id 
       AND i.object_id = si.id
INNER JOIN sys.tables st ON st.[object_id] = si.[id]
INNER JOIN sys.schemas ss ON ss.[schema_id] = st.[schema_id]
WHERE si.id > 100 
AND i.type_desc != 'HEAP'

ALL tables and their indexes...

--USE [YourDatabaseName]
INSERT INTO #HeapWithIndexes2 
SELECT
 DB_NAME(DB_ID()) AS DatabaseName
 , ss.name AS SchemaName
 , st.NAME AS TableName
 , i.NAME AS IndexName
 , i.type_desc
 , si.rowcnt
FROM sys.indexes I 
LEFT JOIN sys.sysindexes SI ON SI.indid = I.index_Id AND i.object_id = si.id
INNER JOIN sys.tables st ON st.[object_id] = si.[id]
INNER JOIN sys.schemas ss ON ss.[schema_id] = st.[schema_id]
INNER JOIN #Heaps h ON st.name = h.name AND ss.name = h.schemaName
WHERE si.id > 100 
 
SELECT * FROM #HeapWithIndexes2 ORDER BY DatabaseName, SchemaName, TableName, IndexName
 
DROP TABLE #Heaps
DROP TABLE #HeapWithIndexes2

Allows targeted improvements, based on query duration. Query plan may offer hints for improvements

SELECT TOP 20 
 CAST(total_elapsed_time / 1000000.0 AS DECIMAL(28, 2)) AS [Total Duration (s)]
 , CAST(total_worker_time * 100.0 / total_elapsed_time AS DECIMAL(28, 2)) AS [% CPU]
 , CAST((total_elapsed_time - total_worker_time)* 100.0 / total_elapsed_time AS DECIMAL(28, 2)) AS [% Waiting] 
 , execution_count
 , CAST(total_elapsed_time / 1000000.0 / execution_count AS DECIMAL(28, 2)) AS [Average Duration (s)]
 , SUBSTRING (qt.text,(qs.statement_start_offset/2) + 1, 
    ((CASE WHEN qs.statement_end_offset = -1 
    THEN LEN(CONVERT(NVARCHAR(MAX), qt.text)) * 2 
    ELSE qs.statement_end_offset
    END - qs.statement_start_offset)/2) + 1) AS [Individual Query] 
 , SUBSTRING(qt.text,1,100) AS [Parent Query]
 , DB_NAME(qt.dbid) AS DatabaseName 
 , qp.query_plan
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) as qt
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) qp
INNER JOIN sys.dm_exec_cached_plans as cp 
      ON qs.plan_handle=cp.plan_handle
WHERE total_elapsed_time > 0
ORDER BY total_elapsed_time DESC

If you find this useful, you are welcome to press one of the ads in this page.. Thanks!

Show execution plan for all sessions in SQL Server

PROBLEM:
How to show execution plan for all sessions?

SOLUTION:
Run this before starting

DBCC FREEPROCCACHE

Show execution plan for all sessions

SELECT TOP 1000 qs.execution_count, 
      StoredProcedure = Db_name(qp.dbid) + '..' + Object_name(qp.objectid, qp.dbid), 
      StatementDefinition = Substring (st.text,
       ( qs.statement_start_offset / 2 ) + 1, 
         ( 
           ( CASE qs.statement_end_offset 
             WHEN -1 THEN Datalength(st.text) 
             ELSE qs.statement_end_offset 
             END - qs.statement_start_offset
           ) / 2
         ) + 1
      ), 
      query_plan, 
      st.text, 
      total_elapsed_time 
FROM
   sys.dm_exec_query_stats AS qs 
   CROSS apply sys.Dm_exec_sql_text(qs.sql_handle) AS st 
   CROSS apply sys.Dm_exec_query_plan (qs.plan_handle) qp 
WHERE
   st.encrypted = 0 
ORDER BY
   total_elapsed_time / QS.execution_count DESC, 
   qs.execution_count, 
   3

If you find this useful, you are welcome to press one of the ads in this page.. Thanks!

Sunday, January 5, 2014

In Memory MongoDB

PROBLEM:
You want to use MongoDB as an in-memory RAM-only store

SOLUTION:
You can use MongoDB and all its features as an in-memory RAM-only store! Its performance should be pretty impressive: achieving 20k writes per second, and it should scale linearly over the number of cores.
See the following great link: How to use MongoDB as a pure in-memory DB

If you find this useful, you are welcome to press one of the ads in this page.. Thanks!