Dynamics GP – How I figured out which user deleted a row from a table using the SQL fn_dump_dblog function and DBCC PAGE command

Introduction

This post is similar to a previous post I did when investigating deletions on a SOP table however it takes things one step further to confirm 100% which user deleted the data.

Along with similar techniques as the previous post I also use a stored procedure and SQL job to create an alert to highlight when the deletion occurs as well as DBCC PAGE to query the contents of a 8k page of data in a restored SQL data file.

The issue

We had an ongoing issue at a client where randomly and without explanation a distribution line went missing on posted payables transactions? This is very strange as altering distribution lines isn’t allowed once a transaction has been posted. The distribution lines also must balance prior to the transaction being posted.

In order to get to the bottom of this I created a SQL stored procedure that would email me and the client when it identified any transactions with missing distributions. The stored procedure would run via a SQL agent job on a 15 minute schedule. As this had occurred in the past, but we had no idea when or how, the idea behind the alert is that it would then give us a 15 minute window to figure out what had happened. i.e. I could do SQL investigations and the client could ask all users what they were doing etc.

The stored procedure I used for the alert is below:

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
Create Procedure [dbo].[CheckMissingDists]
as

BEGIN

Declare @NumOfMissingDistsOpen as int
Declare @NumOfMissingDistsHist as int

select @NumOfMissingDistsHist = COUNT(*) from PM80600 pmdist
	left join PM30600 pm
		on pmdist.VCHRNMBR=pm.VCHRNMBR and pmdist.DSTSQNUM=pm.DSTSQNUM
	inner join PM30200 pmhdr
		on pmdist.VCHRNMBR=pmhdr.VCHRNMBR and pmdist.DOCTYPE=pmhdr.DOCTYPE
	where pm.VCHRNMBR is null 

select @NumOfMissingDistsOpen = COUNT(*) from PM80600 pmdist
	left join PM10100 pm
		on pmdist.VCHRNMBR=pm.VCHRNMBR and pmdist.DSTSQNUM=pm.DSTSQNUM
	inner join PM20000 pmhdr
		on pmdist.VCHRNMBR=pmhdr.VCHRNMBR and pmdist.DOCTYPE=pmhdr.DOCTYPE
	where pm.VCHRNMBR is null


IF (@NumOfMissingDistsHist > 0 or @NumOfMissingDistsOpen > 0)

EXEC msdb.dbo.sp_send_dbmail 
@profile_name='Email Profile', 
@recipients='myemail@email.com;clientsemail@email.com',
@importance='High',
@subject = 'Potential missing distributions',
@body = 'Potential missing distributions'

ELSE



RETURN

END

I then created an SQL job to execute the stored procedure every 15 minutes.

** Please note the query above runs on potentially large tables (PM30200) so may take a while and lock tables on some systems.

This had all been in place for a while, and nothing happened, until yesterday when I received an alert to inform me of missing distributions.

At this point I emailed the client to ask them to check what users where doing and I connected to the SQL server to confirm the document(s) with missing distributions.

Confirming the affected document

First of all I ran the query below which is based on the code from my alert. This highlighted the transaction in question was in the OPEN PM tables

I then ran the query below to find out which transaction had the missing distributions

This shows that the distribution with the DSTSQNUM number of 32768 is no longer in the PM10100 table.

Finally I logged into Dynamics GP and found the invoice and confirmed a distribution was definitely missing

SQL Investigation

Now I had confirmed that the issue had reoccured I commenced my SQL investigations.

As I knew the issue had happened in a 15 minute window I needed to query the SQL transaction backup log file from that time period to find all deletions in a certain time frame. (I couldn’t query the actually log file using fn_dblog as a log backup had occurred). To do this I ran the query below:

select	
	[Transaction ID]
	,[begin time] as BeginTime
	,[End Time] as EndTime
	,SUSER_SNAME ([transaction sid]) as UserID
	,object_name(object_id) as TableName
	,[Lock Information]
FROM fn_dump_dblog 
	(NULL, NULL, N'DISK', 1, N'D:\<Path to Transaction Log backup file>\Transaction Log Backup File.trn', 
	 DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
	 DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
	 DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
	 DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
	 DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
	 DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
	 DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
	 DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
	 DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT) as log1
left join sys.partitions part 
	 on part.partition_id=log1.partitionid
where  
	[transaction id] IN
(select 
	[transaction id]
 FROM fn_dump_dblog 
	(NULL, NULL, N'DISK', 1, N'D:\<Path to Transaction Log backup file>\Transaction Log Backup File.trn',
	DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
	DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
	DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
	DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
	DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
	DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
	DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
	DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
	DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT) as log1
left join sys.partitions part 
	on part.partition_id=log1.partitionid
where 
	operation = 'LOP_DELETE_ROWS' and 
	object_name(object_id) in ('PM10100')) 
order by [Transaction ID]

** Please note this query is a hybrid of a query on SQL Skills blog taken from here: https://www.sqlskills.com/blogs/paul/using-fn_dblog-fn_dump_dblog-and-restoring-with-stopbeforemark-to-an-lsn/

This query highlighted a delete performed by User ID “sa” in the time frame I was looking for as per below

As there was other deletes on PM10100 around the same time I needed to figure out a way to confirm that this delete did remove the distribution on my transaction. (the other deletes could be genuine i.e. a transaction moving from OPEN to HISTORY tables)

As I couldn’t figure out a way of finding out what exact data had been deleted using just the information returned by fn_dump_dblog, I utilised the information recorded in the “Lock Information” column. (as it didn’t fit in the screen shot above I’ve expanded this below)

As you can see this provides a Page ID and a Hash Value for the data that was locked and therefore removed.

As this data has now gone from the LIVE SQL database, I restored the database to a new database called “TEST1” at a point in time prior to the issue, and used DBCC PAGE to view the contents of Page ID 3813166

And hey presto that Page did contain the data for the missing distribution from this transaction!

Therefore I can now say with certainty that the user “sa” must have somehow removed the distribution from the table. What’s more I’m fairly certain that no users use this login to access GP, therefore something else using this account must have somehow removed the distribution. (so further investigations now commence)

Conclusion

This process shows how you can use various SQL tools to pinpoint exactly who has deleted a row of data from an SQL table in a SQL database.

Please note there might be a more efficient way to do this however as fn_dump_dblog and DBCC Page are undocumented tools I couldn’t find much information on how to do this so devised this method.

I hope this helps someone in future!

Thanks for reading!

Dynamics GP – SQL parameter sniffing issue causing a routine to hang

Background

The new year started off with a very interesting ticket logged on our support desk. The premise of the issue was a routine that would normally finish in seconds, or at most minutes, not appearing to start let alone finish. The user had tried crashing out of GP and even asking everyone else to log off however the problem persisted.

The Issue

The issue was occurring when running an “Extract” routine in the third party module Nolans Advanced Bank Reconciliation. This routine extracts all the data from the GL into third party tables based on parameters specified by the user.

The window used to initiate the process is below.

** Please note the window accepts a couple of parameters

After entering the above parameters and clicking “Extract” the process “hung” on 0% as per below:

The client left this for over an hour and the progress didn’t move. It remained on 0% the whole time. At this point they crashed out of Dynamics GP manually.

The Troubleshooting Process

The first thing I did to troubleshoot this issue was setup an SQL trace so I could log the SQL calls after clicking “Extract”. I then set off the process again and checked the trace. This highlighted the stored procedure that was running the extract routine is ncExtractGLTransactions as per below.

I left this running for a few minutes and nothing happened. Next, I wanted to confirm nothing was blocking the SQL process. To do this I used Adam Machanic’s awesome free SQL monitoring tool “sp_whoisactive” (http://whoisactive.com). After running this I could see nothing was blocking the process.

Therefore the conclusion at this point is the process is running (via the ncExtractGLTransactions stored procedure) but for some reason its taking a very long time to complete.

As this routine normally runs daily without issue my SQL senses had me thinking this could be a case of “bad” parameter sniffing.

Parameter sniffing and execution plan reuse

At this point I’ll try and give a very short overview of parameter sniffing and execution plan creation.

When a stored procedure runs the SQL query optimiser first checks if an execution plan already exists in the plan cache for the stored procedure, and if one doesn’t exist an execution plan is created and cached using the inputs provided (i.e. the parameter inputs are “sniffed” and used to create the execution plan) . This execution plan is then used to execute the query. If the stored procedure is executed again SQL will reuse that execution plan and therefore save time creating and compiling a new one. (this is why generally parameter sniffing is regarded as a good thing)

However issues can arise if the execution plan is reused based on parameters that would give completely different results. For example if the stored procedure is first executed with parameters that would only give a very small result set an execution plan might be created using index seeks and lookups. However if its then ran using parameters that would produce a very large result set index seeks and lookups might not be optimal way to do this. In this case it might be much quicker to perform table\index scans.

The Immediate Fix

To see if parameter sniffing was causing the issue I decided to find the execution plan for the stored procedure and evict it from the plan cache thus forcing SQL to create a new plan based on the inputs the client was using the next time the stored procedure was executed.

Please note there are other ways to do this such as marking the stored procedure for recompilation using the system stored procedure “sp_recompile” however I chose the plan cache route. This is mainly because I get to have more of a nosy around 🙂 i.e. running this query you can see when the execution plan was created and a gather whole host of other information.

To find the execution plan I used the script below:

SELECT plan1.query_plan,qs.plan_handle, a.attrlist, qs.*,est.*
FROM   sys.dm_exec_query_stats qs
CROSS  APPLY sys.dm_exec_sql_text(qs.sql_handle) est
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) plan1
CROSS  APPLY (SELECT epa.attribute + '=' + convert(nvarchar(127), epa.value) + '   '
              FROM   sys.dm_exec_plan_attributes(qs.plan_handle) epa
              WHERE  epa.is_cache_key = 1
              ORDER  BY epa.attribute
              FOR    XML PATH('')) AS a(attrlist)
WHERE  est.objectid = object_id ('dbo.ncExtractGLTransactions')
  AND  est.dbid     = db_id('DBName') 

** Please note I acquired this script from someone else either via a blog or book however as it was such a long time ago I can’t recall who exactly to credit them.

I then took the plan handle and ran the query below to evict the execution plan:

DBCC FREEPROCCACHE (<plan handle>)

After this had been done I ran the process again and it completed as normal! (I also ran the first query again and I can see based on the plan creation date it was a brand new execution plan)

The Long Term Fix

Going forwards I could “fix” this by adding the “WITH RECOMPILE” option to the stored procedure so that an execution plan is created but never stored when its executed. Therefore every time the stored procedure is ran a brand new optimal execution plan will be created. The only downside is a CPU hit while the execution plan is created.

Alternatively I could add the OPTION (RECOMPILE) hint to the statement within the stored procedure that is causing the issue so a plan is still created and cached however it won’t be for the troublesome statement.

Final thoughts

Some other things to bear in mind is that execution plans don’t stay in the plan cache forever, which is why you won’t necessarily come across this issue very often.

For example index maintenance and updating statistics, which are routines you might run frequently on your SQL server databases, will invalidate associated executions plans causing them to recompile the next time the stored procedure is ran. Therefore its possible that if I’d have left this issue overnight and index maintenance would have ran, this would have invalidated the plan, and the next time the user ran the routine a new execution plan would have been created and the routine would have ran as normal. This gives the impression the issue has “fixed itself” when in fact a new execution plan was created and used which resolved the issue.

I hope this helps anyone else who comes across routines that suddenly run slowly when they had previously ran fine. It might also answer why when you leave issues overnight things can appear to have magically fix themselves when you try again the next day 🙂

Thanks for reading!

Why did my Microsoft SQL Transaction log grow so big and how do I shrink it and reclaim the space?

Background

Working primarily on support for Microsoft Dynamics ERP systems I often see tickets logged where the SQL transaction log has grown uncontrollably and is filling (or almost filling) the drive where its located. This post aims to explain the most common reason why this happens and some steps you can take to truncate and shrink the SQL transaction log and reclaim the space. Please note I only mention the FULL and SIMPLE recovery models as these are the recovery models I have most experience with.

What is the SQL transaction log?

The transaction log is a fundamental part of the SQL database. All changes that are made to the database are described and written into the SQL transaction log before the changes are written to the main database file.

My basic\novice understanding of how this works based on a typical UPDATE transaction is as follows:

  1. An UPDATE statement is issued to update some data in the database.
  2. If the page containing the data being updated isn’t in memory the page is retrieved into memory.
  3. The data is changed in memory and log records describing the changes are written to the transaction log.
  4. At some point later the updated page in memory is written to the SQL database file when the CHECKPOINT runs. (this can be before the UPDATE statement COMMITS)
  5. UPDATE statement COMMITS

Its also important to highlight that this process of logging to the transaction log happens whether the database is in the FULL or SIMPLE recovery model.

SQL Recovery Models and Log Truncation

If the database is in the SIMPLE recovery model the log records are automatically truncated when they are no longer needed. This means the space within the transaction log file can be reused and theoretically the transaction log file won’t grow. (caveats here because for example if you were to delete a large number of rows in one SQL transaction this would also cause the log file to grow however generally the log file will be much more manageable in SIMPLE recovery model)

When the database is in the FULL recovery model the transaction log records aren’t automatically truncated when they are no longer needed. Therefore new database changes will be written to the end of the log and the log file will continue to expand and grow.

The only way to truncate the log, so the space can be reused, is to perform a SQL transaction log backup. When you do this log records are written to the SQL transaction log backup file and portions (VLFs) of the transaction log can be used again*. If a SQL transaction log backup isn’t performed the transaction log will continue growing and growing until it fills the disk. (assuming auto growth is switched on).

Therefore the most common reason I have experienced for a transaction log file to have grown extremely large is because the database is in the FULL recovery model and LOG backups haven’t been taken for a long time.

* Its important to note that the space within the transaction log is just marked as available again. The transaction log file itself isn’t physically shrank by the transaction log backup.

How to reclaim the space. Truncate, Shrink or both?

A transaction log backup doesn’t physically shrink the transaction log file on disk. The transaction log backup truncates the log records that are no longer needed making the space within the log file available for use again.

Therefore, if you have a very large transaction log file, the first thing I do is check the recovery model and the available space in the log using the script below. (run this against the target database)

SELECT 
	[name] as [Database Name]
	,[recovery_model_desc] as [Recovery Model]
	,log_reuse_wait_desc as [Log Reuse Description]
	,total_log_size_in_bytes*1.0/1024/1024 as [Total Log Size Mb]
	,used_log_space_in_bytes*1.0/1024/1024 as [Log Used Space Mb]
	,(total_log_size_in_bytes - used_log_space_in_bytes)*1.0/1024/1024 as [Free Log Space Mb]
	,((total_log_size_in_bytes - used_log_space_in_bytes)*1.0/1024/1024)/((total_log_size_in_bytes*1.0/1024/1024))*100 as [% Space Free in Log]
FROM sys.dm_db_log_space_usage loginfo
inner join sys.databases dbs on dbs.database_id=loginfo.database_id;

If the “% Space Free in Log” is very small and the “Recovery Model” isn’t SIMPLE and the “Log Reuse Description” is LOG_BACKUP you need to truncate and shrink the log to reclaim the space. For example this log file needs truncating and then shrinking to reclaim the space:

If the “% Space Free in Log ” is very large and the recovery model is FULL then you just need to shrink the transaction log to reclaim the space. For example after backing up the log you can see the “% Space Free in the Log” has increased so this log file just needs shrinking.

Truncating the log

If after running the script above you have determined you need to truncate the log file before shrinking it, you can perform a log backup to do this using the command below:

BACKUP LOG [DBName]
TO DISK = N’C:\<Location>\<FileName>’ WITH NOFORMAT, NOINIT,
NAME = N’DBName-Log Backup’, SKIP, NOREWIND, NOUNLOAD, STATS = 10

If you re-run the script you should notice the ” % Space Free in the Log ” has increased, as the log has been truncated, and you can now move towards shrinking the log to reclaim that free space.

However, what if the log has grown so large you don’t have enough disk space to perform a transaction log backup? How can you truncate the log? In this situation I try the following:

  1. Backup the transaction log to NUL.

This discards all the data as the backup is taken and therefore doesn’t use up disk space creating the backup (basically backing up to nothing). The syntax for this is below:

BACKUP LOG <DBName> TO DISK = ‘NUL’;

After doing this you can shrink the log. You’ll also need to immediately perform a FULL backup as well.

2. Change the recovery model of the database to SIMPLE and back to FULL.

This is a little more drastic and should be a last resort. This breaks the log chain therefore after shrinking the log file its imperative to take a FULL backup to restart the log chain. You also need to ensure no features are being used that utilise the log file like replication or mirroring. You can change the recovery model using the statement below:

ALTER DATABASE <DBName> SET RECOVERY SIMPLE;

IMPORTANT:- Please note after backing up the log file its wise to keep the transaction log backup file handy in case you ever need to restore it.

Shrinking the log file

Once you have truncated the transaction log you can now shrink the log file to reclaim the space on disk. One thing to note is not to shrink the log too much. If you do its inevitable the log file will have to grow when a large operation occurs i.e. something like a rebuild of an index on a very large table which has to be logged in the transaction log. I tend to shrink to about 10% of the size of the database.

To shrink the log you do the following:

DBCC SHRINKFILE ('<TransactionLogLogicalFileName>' , 1024);

This example will shrink the log file to 1GB.

You can find out the “Logical File Name” of the transaction log file using the statement below against the database in question:

select name as [LogicalFileName] 
from sys.database_files 
where type_desc = 'LOG'

As an example if I run the following shrink operation against my example database it shrinks it to approx 400Mb:

USE [LargeLogFileDB]
GO
DBCC SHRINKFILE (N'LargeLogFileDB_log' , 400)
GO

If I now run the original script I can see the size of the transaction log has shrank and therefore the space on disk has been reclaimed:

Going forwards

To prevent this happening again ensure you have regular LOG backups for all databases in the FULL recovery model. This may also entail checking any other third backup software you may use that can backup the log file.

Also, if point in time recovery isn’t needed consider changing the database recovery model to SIMPLE so the transaction log auto truncates and you’ll never have to work about a runaway transaction log again. (I use this for practice databases etc. I often get asked why the logs have grown for those databases, as they are hardly ever used by end users, however if you have regular index maintenance jobs on those databases, this is all logged in the transaction log, and if this isn’t backed up will cause the log file to grow and grow)

Conclusion

I’m more of an SQL enthusiast than a SQL expert however these are the general steps I take to resolve the issue of a very large transaction log. There are so many good blog posts online on this subject that I’d highly recommend anyone wanting to learn more about this topic to take a look at. (Google is your friend here :)).

I hope this post helps other Dynamics ERP System Admins out there that come across large transaction log files.

Thanks for reading!