Fill factor

Fill Factor is telling how much space, from a page, can be used with storing records.
It’s possible only for indexes.
100% = 0% = to use the hole page = 8kbytes.
Having a different fill factor then 100%, it means you preserve some free space.

The fill factor is “activated” when you create an index, rebuild an index, reorg an index.
When you do create, rebuild, you can set a different value for the fill factor
When you do reorg, the value of the fill , it’s the used/default value.

 

Depending of your scenariou you can have a 100% or less fill factor:

  • if it’s doing inserts at the end of the index (with a identity clustered column ) and no update of a varchar data type columns = then use 100% fill factor
  • if it’s doing random insert (like GUID as cluster key column) or doing update of a variable data type length = use a different , a low fill factor. This should be monitor (1 per week or per 2weeks or 1 per month) to see if you still have a high number of page split. If yes, reduce again the fill factor with 5% (or 10%) and re-monitor it.

 

Having a lower fill factor, combine with your scenariu , you will have:
PRO:

  • less page splits

CONS

  • additional storage
  • additional pages to be read

Example:

We are using two tables , same fields , but one will have fill_factor = 100, one will have fill_factor =60.

drop table if exists dbo.fillFact_Full;
create table dbo.fillFact_Full
(
id int not null,
myChar char(985) not null,
myVarChar varchar(500) null
)
GO
CREATE UNIQUE CLUSTERED INDEX idx_fillFact_Full_id ON dbo.fillFact_Full
(id);
GO
drop table if exists dbo.fillFact_60;
create table dbo.fillFact_60
(
id int not null,
myChar char(985) not null,
myVarChar varchar(500) null
)
GO
CREATE UNIQUE CLUSTERED INDEX idx_fillFact_60_id ON dbo.fillFact_60
(id);
GO

Initial setup, insert 20 records, and the two tables will be identical.

fillFactor_init_setup

Now, activate the fill factor  = 60 for table 2, “fillFact_60” , and fill factor = 100 for the table “fillFact_FULL”, and look at the space and pages used:

fillFactor_after_rebuild

 

fillFactor_after_rebuild_spaceUsed.jpg

So, definitely , the space used is much more for the one with fill factor = 60

What about pages read, when doing a select (more then 1 record, like select *)

fillFactor_pagesRead.jpg

Again, more logical reads for the one with fill factor = 60.

 

Let’s look at the location of each record in pages ; for this will use undocumented function : %%physloc%% and sys.fn_PhysLocCracker(%%physloc%%)

fillFactor_physicalLocation.jpg

 

And now let’s update some records, or insert in a particular page, base on key clustered value.

–update some records with id = 9 ,12
–for id = 9,12 , page = 392
update dbo.fillFact_Full
set myVarChar = LEFT(REPLICATE(‘update:’ + CAST(id AS VARCHAR(50)) ,500),500)
where id IN( 9 ,12)
–for id = 9,12 , page = 400
update dbo.fillFact_60
set myVarChar = LEFT(REPLICATE(‘update:’ + CAST(id AS VARCHAR(50)) ,500),500)
where id IN( 9 ,12)
–see the location of each record:
select flc.*,f.* from dbo.fillFact_Full as f CROSS APPLY sys.fn_PhysLocCracker(%%physloc%%) as flc
select flc.*,f.* from dbo.fillFact_60 as f CROSS APPLY sys.fn_PhysLocCracker(%%physloc%%)as flc

 

fillFactor_afterUpdate.jpg

We can see, that in table with fill factor = 100 , one record with id = 9 was able to store the additional information, but for the record with id = 12 ,no; it needed a new page , pageid = 465.Additional , it’s moving the half of records, from id = 12 till 24.

For the table with fill factor = 60, both update were able to be stored in the same page. page id = 400

 

For insert , the same : adding some records will have the same effect as update.

fillFactor_afterInsert.jpg

code can be found here

S

 

 

 

 

 

Page Split – monitor

To monitor page splits, we can :

  • Extended events : sqlserver.transaction_log
  • Look in transaction log  with fn_dblog = LOP_DELETE_SPLIT = this will be showing the bad page split.
  • performance counters  –  ‘Page Splits/sec’ = this is holding good splits + bad splits
SELECT
     COUNT(*) AS countOfSplits
    , AllocUnitName as idxName
FROM
     fn_dblog(NULL,NULL)
WHERE
     Operation = ‘LOP_DELETE_SPLIT’
GROUP BY
     AllocUnitName
ORDER BY
    countOfSplits DESC
S

 

 

Page Split

Page split occurs when there is not enough space in a particular data page to insert/update a specific record. There are 3 ways that generate Page splits:

  • End/Sequential/Identity Insert  = good page split = when we insert at the end of the table . ex: imagine you have an identity field  which is also PK and clustered index , and all new records would have id+1
  • Mid lvl table Inserts = bad page split
  • Updates = bad page split = when we’re updating a record on a page that don’t have enough space to accommodate the extra space. For this needs extra space and voila , page split

For bad page split, this is bad because there is additional information that needs to be save in the transaction log.

  • adding a new page
  • move half of records to the new page (insert into the new page and delete from old page  = mark it as delete )
  • update link pointers in the pages

 

Here is a sample of it:

We have a simple table , with id field clustered :

–create a table with an id and some string , so that 2 records will be stored in a page
drop table if exists dbo.pageSplit;
create table dbo.pageSplit
(
id int not null,
myChar char(3500) not null,
constraint pk_pageSplit_id PRIMARY KEY CLUSTERED(id)
)
go

Now ,let start inserting one record, starting with value of id = 3 , and see the log data generated.

begin transaction
insert into dbo.pageSplit(id,myChar)
values(3,REPLICATE(‘3’,3500));

select database_transaction_log_bytes_used from sys.dm_tran_database_transactions where database_id = DB_ID(‘db_workspace’)
–3796
commit transaction

 

Log data generated

pageSplit_Insert_Value_3

 

Next step, insert value 2:

begin transaction
insert into dbo.pageSplit(id,myChar)
values(2,REPLICATE(‘2’,3500));

select database_transaction_log_bytes_used from sys.dm_tran_database_transactions where database_id = DB_ID(‘db_workspace’)
–3756
commit transaction

pageSplit_Insert_Value_2

pageSplit_Insert_Value_2_after

 

Step 3, insert a record with value 1.

begin transaction
insert into dbo.pageSplit(id,myChar)
values(1,REPLICATE(‘1’,3500));

select database_transaction_log_bytes_used from sys.dm_tran_database_transactions where database_id = DB_ID(‘db_workspace’)
–5784
commit transaction

Log data generated is 5784, almost double (70%) .wow…

You can monitor it with perform counters:

SQLServer:AccessMethods:Page Splits/Sec

Or from dmvs:

select cntr_value
from sys.dm_os_performance_counters
where counter_name = ‘Page Splits/sec’

Code here

S

 

 

 

 

 

 

 

BI dev position

Working at one company…

I’m more a database developer , but at some point I received an offer for a BI dev position. The financial part was great, the location (close to me, few minutes walking)
was great, so I accepted.

It’s a consultancy company, so, we were working for an external client. The main project was divided in 5,6 sub-projects.  I was allocated to one of it,  along with 2,3 dev and 2 QA, 1BA . + 1 Project Manager and 1 Line Manager belonging to the hole project.

The main objective of the project was to create a moving SSIS project. From client OLTP/DWH to a continental DWH . And also, update the  flow from continental DWH to the global DWH.

Starting the development, I encountered my first warning : the mapping document which was describing the fields you need to take, transform . Every day we get a new mapping document, and were days where we received  more then one document. I started to save them to disk , and after 20+ files , I stop saving them , to much.  And there was nothing to show what changes were made from one version to another. This led to forget, skip some changes. You know , to scan , by hand , every field (200+) , check it …

Second warning was the way we access the client resources. We were connecting throw a VDI. No prob with this kind of working but entering in the system was problematic : every week our accounts or VDI’s were inaccessible. And once entered , the VDI was moving very slow. If you were opening a SSMS , Visual studio and try to open an excel file , then you should wait… Very frustrating!!!

Another warning: We had a table with no primary key and we needed some field/s for the incremental load. But nothing were fit for it. So we ask the client to provide a primary key to that table (or a combination of fields that will guarantee the unique)  .  It took like 1month+  to just add that existing field to the exporting table. The field was present in solution, but was not in the table.
Then we enter in a time pressure , and the LM told that we need to stay overtime, work in weekend …??!!. No need because the SSIS solution was ready, just needed little changes to update it with then primary key.

Another warning: the SSIS solution would create some csv files and place them in a default location. In the meeting, we agree that , the location should be in the same data center as the supplier DWH, and would be a default location. This is how the 3 developers understood and remembers,  but only the PM remembers that we should push the files to the main data center.  And the human behavior for it, was never encounter ever and under any human standard( “I’m feeling to hit someone” and he looked at me, because I developed the most of it . The part with the export location was the result of my co-worker, to which I was fully agree with) . The change for it took like 5 minutes, just created a variable and bring that variable to the config file so that should be configurable.

Another warning: I don’t know how it is in other places, but after every important meeting with the client, the PM would write some “meenuts” of the meeting, and sent it to all participants( to be sure that we all understand correctly). But here never have been done that.

At some point I was put to present to the client, the technical specification of a sub-project at which I didn’t have done work , because the main developers were out of this project or left the company. I refused and the colleague presented.

The main project should have ended somewhere in the summer, but at some point extended till last of august.

Once , the project been close to the end term, I was out of the project, on bench waiting for another project. In mean time, I started to search a new job company, by my self, outside of the company and started to talk with some company.

During this, I was brought back in the project ( I didn’t knew why at that point, because of my skills ? I don’t think so) and the new final term was now end of October. NO prob , just endure it the final term si close.
Later, I founded that the client just extended the final term with two more months (till December) and additionally wants only 1 Dev and 1 QA . No PM  and LM only for 1 day per week. The project should entered in a maintenance state, where you should only change a formula , some mapping field. And I was the one to remain in project. This is way I was brought back in the project and because the former Dev and PM were to move an on new project…
I found that maintenance is not only maintenance , they also want development .

The project was made from 5,6 sub projects on which I only worked 2,3 of them. The  others sub projects I didn’t know much of it.

I kept talking with the other company, but still the recruitment process was delaying.
At some point I should decided fast:

– not having yet an answer for the other company (good or bad)
– the other dev would start a new project and me remaining on the old project.
– if I get an positive answer and my colleague starting the new project, then no one where available to continue the present project…
– knowing that the final term is already prolonged 4 time , I bet this end term would also change.

Having this in count, I rolled the dice (hoping for a positive answer) and gave my resignation. In this condition, my colleague remained in the main project.
After 3weeks , finally , I received a notification from the second company. I wasn’t chosen for the position, so my bet was wrong.

A longer Christmas vacation. Till Easter ?

I did right or not?

 

 

S

Transaction Log informations

To get information about transaction log, usually  we use :

DBCC SQLPerf(LogSpace) = how much log space it’s used

DBCC LogInfo = information about vlf virtual log files

 

But there are new dmvs ,that can do the same things maybe more:

for DBCC SQLPerf , we can use sys.dm_db_log_space_usage
for DBCC LogInfo , we can use sys.dm_db_log_info
also can be used sys.dm_db_log_stats

 

 

S

Sql 2016 Insert Select Paralel

Select … into  newTable  could be a parallel operation. But Insert … Select was NOT.

 

Starting with sql server 2016, Insert … Select could be parallel.

There are some additional conditions:

  • use TABLOCK  :

 insert into aTable with (TABLOCK)

select ….

  • compatibility level 130.
  • heap table or column-stored clustered table (not  for clustered table)

 

S

Sql server 2017

New in sql server 2017:

  • possibility to process graph . see here 

 

Graph data processing – this give us , the ability to create two new types of tables

node  – holds the entities  and could have properties
edge –  holds the relations between nodes/entities.

There are some hidden columns
    $node_id …

$edge_id
$from_id
$to_id

 
Is adding a new clause = MATCH =
Here is an example from microsoft blog:

SELECT Person2.Name
FROM Person Person1, Friends, Person Person2
WHERE MATCH(Person1-(Friends)->Person2)
AND Person1.Name = ‘John’;

see it here

 

Good stuff , microsoft!

 

S