Thursday, April 23, 2015

This is an expanded explanation of Incrementation, in relation to a discussion on r/excel.

Here is the post I made, for reference:




"Incrementation.

I have been the bearer of a solution many times because of it. For example, I had a data management job where I need to put prices into a platform. The vendor, however, is a bag of dicks and sent a horrid .csv file. Each item has 4 to 9 rows. 3 of them standard for each item, then the rest are description fragments up to 6 rows long.

This file takes 3 days for the average person to go through and manually merge all of the descriptions.

I jump in there and with a few extra columns make it to minutes of work.
Since there is a blank row in between each item, I can make to the right (lets say F):

=IF(E2="",0,F1+1) [instead of having a name in the header row, the formula needs to be kick started with a 0]

This creates a column that counts off how long each item is.

Then to the right of that:

=IF(F2=0,"",IF(F3>F2,"",F2))

Dont know if this is necessary, but makes it easier for me to navigate and interpret, blanks out everything but the final rows in the items. Then I can make a big nasty nested IF that concatenates based on the number being reported. Something like (in, lets say H5):

=IF(G5="","",IF(G5=3,C5,IF(G5=4,CONCATENATE(C4,". ",C5),IF(G5=5,CONCATENATE(C3,". ",C4,". ",C5)...)

Boom. Copy column H, paste elsewhere, remove blank rows, you have all of your merged descriptions in about 7 minutes time.

Learning how to impose incrementation into a dataset to tick off an attribute is a powerful trick that empowers you to create datasets that allow you to manipulate the data in specific and useful ways."




I'll start by building up an example file:



This is the general idea. In this file, each item has one row that is always there. Then the file goes on to include additional rows for fragments of the description. It almost perfectly copies the file that I worked with that vendors would send to me, and that I then would need to reformat to upload into the platform I maintained. Just like this, but think in the neighborhood of 7500 items per file.

So, the first step:

 =IF(E2="",0,F1+1) [instead of having a name in the header row, the formula needs to be kick started with a 0]




Removing the space to make the formula functional, and then autofilling:



Then on to the next column:

=IF(F2=0,"",IF(F3>F2,"",F2))

 

Space removed and formula autofilled:



So now that the incrementation is in place, we can write a formula that does what existing workers were spending days on end doing:

=IF(G5="","",IF(G5=3,C5,IF(G5=4,CONCATENATE(C4,". ",C5),IF(G5=5,CONCATENATE(C3,". ",C4,". ",C5)...)



Space removed and formula autofilled:



 Do some clerical work, copy column G, paste it elsewhere, remove blanks, go back to your original spreadsheet, delete column c, delete blank rows, paste your new list of descriptions back into the original spreadsheet, and viola, you just did in 10 minutes what took others days on end of manual copy pasting, or CONCATENATING one item at a time.



 "Incrementation" is not a magical keystroke command that fixes things, but it is a powerful tool for the manipulation of data.

In this instance, the hurtle was that the simple process, merging the descriptions, didn't have a quick fix because there were 3 different variations on "merge the descriptions" (Merge 2 rows of descriptions, merge 3 rows of descriptions, and merge 4 rows of descriptions). A couple simple incrementation formulas created a key that identified which variation each item fell into, and allowed us to create a formula that would perform the correct function dependent on which variation was present.