I’ve been playing around with my shark data. I know this is not the sort of data you would normally ingest into Splunk but when I’m testing different visualizations or new apps I like to use data that interests me. The down side is that the data is not always clean. Below I’m going to show you how to clean up Mixed case dirty data directly from search.
Dirty Data
In this case I have a field that has mixed case for a field value and returns three different version of the same value SCUBA diving, scuba diving and Scuba Diving.
Splunk search is not case sensitive but if I want to return the results in a chart I want Scuba Diving to be one value not 3.
There are a couple of different ways you can do this. One is configuring the props.conf. Check out the Admin manual.
Problem – Mixed Case Data
source="attacks.csv" Country=* Year>1966 Activity=*scuba*diving | Top limit=20 Activity
BEFORE
A quick Eval Function here will convert all of the values to the same case and they will be classed together as one category.
Lowercase
eval Activity=lower(Activity)
Uppercase
eval Activity=upper(Activity)
Propercase
source="attacks.csv" Country=* Year>1966 Activity=*scuba*diving | eval Activity = lower(Activity) | makemv delim=" " Activity | mvexpand Activity | eval A = substr(Activity, 1, 1) | eval B = substr(Activity,2) | eval A = upper(A) | eval Activity = A.B | fields - A, B | mvcombine Activity | eval Activity = mvjoin(Activity, " ") | top limit=20 Activity
BEFORE AND AFTER
Whilst this sort of cleaning activity is more commonly needed for static or periodic data dumps, for example exports from databases. We also see many instances and Use Cases with machine and real time data. A recent example was a requirement to clean up web logs generated from an online shop. Most of the cleaning was related to the user created data such as search terms as well as shop inventory items. I hope this has helped you find and clean up any dirty data issues you may have.
And a per usual if you have this sort of work that you need help with, the App Assembly is very happy to offer consulting services for all your Splunk needs.
Share this Post
One Comment on “Cleaning Dirty Data- Splunking JAWS”
It works quite well for me