Sunday, November 19, 2017

Decimate Data with JavaScript's Filter Function

In 2011, the Javascript standard added several new array functions that greatly simplify the process of working with large datasets.  One of the challenges when working with "big data" is when you go to chart that data, rendering every individual data point can seriously hamper the performance of your chart, increasing render times or locking your browser for a while, or crashing.

As I skimmed through several internet postings on different elaborate methods for decimating data (that is, reducing the resolution of a dataset through the application of some function), I realized that the filter function would perform all the work easily.

Consider an array of values:

var a = "4.901712886872069,4.905571030847647,4.909414346738851,4.913242948087607,4.91705694713669,4.92085645484947,4.92464158092928,4.928412433838424,4.932169120816823,4.93591174790032,4.939640419938632,4.94335524061298,4.947056312453386,4.950743736855651,4.954417614098028,4.958078043357581,4.961725122726249,4.965358949226622,4.968979618827425,4.972587226458727,4.976181866026878,4.97976363042917,4.983332611568248,4.986888900366257,4.990432586778738,4.9939637598082856,4.997550054547602,5.001123533723662,5.004684288602818,5.008232409479947,5.011767985692191,5.015291105632452,5.018801856762651,5.022300325626761,5.025786597863601,5.029260758219422,5.032722890560263,5.036173077884096,5.039611402332775,5.043037945203758,5.046452786961651,5.049856007249539,5.053247684900134,5.05662789794673,5.059996723633982,5.063354238428486,5.066700518029206,5.070035637377705,5.07335967066822,5.076672691357564,5.079974772174871,5.083265985131165,5.086546401528796,5.0898160919706985,5.0930751263695155,5.096323573956563,5.099561503290659,5.102788982266802,5.106006078124715,5.109212857457251,5.112469599977525,5.115715770546765,5.1189514375797165,5.122176668829165,5.125391531394446,5.128596091729822,5.131790415652724,5.134974568351865,5.138148614395217,5.141312617737873,5.144466641729777,5.147610749123334,5.150745002080902,5.153869462182165,5.156984190431394,5.160089247264595,5.163184692556541,5.166270585627705,5.169346985251077,5.172413949658882,5.175471536549192,5.178519803092442,5.181558805937842,5.184588601219694,5.187609244563616,5.190620791092663,5.193623295433372,5.1966168117216975,5.199601393608879,5.202577094267201,5.205543966395682,5.208556759340414,5.211560502621755,5.214555250442706,5.21754105652075,5.220517974093625,5.223486055925034,5.2264453543102425,5.229395921081616,5.232337807614062,5.235271064830402,5.238195743206656,5.241111892777258,5.2440195631401885,5.246918803462037,5.249809662482995,5.252692188521762,5.255566429480405,5.258432432849123,5.261290245710962,5.264139914746451,5.266981486238185,5.269815006075326,5.272640519758055,5.275458072401956,5.278267708742335,5.281069473138485,5.283863409577888,5.286649561680356,5.289427972702121,5.292198685539861,5.295011909539312,5.2978172415063565,5.300614725596722,5.303404405596596,5.306186324926734,5.308960526646518,5.311727053457957,5.314485947709622".split(',');

Now let's say I want to decimate that so that a small status chart only ever shows around 10 data points.  More or less are fine as long as I'm not continuing to show more data as I add several more batches to this initial set.  The following filter function uses the modulo operator to make a simple reduction function.

a.filter(function(elem,index,array){return index % (Math.round(array.length/10)) ==0;});

...yields a set of 10 data points, one for every time the array index of the value divided the dynamic value  array.length/10 is equal to zero.  Math.round simply makes our dynamic value a whole number, which works well with modulo (%) against the integer index numbers in the array.

"4.901712886872069,4.950743736855651,4.997550054547602,5.043037945203758,5.086546401528796,5.128596091729822,5.169346985251077,5.208556759340414,5.246918803462037,5.283863409577888"

Now any charting we may do with this data will continue to tell the story, sacrificing resolution in exchange for performance.

Thursday, November 3, 2016

New Tools, New Skills

Every so often in my career, I find that I have been using a set of tools for a long enough period of time that they have been surpassed by the market.  For me, this has been Eclipse for Java development, and SVN for code check-in and check-out.  All the cool kids are using IntelliJ and git these days, so with some time off between contracts, I have spent the week getting familiar and functional with these tools.  I wish I had done so sooner.

Even with Eclipse at release Neon, it still has some ways to go to catch IntelliJ.  While the environments have very dissimilar setups, some conventions are alike enough that a few tutorials is all it takes to get up to speed.  IntelliJ does more for the developer and in doing so saves a lot of time.  Eclipse, continues the tradition of an open ended environment made better through many many plugins.  It doesn't get too opinionated but that sometimes leaves the way forward for a particular path a little less clear.  IntelliJ, by contrast, seems to have a slightly more opinionated view of how things should be done, so while there is much to learn to move from one to the other, in reality, there isn't as much to learn to become productive.

SVN, a long time bacon-saver for many developers, including myself, is a central repository system.  While it is simple to understand and use, it is often the case that if you want to fork and merge code, the work is not as straight forward as it otherwise could be.  You can check out from a certain branch, but merging has always felt like surgery to me.  Git, and specifically the very nice website built around it, Git-hub, makes this a bit more clear and automated.  While the git console paired especially with certificate base authentication can be cryptic and frustrating, using Git-Hub and the integrated support in IntelliJ-IDEA makes it as painless to use as SVN, but much more painless when it comes to merging branches.

Another tool I have been longing to embrace for some years now is JUnit.  I didn't even know what assertive testing was a few years ago.  I've always written my own test code but never thought much beyond automating the calling of my API to make sure things did what I expected.  There are a whole bevy of testing techniques that go much more beyond this and I've picked them up one at a time as client driven work has allowed.  While I've been trying to get a recent client to let me re-baseline a few projects taken in from the off-shore labs and built them as Test Driven Development projects from the ground up, I have now had the time to do this on my own and find the experience to be gratifying.  Knowing that you are thinking about your code from a testing perspective puts new mental focus on lean code that follows the DRY and SOLID approaches.

This meshes very well with my shift to daily workouts this year.  Self discipline, I find, suites me, and using these technologies has made that both easier and more meaningful in terms of improving the consistency and reliability of my code.  I'm looking forward to future projects and contracts, and expect the future to bring more improvements to the tools we love and use.

Sunday, December 20, 2015

Retooling with Abstraction

Below you will find a presentation I prepared over a year ago for a customer who was contemplating replacing their entire software back bone, moving from one legacy full stack to another.  The reasons for their contemplating this are ultimately not material to the discussion, except in that it provided the catalyst for my own thought processes.  I could see momentum behind the "do something" mantra was building and sought to help them avoid rushing off the cliff in a way that would result in huge disruptions.

My slide deck was intended as an introduction to both the MVC design pattern and software abstraction as a concept, presenting it at a time when they would most benefit from adopting the sort of approach it represented.  Ultimately they took a similar path to the one I outlined, but with a critical difference in that they moved to more SASS systems rather than build their own.

If you have questions, please feel free to ask me.  I've helped large companies do this sort of transition on many platforms - the tools themselves are not as important as the way they are employed.

The Case for Retooling and Abstraction

Feel free to share this if it makes the conversation with your management or stakeholders easier.  I only ask that you share it from the source link so that I have some idea as to how widely it is used.

Monday, December 14, 2015

Free File Recovery Tool: PhotoRec


CG Security has a free tool for photo and file recovery called PhotoRec.

http://www.cgsecurity.org/wiki/PhotoRec

You can get it for most file systems as a stand-alone tool which will run in a command-line style interface.

I used it this weekend to recover files for a friend that went missing after his upgrade to Windows 10.  So, a note about that - if you have any data outside your home path, usually C:/users/yourname, you will lose it when you upgrade to Windows 10 unless you take steps to back it up.  

In my friend's case, the files were still on his computer, by the grace of God, and not actually over written with new data by the installation of Windows.  

I will make a guess that if you're reading this post, you've lost some files (or more likely have a friend who has) so let's review a couple of things everyone should do before the walk-through.

1. Always back up your data.  Put on another computer, a server, a cloud service such as Windows One Drive, or DropBox or Google Drive or set up a home backup server.

2. Use an automated tool to make your file backups, wait for it..., automatic.  There's nothing worse than having spent money or time on a backup solution that provides no benefit because you forgot to use it.

3. On Windows, as with Linux, User files should always be kept in the user's home directory.

So, a quick how-to for Photo-Rec, in this case, on a Windows Laptop that offered 2 USB ports.

1. Download the tool to a USB drive from which you will run it.

2. scrounge up a couple of extra USB drives for recovered data.

3. Boot the system and plug in the USB drives.

4. Run Photo Rec, follow the default selections for the most part, and then navigate to your second USB drive and use it as the target for recovering your data.

It's pretty much that simple.  I suggest, though, investigating the file filters before you run the tool.  It will find everything that hasn't been totally obliterated.  Narrowing the search filter will save you time and reduce huge amounts of false positives from winding up in the recovery folders created by the tool.  You will still need to review the recovered data and cherry pick the things you wanted.

We were looking for office documents in our example and found a surprising number of things that were not exactly office documents in our recovered files folder.  You can easily spot the valid files by looking for complete file names and turning on the Author's column in Windows Explorer.  You can also use Windows Search on the recovery target USB drive with advanced options to search for text within the recovered files.  

While this process is pretty easy for a technically capable person, it does require some experience to pull off without making matters worse. If you need a hand, leave a comment - I'm happy to help for a reasonable fee.

Tuesday, November 10, 2015

Refactoring During Requirements Analysis

Refactoring is a process of reviewing computer code to look for improvements.  Generally these improvements identify redundancies and replace repeating code with modified general purpose (or specific purpose) code, relocate variables for easier maintenance, involve performance reviews and resulting tweaks, and so on.  Given a proper project life cycle, this should be an ongoing effort.  But there is a not-so-obvious time before the writing of the code even commences that the refactoring mind-set is highly applicable.

I'll use a recent case by way of illustration.  I spent the better part of last November through June writing code to handle the transformation of payroll data from one system, through an effective in-memory pivot table, out to another format and then uploading that resulting data to a remote webservice.  The rules for pivoting the data were different for almost every state.  In our case, I wound up creating a base class for the timesheet workflow and then subsequent subclasses to handle each specific location's special rules.  In all we identified three requisite subclasses to effectively handle every state in the US.

Getting to the point where I could clearly see the overlapping functionality for each workflow took some time.  We went through 9 iterations to finally get California's ruleset working correctly and wound up making changes to the superclass a few times along the way.  But before we even got to that, I started with a look at the customer supplied flow charts.  They were tweaked by the project manager / business analyst and then I had a go at them.  We worked back and forth with the business as I found logical black holes and contradictions and rules were decided as we went.  This "refactoring" of the paper flow chart provided an iteration phase during design that allowed us to have a pretty good, but  far from perfect, topographical view of the class structure that would result before I began writing the code.

Semantics being what they are, we discovered during testing that several terms suffered from the oft assumed incorrect perspective and definitions.  We thought we knew what the business was saying because they used terms that were common, but within the context of their business flow, some common terms had special meanings.   Had we refactored, or iterated with a view to refinement and correction, during the requirement gathering process as well as the design process, we might have caught those issues.  Some of them were pretty sneaky though.  But, that should be a cautionary tale to always include an insanely detailed glossary of terms.

So, next time you're faced with a complex task, take it in several passes, make paper models like sketches and flow charts and really question your assumptions and dig into the details.  There's always pressure to rush into the act of producing, whether building, coding, or making drawings.  Resist the headlong rush, take the time to really understand the goals, and go over it again and again until you're sure you've got it.  Even then, you'll be refactoring later.  :-)

Friday, October 30, 2015

Finding out what works - with Fuzzing

Fuzzing is an interesting concept.  From a software testing perspective, it's a means of finding out where code will break by flooding it with a steady stream of commands, good bad or indifferent, to see if you can break it.  The test input is literally garbage, which could be described as fuzzy data, purpose made to be random enough to find the things you didn't think of that might break your code.

I like to use fuzzing for a secondary purpose: to find out what works, not what is unexpectedly broken.  I remember hearing about an undocumented command with a product I was working with.  It took integers as arguments and had a different UI element for each different integer value passed.  It occurred to me to flood it, using a loop, with a series of numbers well beyond the published range to see what else might be hiding in the product code.  I was rewarded with several hits, some of them very useful.  You might not ever want to build your application around undocumented features, but sometimes they are stable and useful and it can be fun to show off a bit by being able to leverage them.

My recent example for using PLINK to send commands over TCP to a Smart TV presented a potential surface for attack with this method.  I wanted to find out if, with any of the published 4 bit codes, any serially entered 4 bit number would combine to make a code that could return some additional useful information from the TV.  Unfortunately, my testing struck out, but the simple CMD batch file I came up with is very useful for documenting the results.



@echo off 
echo Fuzzing Interface... 
set cmdlist=TVNM MNRD SWVN IPPV WIDE 
setlocal ENABLEDELAYEDEXPANSION 

for %%a in (%cmdlist%) do ( 
        for /L %%n in (1,1,9999) do ( 
                set "cmd=%%a%%n  " 
                echo !cmd! >cmd.txt 
                echo !cmd! 
                plink 169.254.253.20 -P 10002 -raw < cmd.txt >results.txt 
                timeout 1 
                set /P r=< results.txt 
                if [!r!] EQU [OK] echo !cmd! !r!>> OKcommands.txt 
                if [!r!] NEQ [ERR] echo !cmd! !r!>> goodcommands.txt 
                 if [!r!] EQU [] echo no response 
)) 
echo Done. 
@echo on 

First this generates a command fine, called cmd.tx by combining a command sequence with an integer.  So our first command would be TVNM1, and our next would be TVNM2 on up to TVNM9999, the maximum range for the input.

Next, the code uses this command file as input to PLINK, and sends the response from PLINK to results.txt.  

Then, it reads the result file into a variable, r, which can be evaluated to see if the command that was used is sent to the file of things that simply return OK or to the file that returns something other than ERR.

It's not perfect, there are some gaps in the code I decided not to take the time to close because we had moved past the point in the project where it would provide useful information.  For one, the command text should always only be 8 characters and a carriage return.  This code doesn't trim the resulting command down as the integers grow.  I also could do a better job discriminating the useful output into different files, but a glance at the goodcommands.txt told me what I wanted to know.

There are lots of ways to use this approach and it can be employed in almost any system or language.  A windows cmd batch file is probably one of the more simplistic, if not powerful, ways to use this method.

Wednesday, October 21, 2015

Using PLINK and Batch Files to Control a Smart TV

Editors Note
What do you know - some new content already.  I think I've pulled all I'm going to pull out of the old archive.  There is so much there that holds little relevance for today it just doesn't make sense.  Going forward, new posts will appear as time and opportunity permit.




I had a customer requirement recently to control a 80" television from a small computer plugged into the HDMI port.  The little computer has one solitary USB port, and the television a single TCPIP port.  With a USB to TCP dongle, I was able to connect the computer to the network port for the television. 

The TV in this case is a SHARP AQUOS... a really nice TV if you have $1400 and want a picture the size of a large window. Communicating with it proved to be a challenge but with enough research and trial and error, I figured out what to do (many thanks to the internet and Google Search). 

First, it calls for Putty, the open source telnet client, rather than the basic Windows telnet client.  The former provides the ability to specify "raw" as the protocol, which is essential.  Secondly, I used plink instead of putty, the command line version by the same developer.  From there, I wrote two files for each command: a cmd line command structure, and a command file to send to the TV. 

TVON.BAT: 


plink 169.254.11.133 -P 10002 -raw < plinkTVON.txt 


This command connects to the IP address assigned by the TV for its port.  We use what is called a cross over or null cable - a specially wired network cable, to make the connection without a server between the TV and the computer.  The TV listens for commands on a configurable port number and defaults to 10002.  Using < plinkTVON.txt allows us to pass the content of the text file with the connection. 

plinkTVON.txt 


POWR1___{crlf} 

The SHARP televisions take 8 bit commands with a return code at the end.  The text file allows me to put a carriage return where I show the {crlf} above and so "executes" the command on the television.  Underscores are spaces. 

Now, one thing that puzzled me, I couldn't actually turn on the TV once I sent a POWR0___ command.  As it turns out, you have to send RSPW1___ to tell the TV to remain in a standby state capable of accepting the POWR1____ command.  The documentation does not make this clear and it was by the grace of God and a really random internet posting that I learned about that little bit of undocumented goodness.   

Now, we can schedule the television to turn on and off to save power when it's not being used to show videos and slide shows at the customers place of business.  This makes digital signs and internal communications a lot more configurable and powerful.  There is more to the overall solution that I may have time to discuss later, but this was the difficult part I thought worth sharing.