icd9x.hlp

.-Help for ^icd9x^                                                      Bill Rising.-Includes and Excludes Observations based on ICD-9 Codes--------------------------------    ^icd9x^ [^if^ exp] [^in^ range] [^,^        ^ind^iag(code list) ^exd^iag(code list) ^di^agvar(varlist)        ^inp^roc(code list) ^exp^roc(code list) ^pr^ocvar(varlist)        ^k^eep(varlist) ^d^rop(varlist) ^which^        ^sub^set ^replace^        ^nocheck^        Description-----------^icd9x^ is used for taking subsets of a data set by including or excludingvarious ICD-9 codes. The user has a choice of actually taking a subset ofthe original data set or just marking those cases which should be includedor excluded. The former is conceptually easier;  the latter allows more powerful cascading of conditions.Options-------^indiag^ specifies medical codes which should specifically be included,    ^exdiag^ specifies medical codes which should specifically be excluded.    ^inproc^ specifies procedure codes which should specifically be included,    ^exproc^ specifies procedure codes which should specifically be excluded.    See ^codelists^ below for specifying code lists.^diagvar^ and ^procvar^ specify the variables which should be searched for the    pertinent codes. These can be in the standard Stata varlist form,    i.e abbreviations, ranges and wildcards are allowed. Note:  ^diagvar^s    must be of type str5 and ^procvar^s of type str4. This means that there    are no decimal points.^subset^ tells the command to actually take a subset of the present data set.^replace^ tells the command to run the subsetting option regardless of the    `clean' status of the file, i.e. to run regardless whether the current    data have been saved since the last change.^keep^ and ^drop^ give the names of variables which will be used to mark the    observations which are to be included/excluded. These can be used    whether the subset option is specified or not. If the subset option    is not specified, and the variables do not exist, they will be created.     The final selections will be those for which keep==1 and drop==0     (i.e. for keep&!drop). If only inclusions are used, then only keep     need be specified;  if only exclusions are used, then only drop need be     specified.^which^ specifies that the marking variable will be set to the number of the    procedure/diagnosis in the procvar/diagvar varlists if an appropriate    code is found. This is useful for finding ^which^ surgical procedure    was the one of interest, for example.^nocheck^ tells the program not to check to see if the codelists contain    valid ICD-9 codes. This speeds up the program slightly, but is really     only useful when using icd9x from within another .ado file.Codelists---------There is a great deal of flexibility in the lists of ICD-9 codes which canbe specified in the codelists. The codelist can be a list of single codes,ranges of codes, or a mixture of the two.^Single codes^ are specified by giving the full ICD-9 code when the code isspecific, or the leading digits if the code should include all itssub-codes. Thus specifying a diagnosis code of        2      will mark all the codes from 200 to 299.99     25     will mark all the codes from 250 to 259.99     250    will mark all the codes from 250 to 250.99     250.0  will mark all the codes from 250.0 to 250.09     250.00 will mark 250.00 only Be sure to specify leading zeros for the codes below 100! E- and V-Codeswork in an identical fashion.^Code Ranges^ are specified by giving the starting and ending codes of arange. Codes are not filled out for ranges as they are for the singlecodes. So specifying 250-250.9 will not include the codes from 250.91 to250.99. Ranges may be specified for E- and V-Codes, also.Case is unimportant for the E- and V-Codes.Simple Examples---------------All these examples will presume a data set which contains the diagnosis variables ^pdiagcd^, ^sdiagcd1^, ..., ^sdiagcd9^, and surgical codes ^surgcd1^, ..., ^surgcd6^.Suppose that you wished to mark all those cases (using the variable ohboy)which had principal E-codes of E970 to E978 or E990 to E999, specify    ^icd9x, indiag(e970-E978 E990-e999) diagvar(pdiag) keep(ohboy)^Note that capitalization of E or V is unimportant (and forget that E- andV-codes cannot be principal diagnoses).Suppose that you wished to pull out all cases which had any diagnosis of265.5, but no diagnosis of 286.0-286.4 or 286.6-286.9. This could be doneby typing    ^icd9x, indiag(265.5) exdiag(286.0-286.4 286.6-286.9)^        ^diagvar(pdiag sdiag*) subset^Suppose that instead you wished to just mark the cases of interest(instead of subsetting the data set). This could be done by typing    ^icd9x, indiag(265.5) exdiag(286.0-286.4 286.6-286.9)^        ^diagvar(pdiag sdiag*) keep(good) drop(bad)^Assuming that the variables ^good^ and ^bad^ did not yet exist, this wouldform them, setting ^good^ to 1 for those observations which had a diagnosis code of 265.5, and setting ^bad^ to 1 for those observations which had oneof the excluded codes. To form the data set with only these observations,type ^keep if good&!bad^.To put the number of the first surgical code which had procedure 51.23 inthe marking variable, try    ^icd9x, inproc(51.23) procvar(surgcd*) keep(lap) whichAll the 51.23's will be marked, as usual, except that if the 51.23 showedup in surgcd3, the variable ^lap^ would contain a 3 because ^surgcd3^ isthe third variable in the varlist ^surgcd*^.A More Complicated Example--------------------------Note that selections are done by the equivalent of boolean `or's within each call. When using subsetting `and's are easy --- just run the command several times. When marking, the keep and a drop variables allow the use of boolean `and's, though in a somewhat roundabout fashion.Suppose that you wanted all patients which had *both* an ICD-9 procedure of 50 *and* and an ICD-9 procedure of 35. Doing this via subsetting is easy:    ^icd9x, inproc(50) procvar(surg*) subset^    ^icd9x, inproc(35) procvar(surg*) subset^Using marking is not as simple. Specifying inproc(50 35) will not work, since it will mark all observations with an ICD-9 procedure of 50 *or* 35. Instead, you need to call the program twice with two intervening steps:    ^icd9x, inproc(50) procvar(surg*) keep(good)^    ^gen byte bad = !good^    ^drop good^    ^icd9x, inproc(35) procvar(surg*) keep(good)^The poor souls having both procedures are kept via: ^keep if good&!bad^Notes-----Yes, Virginia, there *is* a good reason for using marks instead ofsubsets. Very simply put:  if a series of counts must be made on a largedata set, it is faster to mark and count then to keep reading the large data set off of a disk.Also:  the drop variable should be used in place of ^if^ or ^in^ statements. Instead of ^if bling^, for example, just type     ^gen byte bad=!bling^    ^icd9x .... drop(bad)^This is an unfortunate complication caused by the generalization needed to`play' with big data sets.Using either if or in will fill those observations which don't satisfy theif clause with missing values.Author------Bill Rising (brising@@jhsph.edu)Dept. of BiostatisticsJohns Hopkins University615 N. Wolfe St.Baltimore, MD 21205-2179