-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathnwchem101_v1.tex
330 lines (255 loc) · 16.2 KB
/
nwchem101_v1.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
\documentclass[letterpaper,12pt]{article}
\title{NWChem 101: a tutorial for experienced quantum chemists
\footnote{
Copyright (C) 2009, Jeff R. Hammond. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. See \texttt{<http://www.gnu.org/copyleft/fdl.html>} for the full text of this license.
}
}
\author{Jeff R. Hammond \\
Leadership Computing Facility \\
Argonne National Laboratory \\
Argonne, IL 60439 \\
\texttt{[email protected]}}
\date{Last edited: \today}
\begin{document}
\maketitle
% \begin{abstract}
%
% \end{abstract}
\section{Introduction}\label{sec:Introduction}
The document (101) introduces the molecular quantum chemistry capability of NWChem in pedagogical manner. It should not replace the \textit{NWChem User Manual} (UM), but rather supplement it. The UM is organized by functionality and documents all options while 101 is organized from the most basic features to the most advanced.
\newpage
\subsection{Input File Structure}\label{sec:InputFileStructure}
Figure \ref{fig:WaterSimple} is a basis NWChem input file which introduces a number of key elements. The important input elements are described briefly in Table~\ref{tab:BasicInput}.
The \texttt{echo} directive should be considered essential --- it is the best way to prevent data rot. It is common to modify input files throughout a project, thus the only way to preserve the exact settings used for a given calculation is to retain the input file with the corresponding output. It is tedious to extract the original input settings from the output file, and in some circumstances, these may not be available at all (e.g. when \texttt{print low} or \texttt{print none} are used).
The \texttt{start} directive is also essentially since it controls the names of files that NWChem will create. It is critical that the RTDB file be unique, as running two jobs simultaneously with the same prefix will frequently result in an error or incorrect behavior. Using \texttt{title} is optional and less useful than \texttt{echo} with a commented input file.
Comments can be introduced to input files using the \texttt{\#} symbol. These comments will be echoed along with the rest of the input file. One can comment after a regular input line as demonstrated in the example above. Suggested comments include: (1) a description of where the input geometry came from (e.g. ``this is the B3LYP/6-31G optimized geometry''), (2) the citation output from the EMSL Basis Set Exchange (\texttt{bse.pnl.gov}), when this is the source of the basis set input, and (3) annotation of undocumented options users, system-specific paramters and \texttt{set} directives (described in Section~\ref{sec:SetDirectives}). Other important characters include \verb";" and \verb"\", which terminate and extend input file lines, respectively.
The \texttt{memory}, \texttt{permanent\_dir} and \texttt{scratch\_dir} directives are the most important system-dependent settings one must determine. If left blank, NWChem will be limited to 400 MB of memory and use the current working directory (cwd) for all permanent and scratch files. The memory directive requires great care for some methods, and will be discussed in detail in Section~\ref{sec:ParallelPerformance} and Section~\ref{sec:TCEInputFiles}. The simple rule is that the total memory usage should be no more than 80-90\% of the physical memory, due to the memory used by the kernel, MPI and other system libraries. The directories used for permanent and scratch files are intended to be on shared and local filesystems, respectively. The RTDB and other critical files which are not too large in size must be seen by all nodes, whereas scratch files, such as two-electron integrals, are designed to be accessed locally. As will often be the case, the TCE is an exception to this rule, as some four-index transformation algorithms make intensive use of the shared filesystem, which should not be \texttt{/home} on most supercomputers, which have high-performance shared filesystems for scratch, often instead of local disk.
\begin{figure}
\caption{A very simple input file.}
\label{fig:WaterSimple}
\begin{verbatim}
1 echo
2 start water
3 title "a simple calculation of water"
4 # COMMENT: these options are system specific
5 memory total 2000 mb
6 permanent_dir /home/jeff/scratch/nwchem
7 scratch_dir /home/jeff/scratch/nwchem
8 geometry angstroms # angstroms are the default units
9 O 0.00000000 0.00000000 0.11726921
10 H 0.75698224 0.00000000 -0.46907685
11 H -0.75698224 0.00000000 -0.46907685
12 end
13 basis
14 * library 6-31G*
15 end
16 task scf energy
\end{verbatim}
\end{figure}
\begin{table}[!hp]
\label{tab:BasicInput}
\caption{Important input directives to be used in any type of calculation. Line numbers refer to Figure~\ref{fig:WaterSimple}. See Section~\ref{sec:InputFileStructure} more detail descriptions.}
\begin{tabular}{lcl}
\hline\hline
directive & line & purpose \\
\hline
\texttt{echo} & 1 & prints the input file to output stream \\
\texttt{start} & 2 & initializes the RTDB and other files using given prefix \\
\texttt{title} & 3 & this title will be printed with each module's output \\
\texttt{\#} & 4,8 & stops parsing at the \texttt{\#} symbol \\
\texttt{memory} & 5 & configures memory usage \\
\texttt{permanent\_dir} & 6 & the directory where persistent files will be stored \\
\texttt{scratch\_dir} & 7 & the directory where scratch files will be stored \\
\texttt{geometry} & 8 & input block for the molecular geometry \\
\texttt{basis} & 13 & input block for the basis set \\
\texttt{task} & 16 & NWChem will execute the directive given here \\
\hline\hline
\end{tabular}
\end{table}
\newpage
\subsection{Geometry Input}\label{sec:GeometryInput}
Geometry input may be specified in both cartesian and zmatrix format. Point-group symmetry will be detected automatically but in some cases it must be set manually. First, not all modules implement non-degenerate point-group symmetry, in which case the symmetry must be set to $D_{2h}$ or one its subgroups (see Figure~\ref{fig:NeonGeometry}). Some modules detect non-degenerate point-groups and disable symmetry altogether so it recommended that one set the appropriate symmetry manually for TCE jobs. In other cases, it is desirable to use symmetry to generate a molecule from a fragment. An example is given in Figure~\ref{fig:C60geometry}. Finally, an example of a non-trivial zmatrix geometry input is given in Figure~\ref{fig:HexafluorobenzeneZMatrix}.
The UM is an excellent resource for the geometry input block so we will not elaborate further on this topic.
\begin{figure}
\caption{The geometry input for an atom with a forced reduction in symmetry.}
\label{fig:NeonGeometry}
\begin{verbatim}
1 geometry
2 symmetry d2h
3 Ne 0.0 0.0 0.0
4 end
\end{verbatim}
\end{figure}
\begin{figure}
\caption{Geometry input for C$_{60}$ using symmetry completion.}
\label{fig:C60geometry}
\begin{verbatim}
1 geometry print xyz units bohr
2 symmetry d2h
3 C 0.0000000 6.5776597 1.3147448
4 C 2.2251790 5.7277170 2.6899811
5 C 1.3752363 4.3524807 4.9151601
6 C 4.3524807 4.9151601 1.3752363
7 C 2.6899811 2.2251790 5.7277170
8 C 4.9151601 1.3752363 4.3524807
9 C 5.7277170 2.6899811 2.2251790
10 C 6.5776597 1.3147448 0.0000000
11 C 1.3147448 0.0000000 6.5776597
12 end
\end{verbatim}
\end{figure}
\begin{figure}
\caption{Geometry input for hexafluorobenzene using the zmatrix format.}
\label{fig:HexafluorobenzeneZMatrix}
\begin{verbatim}
1 geometry units bohr
2 zmatrix
3 X
4 C 1 RXC
5 C 1 RXC 2 A60
6 C 1 RXC 3 A60 2 D180
7 C 1 RXC 4 A60 3 D180
8 C 1 RXC 5 A60 4 D180
9 C 1 RXC 6 A60 5 D180
10 F 1 RXF 2 A60 7 D180
11 F 1 RXF 3 A60 2 D180
12 F 1 RXF 4 A60 3 D180
13 F 1 RXF 5 A60 4 D180
14 F 1 RXF 6 A60 5 D180
15 F 1 RXF 7 A60 6 D180
16 constants
17 RXC 2.63428
18 RXF 5.14195
19 A60 60.00
20 D180 180.00
21 end
22 end
\end{verbatim}
\end{figure}
\newpage
\subsection{Basis Set Input}\label{sec:BasisSetInput}
The two best ways to specify the basis set in an input are to (1) use the NWChem basis set library (BASLIB) and (2) copy the input generated by the EMSL Basis Set Exchange (BSE). While the NWChem basis set library is extensive, very recent or highly specialized basis sets must be obtained from the BSE. Common basis sets in the BASLIB include those from the Pople series (e.g. 6-31G*), the Dunning series (e.g. cc-pVDZ to d-aug-cc-pV6Z for many elements) and specialized basis sets such as Roos ANO DZ, Sadlej pVTZ (POL), Ahlrichs, DGauss and DeMon fitting basis sets, and many basis sets for heavy elements (e.g. LANL2DZ and CRENBL ECP).
Figure~\ref{fig:BasisLibrary} demonstrates the various ways to use the BASLIB. One can specify basis sets for each atom or for all the atoms, provided the basis set is defined for all atoms. One can also specify more than one basis set for each atom, which is needed for adding diffuse functions in some cases. The use of quotation marks and/or underscore characters is necessary for some basis sets.
One must carefully consider the type of angular functions used in conjunction with different basis sets. For example, the Pople basis sets were parameterized for use with cartesian d-functions, and the use of spherical functions will lead to incorrect energies compared to other codes. For the Dunning basis sets, one should use spherical angular functions, both to avoid linear dependency but also because the cost of the extra functions for higher angular momentum is prohibitive in correlated calculations. The default is cartesian angular functions, so one must explicitly specify \texttt{spherical} when required.
We will see later that one can name basis sets arbitrarily and swap them as needed as part of advanced techniques for convergence or as part of a multi-task input file.
\begin{figure}
\caption{Specification of the three types of basis sets used in NWChem. One must specify the atomic orbital basis set (\texttt{"ao basis"} is the default if not specified) for all calculations, while \texttt{"ri-mp2 basis"} and \texttt{"cd basis"} are used for RI-MP2 and DFT with density-fitting, respectively.}
\label{fig:BasisLibrary}
\begin{verbatim}
1 basis "ao basis" cartesian
2 C library 6-311G*
3 H library 6-31G
4 end
5 basis "ri-mp2 basis" spherical print
6 * library "cc-pVTZ-RI"
7 * library "aug-cc-pVTZ-RI_diffuse"
8 end
9 basis "cd basis" spherical
10 * library ahlrichs_coulomb_fitting
11 end
\end{verbatim}
\end{figure}
\newpage
\subsection{Task Input}\label{sec:TaskInput}
The full scope of options for the \texttt{task} directive are documented in UM 5.10.1, but we review the basic options here. The format of the \texttt{task} directive is ``\texttt{task <theory> <operation>}'' such as ``\texttt{task scf frequencies}'' or ``\texttt{task mp2 optimize}.'' Analytic gradients and frequencies are not available for all methods, so one should be careful not to request these operations for such methods if numerical derivatives are undesirable.
In multi-task input files (see Section~\ref{sec:MultipleTasks} for details) it is often desirable to continue the calculation despite the failure of one task. This can be effected by adding \texttt{ignore} to the end of the task input line.
\begin{table}[!hp]
\label{tab:TaskOptions}
\caption{A list of operations which are efficient (i.e. do not use numerical derivatives) for popular methods. Notation: E=energy, G=gradients (geometry optimization), H=hessian (vibrational frequencies). The availability of open-shell variants and excited-states are also listed.}
\begin{tabular}{cccc}
\hline\hline
method & efficient operations & references & excited-states \\
\hline
Hartree-Fock & E, G, H & RHF, ROHF, UHF & Yes \\
DFT (GGA/hybrid) & E, G, H & RHF, UHF & Yes \\
DFT (meta-GGA) & E, G & RHF, UHF & No \\
DFT (DHDF) & E & RHF, UHF & No \\
MP2 (semidirect) & E, G & RHF, ROHF, UHF & No \\
MP2 (direct) & E, G & RHF & No \\
RIMP2 & E, G & RHF, UHF & No \\
CCSD/CCSD(T) & E, G & RHF & No \\
TCE & E, G$^{\dag}$ & RHF, ROHF, UHF & Yes \\
\hline\hline
\end{tabular}
$^{\dag}$ The TCE gradients calculations are less efficient than those for the energy alone due to the use of spin-orbital integrals.
\end{table}
\newpage
\section{Basic Input Files}\label{sec:BasisInputFiles}
A number of non-trivial examples of various capability are presented in this section.
\newpage
\subsection{Hartree-Fock (HF)}\label{sec:HartreeFock}
There are a variety of ways to run a Hartree-Fock calculation in NWChem. Both the SCF module \textit{and} the DFT module can perform a Hartree-Fock (HF) calculation. As the solvers in the two modules are different, the distinction between the two modules for HF is non-trivial. Other critical aspects of the HF input are the initial guess and the storage of two-electron integrals. A good initial guess will reduce the number of iterations dramatically, which is critical for when the two-electron integrals must be recomputed each iteration (the direct method). In the semidirect approach, integrals may be cached in memory, on disk, or both, and the storage used for each is controlled by the user.
The various ways to run HF will be presented as extensions to the input file in Figure~\ref{fig:WaterSimple}, so line numbers will start at 16.
\begin{figure}
% \caption{A very simple input file.}
\label{fig:DirectSCF}
\begin{verbatim}
16 scf
17 direct
18 end
19 task scf energy
\end{verbatim}
\end{figure}
\begin{figure}
% \caption{A very simple input file.}
\label{fig:DirectDFT}
\begin{verbatim}
16 dft
17 xc HFexch
18 direct
19 end
20 task dft energy
\end{verbatim}
\end{figure}
\begin{figure}
% \caption{A very simple input file.}
\label{fig:SemidirectSCF1}
\begin{verbatim}
16 scf
17 # use 100MW = 800 MB of memory for caching integrals
18 semidirect memsize 100000000 filesize 0
19 end
20 task scf energy
\end{verbatim}
\end{figure}
\begin{figure}
% \caption{A very simple input file.}
\label{fig:SemidirectSCF2}
\begin{verbatim}
16 scf
17 # use 100MW = 400 MB of memory and 8 GB of disk for caching integrals
18 semidirect memsize 50000000 filesize 1000000000
19 end
20 task scf energy
\end{verbatim}
\end{figure}
\newpage
\subsection{Density-Function-Theory (DFT)}\label{sec:DensityFunctionTheory}
\newpage
\subsection{Perturbation Theory (MP2)}\label{sec:PerturbationTheory}
\newpage
\subsection{Coupled-Cluster Theory (CCSD)}\label{sec:CoupledClusterTheory}
\newpage
\subsection{Multi-Configuration SCF (MCSCF)}\label{sec:MultiConfigurationSCF}
\newpage
\subsection{Computing Molecular Properties}\label{sec:MolecularProperties}
\newpage
\section{Advanced Input Files}\label{sec:AdvancedInputFiles}
\newpage
\subsection{Parallel Submission and Performance Optimization}\label{sec:ParallelPerformance}
\newpage
\subsection{Multiple Tasks}\label{sec:MultipleTasks}
\newpage
\subsection{Converging SCF/DFT}\label{sec:ConvergingSCF}
\newpage
\subsection{Effective Core Potential (ECP) Basis Sets}\label{sec:ECPBasisSets}
\newpage
\subsection{Custom Density Functionals}\label{sec:CustomDensityFunctionals}
\newpage
\subsection{Undocumented Options}\label{sec:UndocumentedOptions}
\newpage
\subsection{Set Directives}\label{sec:SetDirectives}
\newpage
\section{TCE Input Files}\label{sec:TCEInputFiles}
\newpage
\end{document}