Other Required Files.
Ok, you have a web host so it is time to load your pages. Naturally, you have to upload all the HTML pages and all the images associated with them. Most hosts provide a file manager, but FTP is much faster and easier. I use Mozilla Firefox web browser with the "fireFTP" add-on. For sheer convenience, nothing beats it.
There are a number of extra pages which need to be added. Where there is a directory that can be accessed from a web browser, if no default HTML file is found, a directory listing is shown. This isn't what you want! The most important directory to keep people out of is 'cgi-bin'. You probably don't want your PHP directories made public either. You must put an "index.html" file in each of these directories. This is NOT the same 'index.html' that is your front page! These protective "index.html" files can be made like a regular "Not Found" page, or they could be polite but firm as in "Sorry, you are not permitted in this area.", or they could be amusing such as "Hey, you just found the edge of the world! Don't go any farther or you'll fall off!". Simply put, it doesn't matter what they contain, they just have to be present.
But that isn't all! Soon you will start to notice a number of "Not Found" items appearing in your logs. There are three non-HTML files that should be on your server, that are often not even mentioned in the hosting company's help, or by the control panel software guides.
The first is "favicon.ico". I always thought that this was optional but after looking at the raw logs of this site, I noticed that Internet Explorer version 7 requested this three or four times in a row, then finding it not present, failed to load any page. Perhaps this is just a bug in IE7 but to be on the safe side, I quickly made up a favicon. It looks nice anyway!
A favicon is NOT a regular bitmap with the ico extension, as are most Windows icons. A favicon is a 16 x 16 pixel image that can include transparency if desired. I made mine with a regular graphics program, saved it as a png, and then used the (Windows) program "png2ico.exe" to convert it into "favicon.ico". This new "favicon.ico" file is then uploaded into the root directory of your server (normally public_html). Don't put it in your images folder! Now there are no problems with visitors using IE7.
After a few days or a week at most, you will start being visited by bots. These are the search engine crawlers, spiders, robots - they have so many names! Google-bot is normally the first. They first look for "robots.txt". If it is not present, it doesn't matter because they will scan your site anyway. However, you will get a "Not Found" recorded in the error logs. This can be quickly remedied by just putting an empty file named "robots.txt" in your root directory. More likely, you will want to keep these bots out of your cgi-bin and other sensitive directories, otherwise you could find your humorous Not Found pages listed in Google!
We then have to add a few lines to "robots.txt". For each directory, we add two lines like this:
User-agent: *
Disallow: /cgi-bin
The "robots.txt" file could be as simple as this or much more complex. You can find information and sample files in many places on the web. Google it!
Now that your site has been found by the search engines, we can give them a little further help. A sitemap will allow the bots to cover your entire site easily and will inform them when pages are added or updated. The easiest way to create a sitemap is to do it online by visiting http://www.xml-sitemaps.com
This site will scan your site, then allow you to download various versions of the generated sitemap. You need the uncompressed "sitemap.xml" to be placed in your root directory. You can also download a "sitemap.html" which can be useful for human readers too - particularly blind users. Remember to make a new sitemap each time you update your site.
Now there shouldn't be any "Not Found" errors in your logs. You can take one more important step by checking for broken links. Again, this is easily achieved using online checkers.
Having added files, it is now time to remove some. Most hosts give you a few scripts ready installed in your cgi-bin directory. If you don't intend to use them, remove them as they could be a security risk. Even if you do want to keep them, it is a good idea to rename them. Use the weirdest names that you can think of to prevent their possible use by hackers. Unfortunately, this renaming cannot be done with multiple PHP scripts unless you edit each one to correct the internal naming. As common PHP scripts aren't very secure, you have to hide them as much as possible and rename the outer containing directory.
As a nice gesture that you will come to appreciate in time, you can keep an archive. That is, each time you change your pages, keep the old pages in a separate folder. Later, you will probably enjoy looking back to see how your site evolved.