

Use REPLACE INTO to fill the second table with data from the first, and you will in the process generate unique numeric keys for all the businesses. Add a UNIQUE key which includes ALL all columns in the table except the primary key. This may take some time.Ĭreate another ISAM table having an auto increment primary key as well as all columns from the first table which are required to uniquely identify a business (e.g the ones that were indexed, but nothing related to employees). Once the data is imported, use ALTER TABLE to add required indexes, in essence everything needed to uniquely identify a business (name, address, state etc).

I have never run out of memory when doing this with larger data sets than yours, on pretty old servers with alot less memory than is used today. ISAM should reduce memory consumption and get all data in there the fastest possible way. Try using LOAD DATA INFILE into an ISAM table with few or no indexes. And this is only with 1/4th of the records loaded. I've tried reading the file into SQL first and then using SQL to get a unique list of business_name/address/city/state combinations that I could then insert into the businesses table, but my server runs out of memory. Note: it is NOT safe to assume that the file is sorted. How do I go about efficiently reading this file in so that the employees/businesses are created and referenced as needed? I'd like to read this in to two tables structured with the employees table having a foreign key like so: businesses (id int, name varchar, address varchar, city varchar, etc.)Įmployees (id int, name varchar, business_id int)

If there are multiple employees at the business, there will be one row for each employee with the business name, address, etc. I have a massive flat text file with ~4M records that I would like to read into MySQL using Ruby/Rails, Python, or any other method.īusiness_name,address,city,state,zip,employee_name
