Yolo is a well-known and popular object detection architecture. It is also very easy to use as things like model download and using various input sources are handled by the detect.py script. The Tensorflow Object Detection API could learn a thing or two from this… You can download everything from Github via git or by downloading a ZIP.
For video output to work on Windows, you might need to install the opencv-contrib-python package via conda or pip. All other requirements are listed in requirements.txt – if you use Anaconda, you should try downloading them with conda first before resorting to pip.
What’s really fun with Yolo is that you can directly use your webcam for real-time object detection. You can e.g. do this with the high-resolution 5l6 model with this command:
python detect.py --weights yolov5l6.pt --source 0 --nosave
I ran into the issue that the webcam image size was stuck at 640×640. Changing the default size in detect.py via
parser.add_argument('--imgsz', '--img', '--img-size', nargs='+', type=int, default=[1280,720], help='inference size h,w')
was not enough. You also need to edit datasets.py in the location where the video capture is started:
cap = cv2.VideoCapture(s,cv2.CAP_DSHOW)
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 1920)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 1080)
Note that I’ve set 1920×1080 here, but my built-in webcam is limited to 1280×720 – hence the smaller default size in detect.py. You can up that to 1920×1080 as well if you have a Full HD webcam.